MIM Portal and services completely hosted within Azure with the only access using Azure Panel Application
Issue encountered was customers were reporting connection issues when navigating the portal and submitting request.
High level design
Contributions : Jose Garza
This was a mystery case that peaked my interest , Even testing I could not reproduce , but there was a variable we did not consider and that was the application proxy and it was a key player in the issue. In my original lab I only used 1 AP But When we introduced the second proxy we seen from the IIS logs that the request was jumping around even from the same user , thus causing weird behavior .
The app proxy is a cloud service with many different instances under the same endpoint (URL). Azure PaaS load balancer which should ensure that same source IP and source port should reach same Proxy instance but it’s not 100% guaranteed.
The requests then reaches the connector when statistically if you have several connectors, different request reaching different instances can reach different connectors. What we seen in the IIS logs below
When the request reaches the backend (or load balancer in between) the connector IP is the one that is sent as the client IP breaking the session. (from security reasons, HTTP client does not allow to manipulate it of course so the RFC of proxies introduced the X-Forwarded-For header)
So after understanding this it all made sense and the logs we seen, Azure NLB cannot work for us unless we have 1 AP even then session state is based on IP , not a great story for HA with smarts .
To see the x-forwarded-for:
Open IIS select logging
AAD-AP 9,10 when navigating the site :
So what to do we have two options as Azure load balancer doesn’t currently these advance session persistence
1. Update the load balancer to check the X-Forwarded-For header, we populate this one with the original client IP(F5,KEMP,etc.)
2. Move to only 1 connector solution, this solution is of course less desirable since it requires having automatic system that you need to build that once the connector service goes down (for any reason), the second one starts.
Option one is the best solution for enterprise so in this case I chose kemp 1 because it is free up to a 20MB
So I deployed the KEMP Free 20MBs Load balancer (may be enough) – choose your flavor (F5,jetNEXUS,loadbalancer.org) took me less than an hour
Once loaded logged in and configured the service , we used SSL offloading ,then re-encrypt this allows us to terminate the connection and then read the X-Forwarded-For for persistence Options then re-encrypt so 100% 443/SSL
After a testing we see that no request made from a single IP is sent to secondary node(aka persistence is working)
1 is CORPMIM1, 2 is CORPMIM2
Updated solution :