If you have spent any time in Azure Active Directory, chances are you have stumbled across Azure AD Conditional Access. It is at the very center of Microsoft Zero Trust. At its most basic, it evaluates every sign in to your Azure AD tenant. It takes the different signals that form that sign in. The location a user is coming from, the health of a device. It can look at the roles a user has, or the groups they are in. Even what application is being used to sign in. Once it has all that telemetry, it decides not only if you are allowed into the tenant. It also dictates the controls required to access. You must complete MFA, or your device be compliant. You can block sign ins from particular locations, or need specific applications to be allowed in. When I first looked at Conditional Access I thought of it as a ‘firewall for identity’. While that is somewhat true, it undersells the power of Conditional Access. Conditional Access can make decisions based on a lot more than a traditional firewall can.
Before we go hunting through our data, let’s take a step back. To make sense of that data, here are a couple of key points about Conditional Access.
- Many policies can apply to a sign in. The controls for these policies will be added together. For instance, if you have two policies that control access to Exchange Online. The first requires MFA and the second device compliance. Then the policies are added together. The user must satisfy both MFA and have a compliant device.
- Each individual policy can have many controls within it, such as MFA and requiring an approved application. They are evaluated in the following order.
- Multi-factor Authentication
- Approved Client App/App Protection Policy
- Managed Device (Compliant, Hybrid Azure AD Join)
- Custom controls (such as Duo MFA)
- Session controls (App Enforced, MCAS, Token Lifetime)
- A block policy overrides any allow policy, regardless of controls. If one policy says allow with MFA and one says block. The sign in is blocked.
These are important to note because when we look through our data, we will see multiple policies per sign in. To make this data easier to read, we are going to use the mv-expand operator. The guidance says it “Expands multi-value dynamic arrays or property bags into multiple records”. Well, what does that mean? Let’s look at example using the KQL playground. This a demo environment anyone can access. If you log on there, we can look at one sign in event.
SigninLogs | where CorrelationId == "cadd2fee-a8b0-4daf-9ac8-cc3ae8ebe15b" | project ConditionalAccessPolicies
We can see many policies evaluated. You see the large JSON structure listing them all. From position 0 to position 11. So 12 policies in total have been evaluated. The problem when hunting this data, is that the position of policies can change. If ‘Block Access Julianl’, seen at position 10 is triggered, it would move up higher in the list. So we need to make our data consistent before hunting it. Let’s use our mv-expand operator on the same sign in.
SigninLogs | where CorrelationId == "cadd2fee-a8b0-4daf-9ac8-cc3ae8ebe15b" | mv-expand ConditionalAccessPolicies | project ConditionalAccessPolicies
Our mv-expand operator has expanded each of the policies into its own row. We went from one row, with our 12 policy outcomes in one JSON field, to 12 rows, with one outcome each. We don’t need to worry about the location within a JSON array now. We can query our data knowing it is consistent.
For each policy, we will have one of three outcomes
- Success – the controls were met. For instance, a user passed MFA on a policy requiring MFA.
- Failure – the controls failed. For instance, a user failed MFA on a policy requiring MFA.
- Not applied – the policy was not applied to this sign in. For instance, you had a policy requiring MFA for SharePoint. But this sign in was for Service Now, so it didn’t apply.
If you have policies in report only mode you may see those too. Report only mode lets you test policies before deploying them. So the policy will be evaluated, but none of the controls enforced. You will see these events as reportOnlySuccess, reportOnlyFailure and reportOnlyNotApplied.
User Sign In Insights
Now that we have the basics sorted, we can query our data. The more users and more policies you have, the more data to evaluate. If you were interested in just seeing some statistics for your policies, we can do that. You can use the evaluate operator to build a table showing all the outcomes.
//Create a pivot table showing all conditional access policy outcomes over the last 30 days SigninLogs | where TimeGenerated > ago(30d) | extend CA = parse_json(ConditionalAccessPolicies) | mv-expand bagexpansion=array CA | evaluate bag_unpack(CA) | extend ['CA Outcome']=tostring(column_ifexists('result', "")), ['CA Policy Name'] = column_ifexists('displayName', "") | evaluate pivot(['CA Outcome'], count(), ['CA Policy Name'])
These are the same 12 policies we saw earlier. We now have a useful table showing the usage of each.
Using this mv-expand operator further, we can really dig in. This query looks for the users that are failing the most different policies. Is this user compromised and the attackers are trying to find a hole in your policies?
//Find which users are failing the most Conditional Access policies, retrieve the total failure count, distinct policy count and the names of the failed policies SigninLogs | where TimeGenerated > ago (30d) | project TimeGenerated, ConditionalAccessPolicies, UserPrincipalName | mv-expand ConditionalAccessPolicies | extend CAResult = tostring(ConditionalAccessPolicies.result) | extend CAPolicyName = tostring(ConditionalAccessPolicies.displayName) | where CAResult == "failure" | summarize ['Total Conditional Access Failures']=count(), ['Distinct Policy Failure Count']=dcount(CAPolicyName), ['Policy Names']=make_set(CAPolicyName) by UserPrincipalName | sort by ['Distinct Policy Failure Count'] desc
One query I really love running is the following. It hunts through all sign in data, and returns policies that are not in use.
//Find Azure AD conditional access policies that have no hits for 'success' or 'failure' over the last month //Check that these policies are configured correctly or still required SigninLogs | where TimeGenerated > ago (30d) | project TimeGenerated, ConditionalAccessPolicies | mv-expand ConditionalAccessPolicies | extend CAResult = tostring(ConditionalAccessPolicies.result) | extend ['Conditional Access Policy Name'] = tostring(ConditionalAccessPolicies.displayName) | summarize ['Conditional Access Result']=make_set(CAResult) by ['Conditional Access Policy Name'] | where ['Conditional Access Result'] !has "success" and ['Conditional Access Result'] !has "failure" and ['Conditional Access Result'] !has "unknownFutureValue" | sort by ['Conditional Access Policy Name'] asc
This query uses the summarize operator to build a set of all the outcomes for each policy. We create a set of all the outcomes for that policy – success, not applied, failure. Then we exclude any policy that has a success or a failure. If we see a success or failure event, then the policy is in use. If all we see is ‘not Applied’ then no sign ins have triggered that policy. Maybe the settings aren’t right, or you have excluded too many people?
We can even use some of the more advanced operators to look for anomalies in our data. The series_decompose_anomalies operator lets us hunt through time series data. From that data is flags anything it believes is an anomaly.
//Detect anomalies in the amount of conditional access failures by users in your tenant, then visualize those conditional access failures //Starttime and endtime = which period of data to look at, i.e from 21 days ago until today. let startdate=21d; let enddate=1d; //Timeframe = time period to break the data up into, i.e 1 hour blocks. let timeframe=1h; //Sensitivity = the lower the number the more sensitive the anomaly detection is, i.e it will find more anomalies, default is 1.5 let sensitivity=2; //Threshold = set this to tune out low count anomalies, i.e when total failures for a user doubles from 1 to 2 let threshold=5; let outlierusers= SigninLogs | where TimeGenerated between (startofday(ago(startdate))..startofday(ago(enddate))) | where ResultType == "53003" | project TimeGenerated, ResultType, UserPrincipalName | make-series CAFailureCount=count() on TimeGenerated from startofday(ago(startdate)) to startofday(ago(enddate)) step timeframe by UserPrincipalName | extend outliers=series_decompose_anomalies(CAFailureCount, sensitivity) | mv-expand TimeGenerated, CAFailureCount, outliers | where outliers == 1 and CAFailureCount > threshold | distinct UserPrincipalName; //Optionally visualize the anomalies SigninLogs | where TimeGenerated between (startofday(ago(startdate))..startofday(ago(enddate))) | where ResultType == "53003" | project TimeGenerated, ResultType, UserPrincipalName | where UserPrincipalName in (outlierusers) | summarize CAFailures=count()by UserPrincipalName, bin(TimeGenerated, timeframe) | render timechart with (ytitle="Failure Count",title="Anomalous Conditional Access Failures")
I am not sure I would want to alert on every Conditional Access failure. You are likely to have a lot of them. But what about users failing Conditional Access to multiple applications, in a short time period? This query finds any users that get blocked by Conditional Access to 5 of more unique applications within an hour.
SigninLogs | where TimeGenerated > ago (1d) | project TimeGenerated, ConditionalAccessPolicies, UserPrincipalName, AppDisplayName | mv-expand ConditionalAccessPolicies | extend CAResult = tostring(ConditionalAccessPolicies.result) | extend CAPolicyName = tostring(ConditionalAccessPolicies.displayName) | where CAResult == "failure" | summarize ['List of Failed Application']=make_set(AppDisplayName), ['Count of Failed Application']=dcount(AppDisplayName) by UserPrincipalName, bin(TimeGenerated, 1h) | where ['Count of Failed Application'] >= 5
The second key part of Conditional Access monitoring is auditing changes. Much like a firewall, changes to Conditional Access policies should be alerted on. Accidental or malicious changes to your policies can decrease your security posture significantly. Any changes to policies are held in the Azure Active Directory audit log table.
Events are logged under three different categories.
- Add conditional access policy
- Update conditional access policy
- Delete conditional access policy
A simple query will return any of these actions in your environment.
AuditLogs | where TimeGenerated > ago(7d) | where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy")
You will notice one thing straight away. It is difficult to work out what has actually changed. Most of the items are stored as GUIDs buried in JSON. It is hard to tell the old setting from the new. I wouldn’t even bother trying to make sense of it. Instead let’s update our query to this.
AuditLogs | where TimeGenerated > ago(7d) | where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy") | extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName) | extend ['Policy Name'] = tostring(TargetResources.displayName) | extend ['Policy Id'] = tostring(TargetResources.id) | project TimeGenerated, Actor, OperationName, ['Policy Name'], ['Policy Id']
Now we are returned the name of our policy, and its Id. Then we can jump into the Azure portal and see the current settings. This is where your knowledge of your environment is key. If you know the ‘Sentinel 101 Test’ policy requires MFA for all sign ins, and someone has changed the policy, you need to investigate.
We can add some more logic to our queries. For instance, we could alert on changes made by people who have never made a change before. Has an admin has been compromised? Or someone not familiar with Conditional Access was asked to make a change.
//Detects users who add, delete or update a Azure AD Conditional Access policy for the first time. //First find users who have previously made CA policy changes, this example looks back 90 days let knownusers= AuditLogs | where TimeGenerated > ago(90d) and TimeGenerated < ago(1d) | where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy") | extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName) | distinct Actor; //Find new events from users not in the known user list AuditLogs | where TimeGenerated > ago(1d) | where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy") | extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName) | extend ['Policy Name'] = tostring(TargetResources.displayName) | extend ['Policy Id'] = tostring(TargetResources.id) | where Actor !in (knownusers) | project TimeGenerated, Actor, ['Policy Name'], ['Policy Id']
We can even look for actions at certain times of the day, or particular days. This query looks for changes after hours or on weekends.
//Detect changes to Azure AD Conditional Access policies on weekends or outside of business hours let Saturday = time(6.00:00:00); let Sunday = time(0.00:00:00); AuditLogs | where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy") // extend LocalTime to your time zone | extend LocalTime=TimeGenerated + 5h // Change hours of the day to suit your company, i.e this would find activations between 6pm and 6am | where dayofweek(LocalTime) in (Saturday, Sunday) or hourofday(LocalTime) !between (6 .. 18) | extend ['Conditional Access Policy Name'] = tostring(TargetResources.displayName) | extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName) | project LocalTime, OperationName, ['Conditional Access Policy Name'], Actor | sort by LocalTime desc
Like any rules or policies in your environment, there is a chance you will need exclusions. Conditional Access policies are very granular in what you can include or exclude. You can exclude on locations, or OS types, or particular users. It is important to alert on these exclusions, and ensure they are fit for purpose. For this example I have excluded a particular group from this policy.
We can see that an ‘Update conditional access policy’ event was triggered. Again, the raw data is hard to read. So jump into the portal and check out what has been configured. Now, one very important note here. If you add a group exclusion to a policy, it will trigger an event you can track. However, if I then add users to that group, it won’t trigger a policy change event. This is because the policy itself hasn’t changed, just the membership of the group. From your point of view you will need to have visibility to both events. If your policy is changed you would want to know. If 500 users were added to the group, you would also want to know. So we can query group addition events with the below query.
AuditLogs | where OperationName == "Add member to group" | extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName) | extend Target = tostring(TargetResources.userPrincipalName) | extend GroupName = tostring(parse_json(tostring(parse_json(tostring(TargetResources.modifiedProperties)).newValue))) | where GroupName has "Conditional Access Exclusion" | project TimeGenerated, Actor, Target, GroupName
When you are creating exclusions, you want to limit those exclusions down as much as possible. We always talk about the theory of ‘least privilege’. With exclusions, I like to think of them as ‘least exclusion’. If you have a workload that needs excluding, then can we exclude a particular location, or IP, or device? This is a better security stance than a blanket exclusion of a whole policy.
You can often use two policies to achieve the best outcome. Think of the example of Exchange Online, you want to enforce MFA for everyone. But you have a service account that does some automation, and it keeps failing MFA. It signs on from one particular IP address. If you exclude it from your main policy then it is a blanket exclusion. Instead build two policies.
- Policy 1 – Require MFA for Exchange Online
- Includes all users, excludes your service account
- Includes all locations
- Includes Exchange Online
- Control is require MFA
- Policy 2 – Exclude MFA for Exchange Online
- Includes only your service account
- Includes all locations, excludes a single IP address
- Includes Exchange Online
- Control is require MFA
As mentioned at the outset, Conditional Access policies are combined. So this combined set of two policies achieves what we want. Our service account is only excluded from MFA from our single IP address. Let’s say the credentials for that account are compromised. The attacker tries to sign in from another location. When it signs into Exchange Online it will prompt for MFA.
If we had only one policy we don’t get the same control. If we had a single policy and excluded our service account, then it would be excluded from all locations. If we had a single policy and excluded the IP address, then all users would be excluded from that IP. So we need to build two policies to achieve the best outcome.
Of course we want to balance single exclusions with the overhead of managing many policies. The more policies you have, the harder it is to work out the effect of changes. Microsoft provides a ‘what-if’ tool for Conditional Access. It will let you build a ‘fake’ sign in and tell you which policies are applied.
Learning to drive and audit Conditional Access is key to securing Azure AD. Having built a lot of policies over the years, here are some of my tips.
- Never, ever lock yourself out of the Azure portal! You get a UI warning if it believes you may be doing this. Support will be able to get you back in, but it will take time. Exclude your own account as you build policies.
- Create broad policies that cover the most use cases. If your standard security stance is require MFA to access SSO apps then build one policy. Apply that policy to as many apps and users as possible. There is no need to build an individual policy for each app.
- When you create exclusions, use the principal of ‘least exclusion’. When you are building an exclusion, have a think about the flow on effect. Will it decrease security for other users or workloads? Use multiple policies where practical to keep your security tight.
- Audit any policy changes. Find the policy that was changed and review it in the Azure portal.
- Use the ‘what-if’ tool to help you build policies. Remember that multiple policies are combined, and controls within a single policy have an order of operations.
- Blocks override any allows!
- Try not to keep ‘report only’ policies in report only mode too long. Once you are happy, then enable the policy. Report only should only be there to validate your policy logic.
- If you use group exclusions, then monitor the membership of those groups. Users being added to a group that is excluded from a policy won’t trigger a policy change event. Keep on top of how many people are excluded. Once someone is in a group they tend to stay there forever. If an exclusion is temporary, make sure they are removed.
How to get an actual reason from the logs for CA failure?
Yep you can, you will need to dig right into each conditional access and the results. I hopefully have a blog coming out in the next 4-6 weeks about using mv-apply and mx-expand to do exactly that
Any closer to this blog?
Better late than never! https://learnsentinel.blog/2023/05/15/have-a-json-headache-in-kql-try-mv-expand-or-mv-apply/