Microsoft Sentinel 101

Learning Microsoft Sentinel, one KQL error at a time

Improving your security baseline with KQL — 6th Sep 2022

Improving your security baseline with KQL

One of my favourite sayings is ‘don’t let perfect be the enemy of good’. I think in cyber security, we can all be guilty of striving for perfection. Whether that is your MFA deployment, reducing local admin privilege or whatever your project may be. The reality is, in most larger organizations you will always have exclusions to your policies. There are likely people which require a different set of rules to be applied to them. They key however is to keep making progress while trying to find solutions.

Similarly, if organizational red tape is preventing security policies being rolled out, then initially deploy to those users and systems that won’t be impacted in anyway. I also really love the saying ‘analysis paralysis’ to refer to this in organizations. Organizations can be caught up trying to overengineer solutions that solve every potential fringe use case that they end up making no progress.

Perhaps you have some edge use cases where MFA is difficult to deploy – maybe you have users work in environments where mobile phone usage is banned. That shouldn’t prevent you from deploying MFA to the vast majority of users who do have access to their phone. That isn’t to say you forget about those users, it just doesn’t become a showstopper for any MFA deployment.

If you use Microsoft Sentinel or Advanced Hunting you probably view them as detection platforms, which they definitely are. However, they also provide us with a rich set of data which we can use as a baseline to build and target security policies. Using KQL and the data in these platforms, we can quickly see the impact of our planned policies. We can also use the same data to find especially high-risk accounts, devices or applications to prioritize.

Azure AD Identities

I am sure everyone would love to have MFA everywhere, all the time. The reality is most organizations are still working toward that. As you progress, you may want to target high risk applications. Applications such as control plane management for Azure or Defender services or VPN and remote access portals. Applications with a lot of personal or financial data are always attractive targets for threat actors too. We can use KQL to calculate the percentage of authentications to each application that are covered by MFA.

//Microsoft Sentinel query
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize
    ['Total Signin Count']=count(),
    ['Total MFA Count']=countif(AuthenticationRequirement == "multiFactorAuthentication"),
    ['Total non MFA Count']=countif(AuthenticationRequirement == "singleFactorAuthentication")
    by AppDisplayName
| project
    AppDisplayName,
    ['Total Signin Count'],
    ['Total MFA Count'],
    ['Total non MFA Count'],
   MFAPercentage=(todouble(['Total MFA Count']) * 100 / todouble(['Total Signin Count']))
| sort by ['Total Signin Count'] desc, MFAPercentage asc  
//Advanced Hunting query
AADSignInEventsBeta
| where Timestamp > ago(30d)
| where ErrorCode == 0
| summarize
    ['Total Signin Count']=count(),
    ['Total MFA Count']=countif(AuthenticationRequirement == "multiFactorAuthentication"),
    ['Total non MFA Count']=countif(AuthenticationRequirement == "singleFactorAuthentication")
    by Application
| project
    Application,
    ['Total Signin Count'],
    ['Total MFA Count'],
    ['Total non MFA Count'],
    MFAPercentage=(todouble(['Total MFA Count']) * 100 / todouble(['Total Signin Count']))
| sort by ['Total Signin Count'] desc, MFAPercentage asc  

You can then filter that list on particular apps you consider risky, or look for the apps with the worst coverage and start there.

You could alternatively look at it from an identity point of view. Maybe your broader MFA rollout will take a while, but you could enforce MFA across your privileged users straight away. You then get an immediate security benefit by enforcing those controls on your highest risk users. This query finds the MFA percentage for any users with an Azure AD role or ‘admin’ in their username.

//Microsoft Sentinel query
let privusers=
    IdentityInfo
    | where TimeGenerated > ago(21d)
    | summarize arg_max(TimeGenerated, *) by AccountUPN
    | where isnotempty(AssignedRoles)
//Look for users who hold a privileged role or who have admin in their title, you may need to update to your naming standards
    | where AssignedRoles != "[]" or AccountUPN contains "admin"
    | distinct AccountUPN;
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where UserPrincipalName in~ (privusers)
| summarize
    ['Total Signin Count']=count(),
    ['Total MFA Count']=countif(AuthenticationRequirement == "multiFactorAuthentication"),
    ['Total non MFA Count']=countif(AuthenticationRequirement == "singleFactorAuthentication")
    by UserPrincipalName
| project 
    UserPrincipalName,
    ['Total Signin Count'],
    ['Total MFA Count'],
    ['Total non MFA Count'],
   MFAPercentage=(todouble(['Total MFA Count']) * 100 / todouble(['Total Signin Count']))
| sort by MFAPercentage asc    

Another improvement you can make to your identity security is to migrate from weaker MFA methods to stronger ones. This diagram from the Microsoft docs is a great example of this. We know that any MFA is better than no MFA, but we also know that apps like the Authenticator app or going passwordless is even better.

With Microsoft Sentinel if we query our Azure AD sign in data, we can find which users are only using text message. The fact is those users are already doing some kind of MFA, so perhaps some targeted training for those users to get them to move up to a better method. The Authenticator app or passwordless technologies have always been a really easy sell for me. In cyber security we don’t always have solutions that are both more secure and a better user experience. So, when we do run across them, like passwordless, we should embrace them. The following query (available only in Sentinel) will find those users who have only used text message as their MFA method.

//Microsoft Sentinel query
SigninLogs
| where TimeGenerated > ago(30d)
//You can exclude guests if you want, they may be harder to move to more secure methods, comment out the below line to include all users
| where UserType == "Member"
| mv-expand todynamic(AuthenticationDetails)
| extend ['Authentication Method'] = tostring(AuthenticationDetails.authenticationMethod)
| where ['Authentication Method'] !in ("Previously satisfied", "Password", "Other")
| where isnotempty(['Authentication Method'])
| summarize
    ['Count of distinct MFA Methods']=dcount(['Authentication Method']),
    ['List of MFA Methods']=make_set(['Authentication Method'])
    by UserPrincipalName
//Find users with only one method found and it is text message
| where ['Count of distinct MFA Methods'] == 1 and ['List of MFA Methods'] has "text"

Another win you can get in Azure AD is to find users who are trying to use the self-service password reset functionality but failing. The logging for SSPR is really verbose so we get great insights from the data. For instance, we can find users who are attempting to reset their password but don’t have a phone number registered. This is a good chance to reach out to those users and get them enrolled fully – the new combined registration lets them get enrolled into MFA at the same time. Guide them through onboarding the Authenticator app over text message!

AuditLogs
| where LoggedByService == "Self-service Password Management"
| extend User = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ['User IP Address'] = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| sort by TimeGenerated asc 
| summarize ['SSPR Actions']=make_list(ResultReason) by CorrelationId, User, ['User IP Address']
| where ['SSPR Actions'] has "User's account has insufficient authentication methods defined. Add authentication info to resolve this"
| sort by User desc 

Another SSPR query that is helpful; you can find users who are getting stuck during the password reset flow. There is nothing more annoying for a user that is trying to do the right thing but getting stuck. This query will find users who are attempting to reset their password but failing multiple times – possibly due to password complexity requirements. If you are making progress to deploying passwordless technologies, these users may be a good fit.

AuditLogs
| where LoggedByService == "Self-service Password Management"
| extend User = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ['User IP Address'] = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| sort by TimeGenerated asc 
| summarize ['SSPR Actions']=make_list_if(ResultReason, ResultReason has "User submitted a new password") by CorrelationId, User, ['User IP Address']
| where array_length(['SSPR Actions']) >= 3
| sort by User desc 

It wouldn’t be a post about Azure AD without a legacy authentication query. Microsoft is beginning to disable legacy auth in Exchange Online (starting October 1). However, you should still block legacy auth in Conditional Access, because it is used in other places other than Exchange. The easiest place to start is simply build a Conditional Access policy and block it for those users that have never used legacy auth. If they aren’t using it already, then don’t let them (or an attacker) start using it. You could achieve this a number of ways, but in my opinion the easiest is just to create a list of all your identities. From that, we can find those that have not used legacy auth in the last 30 days.

//Microsoft Sentinel query
let legacyauthusers=
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where ClientAppUsed !in ("Mobile Apps and Desktop clients", "Browser")
| distinct UserPrincipalName;
IdentityInfo
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, *) by AccountCloudSID
| where UserType == "Member"
| distinct AccountUPN
| where isnotempty(AccountUPN)
| where AccountUPN !in~ (legacyauthusers)
//Advanced Hunting query
let legacyauthusers=
AADSignInEventsBeta
| where ErrorCode == 0
| where ClientAppUsed !in ("Mobile Apps and Desktop clients", "Browser")
| distinct AccountUpn;
IdentityInfo
| distinct AccountUpn
| where isnotempty( AccountUpn)
| where AccountUpn !in (legacyauthusers)

Azure AD Conditional Access for workload identities allows us to control which IP addresses our Azure AD service principals connect from. Depending on the nature of your service principals, they may change IP addresses a lot, or they may be quite static. We can use both Advanced Hunting and Microsoft Sentinel to find a list of service principals that are only connecting from a single IP address. You can then use this data to build out Conditional Access policies. If one of those service principals is then compromised and a threat actor connects from elsewhere, they will be blocked. The data for this query is held in the AADSpnSignInEventsBeta in Advanced Hunting (requires Azure AD P2) or AADServicePrincipalSignInLogs in Microsoft Sentinel (assuming you have the data ingesting).

//Microsoft Sentinel query
let appid=
    AADServicePrincipalSignInLogs
    | where TimeGenerated > ago (30d)
    | where ResultType == 0
    | summarize dcount(IPAddress) by AppId
    | where dcount_IPAddress == 1
    | distinct AppId;
AADServicePrincipalSignInLogs
| where TimeGenerated > ago (30d)
| where ResultType == 0
| where AppId in (appid)
| summarize ['Application Id']=make_set(AppId) by IPAddress, ServicePrincipalName
//Advanced Hunting query
let appid=
    AADSpnSignInEventsBeta
    | where Timestamp > ago (30d)
    | where ErrorCode == 0
    | where IsManagedIdentity == 0
    | summarize dcount(IPAddress) by ApplicationId
    | where dcount_IPAddress == 1
    | distinct ApplicationId;
AADSpnSignInEventsBeta
| where Timestamp > ago (30d)
| where ErrorCode == 0
| where ApplicationId in (appid)
| summarize ['Application Id']=make_set(ApplicationId) by IPAddress, ServicePrincipalName

Local Admin Access & Lateral Movement

When attackers compromise a workstation, the user they initially breach my not have a lot of privilege. A threat actor will try to move laterally and escalate privilege from that initial foothold. We can try to reduce privilege credentials being left on devices by using tools like LAPS and not using domain admin level accounts when accessing end user workstations. Unless you have some kind of privileged access management software that enforces these behaviors though, chances are privileged credentials are being left on a number of devices. We can use Defender and Sentinel data to try and target the most vulnerable devices and users.

For instance, this query will summarize logons to your devices where the user has local admin rights. From that list we sort our devices by those that have the most unique accounts signing in with local admin privilege. If an attacker was to compromise one of these, then there is a chance they can get access to the credentials for all the users who have logged on using mimikatz or something similar.

//Microsoft Sentinel query
DeviceLogonEvents
| where TimeGenerated > ago(30d)
| project DeviceName, ActionType, LogonType, AdditionalFields, InitiatingProcessCommandLine, AccountName, IsLocalAdmin
| where ActionType == "LogonSuccess"
| where LogonType in ("Interactive","RemoteInteractive")
| where AdditionalFields.IsLocalLogon == true
| where InitiatingProcessCommandLine == "lsass.exe"
| summarize
    ['Local Admin Distinct User Count']=dcountif(AccountName,IsLocalAdmin == "true"),
    ['Local Admin User List']=make_set_if(AccountName, IsLocalAdmin == "true")
    by DeviceName
| sort by ['Local Admin Distinct User Count']
//Advanced Hunting query
DeviceLogonEvents
| where Timestamp > ago(30d)
| project DeviceName, ActionType, LogonType, AdditionalFields, InitiatingProcessCommandLine, AccountName, IsLocalAdmin
| where ActionType == "LogonSuccess"
| where LogonType in ("Interactive","RemoteInteractive")
| where IsLocalAdmin == true
| where InitiatingProcessCommandLine == "lsass.exe"
| summarize
    ['Local Admin Distinct User Count']=dcountif(AccountName,IsLocalAdmin == "true"),
    ['Local Admin User List']=make_set_if(AccountName, IsLocalAdmin == "true")
    by DeviceName
| sort by ['Local Admin Distinct User Count'] desc  

If we run the same query again, we can reverse our summary. This time we find the accounts which have logged onto the most devices as local admin. This will show us our accounts with the largest blast radius. If one of these accounts is compromised, then the attacker would also have local admin access to all the devices listed.

//Microsoft Sentinel query
DeviceLogonEvents
| where TimeGenerated > ago(30d)
| project DeviceName, ActionType, LogonType, AdditionalFields, InitiatingProcessCommandLine, AccountName, IsLocalAdmin
| where ActionType == "LogonSuccess"
| where LogonType in ("Interactive","RemoteInteractive")
| where AdditionalFields.IsLocalLogon == true
| where InitiatingProcessCommandLine == "lsass.exe"
| summarize
    ['Local Admin Distinct Device Count']=dcountif(DeviceName,IsLocalAdmin == "true"),
    ['Local Admin Device List']=make_set_if(DeviceName, IsLocalAdmin == "true")
    by AccountName
| sort by ['Local Admin Distinct Device Count'] desc 
//Advanced Hunting query
DeviceLogonEvents
| where Timestamp > ago(30d)
| project DeviceName, ActionType, LogonType, AdditionalFields, InitiatingProcessCommandLine, AccountName, IsLocalAdmin
| where ActionType == "LogonSuccess"
| where LogonType in ("Interactive","RemoteInteractive")
| where IsLocalAdmin == true
| where InitiatingProcessCommandLine == "lsass.exe"
| summarize
    ['Local Admin Distinct Device Count']=dcountif(DeviceName,IsLocalAdmin == "true"),
    ['Local Admin Device List']=make_set_if(DeviceName, IsLocalAdmin == "true")
    by AccountName
| sort by ['Local Admin Distinct Device Count'] desc  

You can use this same data to hunt for service accounts that are logging into devices. In a perfect world that doesn’t happen of course, but the reality is some software vendors make products where it is required. You may find that IT admins are being lazy and just using those service accounts everywhere though. They often won’t have controls like MFA and possibly have a worse password. For an attacker, service accounts are gold, since the monitoring around them is often weak.

//Microsoft Sentinel query
DeviceLogonEvents
| where TimeGenerated > ago(30d)
| project DeviceName, ActionType, LogonType, AdditionalFields, InitiatingProcessCommandLine, AccountName, IsLocalAdmin
| where ActionType == "LogonSuccess"
| where LogonType in ("Interactive","RemoteInteractive")
| where AdditionalFields.IsLocalLogon == true
| where InitiatingProcessCommandLine == "lsass.exe"
//Search only for accounts starting with svc or containing service. You may need to substitute in your service account naming standard.
| where AccountName startswith "svc" or AccountName contains "service"
| summarize
    ['Local Admin Distinct Device Count']=dcountif(DeviceName,IsLocalAdmin == "true"),
    ['Local Admin Device List']=make_set_if(DeviceName, IsLocalAdmin == "true")
    by AccountName
| sort by ['Local Admin Distinct Device Count'] desc 

Once you have your list, you can then start to enforce what machines they can access. If svc.sqlapp only needs to logon to 2 machines, then just configure that in Active Directory. You can then alert on activity outside of that which may be malicious.

If you don’t use Defender for Endpoint you can use the Windows security event log to achieve a similar summary. For instance, you can find the devices with the most users connecting via RDP. Then you can reverse that query and find the users connecting to the most devices. Just like our Defender data.

//Microsoft Sentinel query
SecurityEvent
| where TimeGenerated > ago(30d)
| where EventID == "4624"
| where LogonType == 10
//Extend new column that drops Account to lower case so users are correctly summarized, i.e User123 and user123 are combined
| extend AccountName=tolower(Account)
| summarize
    ['Count of Users']=dcount(AccountName),
    ['List of Users']=make_set(AccountName)
    by Computer
| sort by ['Count of Users'] desc 
//Microsoft Sentinel query
SecurityEvent
| where TimeGenerated > ago(30d)
| where EventID == "4624"
| where LogonType == 10
//Extend new column that drops Account to lower case so users are correctly summarized, i.e User123 and user123 are combined
| extend AccountName=tolower(Account)
| summarize
    ['Count of Computers']=dcount(Computer),
    ['List of Computers']=make_set(Computer)
    by AccountName
| sort by ['Count of Computers'] desc 

Attack surface reduction rules

Attack surface reduction (ASR) rules are a really great feature of Defender that help protect your device against certain behaviours. Instead of targeting particular malicious files (which Defender still does of course), they instead block against behaviour. For instance, ASR may block a file that when executed attempts to connect to the internet and download further files. IT and cyber security departments are often wary of these rules impacting users negatively. There are still lots of ways to get some quick wins with ASR, without stopping users from being able to work. If you are evaluating ASR then you should absolutely put the rules into audit mode. This will write an event to Advanced Hunting and Sentinel each time a rule would have blocked a file or program if block mode was enabled. Once you have done that, you have a great set of data to start making progress.

The following query will find machines that have triggered no ASR rules over the last 30 days. These machines would be a good starting point to enable ASR in block mode. You have the data showing they haven’t triggered any rules in the last 30 days.

//Microsoft Sentinel query
//First find devices that have triggered an Attack Surface Reduction rule, either block or in audit mode.
let asrdevices=
    DeviceEvents
    | where TimeGenerated > ago (30d)
    | where ActionType startswith "Asr"
    | distinct DeviceName;
//Find all devices and exclude those that have previously triggered a rule
DeviceInfo
| where TimeGenerated > ago (30d)
| where OSPlatform startswith "Windows"
| summarize arg_max(TimeGenerated, *) by DeviceName
| where DeviceName !in (asrdevices)
| project
    ['Time Last Seen']=TimeGenerated,
    DeviceId,
    DeviceName,
    OSPlatform,
    OSVersion,
    LoggedOnUsers
//First find devices that have triggered an Attack Surface Reduction rule, either block or in audit mode.
let asrdevices=
    DeviceEvents
    | where Timestamp > ago (30d)
    | where ActionType startswith "Asr"
    | distinct DeviceName;
//Find all devices and exclude those that have previously triggered a rule
DeviceInfo
| where Timestamp > ago (30d)
| where OSPlatform startswith "Windows"
| summarize arg_max(Timestamp, *) by DeviceName
| where DeviceName  !in (asrdevices)
| project
    ['Time Last Seen']=Timestamp,
    DeviceId,
    DeviceName,
    OSPlatform,
    OSVersion,
    LoggedOnUsers

You can also summarize your ASR audit data. The following query will list the total count, distinct device count and the list of devices for each rule that is being triggered.

//Microsoft Sentinel query
DeviceEvents
| where TimeGenerated > ago(30d)
| where ActionType startswith "Asr"
| where isnotempty(InitiatingProcessCommandLine)
| summarize ['ASR Hit Count']=count(), ['Device Count']=dcount(DeviceName), ['Device List']=make_set(DeviceName) by ActionType, InitiatingProcessCommandLine
| sort by ['ASR Hit Count'] desc 
//Advanced Hunting query
DeviceEvents
| where Timestamp > ago(30d)
| where ActionType startswith "Asr"
| where isnotempty(InitiatingProcessCommandLine)
| summarize ['ASR Hit Count']=count(), ['Device Count']=dcount(DeviceName), ['Device List']=make_set(DeviceName) by ActionType, InitiatingProcessCommandLine
| sort by ['ASR Hit Count'] desc 

It also lists the process command line that flagged the rule. From that list you can see if you have any common software or processes across your devices triggering ASR hits. If you have a particular vendor piece of software that is flagging ASR rules across all your devices, you can reach out to the vendor for an update. Alternatively, you could look at excluding that particular rule and process combination. In the perfect world, we would have no exclusions to AV or EDR, but if you are dealing with legacy software or other tech debt that may not be realistic. I would personally rather have ASR enabled with a small exclusion list, than not have it on at all. With KQL you can help build those rules out with minimal disruption to your users.

These are just a few examples of analyzing the data you have to try and improve your security hygiene. Remember, you don’t need to perfect, there is no such thing as 100% secure. Attacks are constantly evolving. Use the tools and data you have today to make meaningful progress to reducing risk.

Azure AD Conditional Access Insights & Auditing with Microsoft Sentinel — 9th May 2022

Azure AD Conditional Access Insights & Auditing with Microsoft Sentinel

If you have spent any time in Azure Active Directory, chances are you have stumbled across Azure AD Conditional Access. It is at the very center of Microsoft Zero Trust. At its most basic, it evaluates every sign in to your Azure AD tenant. It takes the different signals that form that sign in. The location a user is coming from, the health of a device. It can look at the roles a user has, or the groups they are in. Even what application is being used to sign in. Once it has all that telemetry, it decides not only if you are allowed into the tenant. It also dictates the controls required to access. You must complete MFA, or your device be compliant. You can block sign ins from particular locations, or need specific applications to be allowed in. When I first looked at Conditional Access I thought of it as a ‘firewall for identity’. While that is somewhat true, it undersells the power of Conditional Access. Conditional Access can make decisions based on a lot more than a traditional firewall can.

Before we go hunting through our data, let’s take a step back. To make sense of that data, here are a couple of key points about Conditional Access.

  • Many policies can apply to a sign in. The controls for these policies will be added together. For instance, if you have two policies that control access to Exchange Online. The first requires MFA and the second device compliance. Then the policies are added together. The user must satisfy both MFA and have a compliant device.
  • Each individual policy can have many controls within it, such as MFA and requiring an approved application. They are evaluated in the following order.
    1. Multi-factor Authentication
    2. Approved Client App/App Protection Policy
    3. Managed Device (Compliant, Hybrid Azure AD Join)
    4. Custom controls (such as Duo MFA)
    5. Session controls (App Enforced, MCAS, Token Lifetime)
  • A block policy overrides any allow policy, regardless of controls. If one policy says allow with MFA and one says block. The sign in is blocked.

These are important to note because when we look through our data, we will see multiple policies per sign in. To make this data easier to read, we are going to use the mv-expand operator. The guidance says it “Expands multi-value dynamic arrays or property bags into multiple records”. Well, what does that mean? Let’s look at example using the KQL playground. This a demo environment anyone can access. If you log on there, we can look at one sign in event.

SigninLogs
| where CorrelationId == "cadd2fee-a8b0-4daf-9ac8-cc3ae8ebe15b"
| project ConditionalAccessPolicies

We can see many policies evaluated. You see the large JSON structure listing them all. From position 0 to position 11. So 12 policies in total have been evaluated. The problem when hunting this data, is that the position of policies can change. If ‘Block Access Julianl’, seen at position 10 is triggered, it would move up higher in the list. So we need to make our data consistent before hunting it. Let’s use our mv-expand operator on the same sign in.

SigninLogs
| where CorrelationId == "cadd2fee-a8b0-4daf-9ac8-cc3ae8ebe15b"
| mv-expand ConditionalAccessPolicies
| project ConditionalAccessPolicies

Our mv-expand operator has expanded each of the policies into its own row. We went from one row, with our 12 policy outcomes in one JSON field, to 12 rows, with one outcome each. We don’t need to worry about the location within a JSON array now. We can query our data knowing it is consistent.

For each policy, we will have one of three outcomes

  • Success – the controls were met. For instance, a user passed MFA on a policy requiring MFA.
  • Failure – the controls failed. For instance, a user failed MFA on a policy requiring MFA.
  • Not applied – the policy was not applied to this sign in. For instance, you had a policy requiring MFA for SharePoint. But this sign in was for Service Now, so it didn’t apply.

If you have policies in report only mode you may see those too. Report only mode lets you test policies before deploying them. So the policy will be evaluated, but none of the controls enforced. You will see these events as reportOnlySuccess, reportOnlyFailure and reportOnlyNotApplied.

User Sign In Insights

Now that we have the basics sorted, we can query our data. The more users and more policies you have, the more data to evaluate. If you were interested in just seeing some statistics for your policies, we can do that. You can use the evaluate operator to build a table showing all the outcomes.

//Create a pivot table showing all conditional access policy outcomes over the last 30 days
SigninLogs
| where TimeGenerated > ago(30d)
| extend CA = parse_json(ConditionalAccessPolicies)
| mv-expand bagexpansion=array CA
| evaluate bag_unpack(CA)
| extend
    ['CA Outcome']=tostring(column_ifexists('result', "")),
    ['CA Policy Name'] = column_ifexists('displayName', "")
| evaluate pivot(['CA Outcome'], count(), ['CA Policy Name'])

These are the same 12 policies we saw earlier. We now have a useful table showing the usage of each.

Using this mv-expand operator further, we can really dig in. This query looks for the users that are failing the most different policies. Is this user compromised and the attackers are trying to find a hole in your policies?

//Find which users are failing the most Conditional Access policies, retrieve the total failure count, distinct policy count and the names of the failed policies
SigninLogs
| where TimeGenerated > ago (30d)
| project TimeGenerated, ConditionalAccessPolicies, UserPrincipalName
| mv-expand ConditionalAccessPolicies
| extend CAResult = tostring(ConditionalAccessPolicies.result)
| extend CAPolicyName = tostring(ConditionalAccessPolicies.displayName)
| where CAResult == "failure"
| summarize
    ['Total Conditional Access Failures']=count(),
    ['Distinct Policy Failure Count']=dcount(CAPolicyName),
    ['Policy Names']=make_set(CAPolicyName)
    by UserPrincipalName
| sort by ['Distinct Policy Failure Count'] desc 

One query I really love running is the following. It hunts through all sign in data, and returns policies that are not in use.

//Find Azure AD conditional access policies that have no hits for 'success' or 'failure' over the last month
//Check that these policies are configured correctly or still required
SigninLogs
| where TimeGenerated > ago (30d)
| project TimeGenerated, ConditionalAccessPolicies
| mv-expand ConditionalAccessPolicies
| extend CAResult = tostring(ConditionalAccessPolicies.result)
| extend ['Conditional Access Policy Name'] = tostring(ConditionalAccessPolicies.displayName)
| summarize ['Conditional Access Result']=make_set(CAResult) by ['Conditional Access Policy Name']
| where ['Conditional Access Result'] !has "success"
    and ['Conditional Access Result'] !has "failure"
    and ['Conditional Access Result'] !has "unknownFutureValue"
| sort by ['Conditional Access Policy Name'] asc 

This query uses the summarize operator to build a set of all the outcomes for each policy. We create a set of all the outcomes for that policy – success, not applied, failure. Then we exclude any policy that has a success or a failure. If we see a success or failure event, then the policy is in use. If all we see is ‘not Applied’ then no sign ins have triggered that policy. Maybe the settings aren’t right, or you have excluded too many people?

We can even use some of the more advanced operators to look for anomalies in our data. The series_decompose_anomalies operator lets us hunt through time series data. From that data is flags anything it believes is an anomaly.

//Detect anomalies in the amount of conditional access failures by users in your tenant, then visualize those conditional access failures
//Starttime and endtime = which period of data to look at, i.e from 21 days ago until today.
let startdate=21d;
let enddate=1d;
//Timeframe = time period to break the data up into, i.e 1 hour blocks.
let timeframe=1h;
//Sensitivity = the lower the number the more sensitive the anomaly detection is, i.e it will find more anomalies, default is 1.5
let sensitivity=2;
//Threshold = set this to tune out low count anomalies, i.e when total failures for a user doubles from 1 to 2
let threshold=5;
let outlierusers=
SigninLogs
| where TimeGenerated between (startofday(ago(startdate))..startofday(ago(enddate)))
| where ResultType == "53003"
| project TimeGenerated, ResultType, UserPrincipalName
| make-series CAFailureCount=count() on TimeGenerated from startofday(ago(startdate)) to startofday(ago(enddate)) step timeframe by UserPrincipalName 
| extend outliers=series_decompose_anomalies(CAFailureCount, sensitivity)
| mv-expand TimeGenerated, CAFailureCount, outliers
| where outliers == 1 and CAFailureCount > threshold
| distinct UserPrincipalName;
//Optionally visualize the anomalies
SigninLogs
| where TimeGenerated between (startofday(ago(startdate))..startofday(ago(enddate)))
| where ResultType == "53003"
| project TimeGenerated, ResultType, UserPrincipalName
| where UserPrincipalName in (outlierusers)
| summarize CAFailures=count()by UserPrincipalName, bin(TimeGenerated, timeframe)
| render timechart with (ytitle="Failure Count",title="Anomalous Conditional Access Failures")

I am not sure I would want to alert on every Conditional Access failure. You are likely to have a lot of them. But what about users failing Conditional Access to multiple applications, in a short time period? This query finds any users that get blocked by Conditional Access to 5 of more unique applications within an hour.

SigninLogs
| where TimeGenerated > ago (1d)
| project TimeGenerated, ConditionalAccessPolicies, UserPrincipalName, AppDisplayName
| mv-expand ConditionalAccessPolicies
| extend CAResult = tostring(ConditionalAccessPolicies.result)
| extend CAPolicyName = tostring(ConditionalAccessPolicies.displayName)
| where CAResult == "failure"
| summarize
    ['List of Failed Application']=make_set(AppDisplayName),
    ['Count of Failed Application']=dcount(AppDisplayName)
    by UserPrincipalName, bin(TimeGenerated, 1h)
| where ['Count of Failed Application'] >= 5

Audit Insights

The second key part of Conditional Access monitoring is auditing changes. Much like a firewall, changes to Conditional Access policies should be alerted on. Accidental or malicious changes to your policies can decrease your security posture significantly. Any changes to policies are held in the Azure Active Directory audit log table.

Events are logged under three different categories.

  • Add conditional access policy
  • Update conditional access policy
  • Delete conditional access policy

A simple query will return any of these actions in your environment.

AuditLogs
| where TimeGenerated > ago(7d)
| where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy")

You will notice one thing straight away. It is difficult to work out what has actually changed. Most of the items are stored as GUIDs buried in JSON. It is hard to tell the old setting from the new. I wouldn’t even bother trying to make sense of it. Instead let’s update our query to this.

AuditLogs
| where TimeGenerated > ago(7d)
| where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy")
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ['Policy Name'] = tostring(TargetResources[0].displayName)
| extend ['Policy Id'] = tostring(TargetResources[0].id)
| project TimeGenerated, Actor, OperationName, ['Policy Name'], ['Policy Id']

Now we are returned the name of our policy, and its Id. Then we can jump into the Azure portal and see the current settings. This is where your knowledge of your environment is key. If you know the ‘Sentinel 101 Test’ policy requires MFA for all sign ins, and someone has changed the policy, you need to investigate.

We can add some more logic to our queries. For instance, we could alert on changes made by people who have never made a change before. Has an admin has been compromised? Or someone not familiar with Conditional Access was asked to make a change.

//Detects users who add, delete or update a Azure AD Conditional Access policy for the first time.
//First find users who have previously made CA policy changes, this example looks back 90 days
let knownusers=
    AuditLogs
    | where TimeGenerated > ago(90d) and TimeGenerated < ago(1d)
    | where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy")
    | extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
    | distinct Actor;
//Find new events from users not in the known user list
AuditLogs
| where TimeGenerated > ago(1d)
| where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy")
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ['Policy Name'] = tostring(TargetResources[0].displayName)
| extend ['Policy Id'] = tostring(TargetResources[0].id)
| where Actor !in (knownusers)
| project TimeGenerated, Actor, ['Policy Name'], ['Policy Id']

We can even look for actions at certain times of the day, or particular days. This query looks for changes after hours or on weekends.

//Detect changes to Azure AD Conditional Access policies on weekends or outside of business hours
let Saturday = time(6.00:00:00);
let Sunday = time(0.00:00:00);
AuditLogs
| where OperationName in ("Update conditional access policy", "Add conditional access policy", "Delete conditional access policy")
// extend LocalTime to your time zone
| extend LocalTime=TimeGenerated + 5h
// Change hours of the day to suit your company, i.e this would find activations between 6pm and 6am
| where dayofweek(LocalTime) in (Saturday, Sunday) or hourofday(LocalTime) !between (6 .. 18)
| extend ['Conditional Access Policy Name'] = tostring(TargetResources[0].displayName)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| project LocalTime, 
    OperationName, 
    ['Conditional Access Policy Name'], 
    Actor
| sort by LocalTime desc 

Managing Exclusions

Like any rules or policies in your environment, there is a chance you will need exclusions. Conditional Access policies are very granular in what you can include or exclude. You can exclude on locations, or OS types, or particular users. It is important to alert on these exclusions, and ensure they are fit for purpose. For this example I have excluded a particular group from this policy.

We can see that an ‘Update conditional access policy’ event was triggered. Again, the raw data is hard to read. So jump into the portal and check out what has been configured. Now, one very important note here. If you add a group exclusion to a policy, it will trigger an event you can track. However, if I then add users to that group, it won’t trigger a policy change event. This is because the policy itself hasn’t changed, just the membership of the group. From your point of view you will need to have visibility to both events. If your policy is changed you would want to know. If 500 users were added to the group, you would also want to know. So we can query group addition events with the below query.

    AuditLogs
    | where OperationName == "Add member to group"
    | extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
    | extend Target = tostring(TargetResources[0].userPrincipalName)
    | extend GroupName = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
    | where GroupName has "Conditional Access Exclusion"
    | project TimeGenerated, Actor, Target, GroupName

When you are creating exclusions, you want to limit those exclusions down as much as possible. We always talk about the theory of ‘least privilege’. With exclusions, I like to think of them as ‘least exclusion’. If you have a workload that needs excluding, then can we exclude a particular location, or IP, or device? This is a better security stance than a blanket exclusion of a whole policy.

You can often use two policies to achieve the best outcome. Think of the example of Exchange Online, you want to enforce MFA for everyone. But you have a service account that does some automation, and it keeps failing MFA. It signs on from one particular IP address. If you exclude it from your main policy then it is a blanket exclusion. Instead build two policies.

  • Policy 1 – Require MFA for Exchange Online
    • Includes all users, excludes your service account
    • Includes all locations
    • Includes Exchange Online
    • Control is require MFA
  • Policy 2 – Exclude MFA for Exchange Online
    • Includes only your service account
    • Includes all locations, excludes a single IP address
    • Includes Exchange Online
    • Control is require MFA

As mentioned at the outset, Conditional Access policies are combined. So this combined set of two policies achieves what we want. Our service account is only excluded from MFA from our single IP address. Let’s say the credentials for that account are compromised. The attacker tries to sign in from another location. When it signs into Exchange Online it will prompt for MFA.

If we had only one policy we don’t get the same control. If we had a single policy and excluded our service account, then it would be excluded from all locations. If we had a single policy and excluded the IP address, then all users would be excluded from that IP. So we need to build two policies to achieve the best outcome.

Of course we want to balance single exclusions with the overhead of managing many policies. The more policies you have, the harder it is to work out the effect of changes. Microsoft provides a ‘what-if’ tool for Conditional Access. It will let you build a ‘fake’ sign in and tell you which policies are applied.

Recommendations

Learning to drive and audit Conditional Access is key to securing Azure AD. Having built a lot of policies over the years, here are some of my tips.

  • Never, ever lock yourself out of the Azure portal! You get a UI warning if it believes you may be doing this. Support will be able to get you back in, but it will take time. Exclude your own account as you build policies.
  • Create broad policies that cover the most use cases. If your standard security stance is require MFA to access SSO apps then build one policy. Apply that policy to as many apps and users as possible. There is no need to build an individual policy for each app.
  • When you create exclusions, use the principal of ‘least exclusion’. When you are building an exclusion, have a think about the flow on effect. Will it decrease security for other users or workloads? Use multiple policies where practical to keep your security tight.
  • Audit any policy changes. Find the policy that was changed and review it in the Azure portal.
  • Use the ‘what-if’ tool to help you build policies. Remember that multiple policies are combined, and controls within a single policy have an order of operations.
  • Blocks override any allows!
  • Try not to keep ‘report only’ policies in report only mode too long. Once you are happy, then enable the policy. Report only should only be there to validate your policy logic.
  • If you use group exclusions, then monitor the membership of those groups. Users being added to a group that is excluded from a policy won’t trigger a policy change event. Keep on top of how many people are excluded. Once someone is in a group they tend to stay there forever. If an exclusion is temporary, make sure they are removed.
Maintaining a well managed Azure AD tenant with KQL — 16th Mar 2022

Maintaining a well managed Azure AD tenant with KQL

This article is presented as part of the #AzureSpringClean event. The idea of #AzureSpringClean is to promote well managed Azure environments. This article will focus on Azure Active Directory and how we can leverage KQL to keep things neat and tidy.

Much like on premise Active Directory, Azure Active Directory has a tendency to grow quickly. You have new users or guests being onboarded all the time. You are configuring single sign on to apps. You may create service principals for all kinds of integration. And again, much like on premise Active Directory, it is in our best interest to keep on top of all these objects. If users have left the business, or we have decommissioned applications then we also want to clean up all those artefacts.

Microsoft provide tools to help automate some of these tasks – entitlement management and access reviews. Entitlement management lets you manage identity and access at scale. You can build access packages. These access packages can contain all the access a particular role needs. You then overlay just in time access and approval workflows on top.

Access reviews are pretty self explanatory. They let you easily manage group memberships, application and role access. You can schedule access reviews to make sure people only keep the appropriate access.

So if Microsoft provide these tools, why should we dig into the data ourselves? Good question. You may not be licensed for them to start with, they are both Azure AD P2 features. You also may have use cases that fall outside of the capability of those products. Using KQL and the raw data, we can find all kinds of trends in our Azure AD tenant.

First things first though, we will need that data in a workspace! You can choose which Log Analytics workspace from the Azure Active Directory -> Diagnostics setting tab. If you use Microsoft Sentinel, you can achieve the same via the Azure Active Directory data connector.

You can pick and choose what you like. This article is going to cover these three items –

  • SignInLogs – all your normal sign ins to Azure AD.
  • AuditLogs – all the administrative activities in your tenant, like guest invites and redemptions.
  • ServicePrincipalSignInLogs – sign ins for your Service Principals.

Two things to note, you need to be Azure AD P1 to export this data and there are Log Analytics ingestion costs.

Let’s look at seven areas of Azure Active Directory –

  • Users and Guests
  • Service Principals
  • Enterprise Applications
  • Privileged Access
  • MFA and Passwordless
  • Legacy Auth
  • Conditional Access

And for each, write some example queries looking for interesting trends. Hopefully in your tenant they can provide some useful information. The more historical data you have the more useful your trends will be of course. But even just having a few weeks worth of data is valuable.

To make things even easier, for most of these queries I have used the Log Analytics demo environment. You may not yet have a workspace of your own, but you still want to test the queries out. The demo environment is free to use for anyone. Some of the data types aren’t available in there, but I have tried to use it as much as possible.

You can access the demo tenant here. You just need to login with any Microsoft account – personal or work, and away you go.

Users and Guests

User lifecycle management can be hard work! Using Azure AD guests can add to that complexity. Guests likely work for other companies or partners. You don’t manage them fully in the way you would your own staff.

Let’s start by finding when our users last signed in. Maybe you want to know when users haven’t signed in for more than 45 days. We can even retrieve our user type at the same time. You could start by disabling these accounts.

SigninLogs
| where TimeGenerated > ago(365d)
| where ResultType == "0"
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
| project TimeGenerated, UserPrincipalName, UserType, ['Days Since Last Logon']=datetime_diff("day", now(),TimeGenerated)
| where ['Days Since Last Logon'] >= 45 | sort by ['Days Since Last Logon'] desc  

We use a really useful operator in this query called datetime_diff. It lets us calculate the time between two events in a way that is easier for us to read. So in this example, we calculate the difference between the last sign in and now in days. UTC time can be hard to read, so let KQL do the heavy lifting for you.

We can even visualize the trend of our last sign ins. In this example we look at when our inbound guests last signed in. Inbound guests are those from other tenants connecting to yours. To do this, we summarize our data twice. First we get the last sign in date for each guest. Then we group that data into each month.

SigninLogs
| where TimeGenerated > ago (360d)
| where UserType == "Guest"
| where AADTenantId != HomeTenantId and HomeTenantId != ResourceTenantId
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
| project TimeGenerated, UserPrincipalName
| summarize ['Count of Last Signin']=count() by startofmonth(TimeGenerated)
| render columnchart with (title="Guest inactivity per month")

Another interesting thing with Azure AD guests is that invites never expire. So once you invite a guest the pending invite will be there forever. You can use KQL to find invites that have been sent but not redeemed.

let timerange=180d;
let timeframe=30d;
AuditLogs
| where TimeGenerated between (ago(timerange) .. ago(timeframe)) 
| where OperationName == "Invite external user"
| extend GuestUPN = tolower(tostring(TargetResources[0].userPrincipalName))
| summarize arg_max(TimeGenerated, *) by GuestUPN
| project TimeGenerated, GuestUPN
| join kind=leftanti  (
    AuditLogs
    | where TimeGenerated > ago (timerange)
    | where OperationName == "Redeem external user invite"
    | where CorrelationId <> "00000000-0000-0000-0000-000000000000"
    | extend d = tolower(tostring(TargetResources[0].displayName))
    | parse d with * "upn: " GuestUPN "," *
    | project TimeGenerated, GuestUPN)
    on GuestUPN
| project TimeGenerated, GuestUPN, ['Days Since Invite Sent']=datetime_diff("day", now(), TimeGenerated)

For this we join two queries – guest invites and guest redemptions. Then search for when there isn’t a redemption. We then re-use our datetime_diff to work out how many days since the invite was sent. For this query we also exclude invites sent in the last 30 days. Those guests may just not have gotten around to redeeming their invites yet. Once a user has been invited, the user object already exists in your tenant. It just sits there idle until they redeem the invite. If they haven’t accepted the invite in 45 days, then it is probably best to delete the user objects.

Service Principals

The great thing about KQL is once we write a query we like, we can easily re-use it. Service principals are everything in Azure AD. They control what your applications can access. Much like users, we may no longer be using service principals. Perhaps that application has been decommissioned. Maybe the integration that was in use has been retired. Much like users, if they are no longer in use, we should remove them.

Let’s re-use our inactive user query, and this time look for inactive service principals.

AADServicePrincipalSignInLogs
| where TimeGenerated > ago(365d)
| where ResultType == "0"
| summarize arg_max(TimeGenerated, *) by AppId
| project TimeGenerated, ServicePrincipalName, ['Days Since Last Logon']=datetime_diff("day", now(),TimeGenerated)
| where ['Days Since Last Logon'] >= 45 | sort by ['Days Since Last Logon'] desc  

Have a look through the list and see which can be deleted.

Service principals can fail to sign in for many reasons, much like regular users. With regular users though we get an easy to read description that can help us out. With service principals, we unfortunately just get an error code. Using the case operator we can add our own friendly descriptions to help us out. We just say, when our result code is this, then provide us an easy to read description.

AADServicePrincipalSignInLogs
| where ResultType != "0"
| extend ErrorDescription = case (
    ResultType == "7000215", strcat("Invalid client secret is provided"),
    ResultType == "7000222", strcat("The provided client secret keys are expired"),
    ResultType == "700027", strcat("Client assertion failed signature validation"),
    ResultType == "700024", strcat("Client assertion is not within its valid time range"),
    ResultType == "70021", strcat("No matching federated identity record found for presented assertion"),
    ResultType == "500011", strcat("The resource principal named {name} was not found in the tenant named {tenant}"),
    ResultType == "700082", strcat("The refresh token has expired due to inactivity"),
    ResultType == "90025", strcat("Request processing has exceeded gateway allowance"),
    ResultType == "500341", strcat("The user account {identifier} has been deleted from the {tenant} directory"),
    ResultType == "100007", strcat("AAD Regional ONLY supports auth either for MSIs OR for requests from MSAL using SN+I for 1P apps or 3P apps in Microsoft infrastructure tenants"),
    ResultType == "1100000", strcat("Non-retryable error has occurred"),
    ResultType == "90033", strcat("A transient error has occurred. Please try again"),
    ResultType == "53003",strcat("Access has been blocked by Conditional Access policies. The access policy does not allow token issuance."),
    "Unknown"
    )
| project TimeGenerated, ServicePrincipalName, ServicePrincipalId, ErrorDescription, ResultType, IPAddress

You may be particularly interested in signins with expired or invalid secrets. Are the service principals still in use? Perhaps you can remove them. Or you may be interested where conditional access blocks a service principal sign-in.

Have the credentials for that service principal leaked? It may be worth investigating and rotating credentials if required.

Enterprise Applications

For applications that have had no sign in activity for a long time that could be a sign of a couple of things. Firstly, you may have retired that application. If that is the case, then you should delete the enterprise application from your tenant.

Secondly, it may mean that people are bypassing SSO to access the application. For example, you may use a product like Confluence. You may have enabled SSO to it, but users still have the ability to sign on using ‘local’ credentials. Maybe users do that because it is more convenient to bypass conditional access. For those applications you know are still in use, but you aren’t seeing any activity you should investigate. If the applications have the ability to prevent the use of local credentials then you should enable that. Perhaps you have the ability to set the password for local accounts, you could set them to something random the users don’t know to enforce SSO.

If those technical controls don’t exist, you may need to try softer controls. You should try get buy in from the application owners or users and explain the risks of local credentials. A good point to highlight is that when a user leaves an organization then their account is disabled. When that happens, they lose access to any SSO enforced applications. In applications that use local credentials the lifecycle of accounts is likely poorly managed. Application owners usually don’t want ex employees still having access to data, so that may help enforce good behaviour.

We can find apps that have had no sign ins in the last 30 days easily.

SigninLogs
| where TimeGenerated > ago (365d)
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by AppId
| project
    AppDisplayName,
    ['Last Logon Time']=TimeGenerated,
    ['Days Since Last Logon']=datetime_diff("day", now(), TimeGenerated)
| where ['Days Since Last Logon'] > 30 | sort by ['Days Since Last Logon'] desc 

Maybe you are interested in application usage more generally. We can bring back some stats for each of your applications. Perhaps you want to see total sign ins to each vs distinct sign ins. Some applications may be very noisy with their sign in data. But when you look at distinct users, they aren’t as busy as you thought.

SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize ['Total Signins']=count(), ['Distinct User Signins']=dcount(UserPrincipalName) by AppDisplayName | sort by ['Distinct User Signins'] desc 

You may be also interested in the breakdown of guests vs members for each application. Maybe guests are accessing something they aren’t meant to. If you notice that you can put a group in front of that app to control access.

For this query we use the dcountif operator. Which returns a distinct count of a column where something is true. So for this example, we return a distinct user count where the UserType is a member. Then again for guests.

SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize ['Distinct Member Signins']=dcountif(UserPrincipalName, UserType == "Member"), ['Distinct Guest Signins']=dcountif(UserPrincipalName, UserType == "Guest")  by AppDisplayName | sort by ['Distinct Guest Signins'] 

Use your knowledge of your environment to make sense of the results. If you have lots of guests accessing something you didn’t expect, then investigate.

Privileged Access

As always, your privileged users deserve a more scrutiny. You can detect when a user accesses particular Azure applications for the first time. This query looks back 90 days, then detects if a user accesses one of these applications for the first time.

//Detects users who have accessed Azure AD Management interfaces who have not accessed in the previous timeframe
let timeframe = startofday(ago(90d));
let applications = dynamic(["Azure Active Directory PowerShell", "Microsoft Azure PowerShell", "Graph Explorer", "ACOM Azure Website"]);
SigninLogs
| where TimeGenerated > timeframe and TimeGenerated < startofday(now())
| where AppDisplayName in (applications)
| project UserPrincipalName, AppDisplayName
| join kind=rightanti
    (
    SigninLogs
    | where TimeGenerated > startofday(now())
    | where AppDisplayName in (applications)
    )
    on UserPrincipalName, AppDisplayName
| where ResultType == 0
| project TimeGenerated, UserPrincipalName, ResultType, AppDisplayName, IPAddress, Location, UserAgent

You could expand the list to include privileged applications specific to your environment too.

If you use Azure AD Privileged Identity Management (PIM) you can keep an eye on those actions too. For example, we can find users who haven’t elevated to a role for over 30 days. If you have users with privileged roles but they aren’t actively using them then they should be removed. This query also returns you the role which they last activated.

AuditLogs
| where TimeGenerated > ago (365d)
| project TimeGenerated, OperationName, Result, TargetResources, InitiatedBy
| where OperationName == "Add member to role completed (PIM activation)"
| where Result == "success"
| extend ['Last Role Activated'] = tostring(TargetResources[0].displayName)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| summarize arg_max(TimeGenerated, *) by Actor
| project Actor, ['Last Role Activated'], ['Last Activation Time']=TimeGenerated, ['Days Since Last Activation']=datetime_diff("day", now(), TimeGenerated)
| where ['Days Since Last Activation'] >= 30
| sort by ['Days Since Last Activation'] desc

One of the biggest strengths of KQL is manipulating time. We can use that capability to add some logic to our queries. For example, we can find PIM elevation events that are outside of business hours.

let timerange=30d;
AuditLogs
// extend LocalTime to your time zone
| extend LocalTime=TimeGenerated + 5h
| where LocalTime > ago(timerange)
// Change hours of the day to suit your company, i.e this would find activations between 6pm and 6am
| where hourofday(LocalTime) !between (6 .. 18)
| where OperationName == "Add member to role completed (PIM activation)"
| extend RoleName = tostring(TargetResources[0].displayName)
| project LocalTime, OperationName, Identity, RoleName, ActivationReason=ResultReason

If this is unexpected behaviour for you then it’s worth looking at. Maybe an account has been compromised. Or it could be a sign of malicious insider activity from your admins.

MFA & Passwordless

In a perfect world we would have MFA on everything. That may not be the reality in your tenant. In fact, it’s not the reality in many tenants. For whatever reason your MFA coverage may be patchy. You could be on a roadmap to deploying it, or trying to onboard applications to SSO.

Our sign on logs provide great insight to single factor vs multi factor connections. We can summarize and visualize that data in different ways to track your MFA progress. If you want to just look across your tenant as a whole we can do that of course.

SigninLogs
| where TimeGenerated > ago (30d)
| summarize ['Single Factor Authentication']=countif(AuthenticationRequirement == "singleFactorAuthentication"), ['Multi Factor Authentication']=countif(AuthenticationRequirement == "multiFactorAuthentication") by bin(TimeGenerated, 1d)
| render timechart with (ytitle="Count", title="Single vs Multifactor Authentication last 30 days")

There is some work to be done in the demo tenant!

You can even build a table out of all your applications. From that we can count the percentage of sign ins that are covered by MFA. This may give you some direction to enabling MFA.

let timerange=30d;
SigninLogs
| where TimeGenerated > ago(timerange)
| where ResultType == 0
| summarize
    TotalCount=count(),
    MFACount=countif(AuthenticationRequirement == "multiFactorAuthentication"),
    nonMFACount=countif(AuthenticationRequirement == "singleFactorAuthentication")
    by AppDisplayName
| project AppDisplayName, TotalCount, MFACount, nonMFACount, MFAPercentage=(todouble(MFACount) * 100 / todouble(TotalCount))
| sort by MFAPercentage desc 

If that much data is too overwhelming, why not start with your most popular applications? Here we use the same logic, but first calculate our top 20 applications.

let top20apps=
    SigninLogs
    | where TimeGenerated > ago (30d)
    | summarize UserCount=dcount(UserPrincipalName)by AppDisplayName
    | sort by UserCount desc 
    | take 20
    | project AppDisplayName;
//Use that list to calculate the percentage of signins to those apps that are covered by MFA
SigninLogs
| where TimeGenerated > ago (30d)
| where AppDisplayName in (top20apps)
| summarize TotalCount=count(),
    MFACount=countif(AuthenticationRequirement == "multiFactorAuthentication"),
    nonMFACount=countif(AuthenticationRequirement == "singleFactorAuthentication")
    by AppDisplayName
| project AppDisplayName, TotalCount, MFACount, nonMFACount, MFAPercentage=(todouble(MFACount) * 100 / todouble(TotalCount))
| sort by MFAPercentage asc  

Passwordless technology has been around for a little while now, but it is only starting now to hit mainstream. Azure AD provides lots of different options. FIDO2 keys, Windows Hello for Business, phone sign in etc. You can track password vs passwordless sign ins to your tenant.

let timerange=180d;
SigninLogs
| project TimeGenerated, AuthenticationDetails
| where TimeGenerated > ago (timerange)
| extend AuthMethod = tostring(parse_json(AuthenticationDetails)[0].authenticationMethod)
| where AuthMethod != "Previously satisfied"
| summarize
    Password=countif(AuthMethod == "Password"),
    Passwordless=countif(AuthMethod in ("FIDO2 security key", "Passwordless phone sign-in", "Windows Hello for Business", "Mobile app notification","X.509 Certificate"))
    by startofweek(TimeGenerated)
| render timechart  with ( xtitle="Week", ytitle="Signin Count", title="Password vs Passwordless signins per week")

Passwordless needs a little more love in the demo tenant for sure!

You could even go one better, and track each type of passwordless technology. Then you can see what is the most favored.

let timerange=180d;
SigninLogs
| project TimeGenerated, AuthenticationDetails
| where TimeGenerated > ago (timerange)
| extend AuthMethod = tostring(parse_json(AuthenticationDetails)[0].authenticationMethod)
| where AuthMethod in ("FIDO2 security key", "Passwordless phone sign-in", "Windows Hello for Business", "Mobile app notification","X.509 Certificate")
| summarize ['Passwordless Method']=count()by AuthMethod, startofweek(TimeGenerated)
| render timechart with ( xtitle="Week", ytitle="Signin Count", title="Passwordless methods per week")

Legacy Authentication

There is no better managed Azure AD tenant than one where legacy auth is completely disabled. Legacy auth is a security issue because it isn’t MFA aware. If one of your users are compromised, they could bypass MFA policies by using legacy clients such as IMAP or ActiveSync. The only conditional access rules that work for legacy auth are allow or block. Because conditional access defaults to allow, unless you explicitly block legacy auth, those connections will be allowed.

Microsoft are looking to retire legacy auth in Exchange Online on October 1st, 2022 which is fantastic. However, legacy auth can be used for non Exchange Online workloads. We can use our sign in log data to track exactly where legacy auth is used. That way we can not only be ready for October 1, but maybe we can retire it from our tenant way before then. Win win!

The Azure AD sign in logs contain useful information about what app is being used during a legacy sign in. Let’s start there and look at all the various legacy client apps. We can also retrieve the users for each.

SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where ClientAppUsed in ("Exchange ActiveSync", "Exchange Web Services", "AutoDiscover", "Unknown", "POP3", "IMAP4", "Other clients", "Authenticated SMTP", "MAPI Over HTTP", "Offline Address Book")
| summarize ['Count of legacy auth attempts'] = count()by ClientAppUsed, UserPrincipalName
| sort by ClientAppUsed asc, ['Count of legacy auth attempts'] desc 

This will show us each client app, such as IMAP or Activesync. For each app it lists the most active users for each. That will give you good direction to start migrating users and applications to modern auth.

If you want to visualize how you are going with disabling legacy auth, we can do that too. We can even compare that to how many legacy auth connections are blocked.

SigninLogs
| where TimeGenerated > ago(180d)
| where ResultType in ("0", "53003")
| where ClientAppUsed in ("Exchange ActiveSync", "Exchange Web Services", "AutoDiscover", "Unknown", "POP3", "IMAP4", "Other clients", "Authenticated SMTP", "MAPI Over HTTP", "Offline Address Book")
| summarize
    ['Legacy Auth Users Allowed']=dcountif(UserPrincipalName, ResultType == 0),
    ['Legacy Auth Users Blocked']=dcountif(UserPrincipalName, ResultType == 53003)
    by bin(TimeGenerated, 1d)
| render timechart with (ytitle="Count",title="Legacy auth distinct users allowed vs blocked by Conditional Access")

Hopefully your allowed connections are on the decrease.

You may be wondering why blocks aren’t increasing at the same time. That is easily explained. For instance, say you migrate from Activesync to using the Outlook app on your phone. Once you make that change, there simply won’t be legacy auth connection to block anymore. Visualizing the blocks however provides a nice baseline. If you see a sudden spike, then something out there is still trying to connect and you should investigate.

Conditional Access

Azure AD conditional access is key to security for your Azure AD tenant. It decides who is allowed in, or isn’t. It also defines the rules people must follow to be allowed in. For instance, requiring multi factor authentication. Azure AD conditional access evaluates every sign into your tenant, and decides if they are approved to enter. The detail for conditional access evaluation is held within every sign in event. If we look at a sign in event from the demo environment, we can see what this data looks like.

So Azure AD evaluated this sign in. It determined the policy ‘MeganB MCAS Proxy’ was in scope for this sign in. So then it enforced the sign in to go through Defender for Cloud Apps (previously Cloud App Security). You can imagine if you have lots of policies, and lots of sign ins, this is a huge amount of data. We can summarize this data in lots of ways. Like any security control, we should regularly review to confirm that control is in use. We can find any policies that have had either no success events (user allowed in). And also no failure events (user blocked).

SigninLogs
| where TimeGenerated > ago (30d)
| project TimeGenerated, ConditionalAccessPolicies
| mv-expand ConditionalAccessPolicies
| extend CAResult = tostring(ConditionalAccessPolicies.result)
| extend CAPolicyName = tostring(ConditionalAccessPolicies.displayName)
| summarize CAResults=make_set(CAResult) by CAPolicyName
| where CAResults !has "success" and CAResults !has "failure"

In this test tenant we can see some policies that we have a hit on.

Some of these policies are not enabled. Some are in report only mode. Or they are simply not applying to any sign ins. You should review this list to make sure it is what you expect. If you are seeing lots of ‘notApplied’ results, make sure you have configured your policies properly.

If you wanted to focus on conditional access failures specifically, you can do that too. This query will find any policy with failures, then return the reason for the failure. This can be simply informational for you to show they are working as intended. Or if you are getting excessive failures maybe your policy needs tuning.

SigninLogs
| where TimeGenerated > ago (30d)
| project TimeGenerated, ConditionalAccessPolicies, ResultType, ResultDescription
| mv-expand ConditionalAccessPolicies
| extend CAResult = tostring(ConditionalAccessPolicies.result)
| extend CAPolicyName = tostring(ConditionalAccessPolicies.displayName)
| where CAResult == "failure"
| summarize CAFailureCount=count()by CAPolicyName, ResultType, ResultDescription
| sort by CAFailureCount desc 

You could visualize the same failures if you wanted to look at any trends or spikes.

let start = now(-90d);
let end = now();
let timeframe= 12h;
SigninLogs
| project TimeGenerated, ResultType, ConditionalAccessPolicies
| where ResultType == 53003
| extend FailedPolicy = tostring(ConditionalAccessPolicies[0].displayName)
| make-series FailureCount = count() default=0 on TimeGenerated in range(start,end, timeframe) by FailedPolicy
| render timechart 

Summary

I have provided you with a few examples of different queries to help manage your tenant. KQL gives you the power to manipulate your data in so many ways. Have a think about what is important to you. You can then hopefully use the above examples as a starting point to find what you need.

If you are licensed for the Microsoft provided tools then definitely use them. However if there are gaps, don’t be scared of looking at the data yourself. KQL is powerful and easy to use.

There are also a number of provided workbooks in your Azure AD tenant you can use too. You can find them under ‘workbooks’ in Azure AD. They cover some queries similar to this, and plenty more.

You need to combine the Microsoft tools and your own knowledge to effectively manage your directory.

Detecting privilege escalation with Azure AD service principals in Microsoft Sentinel — 4th Jan 2022

Detecting privilege escalation with Azure AD service principals in Microsoft Sentinel

Defenders spend a lot of time worrying about the security of the user identities they manage. Trying to stop phishing attempts or deploying MFA. You want to restrict privilege, have good passphrase policies and deploy passwordless solutions. If you use Azure AD, there is another type of identity that is important to keep an eye on – Azure AD service principals.

There is an overview of service principals here. Think about your regular user account. When you want to access Office 365, you have a user principal in Azure AD. You give that user access, to SharePoint, Outlook and Teams, and when you sign in you get that access. Your applications are the same. They have a principal in Azure AD, called a service principal. These define what your applications can access.

You haven’t seen anywhere in the Azure AD portal a ‘create service principal’ button. Because there isn’t one. Yet you likely have plenty of service principals already in your tenant. So how do they get there? Well, in several ways.

So if we complete any of the following actions, we will end up with a service principal –

  1. Add an application registration – each time you register an application. For example to enable SSO for an application you are developing. Or to integrate with Microsoft Graph. You will end up with both an application object and an service principal in your tenant.
  2. Install a third party OAuth application – if you install an app to your tenant. For instance an application in Microsoft Teams. You will have a service principal created for it.
  3. Install a template SAML application from the gallery – when you setup SSO with a third party SaaS product. If you deploy their gallery application to help. Both an application object and a service principal in your tenant.
  4. Add a managed identity – each time you create a managed identity, you also create a service principal.

You may also have legacy service principals. Created before the current app registration process existed.

If you browse to Azure AD -> Enterprise applications, you can view them all. Are all these service principals a problem? Not at all, it is the way that Azure Active Directory works. It uses service principals to define access and permissions for applications. Service principals are in a lot of ways much more secure than alternatives. Take a service principal for a managed identity – it can end the need for developers to use credentials. If you want an Azure virtual machine to access to an Azure Key Vault, you can create a managed identity. This also creates a service principal in Azure AD. Then assign the service principal access to your key vault. Your virtual machine then identifies itself to the key vault. The key vault says ‘hey I know this service principal has access to this key vault’ and gives it access. Much better than handling passwords and credentials in code.

In the case of a system assigned managed identity, the lifecycle of the service principal is also managed. If you create a managed identity for a Azure virtual machine then decommission the virtual machine. The service principal, and any access it has, is also removed.

Like any identity, we can grant service principals excess privilege. You could make a service account in on premise Active Directory a domain admin, you shouldn’t, but you can. Service principals are the same, we can assign all kinds of privilege in Azure AD and to Azure resources. So how can service principals get privilege, and what kind of privilege can they have? We can build on our visualization of we created service principals. Now we add how they gain privilege.

So much like users, we can assign various access to service principals, such as –

  1. Assigned an Azure AD to role – if we add them to roles such as global or application administrator.
  2. Granted access to the Microsoft Graph or other Microsoft API – if we add permissions like Directory.ReadWrite.All or Policy.ReadWrite.ConditionalAccess from Microsoft Graph. Or other API access like Defender ATP or Dynamics 365, or your own APIs.
  3. Granted access to Azure RBAC – if we add access such as owner rights to a subscription or contributor to a resource group.
  4. Given access to specific Azure workloads – such as being able to read secrets from an Azure Key Vault.

Service principals having privilege is not an issue, in fact, they need to have privilege. If we want to be able to SSO users to Azure AD then the service principal needs that access. Or if we want to automate retrieving emails from a shared mailbox then we will need to provide that access. Like users, we can assign incorrect or excessive privilege which is then open to abuse. Explore the abuse of service principals by checking the following article from @DebugPrivilege. It shows how you can use the managed identity of a virtual machine to retrieve secrets from a key vault.

We can get visibility into any of these changes in Microsoft Sentinel. When we grant a service principal access to Azure AD or to Microsoft Graph, we use the Azure AD Audit log. Which we access via the AuditLogs table in Sentinel. For changes to Azure RBAC and specific Azure resources, we use the AzureActivity or AzureDiagnostics table.

You can add Azure AD Audit Logs to your Sentinel instance. You do this via the Azure Active Directory connector under data connectors. This is a very useful table but ingestion fees will apply.

For the sake of this blog, I have created a service principal called ‘Learn Sentinel’. I used the app registration portal in Azure AD. We will now give privilege to that service principal and then detect in Sentinel.

Adding Azure Active Directory Roles to a Service Principal

If we work through our list of how a service principal can gain privilege we will start with adding an Azure AD role. I have added the ‘Application Administrator’ role to my service principal using PowerShell. We can run the cmdlet below. Where ObjectId is the Id of the role, and RefObjectId is the Object Id of the service principal. You can get all the Ids of all the roles by first running Get-AzureADDirectoryRole first.

Add-AzureADDirectoryRoleMember -ObjectId 67513fd7-cc60-456c-9cdd-c962c884fbdc -RefObjectId a0f399db-f358-429c-a743-735ab902fcbe

We track this activity under the action ‘Add member to role’ in our Audit Log. Which is the same action you see when we add a regular user account to a role. There is a field, nested in the TargetResources data, that we can leverage to ensure our query only returns service principals –

If we complete our query, we can filter for only events where the type is “ServicePrincipal”

AuditLogs
| where OperationName == "Add member to role"
| extend ServicePrincipalType = tostring(TargetResources[0].type)
| extend ServicePrincipalObjectId = tostring(TargetResources[0].id)
| extend RoleAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| extend ServicePrincipalName = tostring(TargetResources[0].displayName)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| where ServicePrincipalType == "ServicePrincipal"
| project TimeGenerated, OperationName, RoleAdded, ServicePrincipalName, ServicePrincipalObjectId, Actor, ActorIPAddress

If we run our query we see the activity with the details we need. When the event occurred, what role, to which service principal, and who did it.

Everyone uses Azure AD in different ways, but this should not be a very common event in most tenants. Especially with high privilege roles such as Application, Privileged Authentication or Global Administrator. You should alert on any of these events. To see how you could abuse the Application Administrator role, check out this blog post from @_wald0. It shows how you can leverage that role to escalate privilege.

Adding Microsoft Graph (or other API) access to a Service Principal

If you create service principals for integration with other Microsoft services like Azure AD or Office 365 you will need to add access to make it work. It is common for third party applications, or those you are developing in house, to request access. It is important to only grant the access required.

For this example I have added

  • Policy.ReadWrite.ConditionalAccess (ability to read & write conditional access policies)
  • User.Read.All (read users full profiles)

to our same service principal.

When we add Microsoft Graph access to an app, the Azure AD Audit Log tracks the event as “Add app role assignment to service principal”. We can parse out the relevant information we want in our query to return the specifics. You can use this as the completed query to find these events, including the user that did it.

AuditLogs
| where OperationName == "Add app role assignment to service principal"
| extend AppRoleAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ServicePrincipalObjectId = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[3].newValue)))
| extend ServicePrincipalName = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[4].newValue)))
| project TimeGenerated, OperationName, AppRoleAdded, ServicePrincipalName, ServicePrincipalObjectId,Actor, ActorIPAddress

When we run our query we see the events, even though I added both permissions together, we get two events.

Depending on how often you create service principals in your tenant, and who can grant access I would alert on all these events to ensure that service principals are not granted excessive privilege. This query also covers other Microsoft APIs such as Dynamics or Defender, and your own personal APIs you protect with Azure AD.

Adding Azure access to a Service Principal

We can grant service principals access to high level management scopes in Azure, such as subscriptions or resource groups. For instance, if you had an asset management system that you used to track your assets in Azure. It could use Azure AD for authentication and authorization. You would create a service principal for your asset management system, then give it read access your subscriptions. The asset management application could then view all your assets in those subscriptions. We track these kind of access changes in the AzureActivity log. This is a free table so you should definitely ingest it.

For this example I have added our service principal as a contributor on a subscription and a reader on a resource group.

The AzureActivity log can be quite verbose and the structure of the logs changes often. For permissions changes we are after the OperationNameValue of “MICROSOFT.AUTHORIZATION/ROLEASSIGNMENTS/WRITE”. When we look at the structure of some of the logs, we can see that we can filter on service principals. As opposed to granting users access.

We can use this query to search for all events where a service principal was given access.

AzureActivity
| where OperationNameValue == "MICROSOFT.AUTHORIZATION/ROLEASSIGNMENTS/WRITE"
| extend ServicePrincipalObjectId = tostring(parse_json(tostring(parse_json(tostring(Properties_d.requestbody)).Properties)).PrincipalId)
| extend ServicePrincipalType = tostring(parse_json(tostring(parse_json(tostring(Properties_d.requestbody)).Properties)).PrincipalType)
| extend Scope = tostring(parse_json(tostring(parse_json(tostring(Properties_d.requestbody)).Properties)).Scope)
| extend RoleAdded = tostring(parse_json(tostring(parse_json(tostring(parse_json(Properties).requestbody)).Properties)).RoleDefinitionId)
| extend Actor = tostring(Properties_d.caller)
| where ServicePrincipalType == "ServicePrincipal"
| project TimeGenerated, RoleAdded, Scope, ServicePrincipalObjectId, Actor

We see our two events. The first when I added a service principal to the subscription, then second to a resource group. You can see the target under ‘Scope’.

You will notice a couple of things. The name of role assigned (in this example, contributor and reader) isn’t returned. Instead we see the role id (the final section of the RoleAdded field). You can find the list of mappings here. We are also only returned the object id of our service principal, not the friendly name. Unfortunately the friendly name isn’t contained within the logs, but this still alerts us to investigate.

When you assign access to subscription or resource group, you may notice you have an option. Either a user, group or service principal or a managed identity.

The above query will find any events for service principals or managed identities. You won’t need a specific one for managed identities.

Adding Azure workload access to a Service Principal

We can also grant our service principals access to Azure workloads. Take for instance being able to read or write secrets into an Azure Key Vault. We will use that as our example below. I have given our service principal the ability to read and list secrets from a key vault.

We track this in the AzureDiagnostics table for Azure Key Vault. We can use the following query to track key vault changes.

AzureDiagnostics
| where ResourceType == "VAULTS"
| where OperationName == "VaultPatch"
| where ResultType == "Success"
| project-rename ServicePrincipalAdded=addedAccessPolicy_ObjectId_g, Actor=identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_name_s, AddedKeyPolicy = addedAccessPolicy_Permissions_keys_s, AddedSecretPolicy = addedAccessPolicy_Permissions_secrets_s,AddedCertPolicy = addedAccessPolicy_Permissions_certificates_s
| where isnotempty(AddedKeyPolicy) or isnotempty(AddedSecretPolicy) or isnotempty(AddedCertPolicy)
| project TimeGenerated, KeyVaultName=Resource, ServicePrincipalAdded, Actor, IPAddressofActor=CallerIPAddress, AddedSecretPolicy, AddedKeyPolicy, AddedCertPolicy

We find the service principal Id that we added, the key vault permissions added, the name of the vault and who did it.

We could add a service principal to many Azure resources. Azure Storage, Key Vault, SQL, are a few, but similar events should be available for them all.

Azure AD Service Principal Sign In Data

As well as audit data to track access changes, we can also view the sign in information for service principals and managed identities. Microsoft Sentinel logs these two types of sign ins in two separate tables. For regular service principals we query the AADServicePrincipalSignInLogs. For managed identity sign in data we look in AADManagedIdentitySignInLogs. You can enable both logs in the Azure Active Directory data connector. These should be low volume compared to regular sign in data but fees will apply.

Service principals sign in logs aren’t as detailed as your regular user sign in data. These types of sign ins are non interactive and are instead accessing resources protected by Azure AD. There are no fields for things like multifactor authentication or anything like that. This makes the data easy to make sense of. If we look at a sign in for our test service principal, you will see the information you have available to you.

AADServicePrincipalSignInLogs
| project TimeGenerated, ResultType, IPAddress, ServicePrincipalName, ServicePrincipalId, ServicePrincipalCredentialKeyId, AppId, ResourceDisplayName, ResourceIdentity

We can see we get some great information. There are other fields available but for the sake of brevity I will only show a few.

We get a ResultType, much like a regular user sign in (0 = success). The IP address, the name of the service principal, then the Id’s of pretty much everything. Even the resource the service principal was accessing. We can summarize our data to see patterns for all our service principals. For instance, by listing all the IP addresses each service principal has signed in from in the last month.


AADServicePrincipalSignInLogs
| where TimeGenerated > ago(30d)
| where ResultType == "0"
| summarize IPAddresses=make_set(IPAddress) by ServicePrincipalName, AppId

Conditional Access for workload identities was recently released for Azure AD. If your service principals log in from the same IP addresses then enforce that with conditional access. That way, if we lose client secrets or certificates, and an attacker signs in from a new IP address we will block it. Much like conditional access for users. The above query will give you your baseline of IP addresses to start building policies.

We can also summarize the resources that each service principal has accessed. If you have service principals that can access many resources such as Microsoft Graph, the Windows Defender ATP API and Azure Service Management API. Those service principals likely have a larger blast radius if compromised –

AADServicePrincipalSignInLogs
| where TimeGenerated > ago(30d)
| where ResultType == "0"
| summarize ResourcesAccessed=make_set(ResourceDisplayName) by ServicePrincipalName

We can use similar detection patterns we would use for users with service principals. For instance detecting when they sign in from a new IP address not seen for that service principal. This query alerts when a service principal signs in to a new IP address in the last week compared to the prior 180 days.

let timeframe = 180d;
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(timeframe) and TimeGenerated < ago(7d)
| distinct AppId, IPAddress
| join kind=rightanti
    (
    AADServicePrincipalSignInLogs
    | where TimeGenerated > ago(7d)
    | project TimeGenerated, AppId, IPAddress, ResultType, ServicePrincipalName
    )
    on IPAddress
| where ResultType == "0"
| distinct ServicePrincipalName, AppId, IPAddress

For managed identities we get a cut down version of the service principal sign in data. For instance we don’t get IP address information because managed identities are used ‘internally’ within Azure AD. But we can still track them in similar ways. For instance we can summarize all the resources each managed identity accesses. For instance Azure Key Vault, Azure Storage, Azure SQL. The higher the count, then the higher the blast radius.

AADManagedIdentitySignInLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize ResourcesAccessed=make_set(ResourceDisplayName) by ServicePrincipalName

We can also detect when a managed identity accesses a new resource that it hadn’t before. This query will return any managed identities that access resources that they hadn’t in the prior 60 days. For example, if you have a managed identity that previously only accessed Azure Storage, then accesses an Azure Key Vault, this would find that event.


AADManagedIdentitySignInLogs
| where TimeGenerated > ago (60d) and TimeGenerated < ago(1d)
| where ResultType == "0"
| distinct ServicePrincipalId, ResourceIdentity
| join kind=rightanti (
    AADManagedIdentitySignInLogs
    | where TimeGenerated > ago (1d)
    | where ResultType == "0"
    )
    on ServicePrincipalId, ResourceIdentity
| distinct ServicePrincipalId, ServicePrincipalName, ResourceIdentity, ResourceDisplayName

Prevention, always better than detection.

As with anything, preventing issues is better than detecting them. The nature of service principals though is they are always going to have some privilege. It is about reducing risk in your environment through least privilege.

  • Get to know your Azure AD roles and Microsoft Graph permisions. Assign only what you need. Avoid using roles like Global Administrator and Application Adminstrator. Limit permissions such as Directory.Read.All and Directory.ReadWrite. All are high privilege and should not be required. Azure AD roles can also be scoped to reduce privilege to only what is required.
  • Alert when service principals are assigned roles in Azure AD or granted access to Microsoft Graph using the queries above. Investigate whether the permissions are appropriate to the workload.
  • Make sure that any access granted to Azure management scopes or workloads is fit for purpose. Owner, contributor and user access administrator are all very high privilege.
  • Leverage Azure AD Conditional Access for workload identities. If your service principals sign in from a known set of IP addresses, then enforce that in policy.
  • Don’t be afraid to push back on third parties or internal developers about the privilege required to make their application work. The Azure AD and Microsoft Graph documentation is easy to read and understand and the permissions are very granular.

Finally, some handy links from within this article and elsewhere

Using Logic Apps and Microsoft Sentinel to alert on expiring Azure AD Secrets — 1st Dec 2021

Using Logic Apps and Microsoft Sentinel to alert on expiring Azure AD Secrets

Azure AD app registrations are at the heart of the Microsoft Identity Platform, and Microsoft recommend you rotate secrets on them often. However, there is currently no native way to alert on secrets that are due to expire. An expired secret means the application will no longer authenticate, so you may have systems that fail when the secret expires. A number of people have come up with solutions, such as using Power Automate or deploying an app that you can configure notifications with. Another solution is using Logic Apps, KQL and Microsoft Sentinel to build a very low cost and light weight solution.

We will need two Logic Apps for our automation – the first will query Microsoft Graph to retrieve information (including password expiry dates) for all our applications, our Logic App will then push that data into a custom table in Sentinel. Next we will write a Kusto query to find apps with secrets due to expire shortly, then finally we will build a second Logic App to run our query on a schedule for us, and email a table with the apps that need new secrets.

Both our Logic Apps are really simple, the first looks as follows –

The first part of this app is to connect to the Microsoft Graph to retrieve a token for re-use. You can set your Logic App to run on a recurrence for whatever schedule makes sense, for my example I am running it daily, so we will get updated application data into Sentinel each day. Then we will need the ClientID, TenantID and Secret for an Azure AD App Registration with enough permission to retrieve application information using the ‘Get application‘ action. From the documentation we can see that Application.Read.All is the lowest privilege access we will need to give our app registration.

Our first 3 actions are just to retrieve those credentials from Azure Key Vault, if you don’t use Azure Key Vault you can just add them as variables (or however you handle secrets management). Then we call MS Graph to retrieve a token.

Put your TenantID in the URI and the ClientID and Secret in the body respectively. Then parse the JSON response using the following schema.

{
    "properties": {
        "access_token": {
            "type": "string"
        },
        "expires_in": {
            "type": "string"
        },
        "expires_on": {
            "type": "string"
        },
        "ext_expires_in": {
            "type": "string"
        },
        "not_before": {
            "type": "string"
        },
        "resource": {
            "type": "string"
        },
        "token_type": {
            "type": "string"
        }
    },
    "type": "object"
}

Now we have our token, we can re-use it to access the applications endpoint on Microsoft Graph to retrieve the details for all our applications. So use a HTTP action again, and this time use a GET action.

We connect to https://graph.microsoft.com/v1.0/applications?$select=displayName,appId,passwordCredentials

In this example we are just retrieving the displayname of our app, the application id and the password credentials. The applications endpoint has much more detail though if you wanted to include other data to enrich your logs.

One key note here is dealing with data paging in Microsoft Graph, MS Graph will only return a certain amount of data at a time, and also include a link to retrieve the next set of data, and so on until you have retrieved all the results – and with Azure AD apps you are almost certainly going to have too many to retrieve at once. Logic Apps can deal with this natively thankfully, on your ‘Retrieve App Details’ action, click the three dots in the top right and choose settings, then enable Pagination so that it knows to loop through until all the data is retrieved.

Then we parse the response once more, if you are just using displayname, appid and password credentials like I am, then the schema for your json is.

{
    "properties": {
        "value": {
            "items": {
                "properties": {
                    "appId": {
                        "type": "string"
                    },
                    "displayName": {
                        "type": "string"
                    },
                    "passwordCredentials": {
                        "items": {
                            "properties": {
                                "customKeyIdentifier": {},
                                "displayName": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                },
                                "endDateTime": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                },
                                "hint": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                },
                                "keyId": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                },
                                "secretText": {},
                                "startDateTime": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                }
                            },
                            "required": [
                                "customKeyIdentifier",
                                "displayName",
                                "endDateTime",
                                "hint",
                                "keyId",
                                "secretText",
                                "startDateTime"
                            ],
                            "type": [
                                "object",
                                "null",
                                "array"
                            ]
                        },
                        "type": "array"
                    }
                },
                "required": [
                    "displayName",
                    "appId",
                    "passwordCredentials"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "type": "object"
}

Then our last step of our first Logic App is to send the data using the Azure Log Analytics Data Collector to Microsoft Sentinel. So take each value from your JSON and send it to Sentinel, because you will have lots of apps, it will loop through each one. For this example the logs will be sent to the AzureADApps_CL table.

You can then query your AzureADApps_CL table like you would any data and you should see a list of your application display names, their app ids and any password credentials. If you are writing to that table for the first time, just give it 20 or so minutes to appear. So now we need some KQL to find which ones are expiring. If you followed this example along you can use the following query –

AzureADApps_CL
| where TimeGenerated > ago(7d)
| extend AppDisplayName = tostring(displayName_s)
| extend AppId = tostring(appId_g)
| summarize arg_max (TimeGenerated, *) by AppDisplayName
| extend Credentials = todynamic(passwordCredentials_s)
| project AppDisplayName, AppId, Credentials
| mv-expand Credentials
| extend x = todatetime(Credentials.endDateTime)
| project AppDisplayName, AppId, Credentials, x
| where x between (now()..ago(-30d))
| extend SecretName = tostring(Credentials.displayName)
| extend PasswordEndDate = format_datetime(x, 'dd-MM-yyyy [HH:mm:ss tt]')
| project AppDisplayName, AppId, SecretName, PasswordEndDate
| sort by AppDisplayName desc 

You should see an output if any applications that have a secret expiring in the next 30 days (if you have any).

Now we have our data, the second Logic App is simple, you just need it to run on whatever scheduled you like (say weekly), run the query for you against Sentinel (using the Azure Monitor Logs connector), build a simple HTML table and email it to whoever wants to know.

If you use the same Kusto query that I have then the schema for your Parse JSON action is.

{
    "properties": {
        "value": {
            "items": {
                "properties": {
                    "AppDisplayName": {
                        "type": "string"
                    },
                    "AppId": {
                        "type": "string"
                    },
                    "PasswordEndDate": {
                        "type": "string"
                    },
                    "SecretName": {
                        "type": [
                            "string",
                            "null"
                        ]
                    }
                },
                "required": [
                    "AppDisplayName",
                    "AppId",
                    "SecretName",
                    "PasswordEndDate"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "type": "object"
}

Now that you have that data in Microsoft Sentinel you could also run other queries against it, such as seeing how many apps you have created or removed each week, or if applications have expired secrets and no one has requested a new one; they may be inactive and can be deleted.

Detecting multistage attacks in Microsoft Sentinel — 25th Nov 2021

Detecting multistage attacks in Microsoft Sentinel

For defenders, it would be really amazing if every threat we faced was a single event or action that we could detect – we would know that if x happened, then we need to do y and the threat was detected and prevented. Unfortunately not every threat we face is a single event; it may be the combination of several low priority events that on their own may not raise alarms, but when combined are an indicator of more malicious activity. For instance, you probably receive a lot of identity alerts that are considered low risk, such as users accessing via a new device, or a new location – most are likely benign. If you then detected that same user accessed SharePoint from a location not seen before, that may increase the risk level, and if that user then started downloading a lot of data suddenly that may be really serious.

That pattern follows the MITRE ATT&CK framework where we may see initial access, followed by discovery then exfiltration. Thankfully we can build our own queries to hunt for these kinds of attacks. Microsoft also provide multistage protection via their fusion detections in Microsoft Sentinel.

We can send all kinds of data to Microsoft Sentinel, logs from on premise domain controllers or servers, Azure AD telemetry, logs from our endpoint devices and whatever else you think is valuable. Microsoft Sentinel and the Kusto Query Language provide the ability to look for attacks that may span across different sources. There are several ways to join datasets in KQL, this blog we are going to focus on just the join operator. At its most basic, join allows us to combine data from different tables together based on something that matches between the two tables.

For instance, if we have our Azure AD sign in data, which is sent to the SigninLogs table and our Office 365 audit logs which are sent to the OfficeActivity table, we have various options to where we may find a match between these two tables – such as usernames and IP addresses for example. So we could join the two tables based on a username, and match Azure AD sign in data with Office 365 activity data belonging to the same user. Maybe a user signed into Azure AD from a location previously not seen for them before, so then we would be interested in what actions were taken in Office 365 after that sign in event.

When we join data in Microsoft Sentinel we have a lot of options, to keep things straight forward for this post, we are just going to use ‘inner’ joins, where we look for matches between multiple tables and return the combined data. So using our Azure AD and Office 365 example, after completing an inner join, we would see the data from both tables available to us – such as location, conditional access results or user agent from the Azure AD table and actions such as downloading files from OneDrive or inviting users to Teams, from the Office 365 table. There are other types of joins, referenced in the documentation, but we will explore those in a future post. Learning to join tables was one of the things that confused me the most initially in KQL, but it provides immense value.

If we start with something simple, we can join our Azure AD sign in logs to our Azure AD Risk Events (held in the AADUserRiskEvents table), if we build a simple query and tell KQL to join the tables together, you will see it automatically tells us where there is a match in data.

The TimeGenerated, CorrelationId and UserPrincipalName fields exist in both tables. If we join on our CorrelationId, we can then see we get options to fill in our query from both tables

Where the same column exists on both sides you will see it automatically renames one, seen with ‘CorrelationId1’. We can then finish our query with data from both tables

SigninLogs
| project TimeGenerated, UserPrincipalName, AppDisplayName, ResultType, CorrelationId
| join kind=inner
(AADUserRiskEvents)
on CorrelationId
| project TimeGenerated, UserPrincipalName, CorrelationId, ResultType, DetectionTimingType, RiskState, RiskLevel

We get the TimeGenerated, UserPrincipalName, ResultType from Azure AD sign in data, and the DetectionTimingType, RiskState and RiskLevel from AADUserRiskEvents, and we use the CorrelationId to join them together.

We can use these basics as a foundation to start adding some more logic to our queries. In this next example we are looking for AADUserRiskEvents, and this time joining to our Azure AD Audit table (where Azure AD changes are tracked) looking for events where the same user who flagged a risk event also changed MFA details within a short time frame.

let starttime = 45d;
let timeframe = 4h;
AADUserRiskEvents
| where TimeGenerated > ago(starttime)
| where RiskDetail != "aiConfirmedSigninSafe"
| project RiskTime=TimeGenerated, UserPrincipalName, RiskEventType, RiskLevel, Source
| join kind=inner (
    AuditLogs
    | where OperationName in ("User registered security info", "User deleted security info")
    | where Result == "success"
    | extend UserPrincipalName = tostring(TargetResources[0].userPrincipalName)
    | project SecurityInfoTime=TimeGenerated, OperationName, UserPrincipalName, Result, ResultReason)
    on UserPrincipalName
| project RiskTime, SecurityInfoTime, UserPrincipalName, RiskEventType, RiskLevel, Source, OperationName, ResultReason
| where (SecurityInfoTime - RiskTime) between (0min .. timeframe)

This query is a little more complex but it follows the same pattern. First we set a couple of time variables, we are going to look back through 45 days of data and we want to set a time frame of four hours between our events. If a risk event is triggered initially, but then the MFA event doesn’t occur for two weeks, then it is not as likely to be linked compared to these events happening close together. Next, we look up our AADUserRiskEvents, exclude anything that Microsoft dismiss as safe and then we take the details we want to use in our second query – the UserPrincipalName, RiskEventType, RiskLevel and Source, we also take the TimeGenerated, but to make things more simple to understand we rename it to RiskTime, so that it is easy to distinguish later on.

Then to finish our our query, we again inner join, this time to our AuditLogs table, looking for MFA registration or deletion events, and we join the tables together based on UserPrincipalName, that way we know the same user who flagged the risk event also changed MFA details. We rename the time of the second event to SecurityInfoTime to make our data easy to read. Fnally, to add our time logic, we calculate the time between the two separate events and then alert only when that time is less than four hours.

We can re-use this same pattern across all kinds of data, this query follows basically the exact same format, except we are looking for a risk event followed by access to an Azure management interface. If a user flagged a risk event, then within four hours signed into Azure, we would be alerted.

let starttime = 45d;
let timeframe = 4h;
let applications = dynamic(["Azure Active Directory PowerShell", "Microsoft Azure PowerShell", "Graph Explorer", "ACOM Azure Website"]);
AADUserRiskEvents
| where TimeGenerated > ago(starttime)
| where RiskDetail != "aiConfirmedSigninSafe"
| project RiskTime=TimeGenerated, UserPrincipalName, RiskEventType, RiskLevel, Source
| join kind=inner (
    SigninLogs
    | where AppDisplayName in (applications)
    | where ResultType == "0")
    on UserPrincipalName
| project-rename AzureSigninTime=TimeGenerated
| extend TimeDelta = AzureSigninTime - RiskTime
| project RiskTime, AzureSigninTime, TimeDelta, UserPrincipalName, RiskEventType, RiskLevel, Source
| where (AzureSigninTime - RiskTime) between (0min .. timeframe)

We can even have KQL calculate the time between two events for you to easily see the time difference between the two. You do this by simply extending a new column and having it calculate it for you (| extend TimeDelta = AzureSigninTime – RiskTime )

You can extend these queries across any data that makes sense, so we can again take a risk event, but this time join it to our Office 365 activity logs to find a list of files that a user has downloaded shortly after flagging that risk event.

let starttime = 45d;
let timeframe = 4h;
AADUserRiskEvents
| where TimeGenerated > ago(starttime)
| where RiskDetail != "aiConfirmedSigninSafe"
| project RiskTime=TimeGenerated, UserPrincipalName, RiskEventType, RiskLevel, Source
| join kind=inner (
    OfficeActivity
    | where Operation in ("FileSyncDownloadedFull", "FileDownloaded"))
    on $left.UserPrincipalName == $right.UserId
| project DownloadTime=TimeGenerated, OfficeObjectId, RiskTime, UserId
| where (DownloadTime - RiskTime) between (0min .. timeframe)
| summarize RiskyDownloads=make_set(OfficeObjectId) by UserId
| where array_length( RiskyDownloads) > 10

We use much the same query structure, but there are two things to note here, the AADUserRiskEvents and OfficeActivity store username data in two different columns, so we need to manually tell Microsoft Sentinel how to join, which we do by “on $left.UserPrincipalName == $right.UserId”. We are telling KQL that the UserPrincipalName from our first table (AADUserRiskEvents) is the same as the UserId in our second table (OfficeActivity). Data coming in from different vendors, and even Microsoft themselves, is wildly inconsistent, so you will need to provide the brain power to link them together. In this example, we also summarize the list of downloads the risky user has taken, and only alert when it is greater than 10 unique files.

These kind of multistage queries don’t need to be limited to users or identity type events, you can use the same structure to query device data, or anything else that is relevant to you.

let timeframe = 48h;
SecurityAlert
| where ProviderName == "MDATP"
| project AlertTime=TimeGenerated,DeviceName=CompromisedEntity, AlertName
| join kind=inner (
DeviceLogonEvents
| project TimeGenerated, LogonType, ActionType, InitiatingProcessCommandLine, IsLocalAdmin, AccountName, DeviceName
| where LogonType in ("Interactive","RemoteInteractive")
| where ActionType == "LogonSuccess"
| where InitiatingProcessCommandLine == "lsass.exe"
) on DeviceName
| where (AlertTime - TimeGenerated) between (0min .. timeframe)
| summarize arg_max(TimeGenerated, *) by DeviceName
| project LogonTime=TimeGenerated, AlertTime, AlertName, DeviceName, AccountName, IsLocalAdmin

In this last example, we take an alert from Microsoft Defender for Endpoint, then use that first event to circle back to our DeviceLogonEvents which tracks logon event data on Windows devices, from there we can track down who was the most recent user to sign onto that device, and also determine if they are a local administrator.

Keep an eye on your Azure AD guests with Microsoft Sentinel — 4th Nov 2021

Keep an eye on your Azure AD guests with Microsoft Sentinel

Azure AD External Identities (previously Azure AD B2B) is a fantastic way to collaborate with partners, customers or other people external to your company. Previously you may have needed to onboard an Active Directory account for each user, which came with a lot of inherit privilege, or you used different authentication methods for your applications, and you ended up juggling credentials for all these different systems. By leveraging Azure AD External Identities you start to wrestle back some of that control and importantly get really strong visibility into what these guests are doing.

You invite a guest to your tenant by sending them an email from within the Azure Active Directory portal (or directly inviting them in an app like Teams), they go through the process of accepting and then you have a user account for them in your tenant – easy!

If the user you invite to your tenant belongs to a domain that is also an Azure AD tenant, they can use their own credentials from that tenant to access resources in your tenant. If it’s a personal address like gmail.com then the user will be prompted to sign up to a Microsoft account or use a one time passcode if you have configured that option.

If you browse through your Azure AD environment and already have guests, you can filter to just guest accounts. If you don’t have guests, invite your personal email and you can check out the process.

You will notice that they have a unique UserPrincipalName format, if your guests email address is test123@gmail.com then the guest object in your directory has the UserPrincipalName of test123_gmail.com#EXT#@YOURTENANT.onmicrosoft.com – this makes sense if you think about the concept of a guest account, it could belong to many different tenants so it needs to have a unique UPN in your tenant. You can also see a few more details by clicking through to a guest account. You can see if an invite has been accepted or not, a guest who hasn’t accepted is still an object in your directory, they just can’t access any resources yet.

And if you click the view more arrow, you can see if source of the account.

You can see the difference between a user coming in from another Azure AD tenant vs a personal account.

It is really easy to invite guest accounts and then kind of forget about them, or not treat them with the same scrutiny or governance you would a regular account. They also have a tendency to grow in total count very quickly, especially if you allow your staff to invite them themselves, via Teams or any other method.

Remember though these accounts all have some access to your tenant, potentially data in Teams, OneDrive or SharePoint, and likely an app or two that you have granted access to – or more worryingly apps that you haven’t specifically blocked them accessing. Guests can even be granted access to Azure AD roles, or be given access to Azure resources via Azure RBAC.

Thankfully in Microsoft (no longer Azure!) Sentinel, all the signals we get from sign-in data, or audit logs, or Office 365 logs don’t discriminate between members and guests (apart from some personal information that is hidden for guests such as device names), which makes it a really great platform to get insights to what your guests are up to (or what they are no longer up to).

Invites sent and redeemed are collected in the AuditLogs table, so if you want to quickly visualize how many invites you are sending vs those being redeemed you can.

//Visualizes the total amount of guest invites sent to those redeemed
let timerange=180d;
let timeframe=7d;
AuditLogs
| where TimeGenerated > ago (timerange)
| where OperationName in ("Redeem external user invite", "Invite external user")
| summarize
    InvitesSent=countif(OperationName == "Invite external user"),
    InvitesRedeemed=countif(OperationName == "Redeem external user invite")
    by bin(TimeGenerated, timeframe)
| render columnchart
    with (
    title="Guest Invites Sent v Guest Invites Redeemed",
    xtitle="Invites",
    kind=unstacked)

You can look for users that have been invited, but have not yet redeemed their invite. Guest invites never expire, so if a user hasn’t accepted after a couple of months it may be worth removing the invite until a time they genuinely require it. In this query we exclude invites sent in the last month, as those people may have simply not got around to redeeming their invite yet.

//Lists guests who have been invited but not yet redeemed their invites. Excludes newly invited guests (last 30 days).
let timerange=180d;
let timeframe=30d;
AuditLogs
| where TimeGenerated between (ago(timerange) .. ago(timeframe)) 
| where OperationName == "Invite external user"
| extend GuestUPN = tolower(tostring(TargetResources[0].userPrincipalName))
| project TimeGenerated, GuestUPN
| join kind=leftanti  (
    AuditLogs
    | where TimeGenerated > ago (timerange)
    | where OperationName == "Redeem external user invite"
    | where CorrelationId <> "00000000-0000-0000-0000-000000000000"
    | extend d = tolower(tostring(TargetResources[0].displayName))
    | parse d with * "upn: " GuestUPN "," *
    | project TimeGenerated, GuestUPN)
    on GuestUPN
| distinct GuestUPN

For those users that have accepted and are actively accessing applications, we can see what they are accessing just like a regular user. You could break down all your apps and have a look at the split between guests and members for each application.

//Creates a list of your applications and summarizes successful signins by members vs guests
let timerange=30d;
SigninLogs
| where TimeGenerated > ago(timerange)
| project TimeGenerated, UserType, ResultType, AppDisplayName
| where ResultType == 0
| summarize
    MemberSignins=countif(UserType == "Member"),
    GuestSignins=countif(UserType == "Guest")
    by AppDisplayName
| sort by AppDisplayName  

You can quickly see which users haven’t signed in over the last month, having signed in successfully in the preceding 6 months.

let timerange=180d;
let timeframe=30d;
SigninLogs
| where TimeGenerated > ago(timerange)
| where UserType == "Guest" or UserPrincipalName contains "#ext#"
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
| join kind = leftanti  
    (
    SigninLogs
    | where TimeGenerated > ago(timeframe)
    | where UserType == "Guest" or UserPrincipalName contains "#ext#"
    | where ResultType == 0
    | summarize arg_max(TimeGenerated, *) by UserPrincipalName
    )
    on UserPrincipalName
| project UserPrincipalName

Or you could even summarize all your guests (who have signed in at least once) into the month they last accessed your tenant. You could then bulk disable/delete anything over 3 months or whatever your lifecycle policy is.

//Month by month breakdown of when your Azure AD guests last signed in
SigninLogs
| where TimeGenerated > ago (360d)
| where UserType == "Guest" or UserPrincipalName contains "#ext#"
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
| project TimeGenerated, UserPrincipalName
| summarize InactiveUsers=make_set(UserPrincipalName) by startofmonth(TimeGenerated)

You could look at guests accounts that are trying to access your applications but being denied because they aren’t assigned a role, this could potentially be some reconnaissance occurring in your environment.

SigninLogs
| where UserType == "Guest"
| where ResultType == "50105"
| project TimeGenerated, UserPrincipalName, AppDisplayName, IPAddress, Location, UserAgent

We can leverage the IdentityInfo table to find any guests that have been assigned Azure AD roles. If your security controls for guests are weaker than your member accounts this is something you definitely want to avoid.

IdentityInfo
| where TimeGenerated > ago(21d)
| summarize arg_max(TimeGenerated, *) by AccountUPN
| where UserType == "Guest"
| where AssignedRoles != "[]" 
| where isnotempty(AssignedRoles)
| project AccountUPN, AssignedRoles, AccountObjectId

We can also use our IdentityInfo table again to grab a list of all our guests, then join to our OfficeActivity table to summarize download activities by each of your guests.

//Summarize the total count and the list of files downloaded by guests in your Office 365 tenant
let timeframe=30d;
IdentityInfo
| where TimeGenerated > ago(21d)
| where UserType == "Guest"
| summarize arg_max(TimeGenerated, *) by AccountUPN
| project UserId=tolower(AccountUPN)
| join kind=inner (
    OfficeActivity
    | where TimeGenerated > ago(timeframe)
    | where Operation in ("FileSyncDownloadedFull", "FileDownloaded")
    )
    on UserId
| summarize DownloadCount=count(), DownloadList=make_set(OfficeObjectId) by UserId

If you wanted to summarize which domains are downloading the most data from Office 365 then you can slightly alter the above query (thanks to Alex Verboon for this suggestion).

//Summarize the total count of files downloaded by each guest domain in your tenant
let timeframe=30d;
IdentityInfo
| where TimeGenerated > ago(21d)
| where UserType == "Guest"
| summarize arg_max(TimeGenerated, *) by AccountUPN, MailAddress
| project UserId=tolower(AccountUPN), MailAddress
| join kind=inner (
    OfficeActivity
    | where TimeGenerated > ago(timeframe)
    | where Operation in ("FileSyncDownloadedFull", "FileDownloaded")
    )
    on UserId
| extend username = tostring(split(UserId,"#")[0])
| parse MailAddress with * "@" userdomain 
| summarize count() by userdomain

You can find guests who were added to a Team then instantly started downloading data from your Office 365 tenant.

// Finds guest accounts who were added to a Team and then downloaded documents straight away. 
// startime = data to look back on, timeframe = looks for downloads for this period after being added to the Team
let starttime = 7d;
let timeframe = 2h;
let operations = dynamic(["FileSyncDownloadedFull", "FileDownloaded"]);
OfficeActivity
| where TimeGenerated > ago(starttime)
| where OfficeWorkload == "MicrosoftTeams" 
| where Operation == "MemberAdded"
| extend UserAdded = tostring(parse_json(Members)[0].UPN)
| where UserAdded contains ("#EXT#")
| project TimeAdded=TimeGenerated, UserId=tolower(UserAdded)
| join kind=inner
    (
    OfficeActivity
    | where Operation in (['operations'])
    )
    on UserId
| project DownloadTime=TimeGenerated, TimeAdded, SourceFileName, UserId
| where (DownloadTime - TimeAdded) between (0min .. timeframe)

I think the key takeaway is that basically all your threat hunting queries you write for your standard accounts are most likely relevant to guests, and in some cases more relevant. While having guests in your tenant grants us some control and visibility, it is still an account not entirely under your management. The accounts could have poor passwords, or be shared amongst people, or if coming from another Azure AD tenancy could have poor lifecycle management, i.e they could have left the other company but their account is still active.

As always, prevention is better than detection, and depending on your licensing tier there are some great tools available to govern these accounts.

You can configure guest access restrictions in the Azure Active Directory portal. Keep in mind when configuring these options the flow on effect to other apps, such as Teams. In that same portal you can configure who is allowed to send guest invites, I would particularly recommend you disallow guests inviting other guests. You can also restrict or allow specific domains that invites can be sent to.

On your enterprise applications, make sure you have assignment required set to Yes

This is crucial in my opinion, because it allows Azure AD to be the first ‘gate’ to accessing your applications. The access control in your various applications is going to vary wildly. Some may need an account setup on the application itself to allow people in, some may auto create an account on first sign on, some may have no access control at all and when it sees a sign in from Azure AD it allows the person in. If this is set to no and your applications don’t perform their own access control or RBAC then there is a good chance your guests will be allowed in, as they come through as authenticated from Azure AD much like a member account.

If you are an Azure AD P2 customer, then you have access to Access Reviews, which is an already great and constantly improving offering that lets you automate a lot of the lifecycle of your accounts, including guests. You can also look at leveraging Entitlement Management which can facilitate granting guests the access they require and nothing more.

If you have Azure AD P1 or P2, use Azure AD Conditional Access, you can target policies specifically at guest accounts from within the console.

You can enforce MFA on your guest accounts like you would all other users – if you enforce MFA on an application for guests, the first time they access it they will be redirected to the MFA registration page. You can also explicitly block guests from particular applications using conditional access.

Also unrelated, I recently kicked off a #365daysofkql challenge on my twitter, where I share a query a day for a year, we are nearly one month in so if you want to follow feel free.

Defending Azure Active Directory with Azure Sentinel — 19th Oct 2021

Defending Azure Active Directory with Azure Sentinel

Azure Active Directory doesn’t really need any introduction, it is the core of identity within Microsoft 365, used by Azure RBAC and used by millions as an identity provider. The thing about Azure Active Directory is that it isn’t much like Active Directory at all, apart from name they have little in common under the hood. There is no LDAP, no Kerberos, no OU’s. Instead we get SAML, OIDC/OAuth and Microsoft Graph. It has its own unique threats, logging and attack vectors. There are a massive amount of great articles about attacking Azure AD, such as:

The focus of this blog is looking at it from the other side, looking for how we can detect and defend against these activities.

Defending Reconnaissance

Protection against directory reconnaissance in Azure Active Directory can be quite difficult. Any user in your tenant comes with some level of privilege, mostly to be able to ‘look around’ at other objects. You can restrict access to the Azure AD administration portal to users who don’t hold a privileged role under the ‘User settings’ tab in Azure Active Directory and you can configure guest permissions if you use external identities, it won’t stop people using other techniques but it still valuable to harden that portal.

With on-premise Active Directory we get logging on services like LDAP or DNS, and we have products like Defender for Id that can trigger alerts for us – on premise Active Directory has a very strong logging capability. For Azure Active Directory however, we don’t have access to equivalent data unfortunately, we will get sign-in activity of course – so if a user connects to Azure AD PowerShell, that can be tracked. What we can’t see though is the output for any read/get operations. So once connected to PowerShell if a user runs a Get-AzureADUser command, we have no visibility on that. Once a user starts to make changes, such as changing group memberships or deleting users, then we receive log events.

Tools like Azure AD Identity Protection are helpful, but they are sign-in driven and designed to protect users from account compromise. Azure AD Identity Protection won’t detect privilege escalation in Azure AD like Defender for Id for on premise Active Directory can.

So, while that makes things difficult, looking for users signing onto Azure management portals and interfaces is a good place to start –

SigninLogs
| where AppDisplayName in ("Azure Active Directory PowerShell","Microsoft Azure PowerShell","Graph Explorer", "ACOM Azure Website")
| project TimeGenerated, UserPrincipalName, AppDisplayName, Location, IPAddress, UserAgent

These applications have legitimate use though and we don’t want alert fatigue, so to add some more logic to our query, we can look back on the last 90 days (or whatever time frame suits you), then detect users accessing these applications for the first time. This could be a sign of a compromised account being used for reconnaissance.

let timeframe = startofday(ago(60d));
let applications = dynamic(["Azure Active Directory PowerShell", "Microsoft Azure PowerShell", "Graph Explorer", "ACOM Azure Website"]);
SigninLogs
| where TimeGenerated > timeframe and TimeGenerated < startofday(now())
| where AppDisplayName in (applications)
| project UserPrincipalName, AppDisplayName
| join kind=rightanti
    (
    SigninLogs
    | where TimeGenerated > startofday(now())
    | where AppDisplayName in (applications)
    )
    on UserPrincipalName, AppDisplayName
| where ResultType == 0
| project TimeGenerated, UserPrincipalName, ResultType, AppDisplayName, IPAddress, Location, UserAgent

Defending Excessive User Permission

This one is fairly straight forward, but often the simplest things are hardest to get right. Your IT staff, or yourself, will need to manage Azure AD and that’s fine of course, but we need to make sure that roles are fit for purpose. Azure AD has a list of pre-canned and well documented roles, and you can build your own if required. Make sure that roles are being assigned that are appropriate to the job – you don’t need to be a Global Administrator to complete user administration tasks, there are better suited roles. We can detect the assignment of roles to users, if you use Azure AD PIM we can also exclude activations from our query –

AuditLogs
| where Identity <> "MS-PIM"
| where OperationName == "Add member to role"
| extend Target = tostring(TargetResources[0].userPrincipalName)
| extend RoleAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| project TimeGenerated, OperationName, Target, RoleAdded, Actor, ActorIPAddress

If you have a lot of users being moved in and out of roles you can reduce the query down to a selected set of privileged roles if required –

let roles=dynamic(["Global Admininistrator","SharePoint Administrator","Exchange Administrator"]);
AuditLogs
| where OperationName == "Add member to role"
| where Identity <> "MS-PIM"
| extend Target = tostring(TargetResources[0].userPrincipalName)
| extend RoleAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| where RoleAdded in (roles)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| project TimeGenerated, OperationName, Target, RoleAdded, Actor, ActorIPAddress

And if you use Azure AD PIM you can be alerted when users are assigned roles outside of the PIM platform (which you can do via Azure AD PowerShell as an example) –

AuditLogs
| where OperationName startswith "Add member to role outside of PIM"
| extend RoleAdded = tostring(TargetResources[0].displayName)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| extend TargetAADUserId = tostring(TargetResources[2].id)
| project TimeGenerated, OperationName, TargetAADUserId, RoleAdded, Actor

Defending Shared Identity

Unless you are a cloud native company with no on premise Active Directory footprint then you will be syncing user accounts, group objects and devices between on premise and Azure AD. Whether you sync all objects, or a subset of them will depend on your particular environment, but identity is potentially the link between AD and Azure AD. If you use accounts from on premise Active Directory to also manage Azure Active Directory, then the identity security of those accounts are crucial. Microsoft recommend you use cloud only accounts to manage Azure AD, but that may not be practical in your environment, or it’s something you are working toward.

Remember that Azure Active Directory and Active Directory essentially have no knowledge of the privilege an account has on the other system (apart from group membership more broadly). Active Directory doesn’t know that bobsmith@yourcompany.com is a Global Administrator, and Azure Active Directory doesn’t know that the same account has full control over particular OUs as an example. We can visualize this fairly simply.

In isolation each system has its own built in protections, a regular user can’t reset the password of a Domain Admin on premise and in Azure AD a User Administrator can’t reset the password of a Global Administrator. The issue is when we cross that boundary and where there is a link in identity, there is potential for abuse and escalation.

For arguments sake maybe our service desk staff have the privilege to reset the password on a Global Administrator account in Azure AD – because of inherit permissions in AD. It may be easier for an attacker to target a service desk account because they have weaker controls or may be more vulnerable to social engineering – “hey could you reset the password on bobsmith@yourcompany.com for me?”. In on premise AD that account may appear to be quite low privilege.

We can leverage the IdentityInfo table driven by Azure Sentinel UEBA to track down users who have privileged roles, then join that back to on premise SecurityEvents for password reset activity. Then filter out when a privileged Azure AD user has reset their own on premise password – we want events where someone has reset another persons privileged Azure AD account.

let timeframe=1d;
IdentityInfo
| where TimeGenerated > ago(21d)
| where isnotempty(AssignedRoles)
| where AssignedRoles != "[]"
| summarize arg_max(TimeGenerated, *) by AccountUPN
| project AccountUPN, AccountName, AccountSID
| join kind=inner (
    SecurityEvent
    | where TimeGenerated > ago(timeframe)
    | where EventID == "4724"
    | project
        TimeGenerated,
        Activity,
        SubjectAccount,
        TargetAccount,
        TargetSid,
        SubjectUserSid
    )
    on $left.AccountSID == $right.TargetSid
| where SubjectUserSid != TargetSid
| project PasswordResetTime=TimeGenerated, Activity, ActorAccountName=SubjectAccount, TargetAccountUPN=AccountUPN,TargetAccountName=TargetAccount

The reverse can be true too, you could have users with Azure AD privilege, but no or reduced access to on premise Active Directory. When an Azure AD admin resets a password it is logged as a ‘Reset password (by admin)’ action in Azure Sentinel, we can retrieve the actor, the target and the outcome –

AuditLogs
| where OperationName == "Reset password (by admin)"
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend Target = tostring(TargetResources[0].userPrincipalName)
| project TimeGenerated, OperationName, Result, Actor, Target

An attacker could go further and use a service principal to leverage Microsoft Graph to initiate a password reset in Azure AD and have it written back to on-premise. This activity is shown in the AuditLogs table –

AuditLogs
| where OperationName == "POST UserAuthMethod.ResetPasswordOnPasswordMethods"
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| project TimeGenerated, OperationName, Actor, CorrelationId
| join kind=inner
    (AuditLogs
    | where OperationName == "Reset password (by admin)"
    | extend Target = tostring(TargetResources[0].userPrincipalName)
    | where Result == "success"
    )
    on CorrelationId
| project GraphPostTime=TimeGenerated, PasswordResetTime=TimeGenerated1, Actor, Target

Not only can on premise users have privileged in Azure AD, but on premise groups may hold privilege in Azure AD. When groups are synced from on premise to Azure AD, they don’t retain any of the security information from on premise. So you may have a group called ‘ad.security.appowners’, and that group can be managed by any number of people. If that group is then given any kind of privilege in Azure AD then the members of it inherit that privilege too. If you do have any groups in your environment that fit that pattern they will be unique to your environment, but you can detect changes to groups in Azure Sentinel –

SecurityEvent
| extend Actor = Account
| extend Target = MemberName
| extend Group = TargetAccount
| where EventID in (4728,4729,4732,4733,4756,4757) and Group == "DOMAIN\\ad.security.appowners"
| project TimeGenerated, Activity, Actor, Target, Group

If you have a list of groups you want to monitor, then it’s worth adding them into a watchlist and then querying against that, then you can keep the watchlist current and your query will continue to be up to date.

let watchlist = (_GetWatchlist('PrivilegedADGroups') | project TargetAccount);
SecurityEvent
 extend Target = MemberName
| extend Group = TargetAccount
| where EventID in (4728,4729,4732,4733,4756,4757) and TargetAccount in (watchlist)
| project TimeGenerated, Activity, Actor, Target, Group

If you have these shared identities and groups, what the groups are named will be very specific to you, but you should look to harden the security on premise, monitor them or preferably de-couple the link between AD and Azure AD entirely.

Defending Service Principal Abuse

In Azure AD, we can register applications, authenticate against them (using secrets or certificates) and they can provide further access into Azure AD or any other resources in your tenant – for each application created a corresponding service principal is created too. We can add either delegated or application access to app (such as mail.readwrite.all from the MS Graph) and we can assign roles (such as Global Administrator) to the service principal. Anyone who then authenticates to the app would have the attached privilege.

Specterops posted a great article here (definitely worth reading before continuing) highlighting the privilege escalation path through service principals. The article outlines some potential weak points spots in Azure AD –

  • Application admins being able to assign new secrets (passwords) to existing service principals.
  • High privilege roles being assigned to service principals.

And we will add some additional threats that you may see

  • Admins consenting to excessive permissions.
  • Redirect URI tampering.

From the article we learnt that the Application Administrator role has the ability to add credentials (secrets or certificates) to any existing application in Azure AD. If you have a service principal that has the Global Administrator role or privilege to the MS Graph, then an Application Administrator can generate a new secret for that app and effectively be a Global Administrator and obtain that privilege.

We can view secrets generated on an app in the AuditLogs table –

AuditLogs
| where OperationName contains "Update application – Certificates and secrets management"
| extend AppId = tostring(AdditionalDetails[1].value)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend AppDisplayName = tostring(TargetResources[0].displayName)
| project TimeGenerated, OperationName, AppDisplayName, AppId, Actor

We can also detect when permissions change in Azure AD applications, much like on premise service accounts, privilege has a tendency to creep upward over time. We can detect application permission additions with –

AuditLogs
| where OperationName == "Add app role assignment to service principal"
| extend AppPermissionsAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| extend AppId = tostring(TargetResources[1].id)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| project TimeGenerated, OperationName, AppId, AppPermissionsAdded,Actor, ActorIPAddress

And delegated permissions additions with –

AuditLogs
| where OperationName == "Add delegated permission grant"
| extend DelegatedPermissionsAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].newValue)))
| extend AppId = tostring(TargetResources[1].id)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| project TimeGenerated, OperationName, AppId, DelegatedPermissionsAdded,Actor, ActorIPAddress

Some permissions are of a high enough level that Azure AD requires a global administrator to consent to them, essentially by hitting an approve button. This is definitely an action you want to audit and investigate, once a global administrator hits the consent button, the privilege has been granted. You can investigate consent actions, including the permissions that have been granted –

AuditLogs
| where OperationName contains "Consent to application"
| extend AppDisplayName = tostring(TargetResources[0].displayName)
| extend Consent = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[4].newValue)))
| parse Consent with * "Scope:" PermissionsConsentedto ']' *
| extend UserWhoConsented = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend AdminConsent = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].newValue)))
| extend AppType = tostring(TargetResources[0].type)
| extend AppId = tostring(TargetResources[0].id)
| project TimeGenerated, AdminConsent, AppDisplayName, AppType, AppId, PermissionsConsentedto, UserWhoConsented

From the Specterops article, one of the red flags we mentioned was Azure AD roles being assigned to service principals, we often worry about excessive privilege for users, but forget about apps & service principals. We can detect a role being added to service principals –

AuditLogs
| where OperationName == "Add member to role"
| where TargetResources[0].type == "ServicePrincipal"
| extend ServicePrincipalObjectID = tostring(TargetResources[0].id)
| extend AppDisplayName = tostring(TargetResources[0].displayName) 
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend RoleAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| project TimeGenerated, Actor, RoleAdded, ServicePrincipalObjectID, AppDisplayName

For Azure AD applications you may also have configured a redirect URI, this is the location that Azure AD will redirect the user & token after authentication. So if you have an application that is used to sign people in you will be likely sending the user & token to an address like https://app.mycompany.com/auth. Applications in Azure AD can have multiple URI’s assigned, so if an attacker was to then add https://maliciouswebserver.com/auth as a target then the data would be posted there too. We can detect changes in redirect URI’s –


AuditLogs
| where OperationName contains "Update application"
| where Result == "success"
| extend UpdatedProperty = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].displayName)
| where UpdatedProperty == "AppAddress"
| extend NewRedirectURI = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].newValue))[0].Address)
| where isnotempty( NewRedirectURI)
| project TimeGenerated, OperationName, UpdatedProperty, NewRedirectURI

Remember that Azure AD service principals are identities too, so that we can use tooling like Azure AD Conditional Access to control where they can logon from. Have an application registered in Azure AD that provides authentication for an API that is only used from a particular location? You can enforce that with conditional access much like you would user sign-ins.

Service principal sign-ins are held in the AADServicePrincipalSignInLogs table in Azure Sentinel, the structure is similar to regular sign ins so you can look in trends in the data much like interactive sign-ins and start to detect anything out of the ordinary.

AADServicePrincipalSignInLogs
| where ResultType == "0"
| project TimeGenerated, AppId, ResourceDisplayName
| summarize SPSignIn=count()by bin(TimeGenerated, 15m), ResourceDisplayName
| render timechart 

Service principals can generate errors on logons too, an error 7000215 in the AADServicePrincipalSignInLogs table is an invalid secret, or the service principal equivalent of a wrong password.

AADServicePrincipalSignInLogs
| where ResultType == "7000215"
| summarize count()by AppId, ResourceDisplayName

Stop Defending and Start Preventing

While the focus on this blog was detection, which is a valuable tool, prevention is even better.

Prevention can be straight forward, or extremely complex and what you can achieve in your environment is unique to you, but there are definitely some recommendations worth following –

  • Limit access to Azure management portals and interfaces to those that need it via Azure AD Conditional Access. For those applications that you can’t apply policy to, alert for suspicious connections.
  • Provide access to Azure AD roles following least privilege principals – don’t hand out Global Administrator for tasks that User Administrator could cover.
  • Use Azure AD PIM if licensed for it and alert on users being assigned to roles outside of PIM.
  • Limit access to roles that can manage Azure AD Applications – if a team wants to manage their applications, they can be made owners on their specific apps, not across them all.
  • Alert on privileged changes to Azure AD apps – new secrets, new redirect URI’s, added permissions or admin consent.
  • Treat access to the Microsoft Graph and Azure AD as you would on premise AD. If an application or team request directory.readwrite.all or to be a Global Admin then push back and ask what actions are they trying to perform – there is likely a much lower level of privilege that would work.
  • Don’t allow long lived secrets on Azure AD apps, this is the equivalent of ‘password never expires’.
  • If you use hybrid identity be aware of users, groups or services that can leverage privilege in Azure AD to make changes in on premise AD, or vice versa.
  • Look for anomalous activity in service principal sign in data.

The queries in this post aren’t exhaustive by any means, get to know the AuditLogs table, it is filled with plenty of operations you may find interesting – authentication methods being updated for users, PIM role setting changes, BitLocker keys being read. Line up the actions you see in the table to what is risky to you and what you want to stop. For those events, can we prevent them through policy? If not, how do we detect and respond quick enough.

Reset your on premise passwords with Azure Sentinel + Azure AD Connect writeback — 7th Oct 2021

Reset your on premise passwords with Azure Sentinel + Azure AD Connect writeback

For those who have a large on premise Active Directory environment, one of the challenges you may face is how to use Azure Sentinel to reset the passwords for on premise Active Directory accounts. There are plenty of ways to achieve this – you may have an integrated service environment that allows Logic Apps or Azure Functions to connect directly to on premise resources, like a domain controller. You can also use an Azure Automation account with a hybrid worker. There is a lesser known option though, if you have already deployed Azure AD self-service password reset (SSPR) then we can piggyback off of the password writeback that is enabled when you deployed it. When a user performs a password reset using SSPR the password is first changed in Azure AD, then written back to on premise AD to keep them in sync. If you want to use Azure Sentinel to automate password resets for compromised accounts, then we can leverage that existing connection.

To do this we are going to build a small logic app that uses two Microsoft Graph endpoints, which are

Keep in mind that both these endpoints are currently in beta, so the usual disclaimers apply.

If we have a look at the documentation, you will notice that to retrieve the the id of the password we want to reset we can use delegated or application permissions.

However to reset a password, application permissions are not supported

So we will have to sign in as an actual user with sufficient privilege (take note of the roles required), and re-use that token for our automation. Not a huge deal, we will just use a different credential flow in our Logic App. Keep in mind this flow is only designed for programmatic access and shouldn’t be used interactively, because this will be run natively in Azure and won’t be end user facing it is still suitable.

So before you build your Logic App you will need an Azure AD app registration with delegated UserAuthenticationMethod.ReadWrite.All access and then an account (most likely a service account) with either the global admin, privileged authentication admin or authentication admin role assigned. You can store the credentials for these in an Azure Key Vault and use a managed identity on your Logic App to retrieve them securely.

Create a blank Logic App and use Azure Sentinel alert as the trigger, retrieve your account entities and then add your AAD User Id to a new variable, we will need it as we go.

Next we are going to retrieve the secrets for everything we will need to authenticate and authorize ourselves against. We will need to retrieve the ClientID, TenantID and Secret from our Azure AD app registration and our service account username and password.

There is a post here on how to retrieve the appropriate token using Logic Apps for delegated access, but to repeat it here, we will post to the Microsoft Graph.

Posting to the following URI with header Content-Type application/x-www-form-urlencoded –

https://login.microsoftonline.com/TenantID/oauth2/token 

With the following body –

grant_type=password&resource=https://graph.microsoft.com&client_id=ClientID&username=serviceaccount@domain.com&password=ServiceAccountPassword&client_secret=ClientSecret

In this example we will just pass the values straight from Azure Key Vault. Be sure to click the three dots and go to settings –

Then enable secure inputs, this will stop passwords being stored in the Logic App logs.

Now we just need to parse the response from Microsoft Graph so we can re-use our token, we will also then just build a string variable to format our token ready for use.

The schema for the token is

{
    "properties": {
        "access_token": {
            "type": "string"
        },
        "expires_in": {
            "type": "string"
        },
        "expires_on": {
            "type": "string"
        },
        "ext_expires_in": {
            "type": "string"
        },
        "not_before": {
            "type": "string"
        },
        "resource": {
            "type": "string"
        },
        "token_type": {
            "type": "string"
        }
    },
    "type": "object"
}

Then just create a string variable as above, appending Bearer before the token we parsed. Note there is a single space between Bearer and the token. Now we have our token, we can retrieve the id of the password of our user and then reset the password.

We will connect to the first API to retrieve the id of the password we want to change, using our bearer token as authorization, and passing in the variable of our AAD User Id who we want to reset.

Parse the response from our GET operation using the following schema –

{
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "value": {
            "items": {
                "properties": {
                    "createdDateTime": {},
                    "creationDateTime": {},
                    "id": {
                        "type": "string"
                    },
                    "password": {}
                },
                "required": [
                    "id",
                    "password",
                    "creationDateTime",
                    "createdDateTime"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "type": "object"
}

Then we need to automate what our new password will be, an easy way of doing this is to generate a guid to use as a password – it is random, complex and should pass most password policies. Also if an account is compromised, this is just an automatic password reset, the user will then need to contact your help desk, or whoever is in responsible, confirm who they are and be issued with a password to use.

Then finally we take our AAD User Id from the Sentinel alert, the password id that we retrieved, and our new password and post it back to reset the password.

We use our bearer token again for authorization, content-type is application/json and the body is our new password – make sure to enable secure inputs again. You should be able to then run this playbook against any Azure Sentinel incident where you map your AAD User Id (or UserPrincipalName) as the entity. It will take about 30-40 seconds to reset the password and you should see a successful response.

You could add some further logic to email your team, or your service desk or whoever makes sense in your case to let them know that Azure Sentinel has reset a users password.

Just a couple of quick notes, this Logic App ties back to self service password reset, so any password resets you attempt will need to conform to any configuration you have done in your environment, such as –

  • Password complexity, if you have a domain policy requiring a certain complexity of password and your Logic App doesn’t meet it, you will get a PasswordPolicyError returned in the statusDetail field, much like a user doing an interactive self service password reset would.
  • The account you use to sync users to Azure AD will need access to reset passwords on any accounts you want to via Azure Sentinel, much like SSPR itself.

Azure Sentinel and Azure AD Conditional Access = Cloud Fail2Ban — 2nd Sep 2021

Azure Sentinel and Azure AD Conditional Access = Cloud Fail2Ban

Fail2ban is a really simple but effective tool that has been around forever, it basically listens for incoming connections and then updates a firewall based on that, i.e. too many failed attempts then the IP is added to a ban list, rejecting new connections from it. If you are an Azure AD customer then Microsoft take care of some of this for you, they will ban the egregious attempts they are seeing globally. But we can use Azure Sentinel, Logic Apps and Azure AD Conditional Access to build our own cloud fail2ban which can achieve the same, but for threats unique to your tenant.

On the Azure Sentinel GitHub there is a really great query written for us here that we will leverage as the basis for our automation. The description says it all but essentially it will hunt the last 3 days of Azure AD sign in logs, look for more than 5 failures in a 20 minute period. It also excludes IP addresses in the same 20 minutes that have had more successful sign ins than failures just to account for your trusted locations – lots of users means lots of failed sign ins. You may want to adjust the timeframes to suit what works for you, but the premise remains the same.

There are a few moving parts to this automation but basically we want an Azure Sentinel Incident rule to run every so often, detect password sprays, any hit then invokes a Logic App that will update an Azure AD named location with the malicious IP addresses. That named location will be linked back to an Azure AD Conditional Access policy that denies logons.

Let’s build our Azure AD named location and CA policy first. In Azure AD Conditional Access > Named locations select ‘+ IP ranges location’, name it whatever you would like but something descriptive is best. You can’t have an empty named location so just put a placeholder IP address in there.

Next we create our Azure AD Conditional Access policy. Name is again up to you. Include all users and then it is always best practice to exclude some breakglass accounts, you don’t want to accidentally lock yourself out of your tenant or apps. We are going to also select All Cloud Apps because we want to block all access from these malicious IP addresses.

For conditions we want to configure Locations, including our named location we just created, and excluding our trusted locations (again, we don’t want to lock ourselves or our users out from known good locations). Finally our access control is ‘block access’. You can start the policy in report mode if you want to ensure your alerting and data is accurate.

Let’s grab the id of the named location we just created, since we will need it later on. Use the Graph Explorer to check for all named locations and grab the id of the one you just created.

Now we are ready to update the named location with our malicious IP. The guidance for the update action on the namedLocation endpoint is here. Which has an example of the payload we need to use and we will use Logic Apps to build it for us –

{
    "@odata.type": "#microsoft.graph.ipNamedLocation",
    "displayName": "Untrusted named location with only IPv4 address",
    "isTrusted": false,
    "ipRanges": [
        {
            "@odata.type": "#microsoft.graph.iPv4CidrRange",
            "cidrAddress": "6.5.4.3/18"
        }

    ]
}

Now we configure our Logic App, create a blank Logic App and for trigger choose either incident or alert creation, depending on whether you use the Azure Sentinel incident pane or not. After getting your IPs from the entities, create a couple of variables we will use later, and take the IP entity from the incident and append it to your NewMaliciousIP variable. We know that is the new bad IP we will want to block later.

There is no native Logic App connector for Azure AD Conditional Access, so we will just leverage Microsoft Graph to do what we need. This is a pattern that I have covered a few times, but it is one that I re-use often. We assign our Logic App a system assigned identity, use that identity to access an Azure Key Vault to retrieve a clientid, tenantid and secret for an Azure AD app registration. We then post to MS Graph and grab an access token and re-use that token as authorization to make the changes we want, in this case update a named location.

The URI value is your tenantid, then for body the client_id is your clientid and client_secret your secret. Make sure your app has enough privilege to update named locations, which is Policy.Read.All and Policy.ReadWrite.ConditionalAccess or an equivalent Azure AD role. Parse your token with the following schema and now you have a token ready to use.

{
    "properties": {
        "access_token": {
            "type": "string"
        },
        "expires_in": {
            "type": "string"
        },
        "expires_on": {
            "type": "string"
        },
        "ext_expires_in": {
            "type": "string"
        },
        "not_before": {
            "type": "string"
        },
        "resource": {
            "type": "string"
        },
        "token_type": {
            "type": "string"
        }
    },
    "type": "object"
}

To make this work, we need our Logic App to get the current list of bad IP addresses from our named location, add our new IP in and then patch it back to Microsoft Graph. If you just do a patch action with only the latest IP then all the existing ones will be removed. Over time this list could get quite large so we don’t want to lose our hard work.

We parse the JSON reponse using the following schema.

{
    "type": "object",
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "@@odata.type": {
            "type": "string"
        },
        "id": {
            "type": "string"
        },
        "displayName": {
            "type": "string"
        },
        "modifiedDateTime": {
            "type": "string"
        },
        "createdDateTime": {
            "type": "string"
        },
        "isTrusted": {
            "type": "boolean"
        },
        "ipRanges": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "@@odata.type": {
                        "type": "string"
                    },
                    "cidrAddress": {
                        "type": "string"
                    }
                },
                "required": [
                    "@@odata.type",
                    "cidrAddress"
                ]
            }
        }
    }
}

Next we are going to grab each existing IP and append it to a string (we may have multiple IP addresses already in there so use a for-each loop to get them all), then build a final string adding our new malicious IP. You can see the format required in the Microsoft Graph documentation.

We parse string to JSON one last time because when we patch back to the Microsoft Graph it is expecting a JSON payload using the following schema.

{
    "items": {
        "properties": {
            "@@odata.type": {
                "type": "string"
            },
            "cidrAddress": {
                "type": "string"
            }
        },
        "required": [
            "@@odata.type",
            "cidrAddress"
        ],
        "type": "object"
    },
    "type": "array"
}

Then we are going to use a HTTP patch action to update our list, adding our access token to authorize ourselves and completing the format expected. In Logic Apps you need to escape @ symbols with another @. You will need to add in the id of your namedLocation which will be unique to you. The body is now in the exact format Graph expects.

The last part is to create our analytics rule and map our entities. For this example we will use the password spray query mentioned above, but really you could do any query that you generate a malicious IP from – Azure Security Centre alerts, IP addresses infected with malware etc. Just map your IP Address entity over so that our Logic App can collect it when it fires. Make sure you trigger an alert for each event as your analytics rule may return multiple hits and you want to block them all. Also be sure to run your analytics rule on a schedule that makes sense with the query you are running. If you are looking back on a days worth of data and generating alerts based off that, then you probably only want to run your analytics rule daily too. If you query 24 hours of data and run it every 20 minutes you will fire multiple alerts on the same bad IP addresses.

Then under your automated response options run the Logic App we just created. In your Logic App you could add another step at the end to let you know that Azure Sentinel has already banned the IP address for you. Now next time your Azure Sentinel analytics rule generates a hit on your query, the IP address will be blocked automatically.