2022-03-16T05:54:51+08:00

Maintaining a well managed Azure AD tenant with KQL

16th Mar 20223rd Jan 2023/mzorich/8 Comments

This article is presented as part of the #AzureSpringClean event. The idea of #AzureSpringClean is to promote well managed Azure environments. This article will focus on Azure Active Directory and how we can leverage KQL to keep things neat and tidy.

Much like on premise Active Directory, Azure Active Directory has a tendency to grow quickly. You have new users or guests being onboarded all the time. You are configuring single sign on to apps. You may create service principals for all kinds of integration. And again, much like on premise Active Directory, it is in our best interest to keep on top of all these objects. If users have left the business, or we have decommissioned applications then we also want to clean up all those artefacts.

Microsoft provide tools to help automate some of these tasks – entitlement management and access reviews. Entitlement management lets you manage identity and access at scale. You can build access packages. These access packages can contain all the access a particular role needs. You then overlay just in time access and approval workflows on top.

Access reviews are pretty self explanatory. They let you easily manage group memberships, application and role access. You can schedule access reviews to make sure people only keep the appropriate access.

So if Microsoft provide these tools, why should we dig into the data ourselves? Good question. You may not be licensed for them to start with, they are both Azure AD P2 features. You also may have use cases that fall outside of the capability of those products. Using KQL and the raw data, we can find all kinds of trends in our Azure AD tenant.

First things first though, we will need that data in a workspace! You can choose which Log Analytics workspace from the Azure Active Directory -> Diagnostics setting tab. If you use Microsoft Sentinel, you can achieve the same via the Azure Active Directory data connector.

You can pick and choose what you like. This article is going to cover these three items –

SignInLogs – all your normal sign ins to Azure AD.
AuditLogs – all the administrative activities in your tenant, like guest invites and redemptions.
ServicePrincipalSignInLogs – sign ins for your Service Principals.

Two things to note, you need to be Azure AD P1 to export this data and there are Log Analytics ingestion costs.

Let’s look at seven areas of Azure Active Directory –

Users and Guests
Service Principals
Enterprise Applications
Privileged Access
MFA and Passwordless
Legacy Auth
Conditional Access

And for each, write some example queries looking for interesting trends. Hopefully in your tenant they can provide some useful information. The more historical data you have the more useful your trends will be of course. But even just having a few weeks worth of data is valuable.

To make things even easier, for most of these queries I have used the Log Analytics demo environment. You may not yet have a workspace of your own, but you still want to test the queries out. The demo environment is free to use for anyone. Some of the data types aren’t available in there, but I have tried to use it as much as possible.

You can access the demo tenant here. You just need to login with any Microsoft account – personal or work, and away you go.

Users and Guests

User lifecycle management can be hard work! Using Azure AD guests can add to that complexity. Guests likely work for other companies or partners. You don’t manage them fully in the way you would your own staff.

Let’s start by finding when our users last signed in. Maybe you want to know when users haven’t signed in for more than 45 days. We can even retrieve our user type at the same time. You could start by disabling these accounts.

SigninLogs
| where TimeGenerated > ago(365d)
| where ResultType == "0"
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
| project TimeGenerated, UserPrincipalName, UserType, ['Days Since Last Logon']=datetime_diff("day", now(),TimeGenerated)
| where ['Days Since Last Logon'] >= 45 | sort by ['Days Since Last Logon'] desc

We use a really useful operator in this query called datetime_diff. It lets us calculate the time between two events in a way that is easier for us to read. So in this example, we calculate the difference between the last sign in and now in days. UTC time can be hard to read, so let KQL do the heavy lifting for you.

We can even visualize the trend of our last sign ins. In this example we look at when our inbound guests last signed in. Inbound guests are those from other tenants connecting to yours. To do this, we summarize our data twice. First we get the last sign in date for each guest. Then we group that data into each month.

SigninLogs
| where TimeGenerated > ago (360d)
| where UserType == "Guest"
| where AADTenantId != HomeTenantId and HomeTenantId != ResourceTenantId
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
| project TimeGenerated, UserPrincipalName
| summarize ['Count of Last Signin']=count() by startofmonth(TimeGenerated)
| render columnchart with (title="Guest inactivity per month")

Another interesting thing with Azure AD guests is that invites never expire. So once you invite a guest the pending invite will be there forever. You can use KQL to find invites that have been sent but not redeemed.

let timerange=180d;
let timeframe=30d;
AuditLogs
| where TimeGenerated between (ago(timerange) .. ago(timeframe)) 
| where OperationName == "Invite external user"
| extend GuestUPN = tolower(tostring(TargetResources[0].userPrincipalName))
| summarize arg_max(TimeGenerated, *) by GuestUPN
| project TimeGenerated, GuestUPN
| join kind=leftanti  (
    AuditLogs
    | where TimeGenerated > ago (timerange)
    | where OperationName == "Redeem external user invite"
    | where CorrelationId <> "00000000-0000-0000-0000-000000000000"
    | extend d = tolower(tostring(TargetResources[0].displayName))
    | parse d with * "upn: " GuestUPN "," *
    | project TimeGenerated, GuestUPN)
    on GuestUPN
| project TimeGenerated, GuestUPN, ['Days Since Invite Sent']=datetime_diff("day", now(), TimeGenerated)

For this we join two queries – guest invites and guest redemptions. Then search for when there isn’t a redemption. We then re-use our datetime_diff to work out how many days since the invite was sent. For this query we also exclude invites sent in the last 30 days. Those guests may just not have gotten around to redeeming their invites yet. Once a user has been invited, the user object already exists in your tenant. It just sits there idle until they redeem the invite. If they haven’t accepted the invite in 45 days, then it is probably best to delete the user objects.

Service Principals

The great thing about KQL is once we write a query we like, we can easily re-use it. Service principals are everything in Azure AD. They control what your applications can access. Much like users, we may no longer be using service principals. Perhaps that application has been decommissioned. Maybe the integration that was in use has been retired. Much like users, if they are no longer in use, we should remove them.

Let’s re-use our inactive user query, and this time look for inactive service principals.

AADServicePrincipalSignInLogs
| where TimeGenerated > ago(365d)
| where ResultType == "0"
| summarize arg_max(TimeGenerated, *) by AppId
| project TimeGenerated, ServicePrincipalName, ['Days Since Last Logon']=datetime_diff("day", now(),TimeGenerated)
| where ['Days Since Last Logon'] >= 45 | sort by ['Days Since Last Logon'] desc

Have a look through the list and see which can be deleted.

Service principals can fail to sign in for many reasons, much like regular users. With regular users though we get an easy to read description that can help us out. With service principals, we unfortunately just get an error code. Using the case operator we can add our own friendly descriptions to help us out. We just say, when our result code is this, then provide us an easy to read description.

AADServicePrincipalSignInLogs
| where ResultType != "0"
| extend ErrorDescription = case (
    ResultType == "7000215", strcat("Invalid client secret is provided"),
    ResultType == "7000222", strcat("The provided client secret keys are expired"),
    ResultType == "700027", strcat("Client assertion failed signature validation"),
    ResultType == "700024", strcat("Client assertion is not within its valid time range"),
    ResultType == "70021", strcat("No matching federated identity record found for presented assertion"),
    ResultType == "500011", strcat("The resource principal named {name} was not found in the tenant named {tenant}"),
    ResultType == "700082", strcat("The refresh token has expired due to inactivity"),
    ResultType == "90025", strcat("Request processing has exceeded gateway allowance"),
    ResultType == "500341", strcat("The user account {identifier} has been deleted from the {tenant} directory"),
    ResultType == "100007", strcat("AAD Regional ONLY supports auth either for MSIs OR for requests from MSAL using SN+I for 1P apps or 3P apps in Microsoft infrastructure tenants"),
    ResultType == "1100000", strcat("Non-retryable error has occurred"),
    ResultType == "90033", strcat("A transient error has occurred. Please try again"),
    ResultType == "53003",strcat("Access has been blocked by Conditional Access policies. The access policy does not allow token issuance."),
    "Unknown"
    )
| project TimeGenerated, ServicePrincipalName, ServicePrincipalId, ErrorDescription, ResultType, IPAddress

You may be particularly interested in signins with expired or invalid secrets. Are the service principals still in use? Perhaps you can remove them. Or you may be interested where conditional access blocks a service principal sign-in.

Have the credentials for that service principal leaked? It may be worth investigating and rotating credentials if required.

Enterprise Applications

For applications that have had no sign in activity for a long time that could be a sign of a couple of things. Firstly, you may have retired that application. If that is the case, then you should delete the enterprise application from your tenant.

Secondly, it may mean that people are bypassing SSO to access the application. For example, you may use a product like Confluence. You may have enabled SSO to it, but users still have the ability to sign on using ‘local’ credentials. Maybe users do that because it is more convenient to bypass conditional access. For those applications you know are still in use, but you aren’t seeing any activity you should investigate. If the applications have the ability to prevent the use of local credentials then you should enable that. Perhaps you have the ability to set the password for local accounts, you could set them to something random the users don’t know to enforce SSO.

If those technical controls don’t exist, you may need to try softer controls. You should try get buy in from the application owners or users and explain the risks of local credentials. A good point to highlight is that when a user leaves an organization then their account is disabled. When that happens, they lose access to any SSO enforced applications. In applications that use local credentials the lifecycle of accounts is likely poorly managed. Application owners usually don’t want ex employees still having access to data, so that may help enforce good behaviour.

We can find apps that have had no sign ins in the last 30 days easily.

SigninLogs
| where TimeGenerated > ago (365d)
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by AppId
| project
    AppDisplayName,
    ['Last Logon Time']=TimeGenerated,
    ['Days Since Last Logon']=datetime_diff("day", now(), TimeGenerated)
| where ['Days Since Last Logon'] > 30 | sort by ['Days Since Last Logon'] desc

Maybe you are interested in application usage more generally. We can bring back some stats for each of your applications. Perhaps you want to see total sign ins to each vs distinct sign ins. Some applications may be very noisy with their sign in data. But when you look at distinct users, they aren’t as busy as you thought.

SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize ['Total Signins']=count(), ['Distinct User Signins']=dcount(UserPrincipalName) by AppDisplayName | sort by ['Distinct User Signins'] desc

You may be also interested in the breakdown of guests vs members for each application. Maybe guests are accessing something they aren’t meant to. If you notice that you can put a group in front of that app to control access.

For this query we use the dcountif operator. Which returns a distinct count of a column where something is true. So for this example, we return a distinct user count where the UserType is a member. Then again for guests.

SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize ['Distinct Member Signins']=dcountif(UserPrincipalName, UserType == "Member"), ['Distinct Guest Signins']=dcountif(UserPrincipalName, UserType == "Guest")  by AppDisplayName | sort by ['Distinct Guest Signins']

Use your knowledge of your environment to make sense of the results. If you have lots of guests accessing something you didn’t expect, then investigate.

Privileged Access

As always, your privileged users deserve a more scrutiny. You can detect when a user accesses particular Azure applications for the first time. This query looks back 90 days, then detects if a user accesses one of these applications for the first time.

//Detects users who have accessed Azure AD Management interfaces who have not accessed in the previous timeframe
let timeframe = startofday(ago(90d));
let applications = dynamic(["Azure Active Directory PowerShell", "Microsoft Azure PowerShell", "Graph Explorer", "ACOM Azure Website"]);
SigninLogs
| where TimeGenerated > timeframe and TimeGenerated < startofday(now())
| where AppDisplayName in (applications)
| project UserPrincipalName, AppDisplayName
| join kind=rightanti
    (
    SigninLogs
    | where TimeGenerated > startofday(now())
    | where AppDisplayName in (applications)
    )
    on UserPrincipalName, AppDisplayName
| where ResultType == 0
| project TimeGenerated, UserPrincipalName, ResultType, AppDisplayName, IPAddress, Location, UserAgent

You could expand the list to include privileged applications specific to your environment too.

If you use Azure AD Privileged Identity Management (PIM) you can keep an eye on those actions too. For example, we can find users who haven’t elevated to a role for over 30 days. If you have users with privileged roles but they aren’t actively using them then they should be removed. This query also returns you the role which they last activated.

AuditLogs
| where TimeGenerated > ago (365d)
| project TimeGenerated, OperationName, Result, TargetResources, InitiatedBy
| where OperationName == "Add member to role completed (PIM activation)"
| where Result == "success"
| extend ['Last Role Activated'] = tostring(TargetResources[0].displayName)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| summarize arg_max(TimeGenerated, *) by Actor
| project Actor, ['Last Role Activated'], ['Last Activation Time']=TimeGenerated, ['Days Since Last Activation']=datetime_diff("day", now(), TimeGenerated)
| where ['Days Since Last Activation'] >= 30
| sort by ['Days Since Last Activation'] desc

One of the biggest strengths of KQL is manipulating time. We can use that capability to add some logic to our queries. For example, we can find PIM elevation events that are outside of business hours.

let timerange=30d;
AuditLogs
// extend LocalTime to your time zone
| extend LocalTime=TimeGenerated + 5h
| where LocalTime > ago(timerange)
// Change hours of the day to suit your company, i.e this would find activations between 6pm and 6am
| where hourofday(LocalTime) !between (6 .. 18)
| where OperationName == "Add member to role completed (PIM activation)"
| extend RoleName = tostring(TargetResources[0].displayName)
| project LocalTime, OperationName, Identity, RoleName, ActivationReason=ResultReason

If this is unexpected behaviour for you then it’s worth looking at. Maybe an account has been compromised. Or it could be a sign of malicious insider activity from your admins.

MFA & Passwordless

In a perfect world we would have MFA on everything. That may not be the reality in your tenant. In fact, it’s not the reality in many tenants. For whatever reason your MFA coverage may be patchy. You could be on a roadmap to deploying it, or trying to onboard applications to SSO.

Our sign on logs provide great insight to single factor vs multi factor connections. We can summarize and visualize that data in different ways to track your MFA progress. If you want to just look across your tenant as a whole we can do that of course.

SigninLogs
| where TimeGenerated > ago (30d)
| summarize ['Single Factor Authentication']=countif(AuthenticationRequirement == "singleFactorAuthentication"), ['Multi Factor Authentication']=countif(AuthenticationRequirement == "multiFactorAuthentication") by bin(TimeGenerated, 1d)
| render timechart with (ytitle="Count", title="Single vs Multifactor Authentication last 30 days")

There is some work to be done in the demo tenant!

You can even build a table out of all your applications. From that we can count the percentage of sign ins that are covered by MFA. This may give you some direction to enabling MFA.

let timerange=30d;
SigninLogs
| where TimeGenerated > ago(timerange)
| where ResultType == 0
| summarize
    TotalCount=count(),
    MFACount=countif(AuthenticationRequirement == "multiFactorAuthentication"),
    nonMFACount=countif(AuthenticationRequirement == "singleFactorAuthentication")
    by AppDisplayName
| project AppDisplayName, TotalCount, MFACount, nonMFACount, MFAPercentage=(todouble(MFACount) * 100 / todouble(TotalCount))
| sort by MFAPercentage desc

If that much data is too overwhelming, why not start with your most popular applications? Here we use the same logic, but first calculate our top 20 applications.

let top20apps=
    SigninLogs
    | where TimeGenerated > ago (30d)
    | summarize UserCount=dcount(UserPrincipalName)by AppDisplayName
    | sort by UserCount desc 
    | take 20
    | project AppDisplayName;
//Use that list to calculate the percentage of signins to those apps that are covered by MFA
SigninLogs
| where TimeGenerated > ago (30d)
| where AppDisplayName in (top20apps)
| summarize TotalCount=count(),
    MFACount=countif(AuthenticationRequirement == "multiFactorAuthentication"),
    nonMFACount=countif(AuthenticationRequirement == "singleFactorAuthentication")
    by AppDisplayName
| project AppDisplayName, TotalCount, MFACount, nonMFACount, MFAPercentage=(todouble(MFACount) * 100 / todouble(TotalCount))
| sort by MFAPercentage asc

Passwordless technology has been around for a little while now, but it is only starting now to hit mainstream. Azure AD provides lots of different options. FIDO2 keys, Windows Hello for Business, phone sign in etc. You can track password vs passwordless sign ins to your tenant.

let timerange=180d;
SigninLogs
| project TimeGenerated, AuthenticationDetails
| where TimeGenerated > ago (timerange)
| extend AuthMethod = tostring(parse_json(AuthenticationDetails)[0].authenticationMethod)
| where AuthMethod != "Previously satisfied"
| summarize
    Password=countif(AuthMethod == "Password"),
    Passwordless=countif(AuthMethod in ("FIDO2 security key", "Passwordless phone sign-in", "Windows Hello for Business", "Mobile app notification","X.509 Certificate"))
    by startofweek(TimeGenerated)
| render timechart  with ( xtitle="Week", ytitle="Signin Count", title="Password vs Passwordless signins per week")

Passwordless needs a little more love in the demo tenant for sure!

You could even go one better, and track each type of passwordless technology. Then you can see what is the most favored.

let timerange=180d;
SigninLogs
| project TimeGenerated, AuthenticationDetails
| where TimeGenerated > ago (timerange)
| extend AuthMethod = tostring(parse_json(AuthenticationDetails)[0].authenticationMethod)
| where AuthMethod in ("FIDO2 security key", "Passwordless phone sign-in", "Windows Hello for Business", "Mobile app notification","X.509 Certificate")
| summarize ['Passwordless Method']=count()by AuthMethod, startofweek(TimeGenerated)
| render timechart with ( xtitle="Week", ytitle="Signin Count", title="Passwordless methods per week")

Legacy Authentication

There is no better managed Azure AD tenant than one where legacy auth is completely disabled. Legacy auth is a security issue because it isn’t MFA aware. If one of your users are compromised, they could bypass MFA policies by using legacy clients such as IMAP or ActiveSync. The only conditional access rules that work for legacy auth are allow or block. Because conditional access defaults to allow, unless you explicitly block legacy auth, those connections will be allowed.

Microsoft are looking to retire legacy auth in Exchange Online on October 1st, 2022 which is fantastic. However, legacy auth can be used for non Exchange Online workloads. We can use our sign in log data to track exactly where legacy auth is used. That way we can not only be ready for October 1, but maybe we can retire it from our tenant way before then. Win win!

The Azure AD sign in logs contain useful information about what app is being used during a legacy sign in. Let’s start there and look at all the various legacy client apps. We can also retrieve the users for each.

SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where ClientAppUsed in ("Exchange ActiveSync", "Exchange Web Services", "AutoDiscover", "Unknown", "POP3", "IMAP4", "Other clients", "Authenticated SMTP", "MAPI Over HTTP", "Offline Address Book")
| summarize ['Count of legacy auth attempts'] = count()by ClientAppUsed, UserPrincipalName
| sort by ClientAppUsed asc, ['Count of legacy auth attempts'] desc

This will show us each client app, such as IMAP or Activesync. For each app it lists the most active users for each. That will give you good direction to start migrating users and applications to modern auth.

If you want to visualize how you are going with disabling legacy auth, we can do that too. We can even compare that to how many legacy auth connections are blocked.

SigninLogs
| where TimeGenerated > ago(180d)
| where ResultType in ("0", "53003")
| where ClientAppUsed in ("Exchange ActiveSync", "Exchange Web Services", "AutoDiscover", "Unknown", "POP3", "IMAP4", "Other clients", "Authenticated SMTP", "MAPI Over HTTP", "Offline Address Book")
| summarize
    ['Legacy Auth Users Allowed']=dcountif(UserPrincipalName, ResultType == 0),
    ['Legacy Auth Users Blocked']=dcountif(UserPrincipalName, ResultType == 53003)
    by bin(TimeGenerated, 1d)
| render timechart with (ytitle="Count",title="Legacy auth distinct users allowed vs blocked by Conditional Access")

Hopefully your allowed connections are on the decrease.

You may be wondering why blocks aren’t increasing at the same time. That is easily explained. For instance, say you migrate from Activesync to using the Outlook app on your phone. Once you make that change, there simply won’t be legacy auth connection to block anymore. Visualizing the blocks however provides a nice baseline. If you see a sudden spike, then something out there is still trying to connect and you should investigate.

Conditional Access

Azure AD conditional access is key to security for your Azure AD tenant. It decides who is allowed in, or isn’t. It also defines the rules people must follow to be allowed in. For instance, requiring multi factor authentication. Azure AD conditional access evaluates every sign into your tenant, and decides if they are approved to enter. The detail for conditional access evaluation is held within every sign in event. If we look at a sign in event from the demo environment, we can see what this data looks like.

So Azure AD evaluated this sign in. It determined the policy ‘MeganB MCAS Proxy’ was in scope for this sign in. So then it enforced the sign in to go through Defender for Cloud Apps (previously Cloud App Security). You can imagine if you have lots of policies, and lots of sign ins, this is a huge amount of data. We can summarize this data in lots of ways. Like any security control, we should regularly review to confirm that control is in use. We can find any policies that have had either no success events (user allowed in). And also no failure events (user blocked).

SigninLogs
| where TimeGenerated > ago (30d)
| project TimeGenerated, ConditionalAccessPolicies
| mv-expand ConditionalAccessPolicies
| extend CAResult = tostring(ConditionalAccessPolicies.result)
| extend CAPolicyName = tostring(ConditionalAccessPolicies.displayName)
| summarize CAResults=make_set(CAResult) by CAPolicyName
| where CAResults !has "success" and CAResults !has "failure"

In this test tenant we can see some policies that we have a hit on.

Some of these policies are not enabled. Some are in report only mode. Or they are simply not applying to any sign ins. You should review this list to make sure it is what you expect. If you are seeing lots of ‘notApplied’ results, make sure you have configured your policies properly.

If you wanted to focus on conditional access failures specifically, you can do that too. This query will find any policy with failures, then return the reason for the failure. This can be simply informational for you to show they are working as intended. Or if you are getting excessive failures maybe your policy needs tuning.

SigninLogs
| where TimeGenerated > ago (30d)
| project TimeGenerated, ConditionalAccessPolicies, ResultType, ResultDescription
| mv-expand ConditionalAccessPolicies
| extend CAResult = tostring(ConditionalAccessPolicies.result)
| extend CAPolicyName = tostring(ConditionalAccessPolicies.displayName)
| where CAResult == "failure"
| summarize CAFailureCount=count()by CAPolicyName, ResultType, ResultDescription
| sort by CAFailureCount desc

You could visualize the same failures if you wanted to look at any trends or spikes.

let start = now(-90d);
let end = now();
let timeframe= 12h;
SigninLogs
| project TimeGenerated, ResultType, ConditionalAccessPolicies
| where ResultType == 53003
| extend FailedPolicy = tostring(ConditionalAccessPolicies[0].displayName)
| make-series FailureCount = count() default=0 on TimeGenerated in range(start,end, timeframe) by FailedPolicy
| render timechart

Summary

I have provided you with a few examples of different queries to help manage your tenant. KQL gives you the power to manipulate your data in so many ways. Have a think about what is important to you. You can then hopefully use the above examples as a starting point to find what you need.

If you are licensed for the Microsoft provided tools then definitely use them. However if there are gaps, don’t be scared of looking at the data yourself. KQL is powerful and easy to use.

There are also a number of provided workbooks in your Azure AD tenant you can use too. You can find them under ‘workbooks’ in Azure AD. They cover some queries similar to this, and plenty more.

You need to combine the Microsoft tools and your own knowledge to effectively manage your directory.

Using time to your advantage in Azure Sentinel

1st Oct 2021/mzorich/Leave a comment

Adversary hunting would be a lot easier if we were always looking for a single event that we knew was malicious, but unfortunately that isn’t always the case. Often when hunting for threats, a combination of events over a certain time period may be added cause for concern, or events happening at certain times of the day are more suspicious to you. Take for example a user setting up a mail forward in Outlook, that may not be inherently suspicious on its own but if it happened not too long after an abnormal sign on, then that would certainly increase the severity. Perhaps particular administrative actions outside of normal business hours would be an indicator of compromise.

Azure Sentinel and KQL have an array of really great operators to help you manipulate and tune your queries to leverage time as an added resource when hunting. We can use logic such as hunting for activities before and after a particular event, look for actions only after an event, or even calculate the time between particular events, and use that as a signal. Some of the operators worth getting familiar with are – ago, between and timespan.

I always try to remember this graphic when writing queries, Azure Sentinel/Log Analytics is highly optimized for log data, so the quicker you can filter down to the time period you care about, the faster your results will be. Searching all your data for a particular event, then filtering down to the last day will be significantly slower than filtering your data to the last day, then finding your particular event.

We often forget in Sentinel/KQL that we can apply where causes to time in the same way we would any other data, such as usernames, event ids, error codes or any other string data. You have probably written a thousand queries that start with something similar to this –

Sometable
| where TimeGenerated > ago (2h)

But you can filter your time data even before writing the rest of your queries, maybe you want to look at 7 days of data, but only between midnight and 4am. So first take 7 days of data, then slice out the 4 hours you care about.

Sometable
| where TimeGenerated > ago (7d)
| where hourofday( TimeGenerated ) between (0 .. 3)

If you want to exclude instead of include particular hours then you can use !between.

Perhaps you are interested in admin staff who have activated Azure AD PIM roles after hours, using KQL we can leverage the hourofday function to query only between particular hours. Remember that by default Sentinel will query on UTC time, so extend a column first to create a time zone that makes sense to you. The below query will find any PIM activations that aren’t between 5am and 8pm in UTC+5.

AuditLogs
| extend LocalTime=TimeGenerated+5h
| where hourofday( LocalTime) !between (5 .. 19)
| where OperationName == "Add member to role completed (PIM activation)"

If we take our example from the start of the post, we can detect when a user is flagged for a suspicious logon, in this case via Azure AD Identity Protection, and then within two hours created a mail forward in Office 365. This behaviour is often seen by attackers hoping to exfiltrate data or maintain a foothold in your environment.

SecurityAlert
| where ProviderName == "IPC"
| project AlertTime=TimeGenerated, CompromisedEntity
| join kind=inner 
(
OfficeActivity
| extend ForwardTime=TimeGenerated
) on $left.CompromisedEntity == $right.UserId
| where Operation == "Set-Mailbox"
| where Parameters contains "DeliverToMailboxAndForward"
| extend TimeDelta = abs(ForwardTime - TimeGenerated)
| where TimeDelta < 2h
| project AlertTime, ForwardTime, CompromisedEntity

For this query we take the time the alert was generated, rename it to AlertTime, and the userprincipalname of the compromised entity, join it to our OfficeActivity table looking for mail forward creation events. Then finally we use the abs operator to calculate the time between the forward creation and the identity protection alert and only flag when it is less than 2 hours. There are many ways to create forwards in Outlook (such as via mailbox rules), this is just showing one particular method, but the example is more to drive the use of time as a detection method than being all encompassing.

We can also use a particular event as a starting point, then retrieve data from either side of that event. Say a user triggers an ‘unfamiliar sign-in properties’ event. We can use the time of that alert as an anchor point, and retrieve the 60 minutes of sign in data either side of the alert to give us some really great context. We do this by using a combination of the between and timespan operators

SecurityAlert
| where AlertName == "Unfamiliar sign-in properties"
| project AlertTime=TimeGenerated, UserPrincipalName=CompromisedEntity
| join kind=inner 
(
SigninLogs
) on UserPrincipalName
| where TimeGenerated between ((AlertTime-timespan(60m)).. (AlertTime+timespan(60m)))
| project UserPrincipalName, SigninTime=TimeGenerated, AlertTime, AppDisplayName, ResultType, UserAgent, IPAddress, Location

We can see both the events prior to and those after the alert time.

You can use these time operators with much more detailed hunting too, if you use the anomaly detection operators, you can tune your detections to only parts of the day. Taking the example from that post, maybe we are interested in particular failed sign in activities, but only in non regular working hours.

let starttime = 7d;
let timeframe = 1h;
let resultcodes = dynamic(["50126","53003","50105"]);
let outlierusers=
SigninLogs
| where TimeGenerated > ago(starttime)
| where hourofday( TimeGenerated) !between (6 .. 18)
| where ResultType in (resultcodes)
| project TimeGenerated, UserPrincipalName, ResultType, AppDisplayName, Location
| order by TimeGenerated
| summarize Events=count()by UserPrincipalName, bin(TimeGenerated, timeframe)
| summarize EventCount=make_list(Events),TimeGenerated=make_list(TimeGenerated) by UserPrincipalName
| extend outliers=series_decompose_anomalies(EventCount)
| mv-expand TimeGenerated, EventCount, outliers
| where outliers == 1
| distinct UserPrincipalName;
SigninLogs
| where TimeGenerated > ago(starttime)
| where UserPrincipalName in (outlierusers)
| where ResultType != 0
| summarize LogonCount=count() by UserPrincipalName, bin(TimeGenerated, timeframe)
| render timechart

I have some more examples of similar alerts in my GitHub repo, such as Azure Key Vault access manipulation.

Enforce PIM compliance with Azure Sentinel and Playbooks

26th Jul 2021/mzorich/3 Comments

Azure AD Privileged Identity Management is a really fantastic tool that lets you provide governance around access to Azure AD roles and Azure resources, by providing just in time access, step up authentication, approvals and a lot of great reporting. For those with Azure AD P2 licensing, you should roll it out ASAP. There are plenty of guides on deploying PIM, so I won’t go back over those, but more focus on how we can leverage Azure Sentinel to make sure the rules are being followed in your environment.

PIM actions are logged to the AuditLogs table, you can find any operations associated by searching for PIM

AuditLogs
| summarize count() by OperationName
| where OperationName contains "PIM"

If you have had PIM enabled for a while, you will see lot of different activities, I won’t list them all here, but you will see each time someone activates a role, when they are assigned to roles, when new roles are onboarded and so on. Most of the items will just be business as usual activity and useful for auditing but nothing we need to alert on or respond to. One big gap of PIM is that users can still be assigned roles directly, so instead of having just in time access to a role, or require an MFA challenge to activate they are permanently assigned to roles – this may not be an issue for some roles like Message Center Reader, but you definitely want to avoid it for highly privileged roles like Global Administrator, Exchange Administrator, Security Administrator and whichever else you deem high risk. This could be an admin trying to get around policy or something more sinister.

Thankfully we get an operation each time this happens, ready to to act on. We can query the AuditLogs for these events, then retrieve the information about who was added to which role, and who did it in case we want to follow up with them. For this example I added our test user to the Power Platform Administrator role outside of PIM.

AuditLogs
| where OperationName startswith "Add member to role outside of PIM"
| extend AADRoleDisplayName = tostring(TargetResources[0].displayName)
| extend AADRoleId = tostring(AdditionalDetails[0].value)
| extend AADUserAdded = tostring(TargetResources[2].displayName)
| extend AADObjectId = tostring(TargetResources[2].id)
| extend UserWhoAdded = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| project TimeGenerated, OperationName, AADRoleDisplayName, AADRoleId, AADUserAdded, AADObjectId, UserWhoAdded

If you don’t want to automatically remediate all your roles, you could put the ones you want to target into a Watchlist and complete a lookup on that first. The role names and role ids are the same for all Azure AD tenants, so you can get them here. Create a Watchlist, in this example called PrivilegedAADRoles with the names and ids of the ones you wish to monitor and remediate. Then just query on assignments to groups in your Watchlist.

Now we can include being in that Watchlist as part of the logic we will use when we write our query. Keep in mind you will still get logs for any assignments outside of PIM, we are just limiting the scope here for our remediation.

let AADRoles = (_GetWatchlist("PrivilegedAADRoles")|project AADRoleId);
AuditLogs
| where OperationName startswith "Add member to role outside of PIM"
| extend AADRoleDisplayName = tostring(TargetResources[0].displayName)
| extend AADRoleId = tostring(AdditionalDetails[0].value)
| extend AADUserAdded = tostring(TargetResources[2].displayName)
| extend AADObjectId = tostring(TargetResources[2].id)
| extend UserWhoAdded = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| where AADRoleId in (AADRoles)
| project OperationName, AADRoleDisplayName, AADRoleId, AADUserAdded, AADObjectId, UserWhoAdded

Now to address these automatically. First, let’s create our playbook that will automatically remove any users who were assigned outside of PIM. You can call it whatever makes sense for you, now much like the example here, if you want to do your secrets management in an Azure Key Vault, then assign the new logic app rights to read secrets. The service principal you use for this automation will need to be able to manage membership of Global Administrators, so will therefore need to be one itself, so make sure you keep your credentials for it safe.

We want the trigger of our playbook to be ‘When Azure Incident creation rule was triggered’. Then the first thing we are going to do is create a couple of variables, one for the role id that was changed and one for the AAD object id for the user who was added. We will map these through using entity mapping when we create our analytics rule in Sentinel – which we will circle back on and create once our playbook is built. Let’s retrieve the entities from our incident – for this example we will map RoleId to hostname and object id for the user to AADUserID.

Then we grab our AADUserId and RoleId from the entities and append them to our variables ready to re-use them.

Next we use the Key Vault connect to grab our client id, tenant id and client secret from Key Vault, then we are going to POST to the MS Graph to retrieve an access token to re-use as authorization to remove the user.

We will need to parse the JSON response to then re-use the token as authorization, the schema is

{
    "properties": {
        "access_token": {
            "type": "string"
        },
        "expires_in": {
            "type": "string"
        },
        "expires_on": {
            "type": "string"
        },
        "ext_expires_in": {
            "type": "string"
        },
        "not_before": {
            "type": "string"
        },
        "resource": {
            "type": "string"
        },
        "token_type": {
            "type": "string"
        }
    },
    "type": "object"
}

Now we have proven we have access to remove the user who was added outside of PIM, we will POST back to MS Graph using the ‘Remove directory role member’ action outlined here. As a precautionary step, we will revoke the users sessions so if they had a session open with added privileged it has now been logged out. You can also add some kind of notification here, maybe raise an incident in Service Now, or email the user telling them they have had their role removed and inform them about your PIM policies.

To round out the solution, we create our Analytics rule in Sentinel, this is one I would run as often as possible because you want to revoke that access ASAP. So if you run it every 5 minutes, looking at the last 5 minutes of data, then complete the entity mapping outlined below, to match our playbook entities.

When we get to Automated response, create a new incident automation rule that runs the playbook we just built. Then activate the analytics rule.

Now you can give it a test if you want, add someone to a role outside of PIM, within ~10 minutes (to allow the AuditLogs to stream to Azure Sentinel, then your Analytics rule to fire), they should be removed and be logged back out of Azure AD.

Microsoft Sentinel 101

Learning Microsoft Sentinel, one KQL error at a time

Tag: Privileged Identity Management

Maintaining a well managed Azure AD tenant with KQL

Using time to your advantage in Azure Sentinel

Enforce PIM compliance with Azure Sentinel and Playbooks