Using Logic Apps and Microsoft Sentinel to alert on expiring Azure AD Secrets

Azure AD app registrations are at the heart of the Microsoft Identity Platform, and Microsoft recommend you rotate secrets on them often. However, there is currently no native way to alert on secrets that are due to expire. An expired secret means the application will no longer authenticate, so you may have systems that fail when the secret expires. A number of people have come up with solutions, such as using Power Automate or deploying an app that you can configure notifications with. Another solution is using Logic Apps, KQL and Microsoft Sentinel to build a very low cost and light weight solution.

We will need two Logic Apps for our automation – the first will query Microsoft Graph to retrieve information (including password expiry dates) for all our applications, our Logic App will then push that data into a custom table in Sentinel. Next we will write a Kusto query to find apps with secrets due to expire shortly, then finally we will build a second Logic App to run our query on a schedule for us, and email a table with the apps that need new secrets.

Both our Logic Apps are really simple, the first looks as follows –

The first part of this app is to connect to the Microsoft Graph to retrieve a token for re-use. You can set your Logic App to run on a recurrence for whatever schedule makes sense, for my example I am running it daily, so we will get updated application data into Sentinel each day. Then we will need the ClientID, TenantID and Secret for an Azure AD App Registration with enough permission to retrieve application information using the ‘Get application‘ action. From the documentation we can see that Application.Read.All is the lowest privilege access we will need to give our app registration.

Our first 3 actions are just to retrieve those credentials from Azure Key Vault, if you don’t use Azure Key Vault you can just add them as variables (or however you handle secrets management). Then we call MS Graph to retrieve a token.

Put your TenantID in the URI and the ClientID and Secret in the body respectively. Then parse the JSON response using the following schema.

{
    "properties": {
        "access_token": {
            "type": "string"
        },
        "expires_in": {
            "type": "string"
        },
        "expires_on": {
            "type": "string"
        },
        "ext_expires_in": {
            "type": "string"
        },
        "not_before": {
            "type": "string"
        },
        "resource": {
            "type": "string"
        },
        "token_type": {
            "type": "string"
        }
    },
    "type": "object"
}

Now we have our token, we can re-use it to access the applications endpoint on Microsoft Graph to retrieve the details for all our applications. So use a HTTP action again, and this time use a GET action.

We connect to https://graph.microsoft.com/v1.0/applications?$select=displayName,appId,passwordCredentials

In this example we are just retrieving the displayname of our app, the application id and the password credentials. The applications endpoint has much more detail though if you wanted to include other data to enrich your logs.

One key note here is dealing with data paging in Microsoft Graph, MS Graph will only return a certain amount of data at a time, and also include a link to retrieve the next set of data, and so on until you have retrieved all the results – and with Azure AD apps you are almost certainly going to have too many to retrieve at once. Logic Apps can deal with this natively thankfully, on your ‘Retrieve App Details’ action, click the three dots in the top right and choose settings, then enable Pagination so that it knows to loop through until all the data is retrieved.

Then we parse the response once more, if you are just using displayname, appid and password credentials like I am, then the schema for your json is.

{
    "properties": {
        "value": {
            "items": {
                "properties": {
                    "appId": {
                        "type": "string"
                    },
                    "displayName": {
                        "type": "string"
                    },
                    "passwordCredentials": {
                        "items": {
                            "properties": {
                                "customKeyIdentifier": {},
                                "displayName": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                },
                                "endDateTime": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                },
                                "hint": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                },
                                "keyId": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                },
                                "secretText": {},
                                "startDateTime": {
                                    "type": [
                                        "string",
                                        "null"
                                    ]
                                }
                            },
                            "required": [
                                "customKeyIdentifier",
                                "displayName",
                                "endDateTime",
                                "hint",
                                "keyId",
                                "secretText",
                                "startDateTime"
                            ],
                            "type": [
                                "object",
                                "null",
                                "array"
                            ]
                        },
                        "type": "array"
                    }
                },
                "required": [
                    "displayName",
                    "appId",
                    "passwordCredentials"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "type": "object"
}

Then our last step of our first Logic App is to send the data using the Azure Log Analytics Data Collector to Microsoft Sentinel. So take each value from your JSON and send it to Sentinel, because you will have lots of apps, it will loop through each one. For this example the logs will be sent to the AzureADApps_CL table.

You can then query your AzureADApps_CL table like you would any data and you should see a list of your application display names, their app ids and any password credentials. If you are writing to that table for the first time, just give it 20 or so minutes to appear. So now we need some KQL to find which ones are expiring. If you followed this example along you can use the following query –

AzureADApps_CL
| where TimeGenerated > ago(7d)
| extend AppDisplayName = tostring(displayName_s)
| extend AppId = tostring(appId_g)
| summarize arg_max (TimeGenerated, *) by AppDisplayName
| extend Credentials = todynamic(passwordCredentials_s)
| project AppDisplayName, AppId, Credentials
| mv-expand Credentials
| extend x = todatetime(Credentials.endDateTime)
| project AppDisplayName, AppId, Credentials, x
| where x between (now()..ago(-30d))
| extend SecretName = tostring(Credentials.displayName)
| extend PasswordEndDate = format_datetime(x, 'dd-MM-yyyy [HH:mm:ss tt]')
| project AppDisplayName, AppId, SecretName, PasswordEndDate
| sort by AppDisplayName desc 

You should see an output if any applications that have a secret expiring in the next 30 days (if you have any).

Now we have our data, the second Logic App is simple, you just need it to run on whatever scheduled you like (say weekly), run the query for you against Sentinel (using the Azure Monitor Logs connector), build a simple HTML table and email it to whoever wants to know.

If you use the same Kusto query that I have then the schema for your Parse JSON action is.

{
    "properties": {
        "value": {
            "items": {
                "properties": {
                    "AppDisplayName": {
                        "type": "string"
                    },
                    "AppId": {
                        "type": "string"
                    },
                    "PasswordEndDate": {
                        "type": "string"
                    },
                    "SecretName": {
                        "type": [
                            "string",
                            "null"
                        ]
                    }
                },
                "required": [
                    "AppDisplayName",
                    "AppId",
                    "SecretName",
                    "PasswordEndDate"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "type": "object"
}

Now that you have that data in Microsoft Sentinel you could also run other queries against it, such as seeing how many apps you have created or removed each week, or if applications have expired secrets and no one has requested a new one; they may be inactive and can be deleted.

Detecting multistage attacks in Microsoft Sentinel

For defenders, it would be really amazing if every threat we faced was a single event or action that we could detect – we would know that if x happened, then we need to do y and the threat was detected and prevented. Unfortunately not every threat we face is a single event; it may be the combination of several low priority events that on their own may not raise alarms, but when combined are an indicator of more malicious activity. For instance, you probably receive a lot of identity alerts that are considered low risk, such as users accessing via a new device, or a new location – most are likely benign. If you then detected that same user accessed SharePoint from a location not seen before, that may increase the risk level, and if that user then started downloading a lot of data suddenly that may be really serious.

That pattern follows the MITRE ATT&CK framework where we may see initial access, followed by discovery then exfiltration. Thankfully we can build our own queries to hunt for these kinds of attacks. Microsoft also provide multistage protection via their fusion detections in Microsoft Sentinel.

We can send all kinds of data to Microsoft Sentinel, logs from on premise domain controllers or servers, Azure AD telemetry, logs from our endpoint devices and whatever else you think is valuable. Microsoft Sentinel and the Kusto Query Language provide the ability to look for attacks that may span across different sources. There are several ways to join datasets in KQL, this blog we are going to focus on just the join operator. At its most basic, join allows us to combine data from different tables together based on something that matches between the two tables.

For instance, if we have our Azure AD sign in data, which is sent to the SigninLogs table and our Office 365 audit logs which are sent to the OfficeActivity table, we have various options to where we may find a match between these two tables – such as usernames and IP addresses for example. So we could join the two tables based on a username, and match Azure AD sign in data with Office 365 activity data belonging to the same user. Maybe a user signed into Azure AD from a location previously not seen for them before, so then we would be interested in what actions were taken in Office 365 after that sign in event.

When we join data in Microsoft Sentinel we have a lot of options, to keep things straight forward for this post, we are just going to use ‘inner’ joins, where we look for matches between multiple tables and return the combined data. So using our Azure AD and Office 365 example, after completing an inner join, we would see the data from both tables available to us – such as location, conditional access results or user agent from the Azure AD table and actions such as downloading files from OneDrive or inviting users to Teams, from the Office 365 table. There are other types of joins, referenced in the documentation, but we will explore those in a future post. Learning to join tables was one of the things that confused me the most initially in KQL, but it provides immense value.

If we start with something simple, we can join our Azure AD sign in logs to our Azure AD Risk Events (held in the AADUserRiskEvents table), if we build a simple query and tell KQL to join the tables together, you will see it automatically tells us where there is a match in data.

The TimeGenerated, CorrelationId and UserPrincipalName fields exist in both tables. If we join on our CorrelationId, we can then see we get options to fill in our query from both tables

Where the same column exists on both sides you will see it automatically renames one, seen with ‘CorrelationId1’. We can then finish our query with data from both tables

SigninLogs
| project TimeGenerated, UserPrincipalName, AppDisplayName, ResultType, CorrelationId
| join kind=inner
(AADUserRiskEvents)
on CorrelationId
| project TimeGenerated, UserPrincipalName, CorrelationId, ResultType, DetectionTimingType, RiskState, RiskLevel

We get the TimeGenerated, UserPrincipalName, ResultType from Azure AD sign in data, and the DetectionTimingType, RiskState and RiskLevel from AADUserRiskEvents, and we use the CorrelationId to join them together.

We can use these basics as a foundation to start adding some more logic to our queries. In this next example we are looking for AADUserRiskEvents, and this time joining to our Azure AD Audit table (where Azure AD changes are tracked) looking for events where the same user who flagged a risk event also changed MFA details within a short time frame.

let starttime = 45d;
let timeframe = 4h;
AADUserRiskEvents
| where TimeGenerated > ago(starttime)
| where RiskDetail != "aiConfirmedSigninSafe"
| project RiskTime=TimeGenerated, UserPrincipalName, RiskEventType, RiskLevel, Source
| join kind=inner (
    AuditLogs
    | where OperationName in ("User registered security info", "User deleted security info")
    | where Result == "success"
    | extend UserPrincipalName = tostring(TargetResources[0].userPrincipalName)
    | project SecurityInfoTime=TimeGenerated, OperationName, UserPrincipalName, Result, ResultReason)
    on UserPrincipalName
| project RiskTime, SecurityInfoTime, UserPrincipalName, RiskEventType, RiskLevel, Source, OperationName, ResultReason
| where (SecurityInfoTime - RiskTime) between (0min .. timeframe)

This query is a little more complex but it follows the same pattern. First we set a couple of time variables, we are going to look back through 45 days of data and we want to set a time frame of four hours between our events. If a risk event is triggered initially, but then the MFA event doesn’t occur for two weeks, then it is not as likely to be linked compared to these events happening close together. Next, we look up our AADUserRiskEvents, exclude anything that Microsoft dismiss as safe and then we take the details we want to use in our second query – the UserPrincipalName, RiskEventType, RiskLevel and Source, we also take the TimeGenerated, but to make things more simple to understand we rename it to RiskTime, so that it is easy to distinguish later on.

Then to finish our our query, we again inner join, this time to our AuditLogs table, looking for MFA registration or deletion events, and we join the tables together based on UserPrincipalName, that way we know the same user who flagged the risk event also changed MFA details. We rename the time of the second event to SecurityInfoTime to make our data easy to read. Fnally, to add our time logic, we calculate the time between the two separate events and then alert only when that time is less than four hours.

We can re-use this same pattern across all kinds of data, this query follows basically the exact same format, except we are looking for a risk event followed by access to an Azure management interface. If a user flagged a risk event, then within four hours signed into Azure, we would be alerted.

let starttime = 45d;
let timeframe = 4h;
let applications = dynamic(["Azure Active Directory PowerShell", "Microsoft Azure PowerShell", "Graph Explorer", "ACOM Azure Website"]);
AADUserRiskEvents
| where TimeGenerated > ago(starttime)
| where RiskDetail != "aiConfirmedSigninSafe"
| project RiskTime=TimeGenerated, UserPrincipalName, RiskEventType, RiskLevel, Source
| join kind=inner (
    SigninLogs
    | where AppDisplayName in (applications)
    | where ResultType == "0")
    on UserPrincipalName
| project-rename AzureSigninTime=TimeGenerated
| extend TimeDelta = AzureSigninTime - RiskTime
| project RiskTime, AzureSigninTime, TimeDelta, UserPrincipalName, RiskEventType, RiskLevel, Source
| where (AzureSigninTime - RiskTime) between (0min .. timeframe)

We can even have KQL calculate the time between two events for you to easily see the time difference between the two. You do this by simply extending a new column and having it calculate it for you (| extend TimeDelta = AzureSigninTime – RiskTime )

You can extend these queries across any data that makes sense, so we can again take a risk event, but this time join it to our Office 365 activity logs to find a list of files that a user has downloaded shortly after flagging that risk event.

let starttime = 45d;
let timeframe = 4h;
AADUserRiskEvents
| where TimeGenerated > ago(starttime)
| where RiskDetail != "aiConfirmedSigninSafe"
| project RiskTime=TimeGenerated, UserPrincipalName, RiskEventType, RiskLevel, Source
| join kind=inner (
    OfficeActivity
    | where Operation in ("FileSyncDownloadedFull", "FileDownloaded"))
    on $left.UserPrincipalName == $right.UserId
| project DownloadTime=TimeGenerated, OfficeObjectId, RiskTime, UserId
| where (DownloadTime - RiskTime) between (0min .. timeframe)
| summarize RiskyDownloads=make_set(OfficeObjectId) by UserId
| where array_length( RiskyDownloads) > 10

We use much the same query structure, but there are two things to note here, the AADUserRiskEvents and OfficeActivity store username data in two different columns, so we need to manually tell Microsoft Sentinel how to join, which we do by “on $left.UserPrincipalName == $right.UserId”. We are telling KQL that the UserPrincipalName from our first table (AADUserRiskEvents) is the same as the UserId in our second table (OfficeActivity). Data coming in from different vendors, and even Microsoft themselves, is wildly inconsistent, so you will need to provide the brain power to link them together. In this example, we also summarize the list of downloads the risky user has taken, and only alert when it is greater than 10 unique files.

These kind of multistage queries don’t need to be limited to users or identity type events, you can use the same structure to query device data, or anything else that is relevant to you.

let timeframe = 48h;
SecurityAlert
| where ProviderName == "MDATP"
| project AlertTime=TimeGenerated,DeviceName=CompromisedEntity, AlertName
| join kind=inner (
DeviceLogonEvents
| project TimeGenerated, LogonType, ActionType, InitiatingProcessCommandLine, IsLocalAdmin, AccountName, DeviceName
| where LogonType in ("Interactive","RemoteInteractive")
| where ActionType == "LogonSuccess"
| where InitiatingProcessCommandLine == "lsass.exe"
) on DeviceName
| where (AlertTime - TimeGenerated) between (0min .. timeframe)
| summarize arg_max(TimeGenerated, *) by DeviceName
| project LogonTime=TimeGenerated, AlertTime, AlertName, DeviceName, AccountName, IsLocalAdmin

In this last example, we take an alert from Microsoft Defender for Endpoint, then use that first event to circle back to our DeviceLogonEvents which tracks logon event data on Windows devices, from there we can track down who was the most recent user to sign onto that device, and also determine if they are a local administrator.

Keep an eye on your Azure AD guests with Microsoft Sentinel

Azure AD External Identities (previously Azure AD B2B) is a fantastic way to collaborate with partners, customers or other people external to your company. Previously you may have needed to onboard an Active Directory account for each user, which came with a lot of inherit privilege, or you used different authentication methods for your applications, and you ended up juggling credentials for all these different systems. By leveraging Azure AD External Identities you start to wrestle back some of that control and importantly get really strong visibility into what these guests are doing.

You invite a guest to your tenant by sending them an email from within the Azure Active Directory portal (or directly inviting them in an app like Teams), they go through the process of accepting and then you have a user account for them in your tenant – easy!

If the user you invite to your tenant belongs to a domain that is also an Azure AD tenant, they can use their own credentials from that tenant to access resources in your tenant. If it’s a personal address like gmail.com then the user will be prompted to sign up to a Microsoft account or use a one time passcode if you have configured that option.

If you browse through your Azure AD environment and already have guests, you can filter to just guest accounts. If you don’t have guests, invite your personal email and you can check out the process.

You will notice that they have a unique UserPrincipalName format, if your guests email address is test123@gmail.com then the guest object in your directory has the UserPrincipalName of test123_gmail.com#EXT#@YOURTENANT.onmicrosoft.com – this makes sense if you think about the concept of a guest account, it could belong to many different tenants so it needs to have a unique UPN in your tenant. You can also see a few more details by clicking through to a guest account. You can see if an invite has been accepted or not, a guest who hasn’t accepted is still an object in your directory, they just can’t access any resources yet.

And if you click the view more arrow, you can see if source of the account.

You can see the difference between a user coming in from another Azure AD tenant vs a personal account.

It is really easy to invite guest accounts and then kind of forget about them, or not treat them with the same scrutiny or governance you would a regular account. They also have a tendency to grow in total count very quickly, especially if you allow your staff to invite them themselves, via Teams or any other method.

Remember though these accounts all have some access to your tenant, potentially data in Teams, OneDrive or SharePoint, and likely an app or two that you have granted access to – or more worryingly apps that you haven’t specifically blocked them accessing. Guests can even be granted access to Azure AD roles, or be given access to Azure resources via Azure RBAC.

Thankfully in Microsoft (no longer Azure!) Sentinel, all the signals we get from sign-in data, or audit logs, or Office 365 logs don’t discriminate between members and guests (apart from some personal information that is hidden for guests such as device names), which makes it a really great platform to get insights to what your guests are up to (or what they are no longer up to).

Invites sent and redeemed are collected in the AuditLogs table, so if you want to quickly visualize how many invites you are sending vs those being redeemed you can.

//Visualizes the total amount of guest invites sent to those redeemed
let timerange=180d;
let timeframe=7d;
AuditLogs
| where TimeGenerated > ago (timerange)
| where OperationName in ("Redeem external user invite", "Invite external user")
| summarize
    InvitesSent=countif(OperationName == "Invite external user"),
    InvitesRedeemed=countif(OperationName == "Redeem external user invite")
    by bin(TimeGenerated, timeframe)
| render columnchart
    with (
    title="Guest Invites Sent v Guest Invites Redeemed",
    xtitle="Invites",
    kind=unstacked)

You can look for users that have been invited, but have not yet redeemed their invite. Guest invites never expire, so if a user hasn’t accepted after a couple of months it may be worth removing the invite until a time they genuinely require it. In this query we exclude invites sent in the last month, as those people may have simply not got around to redeeming their invite yet.

//Lists guests who have been invited but not yet redeemed their invites. Excludes newly invited guests (last 30 days).
let timerange=180d;
let timeframe=30d;
AuditLogs
| where TimeGenerated between (ago(timerange) .. ago(timeframe)) 
| where OperationName == "Invite external user"
| extend GuestUPN = tolower(tostring(TargetResources[0].userPrincipalName))
| project TimeGenerated, GuestUPN
| join kind=leftanti  (
    AuditLogs
    | where TimeGenerated > ago (timerange)
    | where OperationName == "Redeem external user invite"
    | where CorrelationId <> "00000000-0000-0000-0000-000000000000"
    | extend d = tolower(tostring(TargetResources[0].displayName))
    | parse d with * "upn: " GuestUPN "," *
    | project TimeGenerated, GuestUPN)
    on GuestUPN
| distinct GuestUPN

For those users that have accepted and are actively accessing applications, we can see what they are accessing just like a regular user. You could break down all your apps and have a look at the split between guests and members for each application.

//Creates a list of your applications and summarizes successful signins by members vs guests
let timerange=30d;
SigninLogs
| where TimeGenerated > ago(timerange)
| project TimeGenerated, UserType, ResultType, AppDisplayName
| where ResultType == 0
| summarize
    MemberSignins=countif(UserType == "Member"),
    GuestSignins=countif(UserType == "Guest")
    by AppDisplayName
| sort by AppDisplayName  

You can quickly see which users haven’t signed in over the last month, having signed in successfully in the preceding 6 months.

let timerange=180d;
let timeframe=30d;
SigninLogs
| where TimeGenerated > ago(timerange)
| where UserType == "Guest" or UserPrincipalName contains "#ext#"
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
| join kind = leftanti  
    (
    SigninLogs
    | where TimeGenerated > ago(timeframe)
    | where UserType == "Guest" or UserPrincipalName contains "#ext#"
    | where ResultType == 0
    | summarize arg_max(TimeGenerated, *) by UserPrincipalName
    )
    on UserPrincipalName
| project UserPrincipalName

Or you could even summarize all your guests (who have signed in at least once) into the month they last accessed your tenant. You could then bulk disable/delete anything over 3 months or whatever your lifecycle policy is.

//Month by month breakdown of when your Azure AD guests last signed in
SigninLogs
| where TimeGenerated > ago (360d)
| where UserType == "Guest" or UserPrincipalName contains "#ext#"
| where ResultType == 0
| summarize arg_max(TimeGenerated, *) by UserPrincipalName
| project TimeGenerated, UserPrincipalName
| summarize InactiveUsers=make_set(UserPrincipalName) by startofmonth(TimeGenerated)

You could look at guests accounts that are trying to access your applications but being denied because they aren’t assigned a role, this could potentially be some reconnaissance occurring in your environment.

SigninLogs
| where UserType == "Guest"
| where ResultType == "50105"
| project TimeGenerated, UserPrincipalName, AppDisplayName, IPAddress, Location, UserAgent

We can leverage the IdentityInfo table to find any guests that have been assigned Azure AD roles. If your security controls for guests are weaker than your member accounts this is something you definitely want to avoid.

IdentityInfo
| where TimeGenerated > ago(21d)
| summarize arg_max(TimeGenerated, *) by AccountUPN
| where UserType == "Guest"
| where AssignedRoles != "[]" 
| where isnotempty(AssignedRoles)
| project AccountUPN, AssignedRoles, AccountObjectId

We can also use our IdentityInfo table again to grab a list of all our guests, then join to our OfficeActivity table to summarize download activities by each of your guests.

//Summarize the total count and the list of files downloaded by guests in your Office 365 tenant
let timeframe=30d;
IdentityInfo
| where TimeGenerated > ago(21d)
| where UserType == "Guest"
| summarize arg_max(TimeGenerated, *) by AccountUPN
| project UserId=tolower(AccountUPN)
| join kind=inner (
    OfficeActivity
    | where TimeGenerated > ago(timeframe)
    | where Operation in ("FileSyncDownloadedFull", "FileDownloaded")
    )
    on UserId
| summarize DownloadCount=count(), DownloadList=make_set(OfficeObjectId) by UserId

If you wanted to summarize which domains are downloading the most data from Office 365 then you can slightly alter the above query (thanks to Alex Verboon for this suggestion).

//Summarize the total count of files downloaded by each guest domain in your tenant
let timeframe=30d;
IdentityInfo
| where TimeGenerated > ago(21d)
| where UserType == "Guest"
| summarize arg_max(TimeGenerated, *) by AccountUPN, MailAddress
| project UserId=tolower(AccountUPN), MailAddress
| join kind=inner (
    OfficeActivity
    | where TimeGenerated > ago(timeframe)
    | where Operation in ("FileSyncDownloadedFull", "FileDownloaded")
    )
    on UserId
| extend username = tostring(split(UserId,"#")[0])
| parse MailAddress with * "@" userdomain 
| summarize count() by userdomain

You can find guests who were added to a Team then instantly started downloading data from your Office 365 tenant.

// Finds guest accounts who were added to a Team and then downloaded documents straight away. 
// startime = data to look back on, timeframe = looks for downloads for this period after being added to the Team
let starttime = 7d;
let timeframe = 2h;
let operations = dynamic(["FileSyncDownloadedFull", "FileDownloaded"]);
OfficeActivity
| where TimeGenerated > ago(starttime)
| where OfficeWorkload == "MicrosoftTeams" 
| where Operation == "MemberAdded"
| extend UserAdded = tostring(parse_json(Members)[0].UPN)
| where UserAdded contains ("#EXT#")
| project TimeAdded=TimeGenerated, UserId=tolower(UserAdded)
| join kind=inner
    (
    OfficeActivity
    | where Operation in (['operations'])
    )
    on UserId
| project DownloadTime=TimeGenerated, TimeAdded, SourceFileName, UserId
| where (DownloadTime - TimeAdded) between (0min .. timeframe)

I think the key takeaway is that basically all your threat hunting queries you write for your standard accounts are most likely relevant to guests, and in some cases more relevant. While having guests in your tenant grants us some control and visibility, it is still an account not entirely under your management. The accounts could have poor passwords, or be shared amongst people, or if coming from another Azure AD tenancy could have poor lifecycle management, i.e they could have left the other company but their account is still active.

As always, prevention is better than detection, and depending on your licensing tier there are some great tools available to govern these accounts.

You can configure guest access restrictions in the Azure Active Directory portal. Keep in mind when configuring these options the flow on effect to other apps, such as Teams. In that same portal you can configure who is allowed to send guest invites, I would particularly recommend you disallow guests inviting other guests. You can also restrict or allow specific domains that invites can be sent to.

On your enterprise applications, make sure you have assignment required set to Yes

This is crucial in my opinion, because it allows Azure AD to be the first ‘gate’ to accessing your applications. The access control in your various applications is going to vary wildly. Some may need an account setup on the application itself to allow people in, some may auto create an account on first sign on, some may have no access control at all and when it sees a sign in from Azure AD it allows the person in. If this is set to no and your applications don’t perform their own access control or RBAC then there is a good chance your guests will be allowed in, as they come through as authenticated from Azure AD much like a member account.

If you are an Azure AD P2 customer, then you have access to Access Reviews, which is an already great and constantly improving offering that lets you automate a lot of the lifecycle of your accounts, including guests. You can also look at leveraging Entitlement Management which can facilitate granting guests the access they require and nothing more.

If you have Azure AD P1 or P2, use Azure AD Conditional Access, you can target policies specifically at guest accounts from within the console.

You can enforce MFA on your guest accounts like you would all other users – if you enforce MFA on an application for guests, the first time they access it they will be redirected to the MFA registration page. You can also explicitly block guests from particular applications using conditional access.

Also unrelated, I recently kicked off a #365daysofkql challenge on my twitter, where I share a query a day for a year, we are nearly one month in so if you want to follow feel free.

Defending Azure Active Directory with Azure Sentinel

Azure Active Directory doesn’t really need any introduction, it is the core of identity within Microsoft 365, used by Azure RBAC and used by millions as an identity provider. The thing about Azure Active Directory is that it isn’t much like Active Directory at all, apart from name they have little in common under the hood. There is no LDAP, no Kerberos, no OU’s. Instead we get SAML, OIDC/OAuth and Microsoft Graph. It has its own unique threats, logging and attack vectors. There are a massive amount of great articles about attacking Azure AD, such as:

The focus of this blog is looking at it from the other side, looking for how we can detect and defend against these activities.

Defending Reconnaissance

Protection against directory reconnaissance in Azure Active Directory can be quite difficult. Any user in your tenant comes with some level of privilege, mostly to be able to ‘look around’ at other objects. You can restrict access to the Azure AD administration portal to users who don’t hold a privileged role under the ‘User settings’ tab in Azure Active Directory and you can configure guest permissions if you use external identities, it won’t stop people using other techniques but it still valuable to harden that portal.

With on-premise Active Directory we get logging on services like LDAP or DNS, and we have products like Defender for Id that can trigger alerts for us – on premise Active Directory has a very strong logging capability. For Azure Active Directory however, we don’t have access to equivalent data unfortunately, we will get sign-in activity of course – so if a user connects to Azure AD PowerShell, that can be tracked. What we can’t see though is the output for any read/get operations. So once connected to PowerShell if a user runs a Get-AzureADUser command, we have no visibility on that. Once a user starts to make changes, such as changing group memberships or deleting users, then we receive log events.

Tools like Azure AD Identity Protection are helpful, but they are sign-in driven and designed to protect users from account compromise. Azure AD Identity Protection won’t detect privilege escalation in Azure AD like Defender for Id for on premise Active Directory can.

So, while that makes things difficult, looking for users signing onto Azure management portals and interfaces is a good place to start –

SigninLogs
| where AppDisplayName in ("Azure Active Directory PowerShell","Microsoft Azure PowerShell","Graph Explorer", "ACOM Azure Website")
| project TimeGenerated, UserPrincipalName, AppDisplayName, Location, IPAddress, UserAgent

These applications have legitimate use though and we don’t want alert fatigue, so to add some more logic to our query, we can look back on the last 90 days (or whatever time frame suits you), then detect users accessing these applications for the first time. This could be a sign of a compromised account being used for reconnaissance.

let timeframe = startofday(ago(60d));
let applications = dynamic(["Azure Active Directory PowerShell", "Microsoft Azure PowerShell", "Graph Explorer", "ACOM Azure Website"]);
SigninLogs
| where TimeGenerated > timeframe and TimeGenerated < startofday(now())
| where AppDisplayName in (applications)
| project UserPrincipalName, AppDisplayName
| join kind=rightanti
    (
    SigninLogs
    | where TimeGenerated > startofday(now())
    | where AppDisplayName in (applications)
    )
    on UserPrincipalName, AppDisplayName
| where ResultType == 0
| project TimeGenerated, UserPrincipalName, ResultType, AppDisplayName, IPAddress, Location, UserAgent

Defending Excessive User Permission

This one is fairly straight forward, but often the simplest things are hardest to get right. Your IT staff, or yourself, will need to manage Azure AD and that’s fine of course, but we need to make sure that roles are fit for purpose. Azure AD has a list of pre-canned and well documented roles, and you can build your own if required. Make sure that roles are being assigned that are appropriate to the job – you don’t need to be a Global Administrator to complete user administration tasks, there are better suited roles. We can detect the assignment of roles to users, if you use Azure AD PIM we can also exclude activations from our query –

AuditLogs
| where Identity <> "MS-PIM"
| where OperationName == "Add member to role"
| extend Target = tostring(TargetResources[0].userPrincipalName)
| extend RoleAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| project TimeGenerated, OperationName, Target, RoleAdded, Actor, ActorIPAddress

If you have a lot of users being moved in and out of roles you can reduce the query down to a selected set of privileged roles if required –

let roles=dynamic(["Global Admininistrator","SharePoint Administrator","Exchange Administrator"]);
AuditLogs
| where OperationName == "Add member to role"
| where Identity <> "MS-PIM"
| extend Target = tostring(TargetResources[0].userPrincipalName)
| extend RoleAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| where RoleAdded in (roles)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| project TimeGenerated, OperationName, Target, RoleAdded, Actor, ActorIPAddress

And if you use Azure AD PIM you can be alerted when users are assigned roles outside of the PIM platform (which you can do via Azure AD PowerShell as an example) –

AuditLogs
| where OperationName startswith "Add member to role outside of PIM"
| extend RoleAdded = tostring(TargetResources[0].displayName)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| extend TargetAADUserId = tostring(TargetResources[2].id)
| project TimeGenerated, OperationName, TargetAADUserId, RoleAdded, Actor

Defending Shared Identity

Unless you are a cloud native company with no on premise Active Directory footprint then you will be syncing user accounts, group objects and devices between on premise and Azure AD. Whether you sync all objects, or a subset of them will depend on your particular environment, but identity is potentially the link between AD and Azure AD. If you use accounts from on premise Active Directory to also manage Azure Active Directory, then the identity security of those accounts are crucial. Microsoft recommend you use cloud only accounts to manage Azure AD, but that may not be practical in your environment, or it’s something you are working toward.

Remember that Azure Active Directory and Active Directory essentially have no knowledge of the privilege an account has on the other system (apart from group membership more broadly). Active Directory doesn’t know that bobsmith@yourcompany.com is a Global Administrator, and Azure Active Directory doesn’t know that the same account has full control over particular OUs as an example. We can visualize this fairly simply.

In isolation each system has its own built in protections, a regular user can’t reset the password of a Domain Admin on premise and in Azure AD a User Administrator can’t reset the password of a Global Administrator. The issue is when we cross that boundary and where there is a link in identity, there is potential for abuse and escalation.

For arguments sake maybe our service desk staff have the privilege to reset the password on a Global Administrator account in Azure AD – because of inherit permissions in AD. It may be easier for an attacker to target a service desk account because they have weaker controls or may be more vulnerable to social engineering – “hey could you reset the password on bobsmith@yourcompany.com for me?”. In on premise AD that account may appear to be quite low privilege.

We can leverage the IdentityInfo table driven by Azure Sentinel UEBA to track down users who have privileged roles, then join that back to on premise SecurityEvents for password reset activity. Then filter out when a privileged Azure AD user has reset their own on premise password – we want events where someone has reset another persons privileged Azure AD account.

let timeframe=1d;
IdentityInfo
| where TimeGenerated > ago(21d)
| where isnotempty(AssignedRoles)
| where AssignedRoles != "[]"
| summarize arg_max(TimeGenerated, *) by AccountUPN
| project AccountUPN, AccountName, AccountSID
| join kind=inner (
    SecurityEvent
    | where TimeGenerated > ago(timeframe)
    | where EventID == "4724"
    | project
        TimeGenerated,
        Activity,
        SubjectAccount,
        TargetAccount,
        TargetSid,
        SubjectUserSid
    )
    on $left.AccountSID == $right.TargetSid
| where SubjectUserSid != TargetSid
| project PasswordResetTime=TimeGenerated, Activity, ActorAccountName=SubjectAccount, TargetAccountUPN=AccountUPN,TargetAccountName=TargetAccount

The reverse can be true too, you could have users with Azure AD privilege, but no or reduced access to on premise Active Directory. When an Azure AD admin resets a password it is logged as a ‘Reset password (by admin)’ action in Azure Sentinel, we can retrieve the actor, the target and the outcome –

AuditLogs
| where OperationName == "Reset password (by admin)"
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend Target = tostring(TargetResources[0].userPrincipalName)
| project TimeGenerated, OperationName, Result, Actor, Target

An attacker could go further and use a service principal to leverage Microsoft Graph to initiate a password reset in Azure AD and have it written back to on-premise. This activity is shown in the AuditLogs table –

AuditLogs
| where OperationName == "POST UserAuthMethod.ResetPasswordOnPasswordMethods"
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| project TimeGenerated, OperationName, Actor, CorrelationId
| join kind=inner
    (AuditLogs
    | where OperationName == "Reset password (by admin)"
    | extend Target = tostring(TargetResources[0].userPrincipalName)
    | where Result == "success"
    )
    on CorrelationId
| project GraphPostTime=TimeGenerated, PasswordResetTime=TimeGenerated1, Actor, Target

Not only can on premise users have privileged in Azure AD, but on premise groups may hold privilege in Azure AD. When groups are synced from on premise to Azure AD, they don’t retain any of the security information from on premise. So you may have a group called ‘ad.security.appowners’, and that group can be managed by any number of people. If that group is then given any kind of privilege in Azure AD then the members of it inherit that privilege too. If you do have any groups in your environment that fit that pattern they will be unique to your environment, but you can detect changes to groups in Azure Sentinel –

SecurityEvent
| extend Actor = Account
| extend Target = MemberName
| extend Group = TargetAccount
| where EventID in (4728,4729,4732,4733,4756,4757) and Group == "DOMAIN\\ad.security.appowners"
| project TimeGenerated, Activity, Actor, Target, Group

If you have a list of groups you want to monitor, then it’s worth adding them into a watchlist and then querying against that, then you can keep the watchlist current and your query will continue to be up to date.

let watchlist = (_GetWatchlist('PrivilegedADGroups') | project TargetAccount);
SecurityEvent
 extend Target = MemberName
| extend Group = TargetAccount
| where EventID in (4728,4729,4732,4733,4756,4757) and TargetAccount in (watchlist)
| project TimeGenerated, Activity, Actor, Target, Group

If you have these shared identities and groups, what the groups are named will be very specific to you, but you should look to harden the security on premise, monitor them or preferably de-couple the link between AD and Azure AD entirely.

Defending Service Principal Abuse

In Azure AD, we can register applications, authenticate against them (using secrets or certificates) and they can provide further access into Azure AD or any other resources in your tenant – for each application created a corresponding service principal is created too. We can add either delegated or application access to app (such as mail.readwrite.all from the MS Graph) and we can assign roles (such as Global Administrator) to the service principal. Anyone who then authenticates to the app would have the attached privilege.

Specterops posted a great article here (definitely worth reading before continuing) highlighting the privilege escalation path through service principals. The article outlines some potential weak points spots in Azure AD –

  • Application admins being able to assign new secrets (passwords) to existing service principals.
  • High privilege roles being assigned to service principals.

And we will add some additional threats that you may see

  • Admins consenting to excessive permissions.
  • Redirect URI tampering.

From the article we learnt that the Application Administrator role has the ability to add credentials (secrets or certificates) to any existing application in Azure AD. If you have a service principal that has the Global Administrator role or privilege to the MS Graph, then an Application Administrator can generate a new secret for that app and effectively be a Global Administrator and obtain that privilege.

We can view secrets generated on an app in the AuditLogs table –

AuditLogs
| where OperationName contains "Update application – Certificates and secrets management"
| extend AppId = tostring(AdditionalDetails[1].value)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend AppDisplayName = tostring(TargetResources[0].displayName)
| project TimeGenerated, OperationName, AppDisplayName, AppId, Actor

We can also detect when permissions change in Azure AD applications, much like on premise service accounts, privilege has a tendency to creep upward over time. We can detect application permission additions with –

AuditLogs
| where OperationName == "Add app role assignment to service principal"
| extend AppPermissionsAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| extend AppId = tostring(TargetResources[1].id)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| project TimeGenerated, OperationName, AppId, AppPermissionsAdded,Actor, ActorIPAddress

And delegated permissions additions with –

AuditLogs
| where OperationName == "Add delegated permission grant"
| extend DelegatedPermissionsAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].newValue)))
| extend AppId = tostring(TargetResources[1].id)
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend ActorIPAddress = tostring(parse_json(tostring(InitiatedBy.user)).ipAddress)
| project TimeGenerated, OperationName, AppId, DelegatedPermissionsAdded,Actor, ActorIPAddress

Some permissions are of a high enough level that Azure AD requires a global administrator to consent to them, essentially by hitting an approve button. This is definitely an action you want to audit and investigate, once a global administrator hits the consent button, the privilege has been granted. You can investigate consent actions, including the permissions that have been granted –

AuditLogs
| where OperationName contains "Consent to application"
| extend AppDisplayName = tostring(TargetResources[0].displayName)
| extend Consent = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[4].newValue)))
| parse Consent with * "Scope:" PermissionsConsentedto ']' *
| extend UserWhoConsented = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend AdminConsent = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].newValue)))
| extend AppType = tostring(TargetResources[0].type)
| extend AppId = tostring(TargetResources[0].id)
| project TimeGenerated, AdminConsent, AppDisplayName, AppType, AppId, PermissionsConsentedto, UserWhoConsented

From the Specterops article, one of the red flags we mentioned was Azure AD roles being assigned to service principals, we often worry about excessive privilege for users, but forget about apps & service principals. We can detect a role being added to service principals –

AuditLogs
| where OperationName == "Add member to role"
| where TargetResources[0].type == "ServicePrincipal"
| extend ServicePrincipalObjectID = tostring(TargetResources[0].id)
| extend AppDisplayName = tostring(TargetResources[0].displayName) 
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| extend RoleAdded = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)))
| project TimeGenerated, Actor, RoleAdded, ServicePrincipalObjectID, AppDisplayName

For Azure AD applications you may also have configured a redirect URI, this is the location that Azure AD will redirect the user & token after authentication. So if you have an application that is used to sign people in you will be likely sending the user & token to an address like https://app.mycompany.com/auth. Applications in Azure AD can have multiple URI’s assigned, so if an attacker was to then add https://maliciouswebserver.com/auth as a target then the data would be posted there too. We can detect changes in redirect URI’s –


AuditLogs
| where OperationName contains "Update application"
| where Result == "success"
| extend UpdatedProperty = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].displayName)
| where UpdatedProperty == "AppAddress"
| extend NewRedirectURI = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[0].newValue))[0].Address)
| where isnotempty( NewRedirectURI)
| project TimeGenerated, OperationName, UpdatedProperty, NewRedirectURI

Remember that Azure AD service principals are identities too, so that we can use tooling like Azure AD Conditional Access to control where they can logon from. Have an application registered in Azure AD that provides authentication for an API that is only used from a particular location? You can enforce that with conditional access much like you would user sign-ins.

Service principal sign-ins are held in the AADServicePrincipalSignInLogs table in Azure Sentinel, the structure is similar to regular sign ins so you can look in trends in the data much like interactive sign-ins and start to detect anything out of the ordinary.

AADServicePrincipalSignInLogs
| where ResultType == "0"
| project TimeGenerated, AppId, ResourceDisplayName
| summarize SPSignIn=count()by bin(TimeGenerated, 15m), ResourceDisplayName
| render timechart 

Service principals can generate errors on logons too, an error 7000215 in the AADServicePrincipalSignInLogs table is an invalid secret, or the service principal equivalent of a wrong password.

AADServicePrincipalSignInLogs
| where ResultType == "7000215"
| summarize count()by AppId, ResourceDisplayName

Stop Defending and Start Preventing

While the focus on this blog was detection, which is a valuable tool, prevention is even better.

Prevention can be straight forward, or extremely complex and what you can achieve in your environment is unique to you, but there are definitely some recommendations worth following –

  • Limit access to Azure management portals and interfaces to those that need it via Azure AD Conditional Access. For those applications that you can’t apply policy to, alert for suspicious connections.
  • Provide access to Azure AD roles following least privilege principals – don’t hand out Global Administrator for tasks that User Administrator could cover.
  • Use Azure AD PIM if licensed for it and alert on users being assigned to roles outside of PIM.
  • Limit access to roles that can manage Azure AD Applications – if a team wants to manage their applications, they can be made owners on their specific apps, not across them all.
  • Alert on privileged changes to Azure AD apps – new secrets, new redirect URI’s, added permissions or admin consent.
  • Treat access to the Microsoft Graph and Azure AD as you would on premise AD. If an application or team request directory.readwrite.all or to be a Global Admin then push back and ask what actions are they trying to perform – there is likely a much lower level of privilege that would work.
  • Don’t allow long lived secrets on Azure AD apps, this is the equivalent of ‘password never expires’.
  • If you use hybrid identity be aware of users, groups or services that can leverage privilege in Azure AD to make changes in on premise AD, or vice versa.
  • Look for anomalous activity in service principal sign in data.

The queries in this post aren’t exhaustive by any means, get to know the AuditLogs table, it is filled with plenty of operations you may find interesting – authentication methods being updated for users, PIM role setting changes, BitLocker keys being read. Line up the actions you see in the table to what is risky to you and what you want to stop. For those events, can we prevent them through policy? If not, how do we detect and respond quick enough.

Reset your on premise passwords with Azure Sentinel + Azure AD Connect writeback

For those who have a large on premise Active Directory environment, one of the challenges you may face is how to use Azure Sentinel to reset the passwords for on premise Active Directory accounts. There are plenty of ways to achieve this – you may have an integrated service environment that allows Logic Apps or Azure Functions to connect directly to on premise resources, like a domain controller. You can also use an Azure Automation account with a hybrid worker. There is a lesser known option though, if you have already deployed Azure AD self-service password reset (SSPR) then we can piggyback off of the password writeback that is enabled when you deployed it. When a user performs a password reset using SSPR the password is first changed in Azure AD, then written back to on premise AD to keep them in sync. If you want to use Azure Sentinel to automate password resets for compromised accounts, then we can leverage that existing connection.

To do this we are going to build a small logic app that uses two Microsoft Graph endpoints, which are

Keep in mind that both these endpoints are currently in beta, so the usual disclaimers apply.

If we have a look at the documentation, you will notice that to retrieve the the id of the password we want to reset we can use delegated or application permissions.

However to reset a password, application permissions are not supported

So we will have to sign in as an actual user with sufficient privilege (take note of the roles required), and re-use that token for our automation. Not a huge deal, we will just use a different credential flow in our Logic App. Keep in mind this flow is only designed for programmatic access and shouldn’t be used interactively, because this will be run natively in Azure and won’t be end user facing it is still suitable.

So before you build your Logic App you will need an Azure AD app registration with delegated UserAuthenticationMethod.ReadWrite.All access and then an account (most likely a service account) with either the global admin, privileged authentication admin or authentication admin role assigned. You can store the credentials for these in an Azure Key Vault and use a managed identity on your Logic App to retrieve them securely.

Create a blank Logic App and use Azure Sentinel alert as the trigger, retrieve your account entities and then add your AAD User Id to a new variable, we will need it as we go.

Next we are going to retrieve the secrets for everything we will need to authenticate and authorize ourselves against. We will need to retrieve the ClientID, TenantID and Secret from our Azure AD app registration and our service account username and password.

There is a post here on how to retrieve the appropriate token using Logic Apps for delegated access, but to repeat it here, we will post to the Microsoft Graph.

Posting to the following URI with header Content-Type application/x-www-form-urlencoded –

https://login.microsoftonline.com/TenantID/oauth2/token 

With the following body –

grant_type=password&resource=https://graph.microsoft.com&client_id=ClientID&username=serviceaccount@domain.com&password=ServiceAccountPassword&client_secret=ClientSecret

In this example we will just pass the values straight from Azure Key Vault. Be sure to click the three dots and go to settings –

Then enable secure inputs, this will stop passwords being stored in the Logic App logs.

Now we just need to parse the response from Microsoft Graph so we can re-use our token, we will also then just build a string variable to format our token ready for use.

The schema for the token is

{
    "properties": {
        "access_token": {
            "type": "string"
        },
        "expires_in": {
            "type": "string"
        },
        "expires_on": {
            "type": "string"
        },
        "ext_expires_in": {
            "type": "string"
        },
        "not_before": {
            "type": "string"
        },
        "resource": {
            "type": "string"
        },
        "token_type": {
            "type": "string"
        }
    },
    "type": "object"
}

Then just create a string variable as above, appending Bearer before the token we parsed. Note there is a single space between Bearer and the token. Now we have our token, we can retrieve the id of the password of our user and then reset the password.

We will connect to the first API to retrieve the id of the password we want to change, using our bearer token as authorization, and passing in the variable of our AAD User Id who we want to reset.

Parse the response from our GET operation using the following schema –

{
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "value": {
            "items": {
                "properties": {
                    "createdDateTime": {},
                    "creationDateTime": {},
                    "id": {
                        "type": "string"
                    },
                    "password": {}
                },
                "required": [
                    "id",
                    "password",
                    "creationDateTime",
                    "createdDateTime"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "type": "object"
}

Then we need to automate what our new password will be, an easy way of doing this is to generate a guid to use as a password – it is random, complex and should pass most password policies. Also if an account is compromised, this is just an automatic password reset, the user will then need to contact your help desk, or whoever is in responsible, confirm who they are and be issued with a password to use.

Then finally we take our AAD User Id from the Sentinel alert, the password id that we retrieved, and our new password and post it back to reset the password.

We use our bearer token again for authorization, content-type is application/json and the body is our new password – make sure to enable secure inputs again. You should be able to then run this playbook against any Azure Sentinel incident where you map your AAD User Id (or UserPrincipalName) as the entity. It will take about 30-40 seconds to reset the password and you should see a successful response.

You could add some further logic to email your team, or your service desk or whoever makes sense in your case to let them know that Azure Sentinel has reset a users password.

Just a couple of quick notes, this Logic App ties back to self service password reset, so any password resets you attempt will need to conform to any configuration you have done in your environment, such as –

  • Password complexity, if you have a domain policy requiring a certain complexity of password and your Logic App doesn’t meet it, you will get a PasswordPolicyError returned in the statusDetail field, much like a user doing an interactive self service password reset would.
  • The account you use to sync users to Azure AD will need access to reset passwords on any accounts you want to via Azure Sentinel, much like SSPR itself.

Using Logic Apps to version control your Azure AD Conditional Access Policies

Azure AD Conditional Access is a fantastic tool, anyone using Azure AD as an identity provider is probably familiar with it, unfortunately like a lot of Azure AD there is no native version control though, so if you change or remove a policy there is no way to roll back. You can manage conditional access entirely through code but you may not be at that level of maturity and just want a backup available should policies be changed accidentally or maliciously.

In this example we will use Azure Sentinel to detect any new policies being created, policies being updated or policies being deleted and then store the changes in an Azure storage account. First you will need to either create a storage account or use an existing, and then create a container (folder) for your policies, I have just called mine ‘conditionalaccess’. Grab an access key for the account, we will need it in Logic Apps soon.

Your Logic App will need two sets of credentials, first an account (or Azure AD application) with enough access to read Conditional Access policies, and second the Azure Storage access key from above to write to your storage location.

Our first Logic App is just going to download all our policies from Microsoft Graph and put them into our storage account. Azure Sentinel will take over when it starts alerting on new policies, or changes and deletions, but we need to grab the current state. We could do this via PowerShell or a number of other ways, but instead we will just create a Logic App we run once – that way it will be a consistent format when we use Sentinel. Create a Logic App with recurrence as the trigger (set it to 3 months or something), hopefully this doesn’t take you that long! Our entire Logic App will look this this when done.

The first 6 steps of this are to connect to Microsoft Graph to retrieve a token for re-use. Check out this post for how to post to the Microsoft Graph to retrieve an access token. To give a little more detail to the last five steps, we first want to list the ids of our existing policies. We do that by performing a GET action on the conditionalAccess endpoint and selecting only the id.

Then the next few steps we just need to manipulate our data a little to get it into a format so we can iterate through an array and get the details of all our policies.

First we parse the JSON response from Microsoft Graph using the following schema.

{
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "value": {
            "items": {
                "properties": {
                    "id": {
                        "type": "string"
                    }
                },
                "required": [
                    "id"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "type": "object"
}

Then we create an array from our list of ids using the createArray function

{
    "inputs": "@createArray(body('Parse_JSON')?['value'])"
}

Then just parse the array so we can loop through it using the following schema.

{
    "items": {
        "items": {
            "properties": {
                "id": {
                    "type": "string"
                }
            },
            "required": [
                "id"
            ],
            "type": "object"
        },
        "type": "array"
    },
    "type": "array"
}

Then finally, we are going to grab each id we pulled from Microsoft Graph, get the details of the policy (using our access token as authorization) and create a blob in our storage. When you first use the Azure Blob Storage connector you will just need to use the access key from earlier.

Then trigger your Logic App and in your storage account you should see a list of containers with your policy ids and then a json file in each with the current config.

Now, we have done all that hard work we can disable or even delete that Logic App, because we will build a new one that will keep our storage up to date when changes occur to conditional access. So create a new Logic App, this time though our trigger will be an Azure Sentinel alert. We will retrieve our account entities from the alert which we are going to use further on – we will use the entity mapping to get the operation name and the id of the policy that was changed so we know what action to take. The next few steps are exactly the same as before, post to Microsoft Graph to get a token for re-use.

We will just rename our variables to ConditionalAccessOperation and ConditionalAccessID to make it clearer, rather than using the account mapping names. Then just create a variable called CurrentDateTime using the utcNow() function, we will use this to timestamp our changes.

Then the final part we will use a control condition called ‘Switch’ where we complete different actions depending on the operation type, so for a new policy created we will do one action, for an update a different action, and for a delete a different action.

So for adding an new conditional access policy we will retrieve it from Microsoft Graph, then add a new container based on the id and a new json file with the initial policy.

For an update, it is much the same except we will add a file with the updated policy

And finally for a delete action we will just add a text file with the time stamp when it was deleted, we can’t query Microsoft Graph anymore because the policy is gone.

Your Logic App should look like this (linked for big)

Lastly, we just need to get Azure Sentinel to detect on any of these changes, so create a new analytics rule, with the following query.

AuditLogs
| where OperationName contains "conditional access policy"
| extend ConditionalAccessID = tostring(TargetResources[0].id)
| sort by TimeGenerated desc 
| project OperationName, ConditionalAccessID

Set it to run every 5 minutes, looking at the last 5 minutes of data and trigger an alert for each event (in case multiple policies are changed). For entity mappings, map them to account name and AAD User ID so that our Logic App flows properly when it is triggered.

Finally on the automated response tab under ‘Alert automation’ select the Logic App you just built. Now if you perform any of those actions you should see a lifecycle of a policy in its respective folder.

Just a couple of points-

  • Sentinel can only run every 5 minutes to search for alerts, even though it can trigger multiple times within that window. If a policy is changed, then changed again in 5 minutes when the Logic App runs it will just grab the latest update/current version.
  • If a policy is changed then deleted within 5 minutes then the changes before deletion won’t be available, because when the Logic App goes to retrieve the changes from Microsoft Graph the policy would have already been removed.

These are a couple of small trade offs to have a very inexpensive Logic App and a tiny amount of storage to backup your hard work though.

Using time to your advantage in Azure Sentinel

Adversary hunting would be a lot easier if we were always looking for a single event that we knew was malicious, but unfortunately that isn’t always the case. Often when hunting for threats, a combination of events over a certain time period may be added cause for concern, or events happening at certain times of the day are more suspicious to you. Take for example a user setting up a mail forward in Outlook, that may not be inherently suspicious on its own but if it happened not too long after an abnormal sign on, then that would certainly increase the severity. Perhaps particular administrative actions outside of normal business hours would be an indicator of compromise.

Azure Sentinel and KQL have an array of really great operators to help you manipulate and tune your queries to leverage time as an added resource when hunting. We can use logic such as hunting for activities before and after a particular event, look for actions only after an event, or even calculate the time between particular events, and use that as a signal. Some of the operators worth getting familiar with are – ago, between and timespan.

I always try to remember this graphic when writing queries, Azure Sentinel/Log Analytics is highly optimized for log data, so the quicker you can filter down to the time period you care about, the faster your results will be. Searching all your data for a particular event, then filtering down to the last day will be significantly slower than filtering your data to the last day, then finding your particular event.

We often forget in Sentinel/KQL that we can apply where causes to time in the same way we would any other data, such as usernames, event ids, error codes or any other string data. You have probably written a thousand queries that start with something similar to this –

Sometable
| where TimeGenerated > ago (2h)

But you can filter your time data even before writing the rest of your queries, maybe you want to look at 7 days of data, but only between midnight and 4am. So first take 7 days of data, then slice out the 4 hours you care about.

Sometable
| where TimeGenerated > ago (7d)
| where hourofday( TimeGenerated ) between (0 .. 3)

If you want to exclude instead of include particular hours then you can use !between.

Perhaps you are interested in admin staff who have activated Azure AD PIM roles after hours, using KQL we can leverage the hourofday function to query only between particular hours. Remember that by default Sentinel will query on UTC time, so extend a column first to create a time zone that makes sense to you. The below query will find any PIM activations that aren’t between 5am and 8pm in UTC+5.

AuditLogs
| extend LocalTime=TimeGenerated+5h
| where hourofday( LocalTime) !between (5 .. 19)
| where OperationName == "Add member to role completed (PIM activation)"

If we take our example from the start of the post, we can detect when a user is flagged for a suspicious logon, in this case via Azure AD Identity Protection, and then within two hours created a mail forward in Office 365. This behaviour is often seen by attackers hoping to exfiltrate data or maintain a foothold in your environment.

SecurityAlert
| where ProviderName == "IPC"
| project AlertTime=TimeGenerated, CompromisedEntity
| join kind=inner 
(
OfficeActivity
| extend ForwardTime=TimeGenerated
) on $left.CompromisedEntity == $right.UserId
| where Operation == "Set-Mailbox"
| where Parameters contains "DeliverToMailboxAndForward"
| extend TimeDelta = abs(ForwardTime - TimeGenerated)
| where TimeDelta < 2h
| project AlertTime, ForwardTime, CompromisedEntity 

For this query we take the time the alert was generated, rename it to AlertTime, and the userprincipalname of the compromised entity, join it to our OfficeActivity table looking for mail forward creation events. Then finally we use the abs operator to calculate the time between the forward creation and the identity protection alert and only flag when it is less than 2 hours. There are many ways to create forwards in Outlook (such as via mailbox rules), this is just showing one particular method, but the example is more to drive the use of time as a detection method than being all encompassing.

We can also use a particular event as a starting point, then retrieve data from either side of that event. Say a user triggers an ‘unfamiliar sign-in properties’ event. We can use the time of that alert as an anchor point, and retrieve the 60 minutes of sign in data either side of the alert to give us some really great context. We do this by using a combination of the between and timespan operators

SecurityAlert
| where AlertName == "Unfamiliar sign-in properties"
| project AlertTime=TimeGenerated, UserPrincipalName=CompromisedEntity
| join kind=inner 
(
SigninLogs
) on UserPrincipalName
| where TimeGenerated between ((AlertTime-timespan(60m)).. (AlertTime+timespan(60m)))
| project UserPrincipalName, SigninTime=TimeGenerated, AlertTime, AppDisplayName, ResultType, UserAgent, IPAddress, Location

We can see both the events prior to and those after the alert time.

You can use these time operators with much more detailed hunting too, if you use the anomaly detection operators, you can tune your detections to only parts of the day. Taking the example from that post, maybe we are interested in particular failed sign in activities, but only in non regular working hours.

let starttime = 7d;
let timeframe = 1h;
let resultcodes = dynamic(["50126","53003","50105"]);
let outlierusers=
SigninLogs
| where TimeGenerated > ago(starttime)
| where hourofday( TimeGenerated) !between (6 .. 18)
| where ResultType in (resultcodes)
| project TimeGenerated, UserPrincipalName, ResultType, AppDisplayName, Location
| order by TimeGenerated
| summarize Events=count()by UserPrincipalName, bin(TimeGenerated, timeframe)
| summarize EventCount=make_list(Events),TimeGenerated=make_list(TimeGenerated) by UserPrincipalName
| extend outliers=series_decompose_anomalies(EventCount)
| mv-expand TimeGenerated, EventCount, outliers
| where outliers == 1
| distinct UserPrincipalName;
SigninLogs
| where TimeGenerated > ago(starttime)
| where UserPrincipalName in (outlierusers)
| where ResultType != 0
| summarize LogonCount=count() by UserPrincipalName, bin(TimeGenerated, timeframe)
| render timechart 

I have some more examples of similar alerts in my GitHub repo, such as Azure Key Vault access manipulation.

Protecting Azure Key Vault with Azure Sentinel

Azure Key Vault is Microsoft’s cloud vault which you can use to store secrets and passwords, API keys or certificates. If you do any kind of automation with Azure Functions, or Logic Apps or any scripting more broadly in Azure then there is a good chance you use a Key Vault, its authentication and role based access is tied directly into Azure Active Directory. When we talk about Azure Key Vault security, we can group it into three categories –

  • Network Security – this is pretty straight forward, which networks can access your Key Vault.
  • Management Plane Security – the management plane is where you manage the Key Vault itself, so changing settings, or generating secrets, or updating access policies. Management plane security is controlled by Azure RBAC.
  • Data Plane Security – data plane security is the security of the data within the Key Vault, so accessing, editing or deleting secrets, keys and certificates.

Firstly, make sure you are sending diagnostic logs to Azure Sentinel which you can do on the ‘Diagnostics setting’ tab on a Key Vault, or more uniformly across all your Key Vaults through Azure Policy or Azure Security Center. Events get sent to the AzureDiagnostics table in Azure Sentinel. This table can be tricky to make your way around – because so many various Azure services send logs to it, each with varying data structures, you will notice a lot of columns will only exist for specific actions.

Let’s first look at network security, Key Vault networking isn’t too difficult to get a handle on thankfully. A Key Vault can be accessed either from anywhere on the internet or from a list of specifically allowed IP addresses and/or private endpoints. If your security stance is that Key Vaults are only to be accessed over an allowed list of IP addresses or private endpoints then you can detect when the policy is changed to allow all by default.

// Detects when an Azure Key Vault firewall is set to allow all by default
AzureDiagnostics
| where ResourceType == "VAULTS"
| where OperationName == "VaultPatch"
| where ResultType == "Success"
| project-rename ExistingACL=properties_networkAcls_defaultAction_s, VaultName=Resource
| where isnotempty(ExistingACL)
| where ExistingACL == "Deny"
| sort by TimeGenerated desc  
| project
    TimeGenerated,
    SubscriptionId,
    VaultName,
    ExistingACL
| join kind=inner
(
AzureDiagnostics
| project-rename NewACL=properties_networkAcls_defaultAction_s, VaultName=Resource
| where ResourceType == "VAULTS"
| where OperationName == "VaultPatch"
| where ResultType == "Success"
| summarize arg_max(TimeGenerated, *) by VaultName, NewACL
) 
on VaultName
| where ExistingACL != NewACL and NewACL == "Allow"
| project DetectionTime=TimeGenerated1, VaultName, ExistingACL, NewACL, SubscriptionId, IPAddressofActor=CallerIPAddress, Actor=identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s

We can see the ACL on the Key Vault firewall has flipped from Deny to Allow. Just a note about the AzureDiagnostics table, if the current ACL is set to ‘Deny’ and you complete other actions (maybe adding a secret, or changing some other settings) on the Key Vault, then that field will keep showing as ‘Deny’ on every action and every log, it doesn’t appear only when making changes to the firewall. So when we join the table in our query we look for when the ACL column has changed and the most recent record (using arg_max) is ‘Allow’.

If you have an approved group of IP ranges you allow, such as your corporate locations, you can also detect for ranges added over and above that in a similar way. This could be an adversary trying to maintain access to a Key Vault they have accessed, or a staff member circumventing policy.

// Detects when an IP address has been added to an Azure Key Vault firewall allow list
AzureDiagnostics
| where ResourceType == "VAULTS"
| where OperationName == "VaultPatch"
| where ResultType == "Success"
| where isnotempty(addedIpRule_Value_s)
| project
    TimeGenerated,
    VaultName=Resource,
    SubscriptionId,
    IPAddressofActor=CallerIPAddress,
    Actor=identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s,
    IPRangeAdded=addedIpRule_Value_s

With this detection, we aren’t changing a global firewall rule i.e. from Deny to Allow, but instead adding new ranges to an existing allow list, so we can return the new IP range that was added in our query.

For management plane security; general access to your Key Vault is going to controlled more broadly by Azure RBAC, so anyone with sufficient privilege in management groups, subscriptions, resource groups or on the Key Vault itself will be able to read or change settings – how that is controlled will be completely unique to your environment. A valuable detection in Sentinel is finding any changes to Azure Key Vault access policies however. An access policy defines what operations service principals (users, app registrations or groups) can perform on secrets, keys or certificates stored in your Key Vault. For instance you may have one set of users who can read and list secrets, but not update them, while others have additional access. The following query finds additions to those access policies.

// Detects when a service principal (user, group or app) has been granted access to Key Vault data
AzureDiagnostics
| where ResourceType == "VAULTS"
| where OperationName == "VaultPatch"
| where ResultType == "Success"
| project-rename ServicePrincipalAdded=addedAccessPolicy_ObjectId_g, Actor=identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_name_s, AddedKeyPolicy = addedAccessPolicy_Permissions_keys_s, AddedSecretPolicy = addedAccessPolicy_Permissions_secrets_s,AddedCertPolicy = addedAccessPolicy_Permissions_certificates_s
| where isnotempty(AddedKeyPolicy)
    or isnotempty(AddedSecretPolicy)
    or isnotempty(AddedCertPolicy)
| project
    TimeGenerated,
    KeyVaultName=Resource,
    ServicePrincipalAdded,
    Actor,
    IPAddressofActor=CallerIPAddress,
    AddedSecretPolicy,
    AddedKeyPolicy,
    AddedCertPolicy

We can also use some more advanced hunting techniques and detect when access was added then removed within a brief period, this may be a sign of an adversary accessing a Key Vault, retrieving the information and then covering their tracks. This is example shows when access was added then removed from a Key Vault within 10 minutes and returns the access changes.

 AzureDiagnostics
| where ResourceType == "VAULTS"
| where OperationName == "VaultPatch"
| where ResultType == "Success"
| extend UserObjectAdded = addedAccessPolicy_ObjectId_g
| extend AddedActor = identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s
| extend KeyAccessAdded = tostring(addedAccessPolicy_Permissions_keys_s)
| extend SecretAccessAdded = tostring(addedAccessPolicy_Permissions_secrets_s)
| extend CertAccessAdded = tostring(addedAccessPolicy_Permissions_certificates_s)
| where isnotempty(UserObjectAdded)
| project
    AccessAddedTime=TimeGenerated,
    ResourceType,
    OperationName,
    ResultType,
    KeyVaultName=Resource,
    AddedActor,
    UserObjectAdded,
    KeyAccessAdded,
    SecretAccessAdded,
    CertAccessAdded
| join kind=inner 
    ( 
    AzureDiagnostics
    | where ResourceType == "VAULTS"
    | where OperationName == "VaultPatch"
    | where ResultType == "Success"
    | extend RemovedActor = identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s
    | extend UserObjectRemoved = removedAccessPolicy_ObjectId_g
    | extend KeyAccessRemoved = tostring(removedAccessPolicy_Permissions_keys_s)
    | extend SecretAccessRemoved = tostring(removedAccessPolicy_Permissions_secrets_s)
    | extend CertAccessRemoved = tostring(removedAccessPolicy_Permissions_certificates_s)
    | where isnotempty(UserObjectRemoved)
    | project
        AccessRemovedTime=TimeGenerated,
        ResourceType,
        OperationName,
        ResultType,
        KeyVaultName=Resource,
        RemovedActor,
        UserObjectRemoved,
        KeyAccessRemoved,
        SecretAccessRemoved,
        CertAccessRemoved
    )
    on KeyVaultName
| extend TimeDelta = abs(AccessAddedTime - AccessRemovedTime)
| where TimeDelta < 10m
| project
    KeyVaultName,
    AccessAddedTime,
    AddedActor,
    UserObjectAdded,
    KeyAccessAdded,
    SecretAccessAdded,
    CertAccessAdded,
    AccessRemovedTime,
    RemovedActor,
    UserObjectRemoved,
    KeyAccessRemoved,
    SecretAccessRemoved,
    CertAccessRemoved,
    TimeDelta

So we have covered network access and management plane access and now we can have a look at possible threats in data plane actions. Each time an action occurs against a key, secret or certificate it is logged to the same AzureDiagnostics table. The most common action will be a retrieval of a current item, but deletions or purges or updates are all logged as well. Over time we can build up a baseline of what is normal access for a Key Vault looks like and then alert for actions outside of that. The below query looks back over 30 days, then compares that to the last day and detects for any new users accessing a Key Vault. Then it also retrieves all the actions taken by that user in the last day.

//Searches for access by users who have not previously accessed an Azure Key Vault in the last 30 days and returns all actions by those users
let operationlist = dynamic(["SecretGet", "KeyGet", "VaultGet"]);
let starttime = 30d;
let endtime = 1d;
let detection=
    AzureDiagnostics
    | where TimeGenerated between (ago(starttime) .. ago(endtime))
    | where ResourceType == "VAULTS"
    | where ResultType == "Success"
    | where OperationName in (operationlist)
    | where isnotempty(identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s)
    | project-rename KeyVaultName=Resource, UserPrincipalName=identity_claim_appid_g
    | distinct KeyVaultName, UserPrincipalName
    | join kind=rightanti  (
        AzureDiagnostics
        | where TimeGenerated > ago(endtime)
        | where ResourceType == "VAULTS"
        | where ResultType == "Success"
        | where OperationName in (operationlist)
        | where isnotempty(identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s)
        | project-rename
            KeyVaultName=Resource,
            UserPrincipalName=identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s
        | distinct KeyVaultName, UserPrincipalName)
        on KeyVaultName, UserPrincipalName;
AzureDiagnostics
| where TimeGenerated > ago(endtime)
| where ResourceType == "VAULTS"
| where ResultType == "Success"
| project-rename
    KeyVaultName=Resource,
    UserPrincipalName=identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s
| join kind=inner detection on KeyVaultName, UserPrincipalName
| project
    TimeGenerated,
    UserPrincipalName,
    ResourceGroup,
    SubscriptionId,
    KeyVaultName,
    KeyVaultTarget=id_s,
    OperationName

We can also do the same for applications instead of users.

//Searches for access by applications that have not previously accessed an Azure Key Vault in the last 30 days and returns all actions by those applications
let operationlist = dynamic(["SecretGet", "KeyGet", "VaultGet"]);
let starttime = 30d;
let endtime = 1d;
let detection=
    AzureDiagnostics
    | where TimeGenerated between (ago(starttime) .. ago(endtime))
    | where ResourceType == "VAULTS"
    | where ResultType == "Success"
    | where OperationName in (operationlist)
    | where isnotempty(identity_claim_appid_g)
    | project-rename KeyVaultName=Resource, AppId=identity_claim_appid_g
    | distinct KeyVaultName, AppId
    | join kind=rightanti  (
        AzureDiagnostics
        | where TimeGenerated > ago(endtime)
        | where ResourceType == "VAULTS"
        | where ResultType == "Success"
        | where OperationName in (operationlist)
        | where isnotempty(identity_claim_appid_g)
        | project-rename
            KeyVaultName=Resource,
            AppId=identity_claim_appid_g
        | distinct KeyVaultName, AppId)
        on KeyVaultName, AppId;
AzureDiagnostics
| where TimeGenerated > ago(endtime)
| where ResourceType == "VAULTS"
| where ResultType == "Success"
| project-rename
    KeyVaultName=Resource,
    AppId=identity_claim_appid_g
| join kind=inner detection on KeyVaultName, AppId
| project
    TimeGenerated,
    AppId,
    ResourceGroup,
    SubscriptionId,
    KeyVaultName,
    KeyVaultTarget=id_s,
    OperationName

Then finally we can also detect on operations that may be considered malicious or destructive, such as deletions, backups or purges. I have added some example operations, but there is a great list here that may have actions that are more specific to your Key Vaults.

// Detects Key Vault operations that could be malicious
let operationlist = dynamic(
    ["VaultDelete", "KeyDelete", "SecretDelete", "SecretPurge", "KeyPurge", "SecretBackup", "KeyBackup", "SecretListDeleted", "CertificateDelete", "CertificatePurge"]);
AzureDiagnostics
| where ResourceType == "VAULTS" and ResultType == "Success" 
| where OperationName in (operationlist)
| project TimeGenerated,
    ResourceGroup,
    SubscriptionId,
    KeyVaultName=Resource,
    KeyVaultTarget=id_s,
    Actor=identity_claim_upn_s,
    IPAddressofActor=CallerIPAddress,
    OperationName

There are some more queries located on the Sentinel GitHub page and the queries from this post can be found here.

Azure Sentinel and Azure AD Conditional Access = Cloud Fail2Ban

Fail2ban is a really simple but effective tool that has been around forever, it basically listens for incoming connections and then updates a firewall based on that, i.e. too many failed attempts then the IP is added to a ban list, rejecting new connections from it. If you are an Azure AD customer then Microsoft take care of some of this for you, they will ban the egregious attempts they are seeing globally. But we can use Azure Sentinel, Logic Apps and Azure AD Conditional Access to build our own cloud fail2ban which can achieve the same, but for threats unique to your tenant.

On the Azure Sentinel GitHub there is a really great query written for us here that we will leverage as the basis for our automation. The description says it all but essentially it will hunt the last 3 days of Azure AD sign in logs, look for more than 5 failures in a 20 minute period. It also excludes IP addresses in the same 20 minutes that have had more successful sign ins than failures just to account for your trusted locations – lots of users means lots of failed sign ins. You may want to adjust the timeframes to suit what works for you, but the premise remains the same.

There are a few moving parts to this automation but basically we want an Azure Sentinel Incident rule to run every so often, detect password sprays, any hit then invokes a Logic App that will update an Azure AD named location with the malicious IP addresses. That named location will be linked back to an Azure AD Conditional Access policy that denies logons.

Let’s build our Azure AD named location and CA policy first. In Azure AD Conditional Access > Named locations select ‘+ IP ranges location’, name it whatever you would like but something descriptive is best. You can’t have an empty named location so just put a placeholder IP address in there.

Next we create our Azure AD Conditional Access policy. Name is again up to you. Include all users and then it is always best practice to exclude some breakglass accounts, you don’t want to accidentally lock yourself out of your tenant or apps. We are going to also select All Cloud Apps because we want to block all access from these malicious IP addresses.

For conditions we want to configure Locations, including our named location we just created, and excluding our trusted locations (again, we don’t want to lock ourselves or our users out from known good locations). Finally our access control is ‘block access’. You can start the policy in report mode if you want to ensure your alerting and data is accurate.

Let’s grab the id of the named location we just created, since we will need it later on. Use the Graph Explorer to check for all named locations and grab the id of the one you just created.

Now we are ready to update the named location with our malicious IP. The guidance for the update action on the namedLocation endpoint is here. Which has an example of the payload we need to use and we will use Logic Apps to build it for us –

{
    "@odata.type": "#microsoft.graph.ipNamedLocation",
    "displayName": "Untrusted named location with only IPv4 address",
    "isTrusted": false,
    "ipRanges": [
        {
            "@odata.type": "#microsoft.graph.iPv4CidrRange",
            "cidrAddress": "6.5.4.3/18"
        }

    ]
}

Now we configure our Logic App, create a blank Logic App and for trigger choose either incident or alert creation, depending on whether you use the Azure Sentinel incident pane or not. After getting your IPs from the entities, create a couple of variables we will use later, and take the IP entity from the incident and append it to your NewMaliciousIP variable. We know that is the new bad IP we will want to block later.

There is no native Logic App connector for Azure AD Conditional Access, so we will just leverage Microsoft Graph to do what we need. This is a pattern that I have covered a few times, but it is one that I re-use often. We assign our Logic App a system assigned identity, use that identity to access an Azure Key Vault to retrieve a clientid, tenantid and secret for an Azure AD app registration. We then post to MS Graph and grab an access token and re-use that token as authorization to make the changes we want, in this case update a named location.

The URI value is your tenantid, then for body the client_id is your clientid and client_secret your secret. Make sure your app has enough privilege to update named locations, which is Policy.Read.All and Policy.ReadWrite.ConditionalAccess or an equivalent Azure AD role. Parse your token with the following schema and now you have a token ready to use.

{
    "properties": {
        "access_token": {
            "type": "string"
        },
        "expires_in": {
            "type": "string"
        },
        "expires_on": {
            "type": "string"
        },
        "ext_expires_in": {
            "type": "string"
        },
        "not_before": {
            "type": "string"
        },
        "resource": {
            "type": "string"
        },
        "token_type": {
            "type": "string"
        }
    },
    "type": "object"
}

To make this work, we need our Logic App to get the current list of bad IP addresses from our named location, add our new IP in and then patch it back to Microsoft Graph. If you just do a patch action with only the latest IP then all the existing ones will be removed. Over time this list could get quite large so we don’t want to lose our hard work.

We parse the JSON reponse using the following schema.

{
    "type": "object",
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "@@odata.type": {
            "type": "string"
        },
        "id": {
            "type": "string"
        },
        "displayName": {
            "type": "string"
        },
        "modifiedDateTime": {
            "type": "string"
        },
        "createdDateTime": {
            "type": "string"
        },
        "isTrusted": {
            "type": "boolean"
        },
        "ipRanges": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "@@odata.type": {
                        "type": "string"
                    },
                    "cidrAddress": {
                        "type": "string"
                    }
                },
                "required": [
                    "@@odata.type",
                    "cidrAddress"
                ]
            }
        }
    }
}

Next we are going to grab each existing IP and append it to a string (we may have multiple IP addresses already in there so use a for-each loop to get them all), then build a final string adding our new malicious IP. You can see the format required in the Microsoft Graph documentation.

We parse string to JSON one last time because when we patch back to the Microsoft Graph it is expecting a JSON payload using the following schema.

{
    "items": {
        "properties": {
            "@@odata.type": {
                "type": "string"
            },
            "cidrAddress": {
                "type": "string"
            }
        },
        "required": [
            "@@odata.type",
            "cidrAddress"
        ],
        "type": "object"
    },
    "type": "array"
}

Then we are going to use a HTTP patch action to update our list, adding our access token to authorize ourselves and completing the format expected. In Logic Apps you need to escape @ symbols with another @. You will need to add in the id of your namedLocation which will be unique to you. The body is now in the exact format Graph expects.

The last part is to create our analytics rule and map our entities. For this example we will use the password spray query mentioned above, but really you could do any query that you generate a malicious IP from – Azure Security Centre alerts, IP addresses infected with malware etc. Just map your IP Address entity over so that our Logic App can collect it when it fires. Make sure you trigger an alert for each event as your analytics rule may return multiple hits and you want to block them all. Also be sure to run your analytics rule on a schedule that makes sense with the query you are running. If you are looking back on a days worth of data and generating alerts based off that, then you probably only want to run your analytics rule daily too. If you query 24 hours of data and run it every 20 minutes you will fire multiple alerts on the same bad IP addresses.

Then under your automated response options run the Logic App we just created. In your Logic App you could add another step at the end to let you know that Azure Sentinel has already banned the IP address for you. Now next time your Azure Sentinel analytics rule generates a hit on your query, the IP address will be blocked automatically.

Azure Sentinel and the story of a very persistent attacker

Like many of you, over the last 18 months we have seen a huge shift in how our staff are working, people are at home, people working remotely permanently, or being unable to get into their regular office. That has meant a shift in your detections, previously you had people lighting up internal firewalls, or you saw events on your internal Active Directory, now you are also interested in cloud service access, suspicious MFA events or VPN activity.

With this change we unsurprisingly noticed a dramatic uptick in identity related alerts – users connecting from new counties, or via anonymous IP addresses, unfamiliar properties and impossible travel events. When we get these kind of events (even for failed attempts), we proactively log our users our of Azure AD to make them re-authenticate + MFA, it isn’t perfect but it’s an easy automation that doesn’t annoy anyone too much and buys some time for a cyber security team member to check out the detail. For each of these events we also populate the Azure Sentinel incident with the last 10 sign-ins for the user affected (excluding any from a trusted location) for someone to investigate.

SigninLogs
| where UserPrincipalName == "attackeduser@yourdomain.com"
| where IPAddress !startswith "10.10.10"
| project TimeGenerated, AppDisplayName, ResultType, ResultDescription, IPAddress, Location, UserAgent
| order by TimeGenerated desc 

Around January of this year we noticed a number of users showing really strange behaviour, they would have one or two wrong password attempts (ResultType 50126) flagged on their account from a location the user had never logged in from, and no other risk detections, just one or two attempts and that was it. After we noticed a half dozen the same, we decided to look a bit closer and noticed the same user agent being used for all the attempts, so we dug into the data. We looked for all sign in data from that agent, and bought back the user, the result, what application was being accessed, the IP and the location.

SigninLogs
| where UserAgent contains "Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"
| project UserPrincipalName, ResultType, AppDisplayName, IPAddress, Location

The data looked a bit like this – lots of attempts on different users, very rarely the same IP address or location twice in a row, locations not expected for our business, maybe two attempts at a user at most and then move on, and thankfully none successful, only wrong passwords (50126) and account locks (50053). We also noticed a second UserAgent with the same behaviour so we added that to the query and found more hits in much the same pattern.

SigninLogs
| where UserAgent contains "Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148" or UserAgent contains "Outlook-iOS/723.4027091.prod.iphone (4.28.0)"
| project TimeGenerated, UserPrincipalName, ResultType, AppDisplayName, IPAddress, Location

Over a 6 month period, it was pretty low noise, peaking at 34 attempts in a single day, but usually less than 10.

We also double checked and there were no legitimate sign in activities from these UserAgents, only suspect ones. Some users were being targeted by one UserAgent, some the other and some by both, to detect those being targeted by both, you can use a simple join in KQL.

let agent1=
SigninLogs
| where UserAgent contains "Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"
| distinct UserPrincipalName;
let agent2=
SigninLogs
| where UserAgent contains "Outlook-iOS/723.4027091.prod.iphone (4.28.0)"
| distinct UserPrincipalName;
agent1
| join kind=inner agent2 on UserPrincipalName
| distinct UserPrincipalName

From January until now we have had around 1500 attempts from these UserAgents, targeting around 300 staff, with about 50 being targeted by both suspicious UserAgents.

Now that data is interesting for us cyber security people, but at the end of the day any Azure AD tenant is going to get some people knocking on the door and there isn’t much you can do about it, these IP addresses change so frequently that blocking them isn’t especially practical. Of the 1500 attempts we have seen about 660 different IP addresses. What we did do is configure an Azure Sentinel analytics rule to tell us if we got a successful sign in from one of these agents. The rule is straight forward, look for the UserAgent and any successful attempts.

let successCodes = dynamic([0, 50055, 50057, 50155, 50105, 50133, 50005, 50076, 50079, 50173, 50158, 50072, 50074, 53003, 53000, 53001, 50129]);
SigninLogs
| where UserAgent contains "Outlook-iOS/723.4027091.prod.iphone (4.28.0)" or UserAgent contains "Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"
| where ResultType in (successCodes)
| project UserPrincipalName

Importantly when we talk about success in Azure AD, we aren’t just interested in ResultType = 0. When we think about the flow of an Azure AD sign in, we can successfully sign in and then be blocked or stopped elsewhere. For instance 53003 means the sign on was stopped by conditional access. However, conditional access policies are applied after credentials have been validated, so in the case of an attacker, if they are blocked by conditional access it still means they have your users correct credentials. 50158 is another good example, which means an external security challenge failed (such as Ping Identity, Duo or Okta MFA), but the same logic applies – username and password are validated, then the user is directed to the third party security challenge. So for an attacker to get to that point, again they have the correct username and password. The KQL above has a list of everything that could be deemed a ‘success’.

We left this query running for around 8 months with no action, occasionally checking ourselves if the users were still be targeted, and they were. Finally last week we got a hit, a user had been phished (no one is perfect!) and the attackers signed into the account, thankfully they were stopped by a conditional access policy blocking sign ins from the particular country they tried on that attempt. We contacted the user, reset their credentials, sent them some phishing training and away they went.

In the scheme of Azure AD globally, 1500 attempts to a single tenant over the course of 8 months is not even a rounding error, Microsoft is evaluating millions of sign ins an hour and this traffic isn’t likely to flag anything special at their end. It was suspicious to our business though, and that is where your knowledge of your environment combined with the tools on offer is where you add real value.

If you are interested more generally in how often you are seeing new UserAgents for your users you can use the below query, we create a set of known UserAgent for each user over a learning period (14 days), then join against the last day (and exclude known corporate IP’s)

let successCodes = dynamic([0, 50055, 50057, 50155, 50105, 50133, 50005, 50076, 50079, 50173, 50158, 50072, 50074, 53003, 53000, 53001, 50129]);
let isGUID = "[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12}";
let lookbacktime = 14d;
let detectiontime = 1d;
let UserAgentHistory =
SigninLogs
    | project TimeGenerated, UserPrincipalName, UserAgent, ResultType, IPAddress
    | where TimeGenerated between(ago(lookbacktime)..ago(detectiontime))
    | where ResultType in (successCodes)
    | where not (UserPrincipalName matches regex isGUID)
    | where isnotempty(UserAgent)
    | summarize UserAgentHistory = count() by UserAgent, UserPrincipalName;
SigninLogs
    | where TimeGenerated > ago(detectiontime)
    | where ResultType in (successCodes)
    | where IPAddress !startswith "10.10.10"
    | where not (UserPrincipalName matches regex isGUID)
    | where isnotempty(UserAgent)
    | join kind=leftanti UserAgentHistory on UserAgent, UserPrincipalName
    | distinct UserPrincipalName, AppDisplayName, ResultType, UserAgent

UserAgents can update quite often, mobile devices getting small updates, browsers being patched, but like everything, getting to know what is normal in your environment and detecting outside of that is half the battle won.