YARBACS or Yet Another RBAC Solution

RBAC is all the rage (as it should be) these days, but it’s not really new.  The concepts have been in use for many years, it’s just a case of bringing it into modern cloud-based scenarios.  Azure continues to invest in this concept so that we can increasingly lock down the control and use of various cloud-based services they offer.  There’s other “interesting” aspects to these scenarios though that can impact the availability and usefulness of it – things such as existing systems and identities, cross tenant applications, etc.

It’s these interesting aspects that bring me to this post today.  At Office365Mon we are ALL about integration with Azure Active Directory and all of the goodness it provides.  We also have had from day 1 the concept of allowing users from different tenants to be able to work on a set of resources using a common security model.  That really means that when you create Office365Mon subscriptions, you can assign subscription administrators to anyone with an Azure AD account.  It doesn’t require Azure B2C, or Azure B2B, or any kind of trust relationship between organizations.  It all “just works”, which is exactly what you want.  IMPORTANT DISCLAIMER:  the RBAC space frequently changes.  You should check in often to see what’s provided by the service and decide for yourself what works best for your scenario.

“Just works” in this case started out being really just a binary type operation – depending on who you are, you either had access or you didn’t.  There was no limited access based on one or more roles that you had.  As we got into more complex application scenarios it became increasingly important to develop some type of RBAC support capabilities, but there really wasn’t an out of the box solution for us.  That led us to develop this fairly simple framework of YARBACS as I call it, or “Yet Another RBAC Solution”.  It’s not a particularly complicated approach (I don’t think) so I thought I’d share the high level details here in case it may help those of you who are living with the same sort of constraints that we have.  We have a large existing user base, everyone is effectively a cloud user to us, we don’t have any type of trust relationship with other Azure AD tenants, but we need to be able to support a single resource (Office365Mon subscription) to be managed by users from any number of tenants and with varying sets of permissions.

With that in mind, these are the basic building blocks that we use for our YARBACS:

  1. Enable support for Roles in our Azure AD secured ASP.NET application
  2. Create application roles in our AD tenant that will be used for access control
  3. Store user and role relationships so that it can be used in our application
  4. Add role attributes to views, etc.
  5. Use a custom MVC FilterAction to properly populate users from different tenants with application roles from our Azure AD tenant

Let’s walk through each of these in a little more detail.


Enable Support for Roles

Enabling support for roles in your Azure AD secured application is something that has been explained very nicely by Dushyant Gill, who works as a PM on the Azure team.  You can find his blog explaining this process in more detail here:  http://www.dushyantgill.com/blog/2014/12/10/roles-based-access-control-in-cloud-applications-using-azure-ad/.   Rather than trying to plagiarize or restate his post here, just go read his.  J   The net effect is that you will end up with code in your Startup.Auth.cs class in your ASP.NET project that looks something like this:


new OpenIdConnectAuthenticationOptions


ClientId = clientId,

Authority = Authority,

TokenValidationParameters = new System.IdentityModel.Tokens.TokenValidationParameters


//for role support

RoleClaimType = “roles”


… //other unrelated stuff here

Okay, step 1 accomplished – your application supports the standard use of ASP.NET roles now – permission demands, IsInRole, etc.


Create Application Roles in Azure AD

This next step is also covered in Dushyant’s blog post mentioned above.  I’ll quickly recap here, but for complete steps and examples see Dushyant’s blog post.  Briefly, you’ll go to the application for which you want to use YARBACS and create applications roles.  Download the app manifest locally and add roles for the application – as many as you want.  Upload the manifest back to Azure AD, and now you’re ready to start using them.  For your internal users, you can navigate to the app in the Azure management portal and click on the Users tab.  You’ll see all of the users that have used your app there.  You can select whichever user(s) you want to assign to an app role and then click on the Assign button at the bottom.  The trick of course is now adding users to these groups (i.e. roles) when the user is not part of your Azure AD tenant.


Store User and Role Relationships for the Application

This is another step that requires no rocket science.  I’m not going to cover any implementation details here because it’s likely going to be different for every company in terms of how and what they implement it.  The goal here is simple though – you need something – like a SQL database – to store the user identifier and the role(s) that user is associated with.

In terms of identifying your user in code so you can look up what roles are associated with it, I use the UPN.  When you’ve authenticated to Azure AD you’ll find the UPN is in the ClaimsPrincipal.Current.Identity.Name property.  To be fair, it’s possible for a user’s UPN to change, so if you find that to be a scenario that concerns you then you should use something else.  As an alternative, I typically have code in my Startup.Auth.cs class that creates an AuthenticationResult as part of registering the application in the user’s Azure AD tenant after consent has been given.  You can always go back and look at doing something with the AuthenticationResult’s UserInfo.UniqueId property, which is basically just a GUID that identifies the user in Azure AD and never changes, even if the UPN does.

Now that you have the information stored, when we build our MVC ActionFilter we’ll pull this data out to plug into application roles.


Add Role Attributes to Views, Etc.

This is where the power of the ASP.NET permissions model and roles really comes into play.  In step 1 after you follow the steps in Dushyant’s blog, you are basically telling ASP.NET that anything in the claim type “roles” should be considered a role claim.  As such, you can start using it the way you would any other kind of ASP.NET role checking.  Here are a couple of examples:

  • As a PrincipalPermission demand – you can add a permission demand to a view in your controller (like an ActionResult) with a simple attribute, like this:  [PrincipalPermission(SecurityAction.Demand, Role = “MyAppAdmin”)].  So if someone tries to access a view and they have not been added to the MyAppAdmin role (i.e. if they don’t have a “roles” claim with that value), then they will get denied access to that view.
  • Using IsInRole – you can use the standard ASP.NET method to determine if a user is in a particular role and then based on that make options available, hide UI, redirect the user, etc. Here’s an example of that:  if (System.Security.Claims.ClaimsPrincipal.Current.IsInRole(“MyAppAdmin “))…  In this case if the user has the “roles” claim with a value of MyAppAdmin then I can do whatever – enabling a feature, etc.

Those are just a couple of simple and the most common examples, but as I said, anything you can do with ASP.NET roles, you can now do with this YARBACS.


Use a Custom FilterAction to Populate Roles for Users

This is really where the magic happens – we look at an individual and determine what roles we want to assign to it.   To start with, we create a new class and have it inherit from ActionFilterAttribute.  Then we override the OnActionExecuting event and plug in our code to look for the roles for the user and assign them.

Here’s what the skeleton of the class looks like:

public class Office365MonSecurity : ActionFilterAttribute


public override void OnActionExecuting(ActionExecutingContext filterContext)


//code here to assign roles



Within the override, we plug in our code.  To begin with, this code isn’t going to do any good unless the user is already authenticated.  If they haven’t then I have no way to identify them and look up the roles (short of something like a cookie, which would be a really bad approach for many reasons).  So first I check to make sure the request is authenticated:

if (filterContext.RequestContext.HttpContext.Request.IsAuthenticated)


//assign roles if user is authenticated



Inside this block then, I can do my look up and assign roles because I know the user has been authenticated and I can identify him.  So here’s a little pseudo code to demonstrate:

SqlHelper sql = new SqlHelper();

List<string> roles = sql.GetRoles(ClaimsPrincipal.Current.Identity.Name);


foreach(string role in roles)


Claim roleClaim = new Claim(“roles”, role);




So that’s a pretty good and generic example of how you can implement it.  Also, unlike application roles that you define through the UI in the Azure management portal, you can assign multiple roles to a user this way.  It works because they just show up as role claims.  Here’s an example of some simple code to demonstrate:


//assign everyone to the app reader role

Claim readerClaim = new Claim(“roles”, “MyAppReader”);



//add a role I just made up

Claim madeUpClaim = new Claim(“roles”, “BlazerFan”);


This code actually demonstrates a couple of interesting things.  First, as I pointed out above, it adds multiple roles to the user.  Second, it demonstrates using a completely “made up” role.  What I mean by that is that the “BlazerFan” role does not exist in the list of application roles in Azure AD for my app.  Instead I just created it on the fly, but again it all works because it’s added as a standard Claim of type “roles”, which is what we’ve configured our application to use as a role claim.  Here’s a partial snippet of what my claims collection looks like after running through this demo code:


To actually use the FilterAction, I just need to add it as an attribute on the controller(s) where I’m going to use it.  Here’s an example – Office365MonSecurity is the class name of my FilterAction:


public class SignupController : RoutingControllerBase

There you have it.  All of the pieces to implement your own RBAC solution when the current service offering is not necessarily adequate for your scenario.  It’s pretty simple to implement and maintain, and should support a wide range of scenarios.

Using the Office 365 Batch Operations API

As I was looking around for a way to batch certain operations with the Office 365 API the other day, I stumbled upon a Preview of just such a thing, called “Batch Outlook REST Requests (Preview)” – https://msdn.microsoft.com/en-us/office/office365/api/batch-outlook-rest-requests.  The fundamentals of how it works is fairly straightforward, but it’s completely lacking implementation details for those using .NET.  So, I decided to write a small sample application that demonstrates using this new API / feature / whatever you want to call it.

First, let’s figure out why you might want to use this.  The most common reason is you are doing a bunch of operations and don’t want to go through the overhead of creating, establishing, and tearing down an HTTP session for each operation.  That can slow down quickly and burn up a lot of resources.  Now when I was first looking at this, I was also interested in how it might impact throttling limits that Office 365 imposes.  Turns out I had a little misunderstanding of that, but fortunately Abdel B. and Venkat A. explained Exchange throttling to me, and so now I will share with you.

My confusion about impact on throttling that batch operations might have was borne out of the fact that SharePoint Online has an API throttling limit that has been somewhat ubiquitously defined as no more than 1 REST call per second over an extended time.  So…kind of specific, but also a little vague.  Exchange Online throttling is arguably even less specific, but they do have some good information about how to know when it happens and what to do about it.

In Exchange Online, different operations may have a different impact on the system, and it may also be impacted by demands from other clients.  So when making REST API calls to Exchange Online your code should account getting a throttling response back.  A throttled response in Exchange Online returns a standard http status code 429 (Too many requests).  The service also returns a Retry-After header with the number of seconds to resubmit the request.  Now that you know what a throttled response from Exchange Online looks like, you can develop your code to include a process for retry and resubmission.

The batching feature lets you work around the overhead of multiple calls by allowing you to send in up to 20 operations in a single request.  That means 1 connection to create, establish and tear down instead of 20.  This is goodness.

The basic process of doing batch operations using this feature is to create what I’ll call a “container” operation.  In it, you will put all of the individual operations you want to perform against a particular mailbox.  Note that I said mailbox – this is important to remember for two reasons:  1) the batch feature only works today with Outlook REST APIs and 2) the individual operations should all target the same mailbox.  That makes sense as well when you consider that you have to authenticate to do these operations, and since they are all wrapped up in this “container” operation, you’re doing so in the context of that operation.

The “container” operation that I’m talking about is POST’ed to the $batch endpoint in Outlook:  https://outlook.office.com/api/beta/$batch.  The Url is hard-coded to the “beta” path for now because this API is still in preview.  In order for you to POST to the $batch endpoint you need to provide an access token in the authorization header, the same way as you would if you were making each of the individual calls contained in your container operation.  I’m not going to cover the process of getting an access token in this post because it’s not really in scope, but if you’re curious you can just look at the sample code included with this post or search my blog for many posts on that type of topic.

While I’m not going to cover getting an access token per se, it’s important to describe one higher level aspect of your implementation, which is to create an application in your Azure Active Directory tenant.  Generally speaking, you don’t access an Office 365 REST API directly; instead, you create an application and configure it with the permissions you need to execute the various Outlook REST APIs you’ll be using.  In my case, I wanted to be able to read emails, send emails and delete emails, so in my application I selected the following permissions:


So with that background, here are the basic steps you’ll go through; I’ll include more details on each one below:

  1. If you don’t have an access token, go get one.
  2. Create your “container” operation – this is a MultipartContent POST.
  3. Create your individual operations – add each one to your MultipartContent.
  4. POST the “container” operation to the $batch endpoint.
  5. Enumerate the results for each individual operation.


Step 1 – Get an Access Token

As I described above, I’m not going to cover this in great detail here.  Suffice to say, you’ll need to create an application in Azure Active Directory as I briefly alluded to above.  As part of that, you’ll also need to do “standard Azure apps for Office 365” stuff in order to get an access token.  Namely, you’ll need to create a client secret, i.e. “Key”, and copy it along with the client ID to your client application in order to convert the access code you get from Azure into an AuthenticationResult, which contains the access token.  This assumes you are using ADAL; if you are not, then you’ll have your own process to get the access token.


Step 2 – Create Your Container Operation

The “container” operation is really just a MultipartContent object that you’ll POST to the $batch endpoint.  Unfortunately, there is scarce documentation on how to create these, which is in large part why I wrote this post.  The code to get you started though is just this simple:


//create a new batch ID

string batchId = Guid.NewGuid().ToString();

//create the multipart content that is used for a batch process

MultipartContent mpc = new MultipartContent(“mixed”, “batch_” + batchId);

The main thing to note here is just that each “container” operation requires a unique batch identifier.  A Guid is perfect for this, so that’s what I’m using to identify my batch operation.


Step 3 – Create Individual Operations and Add to the Container Operation

The actual code you write here will vary somewhat, depending on what your operation is.  For example, a request to send an email message is going to be different from one to get a set of messages.  The basic set of steps though are similar:

  1. Create a new HttpRequestMessage. This is going to be how you define whether the individual operation is a GET, a POST, or something else, what Url to use, etc.  Here’s the code I used for the operation to send a new email:  HttpRequestMessage rqMsg = new HttpRequestMessage(HttpMethod.Post, BATCH_URI_BASE + “me/sendmail”);  It’s worth noting that you ALWAYS send your individual operations to the $batch endpoint to be included in the batch process.  For example, if you were using v2 of the Outlook API, to send a message you would use the Url https://outlook.office.com/api/v2.0/me/sendmail.  However, to use the $batch endpoint, since it’s in beta, you use the Url https://outlook.office.com/api/beta/me/sendmail.
  2. Create the content for your operation. In my case I used a custom class I created to represent a mail message, I “filled it all out”, and then I serialized it to a JSON string.  I then took my string to create the content for the operation, like this:  StringContent sc = new StringContent(msgData, Encoding.UTF8, “application/json”);  So in this case I’m saying I want some string content that is encoded as UTF8 and whose content type is application/json.
  3. Add your content to the HttpRequestMessage: Content = sc;
  4. Wrap up your HttpRequestMessage into an instance of the HttpMessageContent class. Note that you’ll need to add a reference to System.Net.Http.Formatting in order to use this class.  Here’s what it looks like:  HttpMessageContent hmc = new HttpMessageContent(rqMsg);  We’re doing this so that we can set the appropriate headers on this operation when it’s executed as part of the batch.
  5. Set the headers on the HttpMessageContent object: Headers.ContentType = new MediaTypeHeaderValue(“application/http”); and also hmc.Headers.Add(“Content-Transfer-Encoding”, “binary”);  You now have a single operation that you can add to the “container” operation.
  6. Add your individual operation to the “container” operation: Add(hmc);  That’s it – now just repeat these steps for each operation you want to execute in your batch.

Side note:  I realize some of this code may be difficult to follow when it’s intertwined with comments like I’ve done above.  If you’re get squinty eyed, just download the ZIP file that accompanies this post, and you can see all of the code end to end.


Step 4 – Post the Container Operation to the $Batch Endpoint

There’s not a lot to step 4.  You can just POST it now, but there’s one other point I want to make.  Your “container” operation may contain many individual operations.  There are a couple of points about that worth remembering.  First, the individual operations are not guaranteed to be performed in any specific order.  If you need them to be performed in a specific order, then either don’t do them in a batch or do them in separate batches.  Second, by default, at the point that any individual operation encounters an error, execution stops and no further operations in the batch will be executed.  However, you can override this behavior by setting a Prefer header in your “container” operation.  Here’s how you do that:

mpc.Headers.Add(“Prefer”, “odata.continue-on-error”);

With that done (or not, depending on your requirements), you can go ahead and POST your “container” operation to the $batch endpoint, like this:

HttpResponseMessage hrm = await hc.PostAsync(BATCH_URI_BASE + “$batch”, mpc);

With that done, it’s time to look at the results, which is covered in the next step.


Step 5 – Enumerate the Results for Each Individual Operation

At a high level, you can see if the overall batch operation worked the same way you would if it were just one operation:

if (hrm.IsSuccessStatusCode)

The important thing to understand though, is that even though the “container” POST may have worked without issue, one or more of the individual operations contained within may have had issues.  So how do you pull them all out to check?  Well, using the MultipartMemoryStreamProvider class is how I did it.  This is another class that requires a reference to System.Net.Http.Formatting in order to use, but you should already have it from the other steps above so that shouldn’t be a problem.

So we start out by getting all of the responses from each individual operation back like this:

MultipartMemoryStreamProvider responses = await hrm.Content.ReadAsMultipartAsync();

You can then enumerate over the array of HttpContent objects to look at the individual operations.  The code to do that looks like this:

for(int i=0; i < responses.Contents.Count;i++)


string results = await responses.Contents[i].ReadAsStringAsync();


It’s a little different from having an HttpResponseMessage for each one in that you have to do a little parsing.  For example, in my sample batch I sent two emails and then got the list of all of the emails in the inbox.  As I enumerate over the content for each one, here’s what ReadAsStringAsync returns for sending a message:

HTTP/1.1 202 Accepted

Okay, so you get to parse the return status code…should be doable.  It can get a little more cumbersome depending on the operation type.  For example, here’s what I got back when I asked for the list of messages in the inbox as part of the batch:

HTTP/1.1 200 OK

OData-Version: 4.0

Content-Type: application/json;odata.metadata=minimal;odata.streaming=true;IEEE754Compatible=false;charset=utf-8


Okay, so I trimmed a bunch of detail out of the middle there, but the gist is this – you would have to parse out your HTTP status code that was returned, and then parse out where your data begins.  Both quite doable, I just kind of hate having to do the 21st century version of screen scraping, but it is what it is.  The net is you can at least go look at each and every individual operation you submitted and figure out if they worked, retrieve and process data, etc.


That’s the short tour of using the Outlook batch API.  There are a handful of things you need to know about how it works and what it’s limitations are, and I’ve pointed out all of the ones I know about in this post.  The trickier part by far is understanding how to create a batch request using the .NET framework, as well as how to parse the results, and I covered both of those aspects of it as well.

As I mentioned a few times in this post, I just zipped up my entire sample project and have attached it to this post so you can download it and read through it to your heart’s content.  It does contain the details specific to my application in Azure AD, so you’ll need to create your own and then update the values in the app if you want to run this against your own tenant.  The ZIP file with the code is below:


Expanding the Office365Mon Subscription Management API

Today we’re happy to announce a slew of new APIs that have been added to our Subscription Management API tool set.  The Subscription Management API at Office365Mon has long been a market differentiator with other solutions in the Office 365 monitoring space.  Our first releases allowed you to manage the basics of the core monitoring features of Office365Mon.  Based on customer demand, we have just released a significant expansion of those APIs.  Our total feature set for managing your Office365Mon subscription has gone from 28 APIs to 46.

The new API support allows you to do things like configure the cloud service for your Distributed Probes and Diagnostics deployments (https://www.office365mon.com/Configure/OnPremProbes), which allows you to issue health probes in conjunction with our cloud service from any geographic location where you have users.  You can also configure the integration with the Office 365 Service Communication API (https://www.office365mon.com/Signup/Status), which allows you to stay up to date with any changes in the status of all of the services and features you have in your Office 365 tenant.  You can also manage Office365Mon’s monitoring of the SharePoint Online search service (https://www.office365mon.com/Configure/SearchMon).  This is critical for virtually all SharePoint customers, since so much of the content is driven by search – such as Content by Query web parts, search-based navigation, etc. – in addition to being used by many, many custom applications.  With these new APIs, you can now manage virtually every single thing using our API that you can do in the browser on our site.

Today’s announcement marks another set of innovative features that were developed based on feedback from you, our customers.  We hope you’ll find these to be valuable additions to the management of your Office365Mon subscriptions.  As always, if you have other requests or ideas for features you would like to see in our service, please just send us a note at support@office365mon.com.

From Sunny Phoenix,


Analyzing and Fixing Azure Web Sites with the SCM Virtual Directory

There’s so many things you do every day as part of operating and maintaining your Azure web sites.  They’re a common target for developers because you get 10 free sites with your Azure subscription, and if you know what you’re doing you can spin that up into even more applications by using custom virtual directories as I’ve previously explained here:  https://samlman.wordpress.com/2015/02/28/developing-and-deploying-multiple-sharepoint-2013-apps-to-a-single-azure-web-site/.  That example is specific to using them for SharePoint Apps, but you can follow the same process to use them for standard web apps as well.

Typically, you go through your publishing and management process using two out of the box tools – Visual Studio and the Azure browser management pages.  What happens though when you need to go beyond the simple deploy and configure features of these tools?  Yes, there are third-party tools out there than can help with these scenarios, but many folks don’t realize that there’s also a LOT that you can do with something that ships out of the box in Azure, which is the Kudo Services, or as I’ve called it above, the SCM virtual directory.

The SCM virtual directory is present in every Azure web site.  To access it, you merely insert “scm” between your web name and the host name.  For example, if you have an Azure web site at “contoso.azurewebsites.net”, then you would navigate to “contoso.scm.azurewebsites.net”.  Once you authenticate and get in, you’ll arrive at the home page for what they call the Kudu Services.  In this post I really just wanted to give you an overview of some of the features of the Kudu Services and how to find them, which I kind of just did.  :-)  At the end though I’ll include a link to more comprehensive documentation for Kudu.

Going back to my example, I found out about all of the tools and analysis available with the Kudu Services a few months ago when I was trying to publish an update to an Azure web site.  Try as I might, the deployment kept failing because it said a file in the deployment was being used by another process on the server.  Now of course, I don’t own the “server” in this case, because it’s an Azure server running the IIS service.  So that’s how I started down this path of “how am I gonna fix that” in Azure.  SCM came to the rescue.

To begin with, here’s a screenshot of the Kudu home page:


As you can see right off the bat, you get some basic information about the server and version on the home page.  The power of these features come as you explore some of the other menu options available.  When you hop over to the Environment link, you get a list of the System Info, App Settings, Connection Strings, Environment variables, PATH info, HTTP headers, and the ever popular Server variables.  As a long time ASP.NET developer I will happily admit that there have been many times when I’ve done a silly little enumeration of all of the Server variables when trying to debug some issue.  Now you can find them all ready to go for you, as shown in this screenshot:


Now back to that pesky “file in use” problem I was describing above.  After trying every imaginable hack I could think of back then, I eventually used the “Debug console” in the Kudu Services.   These guys really did a nice job on this and offer both a Command prompt shell as well as a PowerShell prompt.  In my case, I popped open the Command prompt and quickly solved my issue.  Here’s an example:


One of the things that’s cool about this as well is that as I motored around the directory structure with my old school DOS skills, i.e. “cd wwwroot”, the graphical display of the directory structure was kept in sync above the command prompt.  This really worked out magnificently, I had no idea how else I was going to get that issue fixed.

Beyond the tools I’ve shown already, there are several additional tools you will find, oddly enough, under the Tools menu.  Want to get the IIS logs?  No problem, grab the Diagnostic Dump.  You can also get a log stream, a dashboard of web jobs, a set of web hooks, the deployment script for your web site, and open a Support case.

Finally, you can also add Site Extensions to your web site.  There are actually a BUNCH of them that you can choose from.  Here’s the gallery from the Site Extensions menu:


Of course, there’s many more than fit on this single screen shot.  All of the additional functionality and the ease with which you can access it is pretty cool though.  Here’s an example of the Azure Websites Event Viewer.  You can launch it from the Installed items in your gallery and it pops open right in the browser:


So that’s a quick overview of the tools.  I used them some time ago and then when I needed them a couple of months ago I couldn’t remember the virtual directory name.  I Bing’d my brains out unsuccessfully trying to find it, until it hit me when I looked at one of my site deployment scripts – they go to the SCM vdir as well.  Since I had such a hard time finding it I thought I would capture it here and hopefully your favorite search engine will find enough of keywords in this post to help you track it down when you need it.

Finally, for a full set of details around what’s in the Kudu Services, check out their GitHub wiki page at https://github.com/projectkudu/kudu/wiki.

New Geographic and Notification Features for Distributed Probes from Office365Mon

We’ve just released a significant update to the Office365Mon Distributed Probes and Diagnostics feature.  For those of you who aren’t familiar with this feature, it was originally released a little over a year ago.  From the beginning it was designed to do two things:

  1. Work in conjunction with the Office365Mon cloud service to issue health probes from different geographic regions where you have users. That allows you to check the availability and performance not only from our cloud service, but also from all of the locations where you have users.
  2. When there’s a problem connecting to Office 365, it runs a series of diagnostics on the local network to try and determine if there are any issues. That includes things like checking local network cards, DNS, gateway and a non-Office 365 Internet site.

In addition to the tasks above, it also allows you to set a performance threshold – for example, let me know when it takes longer than x seconds to connect to and get data back from Office 365.  You can set “x” to whatever value you want, so it allows you to set different minimum performance thresholds for each location where you have users.  One of the big reasons we did this is because we got a lot of feedback from our enterprise customers that they had situations where performance may be great in the US for example, but poor or completely down for users in another region, like Europe.

In the previous version of Distributed Probes and Diagnostics, any issues with health probes, performance, or any of the items in the diagnostics checklist was written to the local event log.  You could then monitor the event log in each location where you have it installed to find out when there are issues at a particular location.  That also proved to be pretty helpful if you had to open a support case with Microsoft because of connectivity issues.  They will typically try and triage the issue by looking to see if there are local network issues, versus an issue with the Office 365 service.  By using the Distributed Probes and Diagnostics feature, you can quickly check the event log on the machine(s) where it’s running and if any local network issues were found, it will be logged in there.  That notifies you if an issue is found, saving you a call and allowing you to focus on the real problem, or else validate with support that your local network is fine.

Our new update has all of the same features I’ve described above you’ve come to depend upon, but we’ve also built some very important new pieces to complement it.  Now, in addition to logging data to the local event log, as long as you have a working Internet connection it also reports and sends out alerts through our cloud service.  This opens up some very interesting data points, both from a reporting perspective as well as notifications.

New Reports

When you configure the Distributed Probes and Diagnostics feature now, you are asked to enter the ZIP code where the computer is located on which you installed it.  We use that data for both local and regional geographic data that feeds into the new reports that have been built for the service.  Overall we added 10 new reports to the service to accommodate this new data stream – two Basic reports and eight Advanced reports.

During the beta phase for this release we had the feature running in 8 different countries and more than a dozen locations.  From that data we can create a performance heat map across the globe from all of our customers that are running this service:


The picture above shows data from locations in the UK, India and Australia.  You can tell based on the intensity of the color around the push pin which locations are performing worse than others.  For example, Australia has the most intense colors and France has some of the lightest colors, so you can tell at a glance that you have much worse performance in Australia than France.  That’s going to be pretty important to know when supporting your Australian users.

We also create bubble maps to represent the performance in different locations for your Office365Mon subscription.  This gives you another “at a glance” snapshot of what how things are going in different locations.  The key distinction here is that in the report above, you get to see what the data looks like across the globe for all Office365Mon customers; the bubble map lets you see the performance just for the locations associated with your Office365Mon subscription.  That gives you the capability to compare how others are doing in a particular region relative to your users.  If you see a negative difference between them then that may indicate that you have problems in your network in those locations that should be addressed.

Here’s a screenshot of that report, where we’ve drilled down to see just locations in the US:


Here we can see that folks out West are getting much better performance than their counterparts in the East.

The graphical maps are a great way to use an “at a glance” view of the performance for your user base, where ever they may be located.  We also offer more traditional views of this data as well though, so you can quickly compare performance on each computer where you’ve installed the Distributed Probes and Diagnostics agent, as shown here:


In our case, we had the agent installed in a LOT of locations, so you see a lot of data there.  Again, the number of locations in which it’s installed is completely up to you.

Of course just as important as performance, we definitely have seen scenarios where the service as a whole may be up, but individual regions may be down.  A good example of this is the handful of times a few months ago when there were problems with Azure Active Directory in certain European regions.  Since our cloud service currently runs out of data centers in the US, it did not have any issues connecting to the service because the regional Azure AD services it uses were working.  However, our customers that had the Distributed Probes and Diagnostics agent running in Europe were able to find out first that there was an issue over there, because the probe and authentication process occurred there, where their users are.

We also saw this occur at times during the beta for this release, and you can see that reflected in the new availability reports.  They show availability based on the agents where the Distributed Probes and Diagnostics feature is installed; here’s a screenshot of that:



New Notification Capabilities

While we’ve added a bunch of new reports, we’ve also vastly improved upon the notification capabilities.  As I was describing earlier, in the previous release of Distributed Probes and Diagnostics, all notifications went exclusively to the local event log.  We still do that, but now these events are also wired up to go out to our cloud service as long as you have a working Internet connection.  Just like you might expect, you get notifications for the same kinds of things you get from our cloud monitoring service – when outages start and end.  But now you are getting those notifications from a specific location, so you can know right away if the service overall is up, but just one or two locations are down.

We also send notifications when the performance for a health probe doesn’t meet the threshold you had defined.  So for example, you could define a threshold of 15 seconds from Melbourne, Australia and 8 seconds from Glasgow, Scotland.  If it takes longer than the threshold you’ve defined for that location, then you’ll get notifications to all of the “channels” that you’ve configured for your Office365Mon subscription – emails, text messages, and webhook data if you have that configured – that indicate the issue and where it’s occurring.  You really will have an up-to-date, around-the-world view of your users’ ability to connect to Office 365 in a reasonable time frame.

Get Started Now

This feature is available to use now for all Office365Mon customers that are either in their 90-day trial period, or that have the Enterprise Premium license.  We hope that you’ll give it a try and, as always, let us know how we can improve upon it.  The features in this update were all driven by feedback from our customers so it DOES matter when you make suggestions.

To get more information on this feature, see our original post about it here: https://samlman.wordpress.com/2015/09/28/announcing-the-availability-of-office-365-on-premise-health-probes-for-office365mon-customers/.  To get get the documentation and agent, visit the Distributed Probes and Diagnostics configuration page on our site here:  https://www.office365mon.com/Configure/OnPremProbes.

Thanks from sunny Phoenix,


Preview Now Available to Monitor Skype for Business at Office365Mon

Today is a day that we’ve been waiting on for a while. In the past 15 months we’ve been building out a pretty comprehensive service centered around monitoring SharePoint Online and Exchange Online in Office 365. Thanks to some new APIs from Microsoft, we are now happy to announce that we’re adding Skype for Business (SfB) to the suite of products you can monitor with Office365Mon.

While we’re still in preview with SfB you may notice an occasional glitch here or there, but it’s been running in our labs for well over a month now and we’ve had pretty good luck with it. It fits into the same proven architecture that Office365Mon has been using since launch. That means – as always – that we don’t ask you for a username and password to monitor SfB. You simply log in through Azure Active Directory, and when you’re done it hands off an access token to us that we can use. At this time we will be providing monitoring for Skype Presence and Skype Instant Messaging. As the scope of the APIs that Microsoft has for SfB expands, we will also expand our offering into other features of the service, such as online meetings and voice.

Although we’ve always recommended a separate service account(s) to use for monitoring Office 365, with SfB it’s really a must. Because of the way we use the APIs to check presence and instant messaging, if you try using the same account you use at work every day to monitor these services, you likely will end up with a bunch of “stuff” going on that would be quite annoying, plus it would interfere with our ability to accurately monitor the service. To that end, we recommend you use the same sort of process that we outlined in our blog post for monitoring multiple sites and mailboxes: https://samlman.wordpress.com/2015/11/02/how-to-monitor-multiple-mailboxes-and-sites-with-office365mon/. In short you will a) create a new account for monitoring, b) give it a SfB license, c) add it as an admin to your Office365Mon subscription, and d) log into Office365Mon with that account and enable the Skype for Business monitoring.

Enabling monitoring for SfB is about as simple as it gets; here’s a screenshot from the configuration page:


As you can see, all you have to do is click the Enable button to get things going – that’s it. This is also in line with how we’ve built our solutions at Office365Mon – as simple as possible, with nothing to download and install. There is one thing to remember when you click Enable the first time – you may get prompted by Azure Active Directory two times instead of the normal one, to consent to allow Office365Mon to have access to Skype resources for the account you are using for monitoring. That’s okay, it’s just because of the way the Skype team designed their service.

After that you’re off and running. We’ll automatically add the data to the reports you see for things like outages and recent health checks. You’ll also see the data show up in your My Info page next to all of the other resources we’re monitoring for you:


Finally, there is one other thing worth pointing out. Because of the way the SfB service is designed, there are times when it will be unavailable for monitoring. As we deploy monitoring for it as a Preview feature, we’re continuing to work on alternatives to minimize the alerting and configuration changes that may be needed as a result of SfB changing to an unmonitorable state. This is something that we’ll continue to work on over time, as well as await changes in the SfB architecture that will eliminate these issues.

This feature is available in Preview now for all of our customers to try. Also remember that all new customers get this along with every other feature we offer free for 90 days. So give it whirl and send your feedback our way.

From Sunny Phoenix,




Office 365 Search Monitoring at Office365Mon

Today we’re announcing a new feature at Office365Mon that our customers have been asking about for quite some time.  We’ve added the ability to monitor the Search service in your SharePoint Online tenant using our well-established health probe architecture.  This has been frequently requested because Search in SharePoint plays such a pivotal role in content delivery.  There are many out of the box web parts that depend on successfully executing queries to generate the content for the page.  On top of that there are many, many custom applications that are dependent upon the Search service working correctly.  From a developer perspective this has been an approach advocated by Microsoft for several years (including myself when I was in that role) because of the reduction in load it puts on the SharePoint farm as well the ability to pull data from a variety of sources.

When we were first designing this feature it primarily focused on the query aspect of the Search service.  However, after giving a sneak peek at what we were planning with several of my former colleagues at Microsoft, they felt just as passionate (if not more so) that we should also see what we could do about monitoring crawl performance.  Apparently quite a few of them have had customers frustrated about not seeing the content they expected when using Search and in many cases found out that their content had not actually been crawled yet.  There were some challenges in managing this request, but we found a solution that we’re happy with and causes a minimal amount of friction for our customers.  Best of all, like all our other features, I think you’ll find this extremely easy to set up and use.


What Does It Do?

Here’s a screenshot of the configuration page for the Office 365 Search Monitoring feature so you can see just how easy it is to configure what it does for you:

The Office 365 Search Monitoring feature allows you to provide a custom Keyword Query Language (KQL) query that we will execute against the site we’re monitoring for you with Office365Mon.  We have a link to the KQL guide when you go in and configure Search monitoring, and it can be something as simple as querytext=’sharepoint’.  As you’ll see if you look at the KQL reference though, the beauty of this is that you can actually get quite sophisticated in your KQL.  You can do things like control how many search results are returned, select a set of properties to return, use different ranking models, enable stemming, enable phonetics, etc.  One of the big reasons why we chose to use KQL directly is because so many of you have written us about custom applications you have built on Search and you want some means to monitor them.  By letting you use KQL, you can use any query that’s relevant to what your app does, and we’ll use that as the basis for monitoring.

Once you have your KQL, we take over from there.  We really monitor three things around the KQL query you’ve provided, based on the feedback and requests we’ve received the last several months:

  1. Query latency – you define a maximum query time, and if it takes longer than that for us to get results back from running your KQL, we send out a notification to all of your configured notification options. That includes emails, text messages, and of course – now – webhooks.  If you have apps that are based on the Search service it can be critically important to know when queries are running slowly – based on a latency you decide is needed – so you know whether there are issues with your app, or issues with the Search service that your app is using.
  2. Search results change – you can receive notifications when search results change. What this means is that if the set of results changes from the last time we ran your KQL, we’ll send out notifications.  If the only thing that changes is the raw rank value of the items in the search results, we will NOT send out notifications.  However, if all of the search results are the same, but the order of them changes, we DO send out notifications.  It’s also important to underscore that this reflects only on the search results themselves – it has nothing directly to do with the underlying items in the search results.  What that means is that if you change a document that’s included in the search results, we don’t detect and notify for that.  However, if that change caused a change in the search results – either the document no longer shows up, it shows up higher or lower in the search results, etc. – then we would send out notifications.

There are a couple of other things worth noting here as well.  First, we don’t store your actual search results.  This is consistent with how we do things here at Office365Mon – we don’t store usernames and passwords, and we don’t store your data.  What we DO store is a one-way hash of the search results.  When we do another search, we create the hash on the latest results and compare it to the hash we had before, and if they’re different we know the search results have changed.  It’s a one-way hash which means that it cannot be “reverse engineered” or otherwise tampered with to rehydrate the actual search results.

The second thing worth noting here is that in our experience, you may see the search results change quite frequently, even if there have not been any changes to the site content.  This is not a bug or issue with the Office365Mon Search Monitor feature.  We find that there may be hours and sometimes days when the search results come back exactly the same.  Then there are times when you may get different search results on five or six queries in a row.  Again – using the same KQL and the site content has not changed – but the search results ordering changes.  This is something you should be aware of when developing your applications based on the Search service.  If you need to know when it happens, now you can use our Search monitor feature to be alerted to it.  To help you parse through the changes, you can also elect to check the box to Include results in notifications when results change.  When that option is selected, you’ll get the actual search results we received in both emails and webhooks – just not text messages.  Again – these results are not saved anywhere, we are simply forwarding to you what we received.

  1. No search results – the other option you have is to get notified when no search results are returned from your KQL. Again, whether you’re using out of the box web parts or a custom application, if you have content in your site that’s dependent upon getting search results, this can be a critical notification to have.

Finally, as described above, the other thing we monitor is content indexing.  With a simple checkbox, you can have us monitor your search index to see how long it is taking to crawl new content.

With all of the features described above, there are several reports that you can also use to stay on top of how things are doing.  More details regarding the reports are described below.  Also, if you want detailed step-by-step instructions for configuring Search monitoring you can get them from our site at https://www.office365mon.com/configuresearchmon.pdf.  One final point worth making – you can also do all of the configuration programmatically using our Subscription Management API.  See our latest API documentation for details and code samples.


How Does It Do It

The Office 365 Search Monitor uses the same super-scale health probe infrastructure that Office365Mon has been using since Day One.  That enables us to issue and track query responses to your tenant.  To support crawl monitoring though, we had to come up with something a little different, and here’s why.  The monitoring applications we use are all defined in Azure Active Directory, and as part of that definition we describe what rights our applications need.  We always use the least invasive permissions possible to get the job done, so all of our apps are configured with the smallest amount of Read Only rights that we can get away with.  To do crawl monitoring though, we needed a way to determine how long it takes to get new content indexed – so how do we do that?  Well, we need to write a small amount of data to the site we’re monitoring, and then start issuing queries for that content.  We look at when we wrote the content into the site, and how long it takes until it starts showing up in search results, and that’s how we calculate the time it takes to index the content.

As described above though, all of our applications are configure to only have Read rights, so how can we write content to a site?  That’s where we had to add a new item to our toolbox to make it happen, and what we decided to do is to write a SharePoint App.  Yes, the same SharePoint Apps many of you develop to bring your site to life, we built one as well.  We wanted to limit our scope as much as possible, so the app only has rights in the current site it’s installed in – not any other site (i.e. SPWeb) in the site collection, nor any other site collection in the tenant.  When you first configure the Office 365 Search Monitoring feature, if you elect to Monitor search index freshness, the first check we do is to ensure the application is installed in the site being monitored.  If it’s not, we let you know so you can go install the app and try saving your Search monitoring configuration changes again.

The SharePoint App is in (or soon will be in) the Office Store under the name Office365Mon Search Monitor.  You can install it from there into the site being monitored.  In addition to that, since some folks turn off access to the Office Store, when you go to our Configure Office 365 Search Monitor page, we have a link you can use to download a zip file containing the SharePoint App.  If you go that route you’ll need extract the .app file out the zip file, upload it to the App Catalog in your tenant, and then you can install it in the monitored site.

When the app is installed and you save your search monitoring configuration, we’ll look for a custom list we use to store the data used for monitoring the search index freshness.  If it doesn’t exist, we’ll create it.  After that we use the app to create new content in the list so we can monitor how quickly it’s being indexed.


Reporting on Search Monitoring

We’ve built a half-dozen new reports based on the Office 365 Search Monitoring feature.  We have a very full list of Advanced Reports now with your Office365Mon subscription, as you can see in the Report Gallery here:

You can get recent data (i.e. from the last few hours) on query and crawl latency.  Here’s an example of the recent crawl latency report:

You can get daily stats on query and crawl latency – here’s an example of the daily query latency report:

Don’t be fooled by the numbers either – they show exactly why you want to monitor your query latency.  You can look at the graph and see overall the queries are returning data in sub second time.  However, it’s not uncommon to see this, yet still get notifications about queries that have taken 15 or 20 seconds or longer.  That’s what you want to know – your queries normally perform one way, but when they are significantly different it may be impacting the content and performance in your site.

We also have reports that show your monthly query and crawl latency averages, and they are overlaid on top of our service-wide averages so you can see how your performance is compared to your peers that are being monitored by Office365Mon.  In addition to that, in the Basic Reports you can see data just on the average crawl and query latency across our entire service:

In addition to our out of the box reports, you can also use our Report Data API to programmatically retrieve this data via our REST endpoints.  For those of you using our Power BI integration, you will also automatically see monthly crawl and query freshness data show up in Power BI after you refresh your data in there.

How Do I Get It

All existing Office365Mon customers always get all new features free to try for 90 days; everyone has had the feature turned on.  In addition, all new customers also always get ALL our features free to try for 90 days.  The feature is currently in beta, but rolled out now so everyone can begin using it.

That’s it – a quick run-down on a feature that we believe many of you will find extremely useful.  It has great value whether you are using out of the box web parts, or you’ve developed your own custom applications built on Office 365.  As always, if you have suggestions or ideas on how to improve this or any other feature at Office365Mon, please just drop me a note at support@office365mon.com.

From Sunny Phoenix,