We introduced monitoring for the email transport for Office 365 customers back in February of this year after strong demand from our customers (https://samlman.wordpress.com/2017/02/14/know-when-your-office-365-email-gets-stuck-in-transit-from-office365mon/). We had provided monitoring of the email service itself since Day 1, but if your mailbox is up but the messages you send don’t go out, or the message people send you don’t make it in, then email is of pretty limited value. As we moved the email transport monitoring service out into production and began gathering more and more data, we started finding a few common errors that many tenants were experiencing, but which are innocuous enough you might not even know that they are happening.
We decided to create some new reports around the email transport monitoring feature to help customers understand better when these situations occur. Sometimes they point out problems that customers can resolve themselves, and other times it’s just good to understand better when and where the hiccups are in your email service. In a nutshell, as we capture more data around errors in the transport, we are bubbling up the common errors into these new reports so you can see for yourself when they happen and how frequently they happen.
If you go into the Advanced Reports gallery on our site at https://www.office365mon.com/Reports/advreports, there are new “Recent Email Transport Errors” and “Monthly Email Transport Errors” reports. The Recent Email Transport Errors report is a simple tabular list of the 100 most recent instances of these common errors described above. Here’s a sample report:
As you can see, we indicate whether the issue was with an inbound or outbound email, when it happened (most recent first), and what the problem was. This report has already paid off big for one of our customers, because with the information in it they were able to determine that they had an old MX record in DNS, and that MX record pointed to a server that was no longer available. This showed up in our reports every time that server was selected for us to send mail to it, and as a result they were able to clean that up in their environment. Some of the other common problems we see are things like Outlook API service is temporarily unavailable, the anti-spam features have incorrectly marked an outbound message as spam (and as a result it gets stuck in the Drafts folder)…happens most frequently with .onmicrosoft.com email accounts by the way, and requests to the service are unauthorized (which could be your access token has expired or Azure Active Directory is temporarily unavailable). Some things you may be able to fix yourself; other things it’s just good information to have so you are aware of how well your transport features are performing.
The Monthly Email Transport Errors report helps you stay on top of this by presenting a bar chart with a count of each type of error, so you can see how frequently each type of error is occurring in your tenant each month. Here’s an example of that:
At Office365Mon our position is you can never have too much good information. Based on the early results of this reporting, we’re already seeing good outcomes and actionable data for our customers. If you aren’t set up for Office 365 monitoring yet, please visit our web site at https://www.office365mon.com to get started. Once you’ve configured basic monitoring, you can turn on email transport monitoring at https://www.office365mon.com/Configure/EmailTransport to get these reports yourself. We’ll be continuing to mine through the data and expand out the collection of common errors as we see them.
As always, please feel free to send us your feedback at email@example.com and thank you for all of the great ideas you’ve sent us already.
From Sunny Phoenix,