Troubleshooting Exchange Event ID 4002 from MSExchange Avalability.

This blogpost is about a strange incident I had with a fresh Exchange 2016 two node DAG.

The environment is a virtual VmWare environment. The case was a customer case where I was hired to migrate from a working Exchange 2013 environment to a new Exchange 2016 deployment. The customer had a relatively simple setup with a single AD site and nothing more.

I installed the new Exchange servers, and configured the environment accordingly setting up DAG and configuring mailflow etc. Proceeded with the pilot users and did some testing to confirm the environment was Ok. Everything checked out Ok, and the customer moved all users from Exchange 2013 to 2016. The 2013 servers were decommissioned and everything was Ok.

After a couple of months we suddenly experienced free-busy problems. The users with mailbox on one node would not be able to see free-busy from users on the other node.. This started happening out of the blue without no changes being done in the environment. We also started to see Event ID 4002 in the server logs on the server trying to do free-busy lookup.

Process 17932: ProxyWebRequest CrossSite from S-1-5-21-1409082233-1343024091-725345543-35887 to https://dagmember02.domain.com:444/EWS/Exchange.asmx failed. Caller SIDs: NetworkCredentials. The exception returned is Microsoft.Exchange.InfoWorker.Common.Availability.ProxyWebRequestProcessingException: Proxy web request failed. —> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. —> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
— End of inner exception stack trace —
at System.Net.TlsStream.EndWrite(IAsyncResult asyncResult)
at System.Net.ConnectStream.WriteHeadersCallback(IAsyncResult ar)
— End of inner exception stack trace —
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at Microsoft.Exchange.InfoWorker.Common.Availability.Proxy.RestService.EndGetUserPhoto(IAsyncResult asyncResult)
at Microsoft.Exchange.InfoWorker.Common.UserPhotos.UserPhotoApplication.EndProxyWebRequest(ProxyWebRequest proxyWebRequest, QueryList queryList, IService service, IAsyncResult asyncResult)
at Microsoft.Exchange.InfoWorker.Common.Availability.ProxyWebRequest.EndInvoke(IAsyncResult asyncResult)
at Microsoft.Exchange.InfoWorker.Common.Availability.AsyncWebRequest.EndInvokeWithErrorHandling()
— End of inner exception stack trace —
. Name of the server where exception originated: dagmember01. LID: 43532. Make sure that the Active Directory site/forest that contain the user’s mailbox has at least one local Exchange 2010 server running the Availability service. Turn up logging for the Availability service and test basic network connectivity.
I immediately started to search for a possible solution to this strange behaviour, but there was no working solution to be found anywhere(read through most of the posts on this event ID on the internet :)) All other services on the Exchange environment were working fine, there were no other error messages in the logs indicating something wrong, just this Event ID 4002 from time to time when people where trying to add someone to a meeting using Scheduling assistant.
After quite a while of research and asking a couple of colleagues, the solution suddenly appeared.

A colleague of mine asked me wether or not the customer used templates to create the VM’s. After checking this with the customer, we could confirm this. He then told me that he had been experiencing some strange similar behaviour on Exchange 2007 some years ago, and asked me to check if the servers had unique SID’s. I did so, and discovered that both the new Exchange 2016 servers had identical SID’s. The tool used was the PsGetSID from Microsoft Sysinternals.
Turned out that the servers were created from a template in VmWare and not sysprepped. After removing one of the servers and reinstalling it, everything started working fine again.
Bottom line is:
If your Exchange servers start acting weird and there doesn’t seem to be a logical explanation to the problem, check the server SID’s. They have to be unique, or obviously, strange things can start to happen in your environment. In my case, there was no obvious reason to the problems that suddenly started to appear and the server setup was made in good faith 🙂
This might bee a noob fault, but I can imagine that someone else but me would have experienced this or other strange problems with no logical explanation, so I think the tip would be useful in case everything else leads nowhere.
The weird part here is the servers functioning 100% ok for a couple of months before the problems started. I’ve never experienced this before, so for all I know that’s how Exchange handles this kind of misconfiguration?
Advertisements

Certificate missing private key.

Sometimes when dealing with certificates, a problem occurs when the certificate does not have a private key assigned to it.

In regards to Lync for instance, it’s not possible to assign the certificate to any services when the private key is missing. The solution to this problem is rather simple, and well documented in Microsoft TechNet but i still choose to write a post about it i case someone stumbles accross it and finds it useful.

Import the certificate in the MMC certificate Snap-In as you would do with any other certificate for the computer account. The certificate shows up in the Personal certificate store. Then doubleclick the certificate in the Personal view, and select the Details tab.

Cert_Properties

  • Copy the serial number from the cerificate properties.
  • Start a command prompt with elevated rights and type the following command:
    certutil.exe -repairstore my “serialnumber of the certificate”
  • Refresh the Personal certificates view, and you will see that the certificate has now been assigned a private key.

Ready to go.

Update:

Just to make it clear, as it’s correctly pointed out by Lasse in the comments, it’s not possible to restore a private key to a certificate without actually having the private key in your cert store.

 

 

 

Exchange 2013: Inbound mailflow suddenly stops.

I recently came across a problem with Exchange server 2013 which I thought I would like to share with you.

I installed an Exchange 2013 server with CU2 in a migration scenario migrating from Exchange 2007. The servers coexisted for a while when migrating all mailboxes, and I then verified that mailrouting was working as expected with the mailboxes residing on the new Exchange 2013 server. All good so far.

Proceeded with the migration and moved/recreated all connectors on the new server before disabling them on the old one. Everything seemed to be working fine for a while, but then suddenly the server stopped responding for incoming SMTP mail. There was nothing in the logs indicating anything wrong with the server, but every once in a while the server would stop accepting inbound SMTP and a reboot was the only way of getting it back on track(or a restart of the Exchange Transport Service/ Information Store service).

After spending quite some time troubleshooting this issue, a colleague of mine pointed me in the direction of the receive connectors. It turns out I had missed an important thing when I created the SMTP internal receive connector.

Exchange 2013 Receive Connectors
Exchange 2013 Receive Connectors correctly configured

As Paul Cunningham points out in this blog post regarding SMTP relaying, if the server is a multi-role server the connector has to be created using the FrontendTransport role, instead of the HubTransport which I had been using. My mistake… (but it doesn’t point out the fact that failing to create it with the correct properties makes everything else fail as well)

So, with this new info in mind, I proceeded to recreate the Internal SMTP routing connector as a FrontendTransport connector with all the same IP addresses that the old one had. Restarted the Transport service and the information store and watched as e-mail started flowing in.

Turns out that even if the connector in question was just meant for internal mail routing, the fact that it was made as a custom HubTransport caused a total failure for all the default connectors as well.

PS: This “bug” applies only to Exchange Server 2013 CU2. In CU1 this is not a problem as far as I know.