Troubleshooting Exchange Event ID 4002 from MSExchange Avalability.

This blogpost is about a strange incident I had with a fresh Exchange 2016 two node DAG.

The environment is a virtual VmWare environment. The case was a customer case where I was hired to migrate from a working Exchange 2013 environment to a new Exchange 2016 deployment. The customer had a relatively simple setup with a single AD site and nothing more.

I installed the new Exchange servers, and configured the environment accordingly setting up DAG and configuring mailflow etc. Proceeded with the pilot users and did some testing to confirm the environment was Ok. Everything checked out Ok, and the customer moved all users from Exchange 2013 to 2016. The 2013 servers were decommissioned and everything was Ok.

After a couple of months we suddenly experienced free-busy problems. The users with mailbox on one node would not be able to see free-busy from users on the other node.. This started happening out of the blue without no changes being done in the environment. We also started to see Event ID 4002 in the server logs on the server trying to do free-busy lookup.

Process 17932: ProxyWebRequest CrossSite from S-1-5-21-1409082233-1343024091-725345543-35887 to https://dagmember02.domain.com:444/EWS/Exchange.asmx failed. Caller SIDs: NetworkCredentials. The exception returned is Microsoft.Exchange.InfoWorker.Common.Availability.ProxyWebRequestProcessingException: Proxy web request failed. —> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. —> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
— End of inner exception stack trace —
at System.Net.TlsStream.EndWrite(IAsyncResult asyncResult)
at System.Net.ConnectStream.WriteHeadersCallback(IAsyncResult ar)
— End of inner exception stack trace —
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at Microsoft.Exchange.InfoWorker.Common.Availability.Proxy.RestService.EndGetUserPhoto(IAsyncResult asyncResult)
at Microsoft.Exchange.InfoWorker.Common.UserPhotos.UserPhotoApplication.EndProxyWebRequest(ProxyWebRequest proxyWebRequest, QueryList queryList, IService service, IAsyncResult asyncResult)
at Microsoft.Exchange.InfoWorker.Common.Availability.ProxyWebRequest.EndInvoke(IAsyncResult asyncResult)
at Microsoft.Exchange.InfoWorker.Common.Availability.AsyncWebRequest.EndInvokeWithErrorHandling()
— End of inner exception stack trace —
. Name of the server where exception originated: dagmember01. LID: 43532. Make sure that the Active Directory site/forest that contain the user’s mailbox has at least one local Exchange 2010 server running the Availability service. Turn up logging for the Availability service and test basic network connectivity.
I immediately started to search for a possible solution to this strange behaviour, but there was no working solution to be found anywhere(read through most of the posts on this event ID on the internet :)) All other services on the Exchange environment were working fine, there were no other error messages in the logs indicating something wrong, just this Event ID 4002 from time to time when people where trying to add someone to a meeting using Scheduling assistant.
After quite a while of research and asking a couple of colleagues, the solution suddenly appeared.

A colleague of mine asked me wether or not the customer used templates to create the VM’s. After checking this with the customer, we could confirm this. He then told me that he had been experiencing some strange similar behaviour on Exchange 2007 some years ago, and asked me to check if the servers had unique SID’s. I did so, and discovered that both the new Exchange 2016 servers had identical SID’s. The tool used was the PsGetSID from Microsoft Sysinternals.
Turned out that the servers were created from a template in VmWare and not sysprepped. After removing one of the servers and reinstalling it, everything started working fine again.
Bottom line is:
If your Exchange servers start acting weird and there doesn’t seem to be a logical explanation to the problem, check the server SID’s. They have to be unique, or obviously, strange things can start to happen in your environment. In my case, there was no obvious reason to the problems that suddenly started to appear and the server setup was made in good faith 🙂
This might bee a noob fault, but I can imagine that someone else but me would have experienced this or other strange problems with no logical explanation, so I think the tip would be useful in case everything else leads nowhere.
The weird part here is the servers functioning 100% ok for a couple of months before the problems started. I’ve never experienced this before, so for all I know that’s how Exchange handles this kind of misconfiguration?