Sitecore xConnect: Sync Token is no longer valid error - Part 2

Hi, I'm Sumit
This is my blog on Sitecore, .Net and Azure
Back in 2020, I encountered this nagging error "Sync token is no longer valid for [Contacts] table" locally and I blogged about it here. The solution mentioned in my earlier blog is for local development and is a rather simple one (delete table rows and restart the index worker service).
Recently, I bumped into this issue once more, but now in a Production environment. As the earlier solution was to delete table rows, it cannot be applied in this case. This gave me the opportunity (and the necessary pressure) to do more holistic research.
The Error message:
This is the error message that is logged into the Index worker service logs:
2023-10-16 00:00:09.203 +02:00 [Error]
The attempt to recover from previous failure has not been successful.
There will be another attempt.
Attempts count: 59591
Sitecore.Xdb.Collection.Data.SqlServer.Exceptions.SqlDataProviderException:
*** [xdb_collection.GetContactsChanges], Line 26. Errno 50000: Sync token
is no longer valid for [Contacts] table.
---> System.Data.SqlClient.SqlException:
*** [xdb_collection.GetContactsChanges], Line 26. Errno 50000: Sync token
is no longer valid for [Contacts] table.
at System.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__180_0
(Task`1 result)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.
InnerInvoke()
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
From the error message, we can see that the exception stems from SqlDataProviderException and that there's a reference to GetContactsChanges. To know more about this exception, I decided to decompile the Sitecore.Xdb.Collection.Data.SqlServer dll which could be found at the path C:\inetpub\wwwroot\[xconnect]\bin
After opening the Sitecore.Xdb.Collection.Data.SqlServer dll in dotpeek, I searched for GetContactsChanges and found it to be a SQL Stored Procedure as can be seen below:

The next obvious thing to do was to open SQL Server and go look for this Stored Procedure. I found the Stored Procedure in both the Shard Databases and also encountered the error message that is being logged

The Stored Procedure Analysis:
A brief analysis of what's going on in the Stored Procedure(SP):
The Stored Procedure uses the concept of Change Tracking in SQL Server.
It takes in two parameters:
@NumberOfChangeVersionsand@SyncVersionIt calculates the Minimum Valid Version (
@MinValidVersion) for theContactsTable using the SQL Server System function CHANGE_TRACKING_MIN_VALID_VERSIONThen it compares the calculated
@MinValidVersionwith the@SyncVersionvalue it received as a parameter. If@SyncVersionturns out to be less than@MinValidVersionthen the processing is aborted and we get the Sync Token no longer valid errorIf it manages to pass through this then it calculates the
@CurrentVersionusing the CHANGE_TRACKING_CURRENT_VERSION System function and again compares the value with the@SyncVersion. If the@CurrentVersionis less than@SyncVersionthen "Current sync version for the [Contacts] table is less than sync token." error is logged and again further processing is stopped.The
@UpBoundChangeVersionvariable is initialized based on the value of the@SyncVersionparameter. If@SyncVersionisNULL, it will be the@MinValidVersionplus@NumberOfChangeVersions. Otherwise, it'll be@SyncVersionplus@NumberOfChangeVersions.It then selects from the changes of
Contactstable during the range of versions spanned by@SyncVersionand@UpBoundChangeVersionand orders the result set bySYS_CHANGE_VERSION.It retrieves and returns the current change tracking version.
It commits the transaction if no error occurs.
In case of a run-time error, the transaction rolls back and it gathers error details and raises an error with a specific message format.
Solution and the explanation:
I found the solution to this problem in Robbert Hock's blog. As Robbert explains, there are multiple solutions and they need to be tried out one by one if the earlier one is not successful.
The first thing to try is to trigger a rebuild of the xdb and xdb_rebuild indexes by issuing the command: Sitecore.XConnectSearchIndexer.exe -rr and restart the index worker service. If this doesn't work, we must also try to rebuild the List Manager index as mentioned by João Neto in his blog. If this also fails, the last resort would be to manually delete the xdb-index-token from the Solr Dashboard.
The reasons why these solutions work (apart from the reasons mentioned by both Robbert and João) is that when we either trigger a rebuild of the index or delete the index token or for that matter, the concerned tables are dropped, the sync version value is reset, and thus we overcome the discrepancy between the calculated sync version and value passed as parameter to the Stored Procedure.
Credits:
https://www.kayee.nl/2021/09/02/sitecore-xdb-index-token-issue-analysis-from-a-different-angle/
https://getsitecore.wordpress.com/2020/09/22/sitecore-xconnect-sync-token-is-no-longer-valid-error/




