Objects continuously synchronizing after partition merge

(Last modified: 06Jan2005)

This document (10069194) is provided subject to the disclaimer at the end of this document.

fact

Novell Directory Services

Novell Directory Services 7

Novell Directory Services 8

Novell eDirectory 8.5 for All Platforms

Novell eDirectory 8.6 for All Platforms

symptom

Objects continuously synchronizing after partition merge

Objects between the different replicas are identical and do not need to be sync'ed

Objects that reside in the old child partition boundary sync over and over. In other words objects that exist in the partition that got merged into its parent partition.

The problem occurs regardless of what DS versions are involved (DS6 not tested).

cause

The cause is that the current versions of DS.NLM only updates the Transitive Vector of the parent partition if a replica number in the Transitive Vector (TV) of the child partition does not exist at the time of merging the two partitions.  If a replica number exists in both TVs the time stamp does not get updated regardless of the fact that the time stamp of the child partition's replica is newer than the parent's. In cases where the replica number in question is not active (does not exist as a replica) any more, but only exists in the TV for referencing time stamps, this time stamp will never get updated. This is  regardless of the fact that the partition now holds objects that have time stamps for that replica number which are newer than the entry in the TV. Due to the logic in DS these objects will sync over and over again as the synchronization algorithm compares the time stamps of the objects with the ones in the TV.

fix

The problem has been resolved in eDirectory 8.6.2 in the in patch EDIR862SP3B.EXE (or greater) 

After appling the patch, DSREPAIR -ANT should be run on all servers in the replica ring at the same time to generate the new TV enties.  This includes running it on subordinate reference replicas, since they hold a copy of the Transitive Vectors. If running DSREPAIR -ANT on all servers in the replica ring at the same time is not possible, then you can disable Directory Services Sync by "SET DSTRACE=!D" on everyone in the replica ring till the DSREPAIR's can be run one at a time. 

You may also use DSREPAIR "Repair timestamps and Declare a new epoch" for the partition in question, however this option should be used as a last resort as it can be destructive.  To repair timestamps and declare a new epoch- LOAD DSREPAIR -A on the server holding the master replica of the partition and select Advanced options menu | Replica and partition operations | select the partition| Repair time stamps and declare a new epoch.

note

Troubleshooting :

Enable the following DSTRACE filters on one of the servers and trace to a file.

SET DSTRACE=ON
SET DSTRACE=+S                       (Makes sure you only show Synchronization messages)
SET DSTRACE=+DETAIL
SET TTF=ON 
SET DSTRACE=*M20000000        (Sets the maximum trace file size to 20 million bytes)
SET DSTRACE=*R
SET DSTRACE=*H

Wait for a couple of sync cycles to complete to verify you will not be troubleshooting valid synchronizing objects and then SET TTF=OFF.

Open the DSTRACE.DBG file in an editor and search for one of the objects you see sync'ing all the time.  Below is an example:


3C90A679:226113976:d35933a0:209 skipping .NLS_LSP_NW51-DS7-2.Org.TREE., not in window
3C90A679:226113980:d35933a0:209 put value succeeded, .Back Link.[Attribute Definitions].[Schema Root] MTS [2002/03/14 10:57:15, 4, 22].
3C90A679:226113980:d35933a0:047 Sync - [00008090] <.O2User1.Org.TREE> [2002/03/14  9:46:06, 1, 1].


When reading the above log, "skipping .NLS_LSP_NW51-DS7-2.Org.TREE., not in window" on the first line means that the object does not require to be synchronized as its modification time stamp (MTS) is older than the entry in the Transitive Vector for the replica that the change was performed on.

In the DSTRACE we are looking for lines containing "put value succeeded, <attribute name in schema> MTS <modification time stamp>

The anatomy of a DSTRACE file with +DETAIL enabled is that the attributes being synchronized are listed before the object itself. In the example above we are sending a Backlink value of the object O2User1 with an MTS of  2002/03/14 10:57:15, 4, 22. The most interesting part of the MTS is the first number after the time stamp itself, which is the replica number of the server on which the change was stored - in this case 4.

Then load DSBROWSE.NLM select Partition Browse | Press <F3> on the partition we are dealing with|View Attributes. 
Press <Enter> 5 times on Transitive Vector (TV) and it will display the server's own TV, which in our case looks like :
Replica 00001:  3-15-2002   1:10:19 pm, Event 00003
Replica 00002:  3-15-2002   1:10:19 pm, Event 00002
Replica 00003:  3-13-2002  10:20:30 pm, Event 00059
Replica 00004:  1-11-2001   9:48:10 am, Event 00002
Replica 00005:  3-14-2002   9:47:44 am, Event 00001
Replica 00008:  3-14-2002  10:18:43 am, Event 00001

As you see replica number 4 has a time stamp of 1-11-2001 9:48:10 am, whereas the backlink value being synchronized on the object we investigated in the trace file (O2User1) has a timestamp of 2002/03/14 10:57:15, which is later than the entry in the TV for replica number 4.

About entries in the TV table for non-existing replicas.
When an object is created in eDirectory it gets a creation timestamp (CTS). The CTS (like any other timestamp in eDirectory) contains the absolute timestamp of the second of the event, the replica number of the server where the event occurred and finally the event number within the second on that particular server. For example a timestamp like 2002/03/14 10:57:15, 4, 22 shows that the event occurred at 10:57:15 on the 14th of March in 2002 on replica # 4 and that it was the 22nd event within that second on that server. The same counts for modification timestamps (MTS).
These timestamps must be unique, are crucial to the operations of eDirectory and many background processes within eDirectory rely on them. CTS's can not be changed as it would break things. For database integrity there must always exist an entry in the TV of a partition for any replica number that is represented in the CTS or MTS of any object in this partition.
So even though a replica is deleted at some point, its entry in the TV will stay if we have a CTS/MTS with that replica number in the partition.

 

.

document

Document Title: Objects continuously synchronizing after partition merge
Document ID: 10069194
Solution ID: NOVL74346
Creation Date: 15Mar2002
Modified Date: 06Jan2005
Novell Product Class:NetWare
Novell eDirectory

disclaimer

The Origin of this information may be internal or external to Novell. Novell makes all reasonable efforts to verify this information. However, the information provided in this document is for your information only. Novell makes no explicit or implied claims to the validity of this information.
Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.