Steps to fix an inconsistent replica ring

  • 7004722
  • 22-Oct-2009
  • 24-Apr-2017

Environment

Novell Operating Systems
Novell eDirectory
Novell Directory Services

Situation

Inconsistencies with number of objects on replicas on servers
Error "-672" seen in Report Synchronization Status Check in DSREPAIR or seen DSTRACE
Error "-626" seen in Report Synchronization Status Check in DSREPAIR or seen DSTRACE
Renamed objects (#_#) in a partition from object collisions due to Inconsistencies in the replicas on servers in the replica ring.
Server removed from replica ring.
Steps to fix an inconsistent replica ring
 

 

Resolution

This process will help to ensure that all servers in your replica ring hold the same objects.
1.  Using NDS Manager, ConsoleOne, iManager or DSREPAIR, write down all servers in the replica ring.  
 
NetWare
Load DSREPAIR | Advanced Options | Replica and partition operations | <Partition in question> | View Replica ring.

 
Linux
ndsrepair -P | <# for Partition in question | 10. View Replica ring.
 

2.  Run a simultaneous repair with a -p switch on all servers in the replica ring.  

Using your list, rconsole to each server and start a repair with the following below for the given platform.  Once the repair is started on the first server in the replica ring, go to the next server and start the repair.  Once repair is running on all servers in the replica ring, you can go back to the first server start saving the repaired the databases and exit repair.

NetWare
Load DSREPAIR -P | Advanced Options | Repair local database | Use default options  except add Lock Local Database and start the repair.  
 
Linux
ndsrepair -R -l yes
 
NOTE:  This process sets any Unknown objects in the replica ring to a Reference state.  In a reference state, these objects will be overwritten if a valid object is sent to that server and they will not synchronize the unknown object out to other servers in the replica ring.

3.  On each server in the replica ring preform a "Send all object to every replica in the ring." from each server in the ring, one at a time.

Using your list, rconsole to each server and start a the Send all process with the following:  
 
NetWare
Load DSREPAIR | Advanced options | Replica and partition operations | <Partition in question> | View Replica ring | <Select the server you are rconsoled into> | Send all object to every replica in the ring
 
Linux
ndsrepair -P | <# for Partition in question | 10. View Replica ring | <# for Replica in your list> | 3. Send all objects to every replica in the ring.
 
 
NOTE:  Send all object to every replica in the ring does exactly what it says, it sends all objects on that servers replica to every other server in the replica ring.  The receiving servers will either a) discard the received information because it already has the object, b) add the object in the receiving servers database because it did not previously have it, c) Overwrite an unknown object with a valid object it just received.    This process will generate a LOT OF TRAFFIC.  So it is advisable that you preform the SEND ALL operation after-hours, and wait for it to complete on each server before starting it on the next server.  You can get an indication that it has completed on the server by watching it in DSTRACE or NDSTRACE  on the server from which you select to send objects out.  
 
NetWare
SET DSTRACE=OFF, SET DSTRACE=+S, SET DSTRACE=*h toggle into the Directory Services trace screen and you should see all the objects being sent out from that server.  
 
Linux
ndstrace, (At the ndstrace prompt) set ndstrace = nodebug, set ndstrace = sklk, set ndstrace=*h, You should see the objects being sent out from the server.
 
Once it gets done sending all the objects out you will start seeing an ALL PROCESSED=YES message for the partition.

3.  The final process would be to View the objects in the partition and delete any unknown, renamed or other objects in the partition that should not be there.

This process will help prevent renamed objects (1_1, #_#) in that partition and help to ensure that all servers in the replica ring have exactly the same information in their replica of the partition.
 
 
Final Note:
This operation will add objects to servers missing objects,  however if the values for attributes are different on each server, then it may not send them out to each server.   Verify the information in question is correct for each server using iMonitor.  If the server is still missing attribute values, then you have the following options.
 
1.  Using iMonitor with Advanced Mode enabled (click on NDS iMonitor in the upper left hand corner to enable advanced mode).   On the server with the object that HAS all the attributes, or most of them, find the object and under Advanced Options, select Timestamp Entry.  This will timestamp the object and all attributes and send them out.   You will then have to repeat the operation for the object on other replicas if they have attributes this server did not have.
 
2.  Remove the replica off the server missing attribute information, rebacklink the server and wait for it to complete and add it back.
 
Rebacklink process
 
NetWare
SET DSTRACE = OFF, SET DSTRACE=+BLINK, SET DSTRACE=*B, then toggle to the Directory Services trace screen and wait for the backlink process to complete.   It will give you a message that it checked xxx backlinks.
 
Linux
ndstrace, (At the ndstrace prompt) set ndstrace = nodebug, set ndstrace = bldt, set ndstrace=*b, then wait for the backlink process to complete.   It will give you a message that it checked xxx backlinks.

Additional Information

These problems are caused by some servers holding some but not all of the objects in the replica rings.  i.e.  one server holds an object but other servers in the ring do not.

The Numbers actually mean something, where the first digit (#_ ) is instance of the rename, and the second digit ( _#) is the replica number doing the rename.   So if you look at replica that created the rename, it should hold the"duplicate" user object as well.
Formerly known as TID# 10054647
 
!!!!!WARNING!!!!!
We live in an environment with virtual servers.   Restoring images on virtual servers to a previous time frame WILL CAUSE MAJOR PROBLEMS IN YOUR REPLICA RINGS.   If an image is restored on a single server tree, then there are no material problems.   However, if you have several servers in the tree and restore a virtual server image, eDirectory WILL NOT know this and will NEVER try to fix the old information on the server you restored.   Once eDirectory sends out a change, and confirms the receiving server has the information, it will NEVER try to send the information out to the receiving server again, until it changes again (the attribute timestamp is updated).   
 
Use EXTREME CAUTION if you restore virtual servers to a prior state.   Novell DOES NOT recommend restoring virtual servers to prior states, but if you do, you MUST remove all replicas from the server IMMEDIATELY, rebacklink the server and then add the replicas back on the server you want.   This process has to typically be done manually with the xk2 process, which will not be covered in this document.