SunSolve Internal

 

  Simple Search | Advanced Search | Product Search | Tips | Investigation Wizard

 Search for in

Printer Friendly Page ] [ E-mail this Document to Someone ]
Was this document useful? Yes or No ]

Jump to
Infodoc ID   Synopsis   Date
18254   How to avoid split brain syndrome and disk groups corruption?   16 Dec 1998

Description Top
How to avoid split brain syndrome and disk groups corruption?

* Latest version of FirstWatch provide the capability of a cleanup
routine that will rollback all the operations performed in the
case of an error condition during a failover operation.
I am not SURE that Qualix HA and QualixHA+ does not do this.  Based on
what I read about the outages regarding "partially completed" transitions,
this seems to be what is happening.

* Avoid the use of "vxdg import" with the "-C" option from FirstWatch scripts.
Reserve its use if and only if it is certain that a disk is not in use by
another host (such as because a disk group was not cleanly deported), then
the "-C" option can be used to clear the existing host ID on all disks
in the disk group as part of the import. A host ID can also be cleared
using "vxdisk clearimport".
"vxdg import" with the "-C" option should only be used by an experienced System

Administrator.

FirstWatch actually uses the -C option.  It is just very careful about
how/when it uses it.  Basically the user wants to make sure the other
system is DOWN.  If the disks have been cleanly deported, they will no
longer have the hostid lock on.

* Extremely care must be taken using "vxdg import" with the "-f" flag, since
it can cause the same disk group to be imported twice from disjoint sets of
disks, causing the disk group to become inconsistent.
Normally, a disk group will not be imported if some disks in the disk group
cannot be found by the local host. The "-f" option can be used to force
an import if, for example, one of the disks is currently unusable or
inaccessible.

The -f will force the import of a disk group, even if all the disks are not
present.  Or perhaps one has failed.  The -f option is not really that 
dangerous. 
FW scripts try it BEFORE they do a -C because it is less dangerous.  It is
also less likely to accomplish anything.  If while a diskgroup was
deported, one of the disks was removed, a normal vxdg import would fail.
However, using the -f option would allow the diskgroup to be imported with
one or more missing disks.  If the missing disk was somehow protected (like
with mirroring), the volume would be startable.  If it was not protected,
that volume would be unstartable, but others might be fine.

* As a sanity check before a disk group is in actuality imported a diskgroup
can be temporarily imported "vxdg import" with the "-t" and "-n" options.
Once this has been accomplished with utilities like "vxdg list" or
"vxdisk -s list" the host lock flag of the disk group to be imported can be
verified to make sure that is safe to be permanently imported.

In dual connected environments (running HA or not), it is usually recommended
the -t option be used for all imports.  This will not permanently write the
hostid onto the disks.  This prevents the autoimport that happens at boot time.

* Another sanity check before importing a disk group could be to compare
the output of "vxdg list" looking for the host lock flag and compare it with
the output of a previously executed "vxprint -ht" that will serve as a
baseline to make sure that is safe to import the disk group.

Another thing to do is try a vxdg list, or look at vxprint from the "other
machine".  If you are about to try a vxdg -C import dgname, make sure you
log into the other machine and check to see if that diskgroup shows up
anywhere in a vxprint.  If it does, DON'T DO IT.  In FW environments, it is
generally assumed that if the other machine is still up and running
the import with the -C is not done.
Product Area SunOS Unbundled
Product Veritas Volume Manager
OS Solaris 2.5.1
Hardware SPARCstorage Array

Top

SunWeb Home SunWeb Search SunSolve Home Simple Search

Sun Proprietary/Confidential: Internal Use Only
Feedback to SunSolve Team