SunSolve Internal

Infodoc ID   Synopsis   Date
14139   kernel tips: What causes a " recursive mutex_enter " panic ?   10 Apr 1997

Description Top

What causes a "recursive mutex_enter" panic?

This panic occurs whenever a thread which owns a mutex tries to allocate the
same mutex a second time.

A mutex is a lock used by the OS and by device drivers to gain exclusive
access to a piece of data.  Such a locking mechanism is manadatory in a
multi-threaded environment, in order to insure that a data structure is intact
before a thread begins to use it.  It safeguards against one thread using
a data structure while another is in the middle of modifying it.

It is considered a gross programming error to allocate a mutex to a thread
which already owns it.  Solaris proper has been thoroughly checked for this
sort of thing.  Thus third-party device drivers are often the cause of this
type of panic.

To verify the cause of the problem:

Examine the stack trace of the system coredump.  (Savecore must be enabled in
order to get a coredump.  Please see infodoc 6332 for 2.x or infodoc 11827 for
4.x on info regarding enabling savecore.)

Adb can be used to examine the coredump:

   # cd /var/crash/system_name	# Go to where the system corefiles are.
   # adb -k unix.0 vmcore.0	# Start adb.
   physmem 1e6b
   $c

A stack traceback will be displayed.  Here is a sample showing routines only:

   mutex_adaptive_enter
   mutex_enter
   mutex_enter_trace
   sio16_wput
   putnext
   idtermwput
   drain_syncq
   fill_syncq
   putnext
   qreply
   sio16_ioctl
   sio16_start
   sio16_txintr
   sio16_poll

One or more calls to routines with "mutex_enter" in their name will be
close to the top of the traceback.  These will be called from the routine
wanting to allocate the mutex (routine sio16_wput() in the example above).
The module containing the routine wanting to allocate the mutex is the
problem module, because it must already have already allocated the mutex in
order for the panic to occur.

By convention, drivers name their routines making the driver name the first
part of the routine name;  in the above example, sio16_wput() would be
expected to be in a module called sio16.  See what the module is, by doing a
"modinfo" command from the running system, looking at the text on the right
side of the output (to get a device abbreviated name), and attempting to match 
it to the names of the routines on the stack at the time of the crash.

If the problem turns out to be a third party driver problem, there is not much
that SunService can do to correct the problem, as it does not have the source
code for third-party drivers.  Please contact the manufacturer of the driver,
who can best fix their own driver.
Product Area Kernel
Product crash
OS Solaris 2.x
Hardware any

Top

SunWeb Home SunWeb Search SunSolve Home Simple Search

Sun Proprietary/Confidential: Internal Use Only
Feedback to SunSolve Team