SunSolve Internal

Infodoc ID   Synopsis   Date
2164   Crash dump analysis   25 Feb 1996

Description Top

This tutorial introduces Sun-3 and Sun-4 kernel dump analysis  to
support engineers.

What to expect
--------------
The reader should gain enough information from this and a  little
experimenting  to be able to sectionalize a crash dump to the ma-
jor subsystem that caused the crash, and to gather some  relevant
data  about  the  subsystem  at  the  time of the crash.  This is
clearly an art form that improves with  experience.   Frequently,
this data will be passed to others for more detailed analysis.




Checklist
---------
This is a checklist  to  insure  that  you  have  saved  all  the
relevant information.
1.      Prior to a crash:
        o uncomment the appropriate lines in /etc/rc.local.
        o Ensure that there is adequate primary swap device space to save
          a dump.

2.      When a crash occurs or is reported to you, determine:
        o       the machine architecture
        o       the kernel version
        o       the time of day
        o       what user(s) were logged on
 o       what windows were open
        o       what programs were running

Some dumps provide little or no useful information.   Frequently,
several dumps must be analyzed to provide enough clues to identi-
fy a particular bug.  Information found in onedump may be useful
in analyzing other dumps.


Where dumps come from
---------------------
Dumps are available as files on the disk that contain a  snapshot
of  the  memory  at the time the fatal error was detected.  These
files are generated automatically when  the  kernel  crashes  and
subsequently reboots.

Saving dumps is an option.  The option is enabled by uncommenting
the  appropriate  lines  in /etc/rc.local.  By default, dumps are
not saved.  Upon bootup, the /etc/rc.  local file  will  run  the
savecore(8) program to create the dump.  Note, savecore will only
create the dump if there is one to be saved and there are  suffi-
cient disk reserves.

In order for a dump to be created, the panic() routine is called.
This  may  be done by hardware trap handlers and at various other
places where a "fatal" condition is detected.  A panic  may  also
be  "forced"  by  the  user.  To force a dump, type TL1-AU to get
into the PROM monitor.  When you get  the  PROM  monitor  prompt,
type  g0 (g-zero).  On a 4C, type TnU at the OK prompt.  On a TTY
device type TbreakU.

A dump will only be created if the primary swap device  is  large
enough.  If a dump is not created, suspect serious hardware prob-
lems, such as power supply failures.  Dumps may  be  supplied  to
you  on  a  tape, or you may have direct or dial-up access to the
crashed machine.


How dumps are used
------------------
Dumps are analyzed with adb(1), an assembly  language  level  de-
bugger.   Other tools are available to support the primary use of
adb.

The ease of analyzing dumps varies  with  the  specific  problem.
The  stack  backtrace (the $C command) gives the most helpful in-
formation in pinpointing what the processor was  doing  when  the
problem  was detected.  If the stack has beenclobbered, adb will
give only a partial and sometimes  incorrect  stack  back  trace.
Examine the back trace to see if it makes sense.  Experience will
quickly give you a basis for judgment.  If  the  adb  stack  back
trace  is  unusable, you can still unwind the stack the hard way:
dump the stack in hexadecimal and piece it  together  by  careful
study  of  what  stack  frames look like.  Attempt this only as a
last resort.

In addition to the stack backtrace,  the  registers  are  useful.
When the panic() routine is called, it saves the registers.  How-
ever, the route to the panic() routine is often  long  enough  to
render this view of the registers useless.  The more useful image
of the registers at the time of  the  fault  is  often  found  in
frames on the stack.

Several key data structures, variables, and buffers are useful to
look at also.  These are described later on.

Ask yourself these questions:

o What routine was the processor in when things went sour?

o What routines were on the stack before we got to this routine?

Specific  header  files are located in the appropriate  directory
under sys.  For example, the Internet-related  header  files  are 
located under sys/netinet.

grep(1)
This command is useful for searching for text.   Whenused  with
R|S (pipe), it can act as a filter to look for keywords.

what(1)
The what command gives source code control  version  numbers  for
files used to build the kernel.

SOLUTION SUMMARY:
Patch ID n/a
Product Area Kernel
Product crash
OS Solaris 1.x
Release n/a
Hardware n/a

Top

SunWeb Home SunWeb Search SunSolve Home Simple Search

Sun Proprietary/Confidential: Internal Use Only