Infodoc ID |
|
Synopsis |
|
Date |
2164 |
|
Crash dump analysis |
|
25 Feb 1996 |
This tutorial introduces Sun-3 and Sun-4 kernel dump analysis to
support engineers.
What to expect
--------------
The reader should gain enough information from this and a little
experimenting to be able to sectionalize a crash dump to the ma-
jor subsystem that caused the crash, and to gather some relevant
data about the subsystem at the time of the crash. This is
clearly an art form that improves with experience. Frequently,
this data will be passed to others for more detailed analysis.
Checklist
---------
This is a checklist to insure that you have saved all the
relevant information.
1. Prior to a crash:
o uncomment the appropriate lines in /etc/rc.local.
o Ensure that there is adequate primary swap device space to save
a dump.
2. When a crash occurs or is reported to you, determine:
o the machine architecture
o the kernel version
o the time of day
o what user(s) were logged on
o what windows were open
o what programs were running
Some dumps provide little or no useful information. Frequently,
several dumps must be analyzed to provide enough clues to identi-
fy a particular bug. Information found in onedump may be useful
in analyzing other dumps.
Where dumps come from
---------------------
Dumps are available as files on the disk that contain a snapshot
of the memory at the time the fatal error was detected. These
files are generated automatically when the kernel crashes and
subsequently reboots.
Saving dumps is an option. The option is enabled by uncommenting
the appropriate lines in /etc/rc.local. By default, dumps are
not saved. Upon bootup, the /etc/rc. local file will run the
savecore(8) program to create the dump. Note, savecore will only
create the dump if there is one to be saved and there are suffi-
cient disk reserves.
In order for a dump to be created, the panic() routine is called.
This may be done by hardware trap handlers and at various other
places where a "fatal" condition is detected. A panic may also
be "forced" by the user. To force a dump, type TL1-AU to get
into the PROM monitor. When you get the PROM monitor prompt,
type g0 (g-zero). On a 4C, type TnU at the OK prompt. On a TTY
device type TbreakU.
A dump will only be created if the primary swap device is large
enough. If a dump is not created, suspect serious hardware prob-
lems, such as power supply failures. Dumps may be supplied to
you on a tape, or you may have direct or dial-up access to the
crashed machine.
How dumps are used
------------------
Dumps are analyzed with adb(1), an assembly language level de-
bugger. Other tools are available to support the primary use of
adb.
The ease of analyzing dumps varies with the specific problem.
The stack backtrace (the $C command) gives the most helpful in-
formation in pinpointing what the processor was doing when the
problem was detected. If the stack has beenclobbered, adb will
give only a partial and sometimes incorrect stack back trace.
Examine the back trace to see if it makes sense. Experience will
quickly give you a basis for judgment. If the adb stack back
trace is unusable, you can still unwind the stack the hard way:
dump the stack in hexadecimal and piece it together by careful
study of what stack frames look like. Attempt this only as a
last resort.
In addition to the stack backtrace, the registers are useful.
When the panic() routine is called, it saves the registers. How-
ever, the route to the panic() routine is often long enough to
render this view of the registers useless. The more useful image
of the registers at the time of the fault is often found in
frames on the stack.
Several key data structures, variables, and buffers are useful to
look at also. These are described later on.
Ask yourself these questions:
o What routine was the processor in when things went sour?
o What routines were on the stack before we got to this routine?
Specific header files are located in the appropriate directory
under sys. For example, the Internet-related header files are
located under sys/netinet.
grep(1)
This command is useful for searching for text. Whenused with
R|S (pipe), it can act as a filter to look for keywords.
what(1)
The what command gives source code control version numbers for
files used to build the kernel.
SOLUTION SUMMARY:
Top
Sun Proprietary/Confidential: Internal Use Only