Infodoc ID |
|
Synopsis |
|
Date |
12936 |
|
Troubleshooting system crashes |
|
5 Mar 1996 |
My system has just crashed - now what? One of the more important things
to determine when looking at crashes are what are the conditions
that cause the crash? The person troubleshooting a crashing system
should try to answer the following questions.
What were the most recent changes to the system?
How often does the system crash?
Are the system crashes related to any particular activity (e.g. time of day,
running a particular application, etc.)?
Is it possible to reproduce the crash on demand?
Does this occur on more than one machine?
Are all the crashes the same kind (see below)?
Does the system have the latest kernel jumbo patch?
Are there any errors or warnings indicated in the messages file?
(If there are, then these should be looked into as a possible root
cause of the crash.)
If this is a sun4d or sun4u architecture, are there any errors
indicated on the "prtdiag -v" output? (Again, this should be
investigated as a possible root cause of the crash.)
If this is a 4.x system and you aren't running the GENERIC kernel, do
the crashes continue when running the GENERIC kernel?
If this is the first time the system has crashed in an appreciable time,
then usually the best thing to do is make sure that savecore is enabled
and wait for another crash to occur. savecore(1M) is a program which
copies a system core file from the primary swap device (where it is
placed when the crash occurs) to the filesystem specified. For
instructions for enabling savecore see SRDB 4659 for 4.x systems and
INFODOC 6332 for 2.x systems.
Part of the reason for waiting for a second crash is that sometimes
crashes are flukes, caused by a set of bizarre circumstances
that won't occur again for many, many years. This kind of crash is
impossible to debug due to its lack of reproducibility.
Another reason for waiting for the second (or more) crash is that it's
helpful to know how often the crashes are occurring and if they are all of
the same type. If the crashes are getting more and more frequent and
are of varying types, you probably have a hardware problem.
TYPES OF CRASHES
----------------
There are many different types of crashes that can occur. The type of
crash can frequently be determined by looking at the messages file.
One type of crash is a BAD TRAP. Bad traps happen when the kernel takes
an unexpected trap. Things that can cause a trap are trying to access
unaligned memory, trying to access memory which is not currently mapped.
An example of messages from a bad trap follow:
Dec 21 03:36:49 mysun unix: BAD TRAP: type=7 rp=f0bbeb8c addr=0 mmu_fsr=0 rw=0
Dec 21 03:36:49 mysun unix: find: Memory address alignment
Dec 21 03:36:49 mysun unix: pid=916, pc=0xfc2550e4, sp=0xf0bbebd8,
psr=0x1f0000c0, context=1930
Dec 21 03:36:49 mysun unix: g1-g7: f004f51c, 8000000, f007702c, c0, fd7a1a68,
1, fcbaa020
Dec 21 03:36:49 mysun unix: panic: cross-call at high interrupt level
Dec 21 03:36:49 mysun unix: syncing file systems... 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 done
Dec 21 03:36:49 mysun unix: 14849 static and sysmap kernel pages
Dec 21 03:36:49 mysun unix: 197 dynamic kernel data pages
Dec 21 03:36:49 mysun unix: 144 kernel-pageable pages
Dec 21 03:36:49 mysun unix: 1 segkmap kernel pages
Dec 21 03:36:49 mysun unix: 0 segvn kernel pages
Dec 21 03:36:49 mysun unix: 153 current user process pages
Dec 21 03:36:49 mysun unix: 15344 total pages (15344 chunks)
Dec 21 03:36:49 mysun unix: dumping to vp fcb00734, offset 121920
In order to troubleshoot this kind of crash, it is necessary to get a stack
trace of the thread which caused the crash. This stack trace can then be
compared with traces found in bug reports to see if this is a known problem.
The following messages are also from a bad trap. These messages are from a
different machine architecture which gives more information. In this case,
the traceback is included in the messages and includes some symbols. It may
be possible with this information to find a matching bug without having to
look at core files.
Jan 18 14:18:04 postbox unix: BAD TRAP: cpu_id=4 type=9 <Data fault>
addr=11001c rw=1 rp=e1bf7b74
Jan 18 14:18:04 postbox unix: MMU sfsr=0x226: ft=<Invalid address error>
at=<supv data load> level=2
Jan 18 14:18:04 postbox unix: MMU sfsr=0x226<FAV>
Jan 18 14:18:04 postbox unix: cheapserv: Data fault
Jan 18 14:18:04 postbox unix: kernel read fault at addr=0x11001c, pte=0x2
Jan 18 14:18:04 postbox unix: MMU sfsr=0x226: ft=<Invalid address error>
at=<supv data load> level=2
Jan 18 14:18:04 postbox unix: MMU sfsr=0x226<FAV>
Jan 18 14:18:04 postbox unix: WR+0x0, pid=234, pc=0xe0043100, sp=0xe1bf7bc0,
psr=0x408010c0, context=465
Jan 18 14:18:04 postbox unix: g1-g7: 0, f56b0000, f5cb4000, 10, f5cde4e0, 1,
f5cb4a00
Jan 18 14:18:04 postbox unix: Begin traceback... sp = e1bf7bc0
Jan 18 14:18:04 postbox unix: qdetach+0xbc @ 0xe0082dac, fp=0xe1bf7c20
Jan 18 14:18:04 postbox unix: args=f6e17b00 3 f54bdf00 80000000 f5e98a00
f6566400
Jan 18 14:18:04 postbox unix: strclose+0x4e0 @ 0xe007ae14, fp=0xe1bf7c80
Jan 18 14:18:04 postbox unix: args=f6e17b00 1 3 f54bdf00 e0103890 f6e17b54
Jan 18 14:18:04 postbox unix: Sysbase+0x82c04 @ 0xf5482c04, fp=0xe1bf7cf8
Jan 18 14:18:04 postbox unix: args=e00d6e00 f6a94c9a f6d14210 f6e17b00
f6a94c60 f6d14258
Jan 18 14:18:04 postbox unix: Sysbase+0x83afc @ 0xf5483afc, fp=0xe1bf7d58
Jan 18 14:18:04 postbox unix: args=f63b8b84 3 f54bdf00 4 f6472304 276e55
Jan 18 14:18:04 postbox unix: closef+0x138 @ 0xe004b420, fp=0xe1bf7db8
Jan 18 14:18:04 postbox unix: args=f63b8b84 3 f6472374 f647237c f6472374
f54bdf00
Jan 18 14:18:04 postbox unix: closeall+0x58 @ 0xe004b2b8, fp=0xe1bf7e18
Jan 18 14:18:05 postbox unix: args=f681d400 1 f681d42c f681d40a f681d428
f681d420
Jan 18 14:18:05 postbox unix: exit+0x230 @ 0xe004a18c, fp=0xe1bf7e78
Jan 18 14:18:05 postbox unix: args=1 f5ca4594 f681d420 ffffffec 10 f681d400
Jan 18 14:18:05 postbox unix: syscall+0x6cc @ 0xe002d0dc, fp=0xe1bf7ed8
Jan 18 14:18:05 postbox unix: args=2 f f5cb4000 f5cb4a00 f5ca4014 f5cb9c08
Jan 18 14:18:05 postbox unix: .syscall+0xa4 @ 0xe0005da0, fp=0xe1bf7f58
Jan 18 14:18:05 postbox unix: args=e00e0048 f5cb44d4 0 f5cb44b0 e 0
Jan 18 14:18:05 postbox unix: (unknown)+0x138bc @ 0x138bc, fp=0xdffffc60
Jan 18 14:18:05 postbox unix: args=4 dffffcd4 200 dffffdb7 4 42bf0
Jan 18 14:18:05 postbox unix: End traceback...
Jan 18 14:18:05 postbox unix: panic[cpu4]/thread=0xf5cb4a00: Data fault
Jan 18 14:18:05 postbox unix: syncing file systems... 169 169 169 169 169 169 \
169 169 169 169 169 169 169 169 169 169 169 169 169 169 done
Jan 18 14:18:05 postbox unix: 13084 static and sysmap kernel pages
Jan 18 14:18:05 postbox unix: 254 dynamic kernel data pages
Jan 18 14:18:05 postbox unix: 404 kernel-pageable pages
Jan 18 14:18:05 postbox unix: 4 segkmap kernel pages
Jan 18 14:18:05 postbox unix: 0 segvn kernel pages
A second kind of crash occurs when the machine panics. Sometimes the panic
message is sufficient to point out the problem. For instance, the following
messages indicate that the machine panicked because of filesystem
corruption. The messages even specify which filesystem (/export/u2).
Dec 5 04:15:59 mysun unix: panic: free: freeing free frag, dev = 0x1bc6d61,
block = 12, cg = 359 fs = /export/u2
Dec 5 04:15:59 mysun unix: syncing file systems...panic: panic sync timeout
Dec 5 04:15:59 mysun unix: 6724 static and sysmap kernel pages
Dec 5 04:15:59 mysun unix: 185 dynamic kernel data pages
Dec 5 04:15:59 mysun unix: 399 kernel-pageable pages
Dec 5 04:15:59 mysun unix: 0 segkmap kernel pages
Dec 5 04:15:59 mysun unix: 0 segvn kernel pages
Dec 5 04:15:59 mysun unix: 0 current user process pages
Dec 5 04:15:59 mysun unix: 7308 total pages (7308 chunks)
Dec 5 04:15:59 mysun unix: dumping to vp fca48aec, offset 186208
If this happens, the filesystem in question should have fsck(1M) run by hand.
This normally clears the problem. If these kind of panics are happening
repeatedly, check to see if there is a problem with the disk.
This next group of messages indicates a panic of type zero. This panic is
caused by trying to execute location zero. Normally, this is caused by a
user or system administrator using "L1-A" (or Stop-A depending on your keyboard
labeling) followed by typing in sync at the ok prompt. The "L1-A" causes the
system to drop to the boot prom. The sync command at the boot prom prompt
causes the system to sync the filesystems (if possible) and then cause a panic
zero.
Dec 20 23:48:55 mysun unix: panic: zero
Dec 20 23:48:55 mysun unix: syncing file systems... H
More information about system hangs can be found in the infodoc 13039,
Troubleshooting System Hangs.
The following messages were actually read from the system core file
rather than the messages file. This is why they don't have any
timestamp on them. These messages are an example of a panic
message which is not sufficient to know what the problem is.
This message is telling us that some code has tried to acquire
a lock which it already owns. Without knowing which code did
this, we don't know what the fix is. To find out what code
did this, we must look at the stack trace of the thread which caused
the panic (see below).
panic[cpu1]/thread=0xf5e99380: recursive mutex_enter. mutex
f441401c caller e00b1eac
syncing file systems...panic[cpu1]/thread=0xe0629ec0: panic
sync timeout
66584 static and sysmap kernel pages
540 dynamic kernel data pages
1047 kernel-pageable pages
0 segkmap kernel pages
0 segvn kernel pages
0 current user process pages
68171 total pages (68171 chunks)
dumping to vp f2bb27b4, offset
Once the panic string has been determined, the words from this string
can be used to look in Sunsolve to see if there is a bug and related
to this string. Make sure not to include numeric information as
this information does not normally match between machines. If a bug
is found and the description seems to match the situation on the
machine that is crashing, enter the bugid into Sunsolve and search
for a patch that fixes the problem.
GETTING STACK TRACE INFORMATION
-------------------------------
If the panic string is not unique enough to indicate the cause of the crash,
it is normally necessary to get a trace of the stack of the process which
caused the crash. This is accomplished by running adb(1) against the system
core file generated at the time of the crash (assuming savecore was enabled
of course).
Unless someone has changed the savecore command in the /etc/init.d/sysetup
file, the core files are saved to the directory /var/crash/`uname -n`.
Placing a command in "`" (back quotes), causes the output of that command
to be used as part of the current command. The output of uname -n is the
name of the machine. So, the core files are typically in the directory
/var/crash/<machine name>. In my case, my machine is called squirt and
so core files get saved to /var/crash/squirt.
The adb command is as follows:
adb -k <unixfile> <corefile>
where
unixfile is unix.NUM for 2.x systems and vmunix.NUM for 4.x
systems. The NUM is a number, starting from 0, that
is used to create unique file names in the case of
multiple crashes.
corefile is core.NUM. The NUM is the same as for unixfile (and
must be the same as that specified for unixfile).
Note that by default, adb does not give a prompt. Just type commands in on
the line once adb has written the first line (physmem xxxxx).
There are three commands that can be used to look at a stack trace:
$c an adb miscellaneous command which dumps the stack backtrace
with one line per call
$<stackregs an adb macro (not available on 4.x) which dumps the stack in
frame format
$<stacktrace an adb macro which dumps the stack in frame format
Examples:
This is an example of using the $c command to get a stack trace. It is
printed one routine per line with the arguments passed to the routines in
parentheses (actually, these may not be the actual arguments but that
belongs to the realm of advanced core dump analysis).
$c
complete_panic(0xf00494d8,0xf0152a68,0xf0fd9b90,0x0,0x0,0x1) + 10c
do_panic(?) + 1c
vcmn_err(0xf0165c44,0xf0fd9d04,0xf0fd9d04,0x404000e2,0x40400ae2,0x3)
cmn_err(0x3,0xf0165c44,0xf02c429c,0x1,0xf0181a18,0xf016bf38) + 1c
page_unlock(0xf02c429c,0x0,0xe31e10ff,0x1,0xf0181a68,0x0) + 3c
segvn_lockop(0x0,0xef13d000,0x0,0xfc951624,0xf02c429c,0xfc96b624) + 370
as_ctl(0x0,0xfc92b9c0,0x0,0x5,0x0,0xfc619800) + ac
memcntl(0xfc50ecb4,0xfc619800,0xfffff000,0x545000,0xefffe000,0xfc619800) + 368
syscall(0xf0160fe8) + 3ec
The following example shows the same stack trace as the $c command above.
The stacktrace macro takes an address as input. In this case, we're specifying
using the contents of the sp register (stack pointer) as the starting point of
the trace.
The stacktrace macro prints each stack frame twice. The first time everything
is printed in hex, the second everything is printed symbolically (when the hex
number corresponds to a symbol). Note on the far left hand side that this is
the address of the stack frame being printed. The same address will appear
twice.
The first output of the macro is the name of the registers being printed.
Registers l0 through l7 (note that this is l as in local registers) are
used as scratch pads for the current routine. Knowing the value of these
is helpful when you have to go further than just looking at the stack
trace. Registers i0 through i5 contain the first five arguments passed
to the routine (unless they have been changed since the routine started).
Register i6 contains the stack frame pointer (where the previous stack
frame is located). Register i7 contains the return address.
Notice that the symbolic form of the stack frame does not always have
four columns (look at the second row of the first frame). This means
that you cannot look for register values positionally, you have to count.
Fortunately, the value you really care about is i7 and that is always the
last value in the frame. So below, the last routine called is do_panic,
which was called by cmn_err, which was called by page_unlock, etc.
<sp$<stacktrace
l0 l1 l2 l3
l4 l5 l6 l7
i0 i1 i2 i3
i4 i5 i6 i7
0xf0fd9b90: 0 fc8d4000 0 1
f0152400 fc63c200 fc63c224 f017dfb8
f00494d8 f0152a68 f0fd9b90 0
0 1 f0fd9bf8 f00490a4
0xf0fd9b90: 0 0xfc8d4000 0 nmap
dumpfile+0x50 kstat_devi+0x3b74 kstat_devi+0x3b98
panic_regs complete_panic+0x10c cpu
0xf0fd9b90 0 0 nmap
0xf0fd9bf8 do_panic+0x1c
(Note that in the stack frame above, the value of i0 is complete_panic+0x10c
not panic_regs. This is to point out that the formatting can cause
things to not align as shown in the layout at the top.)
0xf0fd9bf8: 404000c1 f0181c08 70 70
70 1fff 0 590c
f0165c44 f0fd9d04 f0fd9d04 404000e2
40400ae2 3 f0fd9c58 f00858d4
0xf0fd9bf8: 0x404000c1 ph_mutex LEDPATCNT+0x18 LEDPATCNT+0x18
LEDPATCNT+0x1 LEDPATCNT+0x1fa7 0
Syssize+0x1c0c 0xf0165c44 0xf0fd9d04 0xf0fd9d04
0x404000e2 0x40400ae2 ts_maxumdpri+1 0xf0fd9c58
cmn_err+0x1c
0xf0fd9c58: f017ccdc 1 0 0
0 ef13c000 0 fc92b9c8
3 f0165c44 f02c429c 1
f0181a18 f016bf38 f0fd9cb8 f00a253c
0xf0fd9c58: cpus nmap 0 0
0 0xef13c000 0 0xfc92b9c8
ts_maxumdpri+1 0xf0165c44 0xf02c429c nmap
pse_mutex sleepq_head+0x640 0xf0fd9cb8
page_unlock+0x3c
0xf0fd9cb8: fc674600 f017ccdc f0160d8c e0000000
e0000000 ffffffe0 fc5011a0 fc5011a5
f02c429c 0 e31e10ff 1
f0181a68 0 f0fd9d18 f00b6a40
0xf0fd9cb8: 0xfc674600 cpus cpr_info+0x3718 0xe0000000
0xe0000000 0xffffffe0 0xfc5011a0 0xfc5011a5
0xf02c429c 0 0xe31e10ff nmap
pse_mutex+0x50 0 0xf0fd9d18
segvn_lockop+0x370
0xf0fd9d18: fc5011b8 1 1 2bd
fc5011a8 fc958eb4 189000 624
0 ef13d000 0 fc951624
f02c429c fc96b624 f0fd9e10 f00dcf40
0xf0fd9d18: 0xfc5011b8 nmap nmap LEDPATCNT+0x265
0xfc5011a8 0xfc958eb4 0x189000 LEDPATCNT+0x5cc
0 0xef13d000 0 0xfc951624
0xf02c429c 0xfc96b624 0xf0fd9e10 as_ctl+0xac
0xf0fd9e10: fc717460 f01678c8 8000000 0
0 fc4fef00 134 fc71747c
0 fc92b9c0 0 5
0 fc619800 f0fd9e70 f009fe5c
0xf0fd9e10: 0xfc717460 0xf01678c8 0x8000000 0
0 0xfc4fef00 LEDPATCNT+0xdc 0xfc71747c
0 0xfc92b9c0 0
nfsdump_maxcount+1
0 0xfc619800 0xf0fd9e70 memcntl+0x368
0xf0fd9e70: 2000 2b 0 fc4fef00
3 0 fc717460 0
fc50ecb4 fc619800 fffff000 545000
efffe000 fc619800 f0fd9ed8 f0072054
0xf0fd9e70: LEDPATCNT+0x1fa8 LEDTICKS+0xa 0
0xfc4fef00 ts_maxumdpri+1 0 0xfc717460
0 0xfc50ecb4 0xfc619800 0xfffff000
0x545000 0xefffe000 0xfc619800 0xf0fd9ed8
syscall+0x3ec
0xf0fd9ed8: 1 0 f0156d04 0
fc962800 83 f0fd9fb4 fc50e800
f0160fe8 fc50ecd4 0 fc50ecb0
fffffffc ffffffff f0fd9f58 f0041aa0
0xf0fd9ed8: nmap 0 sysent+0x624 0
0xfc962800 LEDPATCNT+0x2b 0xf0fd9fb4 0xfc50e800
cpr_info+0x3974 0xfc50ecd4 0 0xfc50ecb0
-4 VADDR_MASK_DEBUG 0xf0fd9f58
_sys_rtt+0x4d8
0xf0fd9f58: 40400080 ef769da0 ef769da4 4
88 1 7 f0fd9f58
0 0 5 3
0 0 effff110 ef77d9b0
0xf0fd9f58: 0x40400080 0xef769da0 0xef769da4 au_auditstate
LEDPATCNT+0x30 nmap nfsdump_maxcount+3
0xf0fd9f58 0 0 nfsdump_maxcount+1
ts_maxumdpri+1 0 0 0xeffff110
0xef77d9b0
data address not found
This example (see below), again, shows the same stack trace as was shown
in the previous two examples. Like the stacktrace macro, the stackregs
macro takes an address as input. Once again, the sp register is used
as the starting point of the trace.
The stackregs macro prints each stack frame only once. All the values are
printed in hexadecimal except for the return address.
Notice that each stack frame is broken into two parts. The first part prints
the eight local registers (l0 - l7) and the second part prints the eight input
registers (i0 - i7).
Underneath each stack frame is the instruction located at the return address
found in i7. In the first stack frame, we note that the instruction at
do_panic+0x1c is "call complete_panic".
<sp$<stackregs
0xf0fd9b90: locals:
0 fc8d4000 0 1
f0152400 fc63c200 fc63c224 f017dfb8
0xf0fd9bb0: ins:
f00494d8 f0152a68 f0fd9b90 0
0 1 f0fd9bf8 do_panic+0x1c
do_panic+0x1c: call complete_panic
0xf0fd9bf8: locals:
404000c1 f0181c08 70 70
70 1fff 0 590c
0xf0fd9c18: ins:
f0165c44 f0fd9d04 f0fd9d04 404000e2
40400ae2 3 f0fd9c58 cmn_err+0x1c
cmn_err+0x1c: call vcmn_err
0xf0fd9c58: locals:
f017ccdc 1 0 0
0 ef13c000 0 fc92b9c8
0xf0fd9c78: ins:
3 f0165c44 f02c429c 1
f0181a18 f016bf38 f0fd9cb8 page_unlock+0x3c
page_unlock+0x3c: call cmn_err
0xf0fd9cb8: locals:
fc674600 f017ccdc f0160d8c e0000000
e0000000 ffffffe0 fc5011a0 fc5011a5
0xf0fd9cd8: ins:
f02c429c 0 e31e10ff 1
f0181a68 0 f0fd9d18 segvn_lockop+0x370
segvn_lockop+0x370: call page_unlock
0xf0fd9d18: locals:
fc5011b8 1 1 2bd
fc5011a8 fc958eb4 189000 624
0xf0fd9d38: ins:
0 ef13d000 0 fc951624
f02c429c fc96b624 f0fd9e10 as_ctl+0xac
as_ctl+0xac: jmpl %g1, %o7
0xf0fd9e10: locals:
fc717460 f01678c8 8000000 0
0 fc4fef00 134 fc71747c
0xf0fd9e30: ins:
0 fc92b9c0 0 5
0 fc619800 f0fd9e70 memcntl+0x368
memcntl+0x368: call as_ctl
0xf0fd9e70: locals:
2000 2b 0 fc4fef00
3 0 fc717460 0
0xf0fd9e90: ins:
fc50ecb4 fc619800 fffff000 545000
efffe000 fc619800 f0fd9ed8 syscall+0x3ec
syscall+0x3ec: jmpl %g1, %o7
0xf0fd9ed8: locals:
1 0 f0156d04 0
fc962800 83 f0fd9fb4 fc50e800
0xf0fd9ef8: ins:
f0160fe8 fc50ecd4 0 fc50ecb0
fffffffc ffffffff f0fd9f58 _sys_rtt+0x4d8
_sys_rtt+0x4d8: call syscall
0xf0fd9f58: locals:
40400080 ef769da0 ef769da4 4
88 1 7 f0fd9f58
0xf0fd9f78: ins:
0 0 5 3
0 0 effff110 0xef77d9b0
data address not found
LOOKING FOR MATCHING BUGS
-------------------------
Between the messages file and the stack trace, there should now be enough
information to look in SunSolve to see if the system crash is caused by
a known bug. If the crash is due to a panic, use the panic string to
search (you probably want to make sure it's taken as one string, not
individual words). If this brings up too many documents, add the OS
version number to narrow the search.
If the crash is due to an assertion failure, search the same as would
be done for a panic.
If the crash is due to a BAD TRAP, information from the stack trace
needs to be used. The problem is which information. The following
is the output of the $c command from a BAD TRAP core file.
$c
complete_panic(0xe00e8c00,0x1,0xe00d3400,0xf56b0000,0x4,0xe00e8c00) + d0
do_panic(?) + 20
vcmn_err(0xe00de850,0xe1bf7a4c,0xe1bf7a4c,0x17,0x17,0x3)
cmn_err(0x3,0xe00de850,0xe00fadd4,0x0,0x138bc,0xdffffc60) + 1c
die(0x9,0xe1bf7b74,0x11001c,0x226,0x1,0xe00de850) + e0
trap(0x9,0xe1bf7b74,0x11001c,0x226,0x1,0x0) + 534
This output is relatively typical and absolutely useless. All this output
tells us is that we got a trap and proceeded to crash. We already knew
that. What needs to be known is what routine was executing that led to
the trap. That information can be obtained by running the stackregs or
stacktrace macro.
<sp$<stackregs
0xe1bf78d8: locals:
0 1 0 d
1 e00d3000 f56b0000 1
0xe1bf78f8: ins:
e00e8c00 1 e00d3400 f56b0000
4 e00e8c00 e1bf7940 do_panic+0x20
do_panic+0x20: call complete_panic
0xe1bf7940: locals:
400010c4 e0017db4 e00164e8 44
44 44 7 e1bf78f0
0xe1bf7960: ins:
e00de850 e1bf7a4c e1bf7a4c 17
17 3 e1bf79a0 cmn_err+0x1c
cmn_err+0x1c: call vcmn_err
0xe1bf79a0: locals:
e00d3d98 408010c0 e00dec00 e00ded90
e00ded78 e00ded5c e1bf9 e1bf8
0xe1bf79c0: ins:
3 e00de850 e00fadd4 0
138bc dffffc60 e1bf7a00 die+0xe0
die+0xe0: call cmn_err
0xe1bf7a00: locals:
0 1e4218 f6d3b9c0 f6d3b954
e00d3d98 f6e17b54 456ab 4
0xe1bf7a20: ins:
9 e1bf7b74 11001c 226
1 e00de850 e1bf7a70 trap+0x534
trap+0x534: call die
0xe1bf7a70: locals:
1 f6a94c00 1 f5ca4000
0 e0103890 f5cb4000 0
0xe1bf7a90: ins:
9 e1bf7b74 11001c 226
1 0 e1bf7b18 fault+0x7c
fault+0x7c: call trap
0xe1bf7b18: locals:
408010c0 e0043100 e0043104 10
9 1 7 e1bf7b18
0xe1bf7b38: ins:
110000 432 0 1
974e f6dea700 e1bf7bc0 tcoo_close+0x44
tcoo_close+0x44: call WR
0xe1bf7bc0: locals:
f000 0 0 a
475210 1 e0102cfc 1
0xe1bf7be0: ins:
f6e17b00 3 f54bdf00 80000000
f5e98a00 f6566400 e1bf7c20 qdetach+0xbc
qdetach+0xbc: jmpl %l3, %o7
0xe1bf7c20: locals:
432 432 c0c58 f56b4e4c
94 f6d3b9a8 f6253400 e00d6e00
0xe1bf7c40: ins:
f6e17b00 1 3 f54bdf00
e0103890 f6e17b54 e1bf7c80 strclose+0x4e0
strclose+0x4e0: call qdetach
0xe1bf7c80: locals:
0 f6d14200 e0103908 f54bdf00
3 e00d6cc8 f6d14250 40000
0xe1bf7ca0: ins:
e00d6e00 f6a94c9a f6d14210 f6e17b00
f6a94c60 f6d14258 e1bf7cf8 device_close+0x64
device_close+0x64: call strclose
0xe1bf7cf8: locals:
f6d14200 f544718c e004b428 276e55
1 4 f6472304 0
0xe1bf7d18: ins:
f63b8b84 3 f54bdf00 4
f6472304 276e55 e1bf7d58 spec_close+0xd8
spec_close+0xd8: call device_close
0xe1bf7d58: locals:
f6d14200 e004b54c e004b2c0 80
4 276e55 0 f6472300
0xe1bf7d78: ins:
f63b8b84 3 f6472374 f647237c
f6472374 f54bdf00 e1bf7db8 closef+0x138
closef+0x138: jmpl %l3, %o7
0xe1bf7db8: locals:
0 10000 e009f1bc f5483a24
0 0 3 f63b8b84
0xe1bf7dd8: ins:
f681d400 1 f681d42c f681d40a
f681d428 f681d420 e1bf7e18 closeall+0x58
closeall+0x58: call closef
0xe1bf7e18: locals:
268 e00a73e0 0 0
4ba719 e00db604 f5ca4594 0
0xe1bf7e38: ins:
1 f5ca4594 f681d420 ffffffec
10 f681d400 e1bf7e78 exit+0x230
exit+0x230: call closeall
0xe1bf7e78: locals:
0 f5cb9c08 fffffeff f5ca4094
0 e00fb65c f5ca4000 f5ca4014
0xe1bf7e98: ins:
2 f f5cb4000 f5cb4a00
f5ca4014 f5cb9c08 e1bf7ed8 syscall+0x6cc
syscall+0x6cc: call psig
0xe1bf7ed8: locals:
1 0 e00d7798 4
0 3 e1bf7fb4 f5cb4000
0xe1bf7ef8: ins:
e00e0048 f5cb44d4 0 f5cb44b0
e 0 e1bf7f58 _sys_rtt+0x4d4
_sys_rtt+0x4d4: call syscall
0xe1bf7f58: locals:
40001081 df6db114 df6db118 8
88 2 7 e1bf7f58
0xe1bf7f78: ins:
4 dffffcd4 200 dffffdb7
4 42bf0 dffffc60 0x138bc
data address not found
Looking at the routines leading up to the fault, there is tcoo_close,
qdetach, strclose, etc. These are the search keys that should be placed in
SunSolve to try to find a matching bug.
If a matching bug is found, search for that bug number in SunSolve to see if
there is a patch.
If no matching bug is found, then the crash may represent a new bug. In this
case, read the next section.
INFORMATION NEEDED BY SUN
-------------------------
When SunService is asked to analyze a system crash, there are certain files
which are always asked for. These files differ slightly depending on
whether they are coming from a SunOS 4.x machine or a Solaris 2.X machine.
For Solaris 2.X:
The following files:
/var/crash/<system_name>/unix.X ( X = a number )
/var/crash/<system_name>/vmcore.X ( X = a number )
/var/adm/messages
/etc/system
output from the showrev -p command
output from the prtconf -vp command
And a brief history of the machine, including any recent changes,
and what the machine was doing at the time of the crash.
SunOS 4.1.X:
The following files:
/var/crash/<system_name>/vmunix.X ( X = a number )
/var/crash/<system_name>/vmcore.X ( X = a number )
/var/adm/messages
output from the devinfo -vp command
kernel configuration file and the
/usr/kvm/sys/`arch -k`/<config file name>/param.c file
And a brief history of the machine, including any recent changes,
a list of installed patches, and what the machine was doing at
the time of the crash.
Note that the core files may be in a directory other than
/var/crash/<system_name> if the administrator has changed the
savecore command in the /etc/rc.local or /etc/init.d/sysetup file.
Please remember to include your service order number with the tape.
SOLUTION SUMMARY:
Top
Sun Proprietary/Confidential: Internal Use Only