SunSolve Internal

Infodoc ID   Synopsis   Date
12936   Troubleshooting system crashes   5 Mar 1996

Description Top

My system has just crashed - now what?  One of the more important things
to determine when looking at crashes are what are the conditions
that cause the crash?  The person troubleshooting a crashing system
should try to answer the following questions.

What were the most recent changes to the system?

How often does the system crash?

Are the system crashes related to any particular activity (e.g. time of day,
running a particular application, etc.)?

Is it possible to reproduce the crash on demand?

Does this occur on more than one machine?

Are all the crashes the same kind (see below)?

Does the system have the latest kernel jumbo patch?

Are there any errors or warnings indicated in the messages file?
(If there are, then these should be looked into as a possible root
cause of the crash.)

If this is a sun4d or sun4u architecture, are there any errors
indicated on the "prtdiag -v" output?  (Again, this should be
investigated as a possible root cause of the crash.)

If this is a 4.x system and you aren't running the GENERIC kernel, do
the crashes continue when running the GENERIC kernel?


If this is the first time the system has crashed in an appreciable time,
then usually the best thing to do is make sure that savecore is enabled
and wait for another crash to occur.  savecore(1M) is a program which 
copies a system core file from the primary swap device (where it is 
placed when the crash occurs) to the filesystem specified.  For 
instructions for enabling savecore see SRDB 4659 for 4.x systems and 
INFODOC 6332 for 2.x systems.

Part of the reason for waiting for a second crash is that sometimes 
crashes are flukes, caused by a set of bizarre circumstances
that won't occur again for many, many years.  This kind of crash is
impossible to debug due to its lack of reproducibility.

Another reason for waiting for the second (or more) crash is that it's
helpful to know how often the crashes are occurring and if they are all of
the same type.  If the crashes are getting more and more frequent and
are of varying types, you probably have a hardware problem.


TYPES OF CRASHES
----------------
There are many different types of crashes that can occur.  The type of
crash can frequently be determined by looking at the messages file.

One type of crash is a BAD TRAP.  Bad traps happen when the kernel takes
an unexpected trap.  Things that can cause a trap are trying to access
unaligned memory, trying to access memory which is not currently mapped.
An example of messages from a bad trap follow:

Dec 21 03:36:49 mysun unix: BAD TRAP: type=7 rp=f0bbeb8c addr=0 mmu_fsr=0 rw=0
Dec 21 03:36:49 mysun unix: find: Memory address alignment
Dec 21 03:36:49 mysun unix: pid=916, pc=0xfc2550e4, sp=0xf0bbebd8, 
                                psr=0x1f0000c0, context=1930
Dec 21 03:36:49 mysun unix: g1-g7: f004f51c, 8000000, f007702c, c0, fd7a1a68, 
                                1, fcbaa020
Dec 21 03:36:49 mysun unix: panic: cross-call at high interrupt level
Dec 21 03:36:49 mysun unix: syncing file systems... 3 3 3 3 3 3 3 3 3 3 3 3 3 
                                3 3 3 3 3 3 3 done
Dec 21 03:36:49 mysun unix: 14849 static and sysmap kernel pages
Dec 21 03:36:49 mysun unix:   197 dynamic kernel data pages
Dec 21 03:36:49 mysun unix:   144 kernel-pageable pages
Dec 21 03:36:49 mysun unix:     1 segkmap kernel pages
Dec 21 03:36:49 mysun unix:     0 segvn kernel pages
Dec 21 03:36:49 mysun unix:   153 current user process pages
Dec 21 03:36:49 mysun unix: 15344 total pages (15344 chunks)
Dec 21 03:36:49 mysun unix: dumping to vp fcb00734, offset 121920

In order to troubleshoot this kind of crash, it is necessary to get a stack 
trace of the thread which caused the crash.  This stack trace can then be 
compared with traces found in bug reports to see if this is a known problem.

The following messages are also from a bad trap.  These messages are from a 
different machine architecture which gives more information.  In this case, 
the traceback is included in the messages and includes some symbols.  It may 
be possible with this information to find a matching bug without having to 
look at core files.

Jan 18 14:18:04 postbox unix: BAD TRAP: cpu_id=4 type=9 <Data fault> 
                                addr=11001c rw=1 rp=e1bf7b74
Jan 18 14:18:04 postbox unix: MMU sfsr=0x226: ft=<Invalid address error> 
        at=<supv data load> level=2
Jan 18 14:18:04 postbox unix: MMU sfsr=0x226<FAV>
Jan 18 14:18:04 postbox unix: cheapserv: Data fault
Jan 18 14:18:04 postbox unix: kernel read fault at addr=0x11001c, pte=0x2
Jan 18 14:18:04 postbox unix: MMU sfsr=0x226: ft=<Invalid address error> 
        at=<supv data load> level=2
Jan 18 14:18:04 postbox unix: MMU sfsr=0x226<FAV>
Jan 18 14:18:04 postbox unix: WR+0x0, pid=234, pc=0xe0043100, sp=0xe1bf7bc0, 
        psr=0x408010c0, context=465
Jan 18 14:18:04 postbox unix: g1-g7: 0, f56b0000, f5cb4000, 10, f5cde4e0, 1, 
                                f5cb4a00
Jan 18 14:18:04 postbox unix: Begin traceback... sp = e1bf7bc0
Jan 18 14:18:04 postbox unix: qdetach+0xbc @ 0xe0082dac, fp=0xe1bf7c20
Jan 18 14:18:04 postbox unix:  args=f6e17b00 3 f54bdf00 80000000 f5e98a00 
                                f6566400
Jan 18 14:18:04 postbox unix: strclose+0x4e0 @ 0xe007ae14, fp=0xe1bf7c80
Jan 18 14:18:04 postbox unix:  args=f6e17b00 1 3 f54bdf00 e0103890 f6e17b54
Jan 18 14:18:04 postbox unix: Sysbase+0x82c04 @ 0xf5482c04, fp=0xe1bf7cf8
Jan 18 14:18:04 postbox unix:  args=e00d6e00 f6a94c9a f6d14210 f6e17b00 
                                f6a94c60 f6d14258
Jan 18 14:18:04 postbox unix: Sysbase+0x83afc @ 0xf5483afc, fp=0xe1bf7d58
Jan 18 14:18:04 postbox unix:  args=f63b8b84 3 f54bdf00 4 f6472304 276e55
Jan 18 14:18:04 postbox unix: closef+0x138 @ 0xe004b420, fp=0xe1bf7db8
Jan 18 14:18:04 postbox unix:  args=f63b8b84 3 f6472374 f647237c f6472374 
                                f54bdf00
Jan 18 14:18:04 postbox unix: closeall+0x58 @ 0xe004b2b8, fp=0xe1bf7e18
Jan 18 14:18:05 postbox unix:  args=f681d400 1 f681d42c f681d40a f681d428 
                                f681d420
Jan 18 14:18:05 postbox unix: exit+0x230 @ 0xe004a18c, fp=0xe1bf7e78
Jan 18 14:18:05 postbox unix:  args=1 f5ca4594 f681d420 ffffffec 10 f681d400
Jan 18 14:18:05 postbox unix: syscall+0x6cc @ 0xe002d0dc, fp=0xe1bf7ed8
Jan 18 14:18:05 postbox unix:  args=2 f f5cb4000 f5cb4a00 f5ca4014 f5cb9c08
Jan 18 14:18:05 postbox unix: .syscall+0xa4 @ 0xe0005da0, fp=0xe1bf7f58
Jan 18 14:18:05 postbox unix:  args=e00e0048 f5cb44d4 0 f5cb44b0 e 0
Jan 18 14:18:05 postbox unix: (unknown)+0x138bc @ 0x138bc, fp=0xdffffc60
Jan 18 14:18:05 postbox unix:  args=4 dffffcd4 200 dffffdb7 4 42bf0
Jan 18 14:18:05 postbox unix: End traceback...
Jan 18 14:18:05 postbox unix: panic[cpu4]/thread=0xf5cb4a00: Data fault
Jan 18 14:18:05 postbox unix: syncing file systems... 169 169 169 169 169 169 \

   169 169 169 169 169 169 169 169 169 169 169 169 169 169 done
Jan 18 14:18:05 postbox unix: 13084 static and sysmap kernel pages
Jan 18 14:18:05 postbox unix:   254 dynamic kernel data pages
Jan 18 14:18:05 postbox unix:   404 kernel-pageable pages
Jan 18 14:18:05 postbox unix:     4 segkmap kernel pages
Jan 18 14:18:05 postbox unix:     0 segvn kernel pages


A second kind of crash occurs when the machine panics.   Sometimes the panic 
message is sufficient to point out the problem.  For instance, the following 
messages indicate that the machine panicked because of filesystem 
corruption.  The messages even specify which filesystem (/export/u2).

Dec  5 04:15:59 mysun unix: panic: free: freeing free frag, dev = 0x1bc6d61, 
block = 12, cg = 359 fs = /export/u2
Dec  5 04:15:59 mysun unix: syncing file systems...panic: panic sync timeout
Dec  5 04:15:59 mysun unix:  6724 static and sysmap kernel pages
Dec  5 04:15:59 mysun unix:   185 dynamic kernel data pages
Dec  5 04:15:59 mysun unix:   399 kernel-pageable pages
Dec  5 04:15:59 mysun unix:     0 segkmap kernel pages
Dec  5 04:15:59 mysun unix:     0 segvn kernel pages
Dec  5 04:15:59 mysun unix:     0 current user process pages
Dec  5 04:15:59 mysun unix:  7308 total pages (7308 chunks)
Dec  5 04:15:59 mysun unix: dumping to vp fca48aec, offset 186208

If this happens, the filesystem in question should have fsck(1M) run by hand.
This normally clears the problem.  If these kind of panics are happening
repeatedly, check to see if there is a problem with the disk.

This next group of messages indicates a panic of type zero.  This panic is 
caused by trying to execute location zero.  Normally, this is caused by a 
user or system administrator using "L1-A" (or Stop-A depending on your keyboard

labeling) followed by typing in sync at the ok prompt.  The "L1-A" causes the 
system to drop to the boot prom.  The sync command at the boot prom prompt 
causes the system to sync the filesystems (if possible) and then cause a panic 
zero.

Dec 20 23:48:55 mysun unix: panic: zero
Dec 20 23:48:55 mysun unix: syncing file systems...      H

More information about system hangs can be found in the infodoc 13039, 
Troubleshooting System Hangs.


The following messages were actually read from the system core file
rather than the messages file.  This is why they don't have any
timestamp on them.  These messages are an example of a panic
message which is not sufficient to know what the problem is.
This message is telling us that some code has tried to acquire
a lock which it already owns.  Without knowing which code did
this, we don't know what the fix is.  To find out what code
did this, we must look at the stack trace of the thread which caused
the panic (see below).

                panic[cpu1]/thread=0xf5e99380: recursive mutex_enter. mutex
                                                f441401c caller e00b1eac
                syncing file systems...panic[cpu1]/thread=0xe0629ec0: panic
                                                sync timeout
                66584 static and sysmap kernel pages
                  540 dynamic kernel data pages
                 1047 kernel-pageable pages
                    0 segkmap kernel pages
                    0 segvn kernel pages
                    0 current user process pages
                68171 total pages (68171 chunks)
                dumping to vp f2bb27b4, offset


Once the panic string has been determined, the words from this string
can be used to look in Sunsolve to see if there is a bug and  related 
to this string.  Make sure not to include numeric information as
this information does not normally match between machines.  If a bug
is found and the description seems to match the situation on the
machine that is crashing, enter the bugid into Sunsolve and search
for a patch that fixes the problem.


GETTING STACK TRACE INFORMATION
-------------------------------

If the panic string is not unique enough to indicate the cause of the crash,
it is normally necessary to get a trace of the stack of the process which
caused the crash.  This is accomplished by running adb(1) against the system
core file generated at the time of the crash (assuming savecore was enabled
of course).

Unless someone has changed the savecore command in the /etc/init.d/sysetup
file, the core files are saved to the directory /var/crash/`uname -n`.
Placing a command in "`" (back quotes), causes the output of that command
to be used as part of the current command.  The output of uname -n is the
name of the machine.  So, the core files are typically in the directory
/var/crash/<machine name>.  In my case, my machine is called squirt and
so core files get saved to /var/crash/squirt.

The adb command is as follows:

adb -k <unixfile> <corefile>

where

unixfile        is unix.NUM for 2.x systems and vmunix.NUM for 4.x
                systems.  The NUM is a number, starting from 0, that
                is used to create unique file names in the case of
                multiple crashes.

corefile        is core.NUM.  The NUM is the same as for unixfile (and
                must be the same as that specified for unixfile).

Note that by default, adb does not give a prompt.  Just type commands in on
the line once adb has written the first line (physmem xxxxx).

There are three commands that can be used to look at a stack trace:

$c              an adb miscellaneous command which dumps the stack backtrace
                with one line per call

$<stackregs     an adb macro (not available on 4.x) which dumps the stack in
                frame format

$<stacktrace    an adb macro which dumps the stack in frame format


Examples:

This is an example of using the $c command to get a stack trace.  It is
printed one routine per line with the arguments passed to the routines in
parentheses (actually, these may not be the actual arguments but that
belongs to the realm of advanced core dump analysis).

$c
complete_panic(0xf00494d8,0xf0152a68,0xf0fd9b90,0x0,0x0,0x1) + 10c
do_panic(?) + 1c
vcmn_err(0xf0165c44,0xf0fd9d04,0xf0fd9d04,0x404000e2,0x40400ae2,0x3)
cmn_err(0x3,0xf0165c44,0xf02c429c,0x1,0xf0181a18,0xf016bf38) + 1c
page_unlock(0xf02c429c,0x0,0xe31e10ff,0x1,0xf0181a68,0x0) + 3c
segvn_lockop(0x0,0xef13d000,0x0,0xfc951624,0xf02c429c,0xfc96b624) + 370
as_ctl(0x0,0xfc92b9c0,0x0,0x5,0x0,0xfc619800) +        ac
memcntl(0xfc50ecb4,0xfc619800,0xfffff000,0x545000,0xefffe000,0xfc619800) + 368
syscall(0xf0160fe8) + 3ec


The following example shows the same stack trace as the $c command above.  
The stacktrace macro takes an address as input.  In this case, we're specifying

using the contents of the sp register (stack pointer) as the starting point of 
the trace.

The stacktrace macro prints each stack frame twice.  The first time everything 
is printed in hex, the second everything is printed symbolically (when the hex
number corresponds to a symbol).  Note on the far left hand side that this is
the address of the stack frame being printed.  The same address will appear
twice.

The first output of the macro is the name of the registers being printed.
Registers l0 through l7 (note that this is l as in local registers) are 
used as scratch pads for the current routine.  Knowing the value of these 
is helpful when you have to go further than just looking at the stack 
trace.  Registers i0 through i5 contain the first five arguments passed 
to the routine (unless they have been changed since the routine started).  
Register i6 contains the stack frame pointer (where the previous stack 
frame is located).  Register i7 contains the return address.

Notice that the symbolic form of the stack frame does not always have
four columns (look at the second row of the first frame).  This means
that you cannot look for register values positionally, you have to count.

Fortunately, the value you really care about is i7 and that is always the
last value in the frame.  So below, the last routine called is do_panic,
which was called by cmn_err, which was called by page_unlock, etc.

<sp$<stacktrace
                l0              l1              l2              l3
                l4              l5              l6              l7
                i0              i1              i2              i3
                i4              i5              i6              i7

0xf0fd9b90:     0               fc8d4000        0               1
                f0152400        fc63c200        fc63c224        f017dfb8
                f00494d8        f0152a68        f0fd9b90        0
                0               1               f0fd9bf8        f00490a4

0xf0fd9b90:     0               0xfc8d4000      0             nmap
                dumpfile+0x50   kstat_devi+0x3b74             kstat_devi+0x3b98
                panic_regs      complete_panic+0x10c          cpu
                0xf0fd9b90      0               0             nmap
                0xf0fd9bf8      do_panic+0x1c


(Note that in the stack frame above, the value of i0 is complete_panic+0x10c
not panic_regs.  This is to point out that the formatting can cause
things to not align as shown in the layout at the top.)

0xf0fd9bf8:     404000c1        f0181c08        70              70
                70              1fff            0               590c
                f0165c44        f0fd9d04        f0fd9d04        404000e2
                40400ae2        3               f0fd9c58        f00858d4

0xf0fd9bf8:     0x404000c1      ph_mutex        LEDPATCNT+0x18  LEDPATCNT+0x18
                LEDPATCNT+0x1   LEDPATCNT+0x1fa7                0
                Syssize+0x1c0c  0xf0165c44      0xf0fd9d04      0xf0fd9d04
                0x404000e2      0x40400ae2      ts_maxumdpri+1  0xf0fd9c58
                cmn_err+0x1c

0xf0fd9c58:     f017ccdc        1               0               0
                0               ef13c000        0               fc92b9c8
                3               f0165c44        f02c429c        1
                f0181a18        f016bf38        f0fd9cb8        f00a253c

0xf0fd9c58:     cpus            nmap            0               0
                0               0xef13c000      0                0xfc92b9c8
                ts_maxumdpri+1  0xf0165c44      0xf02c429c       nmap
                pse_mutex       sleepq_head+0x640                0xf0fd9cb8
                page_unlock+0x3c

0xf0fd9cb8:     fc674600        f017ccdc        f0160d8c        e0000000
                e0000000        ffffffe0        fc5011a0        fc5011a5
                f02c429c        0               e31e10ff        1
                f0181a68        0               f0fd9d18        f00b6a40

0xf0fd9cb8:     0xfc674600      cpus          cpr_info+0x3718 0xe0000000
                0xe0000000      0xffffffe0    0xfc5011a0      0xfc5011a5
                0xf02c429c      0             0xe31e10ff      nmap
                pse_mutex+0x50  0             0xf0fd9d18     
segvn_lockop+0x370


0xf0fd9d18:     fc5011b8        1               1               2bd
                fc5011a8        fc958eb4        189000          624
                0               ef13d000        0               fc951624
                f02c429c        fc96b624        f0fd9e10        f00dcf40

0xf0fd9d18:     0xfc5011b8      nmap            nmap            LEDPATCNT+0x265
                0xfc5011a8      0xfc958eb4      0x189000        LEDPATCNT+0x5cc
                0               0xef13d000      0               0xfc951624
                0xf02c429c      0xfc96b624      0xf0fd9e10      as_ctl+0xac


0xf0fd9e10:     fc717460        f01678c8        8000000         0
                0               fc4fef00        134             fc71747c
                0               fc92b9c0        0               5
                0               fc619800        f0fd9e70        f009fe5c

0xf0fd9e10:     0xfc717460      0xf01678c8     0x8000000      0
                0               0xfc4fef00     LEDPATCNT+0xdc 0xfc71747c
                0               0xfc92b9c0     0             
nfsdump_maxcount+1
                0               0xfc619800     0xf0fd9e70     memcntl+0x368


0xf0fd9e70:     2000            2b              0               fc4fef00
                3               0               fc717460        0
                fc50ecb4        fc619800        fffff000        545000
                efffe000        fc619800        f0fd9ed8        f0072054

0xf0fd9e70:     LEDPATCNT+0x1fa8                LEDTICKS+0xa    0
                0xfc4fef00      ts_maxumdpri+1  0               0xfc717460
                0               0xfc50ecb4      0xfc619800      0xfffff000
                0x545000        0xefffe000      0xfc619800      0xf0fd9ed8
                syscall+0x3ec

0xf0fd9ed8:     1               0               f0156d04        0
                fc962800        83              f0fd9fb4        fc50e800
                f0160fe8        fc50ecd4        0               fc50ecb0
                fffffffc        ffffffff        f0fd9f58        f0041aa0

0xf0fd9ed8:     nmap            0               sysent+0x624    0
                0xfc962800      LEDPATCNT+0x2b  0xf0fd9fb4      0xfc50e800
                cpr_info+0x3974 0xfc50ecd4      0               0xfc50ecb0
                -4              VADDR_MASK_DEBUG                0xf0fd9f58
                _sys_rtt+0x4d8

0xf0fd9f58:     40400080        ef769da0        ef769da4        4
                88              1               7               f0fd9f58
                0               0               5               3
                0               0               effff110        ef77d9b0

0xf0fd9f58:     0x40400080      0xef769da0      0xef769da4    au_auditstate
                LEDPATCNT+0x30  nmap            nfsdump_maxcount+3
                0xf0fd9f58      0               0           nfsdump_maxcount+1
                ts_maxumdpri+1  0               0             0xeffff110
                0xef77d9b0


data address not found


This example (see below), again, shows the same stack trace as was shown 
in the previous two examples.  Like the stacktrace macro, the stackregs 
macro takes an address as input.  Once again, the sp register is used 
as the starting point of the trace.

The stackregs macro prints each stack frame only once.  All the values are 
printed in hexadecimal except for the return address.

Notice that each stack frame is broken into two parts.  The first part prints
the eight local registers (l0 - l7) and the second part prints the eight input 
registers (i0 - i7).

Underneath each stack frame is the instruction located at the return address
found in i7.  In the first stack frame, we note that the instruction at
do_panic+0x1c is "call complete_panic".

<sp$<stackregs
0xf0fd9b90:     locals:
                0               fc8d4000        0            1
                f0152400        fc63c200        fc63c224     f017dfb8
0xf0fd9bb0:     ins:
                f00494d8        f0152a68        f0fd9b90     0
                0               1               f0fd9bf8     do_panic+0x1c
do_panic+0x1c:  call    complete_panic
0xf0fd9bf8:     locals:
                404000c1        f0181c08        70           70
                70              1fff            0            590c
0xf0fd9c18:     ins:
                f0165c44        f0fd9d04        f0fd9d04     404000e2
                40400ae2        3               f0fd9c58     cmn_err+0x1c
cmn_err+0x1c:   call    vcmn_err
0xf0fd9c58:     locals:
                f017ccdc        1               0            0
                0               ef13c000        0            fc92b9c8
0xf0fd9c78:     ins:
                3               f0165c44        f02c429c     1
                f0181a18        f016bf38        f0fd9cb8     page_unlock+0x3c
page_unlock+0x3c:               call     cmn_err
0xf0fd9cb8:     locals:
                fc674600        f017ccdc        f0160d8c     e0000000
                e0000000        ffffffe0        fc5011a0     fc5011a5
0xf0fd9cd8:     ins:
                f02c429c        0               e31e10ff     1
                f0181a68        0               f0fd9d18     segvn_lockop+0x370
segvn_lockop+0x370:             call    page_unlock
0xf0fd9d18:     locals:
                fc5011b8        1               1            2bd
                fc5011a8        fc958eb4        189000       624
0xf0fd9d38:     ins:
                0                ef13d000       0            fc951624
                f02c429c        fc96b624        f0fd9e10     as_ctl+0xac
as_ctl+0xac:    jmpl    %g1, %o7
0xf0fd9e10:     locals:
                fc717460        f01678c8        8000000      0
                0                fc4fef00       134          fc71747c
0xf0fd9e30:     ins:
                0               fc92b9c0        0            5
                0               fc619800        f0fd9e70     memcntl+0x368
memcntl+0x368:  call   as_ctl
0xf0fd9e70:     locals:
                2000            2b              0            fc4fef00
                3               0               fc717460     0
0xf0fd9e90:     ins:
                fc50ecb4        fc619800        fffff000     545000
                efffe000        fc619800        f0fd9ed8     syscall+0x3ec
syscall+0x3ec:  jmpl    %g1, %o7
0xf0fd9ed8:     locals:
                1               0               f0156d04     0
                fc962800        83              f0fd9fb4     fc50e800
0xf0fd9ef8:     ins:
                f0160fe8        fc50ecd4        0            fc50ecb0
                fffffffc        ffffffff        f0fd9f58     _sys_rtt+0x4d8
_sys_rtt+0x4d8: call    syscall
0xf0fd9f58:     locals:
                40400080        ef769da0        ef769da4     4
                88              1               7            f0fd9f58
0xf0fd9f78:     ins:
                0               0               5            3
                0               0               effff110     0xef77d9b0

data address not found




LOOKING FOR MATCHING BUGS
-------------------------

Between the messages file and the stack trace, there should now be enough
information to look in SunSolve to see if the system crash is caused by
a known bug.  If the crash is due to a panic, use the panic string to 
search (you probably want to make sure it's taken as one string, not
individual words).  If this brings up too many documents, add the OS
version number to narrow the search.

If the crash is due to an assertion failure, search the same as would
be done for a panic.

If the crash is due to a BAD TRAP, information from the stack trace
needs to be used.  The problem is which information.  The following
is the output of the $c command from a BAD TRAP core file.

$c
complete_panic(0xe00e8c00,0x1,0xe00d3400,0xf56b0000,0x4,0xe00e8c00) + d0
do_panic(?) + 20
vcmn_err(0xe00de850,0xe1bf7a4c,0xe1bf7a4c,0x17,0x17,0x3)
cmn_err(0x3,0xe00de850,0xe00fadd4,0x0,0x138bc,0xdffffc60) + 1c
die(0x9,0xe1bf7b74,0x11001c,0x226,0x1,0xe00de850) + e0
trap(0x9,0xe1bf7b74,0x11001c,0x226,0x1,0x0) + 534

This output is relatively typical and absolutely useless.  All this output
tells us is that we got a trap and proceeded to crash.  We already knew
that.  What needs to be known is what routine was executing that led to
the trap.  That information can be obtained by running the stackregs or 
stacktrace macro.

<sp$<stackregs
0xe1bf78d8:     locals:
                0             1               0            d
                1             e00d3000        f56b0000     1
0xe1bf78f8:     ins:
                e00e8c00      1               e00d3400     f56b0000
                4             e00e8c00        e1bf7940     do_panic+0x20
do_panic+0x20:  call    complete_panic
0xe1bf7940:     locals:
                400010c4      e0017db4        e00164e8     44
                44            44              7            e1bf78f0
0xe1bf7960:     ins:
                e00de850      e1bf7a4c        e1bf7a4c     17
                17            3               e1bf79a0     cmn_err+0x1c
cmn_err+0x1c:   call    vcmn_err
0xe1bf79a0:     locals:
                e00d3d98      408010c0        e00dec00     e00ded90
                e00ded78      e00ded5c        e1bf9        e1bf8
0xe1bf79c0:     ins:
                3             e00de850        e00fadd4     0
                138bc         dffffc60        e1bf7a00     die+0xe0
die+0xe0:       call    cmn_err
0xe1bf7a00:     locals:
                0             1e4218          f6d3b9c0     f6d3b954
                e00d3d98      f6e17b54        456ab        4
0xe1bf7a20:     ins:
                9             e1bf7b74        11001c       226
                1             e00de850        e1bf7a70     trap+0x534
trap+0x534:     call    die
0xe1bf7a70:     locals:
                1             f6a94c00        1            f5ca4000
                0             e0103890        f5cb4000     0
0xe1bf7a90:     ins:
                9             e1bf7b74        11001c       226
                1             0               e1bf7b18     fault+0x7c
fault+0x7c:     call    trap
0xe1bf7b18:     locals:
                408010c0      e0043100        e0043104     10
                9             1               7            e1bf7b18
0xe1bf7b38:     ins:
                110000        432             0            1
                974e          f6dea700        e1bf7bc0     tcoo_close+0x44
tcoo_close+0x44:              call     WR
0xe1bf7bc0:     locals:
                f000          0               0            a
                475210        1               e0102cfc     1
0xe1bf7be0:     ins:
                f6e17b00      3              f54bdf00      80000000
                f5e98a00      f6566400       e1bf7c20      qdetach+0xbc
qdetach+0xbc:   jmpl    %l3, %o7
0xe1bf7c20:     locals:
                432           432             c0c58        f56b4e4c
                94            f6d3b9a8        f6253400     e00d6e00
0xe1bf7c40:     ins:
                f6e17b00      1               3            f54bdf00
                e0103890      f6e17b54        e1bf7c80     strclose+0x4e0
strclose+0x4e0: call    qdetach
0xe1bf7c80:     locals:
                0             f6d14200        e0103908     f54bdf00
                3             e00d6cc8        f6d14250     40000
0xe1bf7ca0:     ins:
                e00d6e00      f6a94c9a        f6d14210     f6e17b00
                f6a94c60      f6d14258        e1bf7cf8     device_close+0x64
device_close+0x64:            call    strclose
0xe1bf7cf8:     locals:
                f6d14200      f544718c        e004b428     276e55
                1             4               f6472304     0
0xe1bf7d18:     ins:
                f63b8b84      3               f54bdf00     4
                f6472304      276e55          e1bf7d58     spec_close+0xd8
spec_close+0xd8:              call    device_close
0xe1bf7d58:     locals:
                f6d14200      e004b54c        e004b2c0     80
                4             276e55          0            f6472300
0xe1bf7d78:     ins:
                f63b8b84      3               f6472374     f647237c
                f6472374      f54bdf00        e1bf7db8     closef+0x138
closef+0x138:   jmpl    %l3, %o7
0xe1bf7db8:     locals:
                0             10000           e009f1bc     f5483a24
                0             0               3            f63b8b84
0xe1bf7dd8:     ins:
                f681d400      1               f681d42c     f681d40a
                f681d428      f681d420        e1bf7e18     closeall+0x58
closeall+0x58:  call     closef
0xe1bf7e18:     locals:
                268           e00a73e0        0            0
                4ba719        e00db604        f5ca4594     0
0xe1bf7e38:     ins:
                1             f5ca4594        f681d420     ffffffec
                10            f681d400        e1bf7e78     exit+0x230
exit+0x230:     call     closeall
0xe1bf7e78:     locals:
                0             f5cb9c08        fffffeff     f5ca4094
                0             e00fb65c        f5ca4000     f5ca4014
0xe1bf7e98:     ins:
                2             f               f5cb4000     f5cb4a00
                f5ca4014      f5cb9c08        e1bf7ed8     syscall+0x6cc
syscall+0x6cc:  call     psig
0xe1bf7ed8:     locals:
                1             0               e00d7798     4
                0             3               e1bf7fb4     f5cb4000
0xe1bf7ef8:     ins:
                e00e0048      f5cb44d4        0            f5cb44b0
                e             0               e1bf7f58     _sys_rtt+0x4d4
_sys_rtt+0x4d4: call     syscall
0xe1bf7f58:     locals:
                40001081      df6db114        df6db118     8
                88            2               7            e1bf7f58
0xe1bf7f78:     ins:
                4             dffffcd4        200          dffffdb7
                4             42bf0           dffffc60     0x138bc

data address not found

Looking at the routines leading up to the fault, there is tcoo_close, 
qdetach, strclose, etc.  These are the search keys that should be placed in 
SunSolve to try to find a matching bug.

If a matching bug is found, search for that bug number in SunSolve to see if
there is a patch.

If no matching bug is found, then the crash may represent a new bug.  In this
case, read the next section.


INFORMATION NEEDED BY SUN
-------------------------

When SunService is asked to analyze a system crash, there are certain files
which are always asked for.  These files differ slightly depending on 
whether they are coming from a SunOS 4.x machine or a Solaris 2.X machine.


For Solaris 2.X:

The following files:
/var/crash/<system_name>/unix.X     ( X = a number )
/var/crash/<system_name>/vmcore.X   ( X = a number )
/var/adm/messages
/etc/system
output from the showrev -p command
output from the prtconf -vp command 

And a brief history of the machine, including any recent changes,
and what the machine was doing at the time of the crash.

SunOS 4.1.X:

The following files:
/var/crash/<system_name>/vmunix.X   ( X = a number )
/var/crash/<system_name>/vmcore.X   ( X = a number )
/var/adm/messages
output from the devinfo -vp command 
kernel configuration file and the 
/usr/kvm/sys/`arch -k`/<config file name>/param.c file

And a brief history of the machine, including any recent changes,
a list of installed patches, and what the machine was doing at 
the time of the crash.

Note that the core files may be in a directory other than 
/var/crash/<system_name> if the administrator has changed the
savecore command in the /etc/rc.local or /etc/init.d/sysetup file.

Please remember to include your service order number with the tape.

SOLUTION SUMMARY:
Product Area Kernel
Product crash
OS any
Hardware any

Top

SunWeb Home SunWeb Search SunSolve Home Simple Search

Sun Proprietary/Confidential: Internal Use Only