Kernel FAQ

Enterprise Services Kernel FAQ
kernel tips: Most Frequently Asked Questions of Frontline Kernel Technical Support Engineers
=========================================================================
=========================================================================
              Sun Kernel Technical Support Engineer Primer
=========================================================================
=========================================================================

This document is intended as a primer for new Sun Kernel Technical
Support Engineers.  It covers most of the basic questions frequently
asked by customers, delves into some of the theory and internals of Sun's
operating systems, and offers tips and techniques for solving problems.

It is intended to provide a starting point of knowledge for new
engineers, who would otherwise not know what information to study from
the vast resources of Sunsolve.  It is intended to give them an idea
of what types of questions to expect, so they know the areas to study
further.

This document is intended to be kept updated.  Hopefully, new questions
will be added, and no-longer-relevent questions will be deleted over
time.

Unless otherwise stated, the answers are for Solaris 2.4 and up.  There
are a few questions for SunOS 4.X as well;  these apply to 4.1.4, and
may apply to older versions.

The questions covered in this document are listed below.  They are
not categorized, per se, but are organized so that adjacent answers flow
to each other.

Alphabetized list of topics, with links to first mention of each topic:
adb asynchronous
boot bufhwm
chroot config
DBE deadman descriptor device dmesg dnlc dump
/etc/system
file
hang
IPC iscda
kernelmap kmem_flags
libc
major maxusers memory memory leak messages message queue module mutex
panic patch prestoserve priorities probe-SCSI process processor proctool pty
rebuild rmap Ross
savecore scheduling SCSI semaphore setuid
shared memory shutdown signal SSA sunsolve1 swap
truss tuning
ufs_ninode undelete
watchdog
year 2000
zs

1) What is a patch?  How are patches installed?  How do I view which
patches are installed on a system?

2) Introducing the /var/adm/messages file.

3) Introducing the dmesg buffer

4) Introducing the /etc/system file

5) What is a loadable kernel module?  How does it work?

6) How to make a module load without using it?

7) How to see which modules are loaded?

8) How to unload a module if it is not busy?

9) What are the various configuration files or commands to determine the
system configuration?

10) How to tune semaphore parameters

11) How to tune shared memory parameters (2.x and SunOS 4.x)

12) How to tune message queue parameters

13) I set the IPC parameters, but they have no effect.  I still see the
values as they were set before.

14) I set the IPC (semaphores, shared memory, or message queue)
parameters and rebooted, but I still see zeros when I display them using
"sysdef -i", or I see "facility not in system" when I display
them using
"ipcs -a".

15) Tuning a system for a database

16) What is maxusers?  What is its default?  How is it changed?  What
other parameters are affected by changes to maxusers?

17) How to increase number of processes per user.

18) How to increase the total number of open files on a system?

19) How to increase the systemwide limit of the number of open files per
process?

20) How to increase the number of open files per process, on a process
by process basis?

21) How to change systemwide things other than file descriptor maximum?

22) How to increase number of processes systemwide.

23) How to increase number of ptys

24) What can be tuned regarding memory parameters?

25) What is dnlc?  How to tune?

26) What is ufs_ninode?  How to tune?

27) What is bufhwm?  How to tune?

28) What kernel parameters need to be modified to enable asynchronous
I/O?

29) What is a kernel rebuild?  When is it necessary to rebuild a kernel?
How to rebuild a SunOS kernel.  Where are error messages put?

30) Will 4.X run in multiple processor systems?

31) Which Ross modules are supported on which operating systems and
architecures?

32) My programs are not freeing up memory after they exit.  Vmstat free
column shows that there is very little free memory.  How can one tell
whether a system is short of memory?

33) How to tell how much memory is on a system?

34) How to tell how much swap on my system?

35) How to add secondary swapfiles to a system

36) How much memory does a process take?

37) My system is running slow.

38) What is a mutex?  What is mutex contention?

39) What is an adaptive mutex?  A spin mutex?

40) What are some other kinds of kernel locking mechanisms in addition
to mutexes?

41) What is kernelmap?

42) What is memory mapping?

43) What is DBE?  Is it necessary for Solaris?

44) What is the largest size of a pathname segment?  Total pathname?

45) How does one acquire detailed process information?

46) How can one access kernel statistics?

47) What is chroot?  How is it used?  What are its common problems?

48) What does setuid mean?

49) How come the system() system call does not execute a privileged
command when called from a program which is setuid'ed to root?

50) What causes defunct processes?  What to do about them?

51) My machine crashed.  What is a panic?

52) What causes a watchdog reset?  How can it be distinguished from a panic?

53) What is a memory leak?  How can I tell I have one?

54) How to tell why a system hung?

55) Is my system problem hardware or software?

56) What is savecore?

57) How large can a coredump be?

58) How to change the default dump device

59) How can I manually crash my system and get a coredump from it?

60) How come the system won't produce a coredump when it crashes?

61) How to get a coredump if STOP-A doesn't work?

62) How to send a tape of system configuration and corefiles in to Sun
Service for analysis?

63) How to FTP files to Sun Service?

64) How to retrieve files off of sunsolve1 for analysis.

65) What is iscda?

66) What is adb?  How is it used to examine corefiles?

67) Can adb be used on live kernels too?

68) How is a kernel patched live?

69) Can a kernel be patched permanently using adb?

70) What is a deadman kernel?

71) What is the kernel memory debugger?  How is it enabled?

72) What is truss?  How to see what's going on in a program?

73) The system returns a "data access exception" error on probe-SCSI

74) Some of my disk drives and/or tape drives are missing at boot or
have gone offline.

75) What to do if new devices time out on the SCSI bus:

76) How to restore devices in /dev directory on Solaris?  SunOS?

77) How to relate disk error in msgs file (i.e. sd0) to real devices
(cwtxdysz)?

78) What does "zs silo overflow" mean? "zs3 ring buffer
overflow"?

79) The console keyboard is in a weird state;  typing produces garbage.

80) My system keeps repeating "proc table is full" (4.X) or "out of
processes" (2.X).  What does it mean, and what is the cause?

81) Where are common error messages and return statuses listed?  Signals?

82) Processes won't run on one system but will run on others

83) The system says "out of memory" (ENOMEM) when I try to run
processes,
yet there is plenty on my system.  The system has no swap configured,
though.  What's going on?

84) "Process killed" is displayed promptly on the console when
execution
of a process is attempted.

85) What causes the message
    "rmallocmap: rmap overflow, lost [number,number]"
    and what can be done about it?

86) When is the international version of libc needed, and when is the
domestic version needed?

87) What is Prestoserve?

88) What has to happen at boot time in order for the Sparc Storage Array
(SSA) to work with Veritas or DiskSuite?  What order are the drivers
loaded?  What about Prestoserve and SSAs?

89) Where is a good matrix of patches and compatibility for the SSA?

90) How to make sure a file has been written out to disk?

91) How to undelete a file?

92) What is the proper way of shutting down a system?

93) What are the various ways of booting the system?

94) How to stop something from configuring (starting) at boot time.

95) When is it NOT a good idea to boot -r?

96) How to startup and shutdown processors?

97) How to bind processes to a processor?

98) How to tell processor speed?

99) What are the various software priorities?

100) What are the scheduling classes of Solaris?

101) What is the layout and dataflow of SCSI drivers?

102) What is a self-identifying device?

103) What is a major device number?  A minor device number?

104) How does the mechanism to add device software to a Solaris system
work?  What are the steps involved in adding a device?

105) How does the system know to do a reconfiguration boot without
having been passed "-r" in the boot command?

106) Recommended reading and references for Sun Kernel Tech Support
Engineers.

107) Will Solaris run properly after the year 2000?

108) Where can I get a copy of proctool?

109) What are SunOS 4.x kernel parameters ncallout, nclist, and ndquot?

=========================================================================
=========================================================================
1) What is a patch?  How are patches installed?  How do I view which
patches are installed on a system?

A patch is the means of installing new or updated software.  Patches
contain the software being installed, plus bookkeeping data to tell the
system what it will have once the patch is installed.

All patches have a README file associated with them, which describe what
the patch is for.  The README usually contains a list of bugs which have
been fixed, along with a synopsis of what those bugs are.  One can
lookup the bugs on Sunsolve for more information on them.

Sun patch numbers have the following format:

		123456-01
Patch number----^^^^^^
		       ^^----Version

2.X

Patch README files also have information on how to install the patch.
Most Solaris 2.X patches install with the installpatch installation
script;  some have additional instructions such as shutting down certain
system functions beforehand, or rebooting afterward.

The Solaris "showrev -p" command outputs a list of the current patches
installed on a system.  Multiple versions of a patch may be displayed if
multiple versions have been installed.  Note, though, that only the
highest version of a patch displayed will be in use.

4.X

Like 2.X, the patch README files have information on how to install the
patch.  Installation, though, is manual.  One copies binaries onto the
system, renames old ones, and may have to build new executables manually.

There is no 4.X equivalent of showrev -p.  One must maintain a log of
installed patches by hand.  One can perform checksums on individual
objects and compare them to those of a patch, to see which version of
the patch is installed, if necessary.

=========================================================================
2) Introducing the /var/adm/messages file.

This file contains most error, warning and informational messages which
have been logged by the system.  It contains also output from boot
sequences and panics.  It can be used to tell system configuration from
the boot sequences, and can be used to spot budding problems or track
the progression of a problem before a panic.  It helps reconstruct a
scenario which has led up to a panic.  It is one of the most useful
diagnostic tools available.

The /var/adm directory has other files, messages.#, which are older,
saved versions of the messages file.  These can be checked in the same
way, and for the same things, as the messages file itself.

=========================================================================
3) Introducing the dmesg buffer

The dmesg buffer is a memory-resident kernel buffer containing messages
logged via syslog and the kernel cmn_err() routines.  Error messages
found in the messages file are found in the dmesg buffer, but sometimes
the dmesg buffer has more detail.  The dmesg buffer may have things in
it which never made it out to the messages file in the case of an abrupt
system shutdown.

The dmesg buffer may be dumped out from the command line using the
"dmesg" command.  It may be viewed from adb (either from a live kernel
or from a system coredump) via the $<msgbuf command.

Data in this buffer will wrap.  Once the bottom of the buffer has been
reached, messages will begin overwriting what is already present at the
top of the buffer.  The /var/adm/messages* files are a good supplement
as the messages there, although not as complete in detail, are not
overwritten by newer messages.

=========================================================================
4) Introducing the /etc/system file

The /etc/system file is the system software configuration file.  Entries 
to tune the kernel are made here.

All kernel tuning parameters have defaults, which can be adjusted with
"set" lines added to this file.  For example, the kernel configures
itself to support a maximum number of users.  This value may be changed
to a specific value by adding a line to the /etc/system file;  to set
this value to 40, the line would be:

	set maxusers = 40

A system reboot is required for any changes to this file to take effect,
because this file is read at boottime.  It does not matter if the module
which uses the parameter is loaded later.

Please see the man page on /etc/system for more information on this
file.

=========================================================================
5) What is a loadable kernel module?  How does it work?

The Solaris kernel is a modular operating system, where only a basic
portion of the kernel is always in memory and other parts of it are
loaded on an as-needed basis.  For example, device drivers are loadable
because all devices on a system are not always used, and the space in
memory that would be taken by an unused device can be recycled for
something else.  That modules are loadable is also helpful to module
developers, because the kernel does not need to be rebuilt each time a
module is changed.

A module is loaded when it is needed, and may be unloaded after a period
of non-use.  Some modules are not unloadable.  For example, the 2.5
version of the shared memory module, shm.c, cannot be unloaded.  Modules
are made unloadable by returning a non-zero value from their _fini()
routine.

=========================================================================
6) How to make a module load without using it?

A module may be loaded without being used.  This is done either by
executing the modload command manually, or by specifying that the module
be loaded in the /etc/system file.

The modload command, executed as root, loads a module manually. In its
easiest form, the command is:

	modload <full-path-name-of-module>

For example, to load shared memory module, the command is

	modload /kernel/sys/shmsys

The most frequently manually loaded modules tend to be in a subdirectory
of /kernel, usually /kernel/drv or /kernel/sys.

A module can be forceloaded at boot time as part of the bootup sequence,
by specifying it in a "forceload" command inside the /etc/system file.
For example, to load the message queue module at boot time, the
/etc/system file would be augmented at the bottom to contain the
following:

	forceload: /kernel/sys/msgsys

Forceloading is a technique to get drivers to be loaded in a certain
order.  This is required, for example, when some drivers need to be
loaded before other drivers which are layered on top of them.
(DiskSuite drivers must be loaded to present an intact meta-filesystem
before Prestoserve drivers can be loaded, for example).

=========================================================================
7) How to see which modules are loaded?

The modinfo command displays a list of all loaded modules.  This list
can change over time, as modules sit idle and are automatically unloaded
by the kernel, and as new modules are loaded to satisfy new functionality
requested.  The command, executable as a normal user, is

	modinfo

This will display a list similar to the following format:

 Id Loadaddr  Size Info Rev Module Name
  1 fc07c000  3b84   1   1  specfs (filesystem for specfs)
  2 fc088000  1ab0   -   1  swapgeneric (root and swap configuration)
  3 fc08f000  2850   1   1  TS (time sharing sched class)
  4 fc08e568   49c   -   1  TS_DPTBL (Time sharing dispatch table)
  5 fc099000 1e618   2   1  ufs (filesystem for ufs)
  6 fc0c9558   a74   1   1  rootnex (sun4m root nexus)
  7 fc0cddf8   170  57   1  options (options driver)
  8 fc0d1ab0   4d8  62   1  dma (Direct Memory Access driver)
  9 fc0d3560   a58  59   1  sbus (SBus nexus driver)
 10 fc0d8aa8  24bc  76   1  iommu (iommu nexus driver)

except probably much larger.

=========================================================================
8) How to unload a module if it is not busy?

A request to unload a module can be made as root with the modunload
command.  In its easiest form,

	modunload -i 0

unloads all unloadable modules except those which have been loaded
manually with the modload command.  To request an unload of a single
module, specify its module ID.  For example, per the above list, to
request an unload of ufs, the command would be

	modunload -i 5

Modules which are active, or otherwise not unloadable, will not be
unloaded.  The modunload command will say when it cannot unload a
module.

Module 5 of the list above, of course, would not likely be unloadable
because ufs is always busy, being the module supporting the primary file
system type.

=========================================================================
9) What are the various configuration files or commands to determine the
system configuration?

The following files are used to tell system configuration:

2.X

Software:
  /etc/system			- tunable parameters
  output of showrev -p		- installed patches
  modinfo			- List of loaded modules (w/version #s).

Hardware:
  prtconf -v			- hardware configuration
  prtdiag (sun4d and sun4u)	- hardware configuration

4.X

Software:
  /sys/`/usr/bin/arch -k`/conf/<kernelname>	- kernel config file
  /sys/`/usr/bin/arch -k`/<kernelname>/param.c	- kernel config file
  List of patches (manually maintained)

Hardware:
  /usr/bin/devinfo -pv		- hardware configuration


If there are unbundled products, such as Solstice Disk Suite, which have
configurations of their own, this information helps give a more complete
picture.

=========================================================================
The answers to questions 10-14 have been moved to the IPC page
=========================================================================

15) Tuning a system for a database

Please see the answer for tuning shared memory parameters for a list of
suggested shared memory parameter settings for databases.  Additionally,
it is suggested that the fsflush parameters be modified as follows:

  set tune_t_fsflushr = 50
  set autoup = 300

These parameters set fsflush to take 300 seconds to flush all file
systems on the system, and to wake up every 50 seconds to flush out
50/300 = 1/6 of them during that iteration.

These can be adjusted higher but this is not recommended.  Bear in mind
that the longer it takes to complete the cycle, the more likely there
will be unflushed data at the time of a system hang or powerloss which
could cause file system corruption.

Please see Cockcroft's "Sun Performance and Tuning" book, page 210 for
more info on fsflush parameters.

=========================================================================
16) What is maxusers?  What is its default?  How is it changed?  What
other parameters are affected by changes to maxusers?

Maxusers is the number of users the kernel is set up to support.  It is
sized automatically by default, based on the amount of memory on the
system, in megabytes.

  Maxusers = number of megabytes of memory - 2

The automatically sized value can be overridden by an entry in the
/etc/system file.  The automatic sizing of maxusers based on memory is
limited to the range of 8 minimum to 1024 maximum, although it can be
manually set as high as 2048.

Maxusers may be adjusted manually by modifying the /etc/system file.
To set maxusers to 40, for example, the entry would be

	set maxusers = 40

Changing the value of maxusers affects other parameters. Other parameters
affected by changes to maxusers are:

parameter	value			What it represents
---------------	-----------------------	--------------------------------

max_nprocs	10 + (16 * maxusers)	Maximum number of processes
					systemwide (limited to 30000 on
					2.5 and 2.5.1)

maxuprc		max_nprocs - 5		Maximum number of processes
					per user.

ufs_ninode	max_nprocs + 80 + maxusers	Number of inodes cached
					in memory ready for use.  One of
					these is used for each open ufs
					file.

ncsize		max_nprocs + 80 + maxusers	Size of the Directory
					Name Lookup Cache, in entries.
					One of these is used for each
					segment of the pathname of each
					file opened, to make future
					openings of that file that much
					faster.

ndquot		(maxusers * NMOUNT)/4 + max_nprocs	Number of disk
					quota structures.  One of these
					is used per user per file system
					with quotas enabled.

Please see Adrian Cockcroft's book, "Sun Performance and Tuning," the
section entitled "Basic Sizing with Maxusers," page 186, for more info
on these parameters and on maxusers.

=========================================================================
17) How to increase number of processes per user.

The easiest way of increasing the number of processes per user is to
increase maxusers (see above) as this will adjust other parameters
requiring adjustment as well.

Maxuprc may be adjusted independent of maxusers.  Note, however, that
it needs to be less than max_nprocs.  Since max_nprocs is not much
larger than maxuprc, it will need to be increased as well, and
then ufs_ninode and ncsize will be affected, almost as much as if
maxusers were modified.  The moral of the story is: just modify
maxusers.

To change the value under SunOS 4.x, edit the value of NPROC, as in
srdb 14177.

=========================================================================
18) How to increase the total number of open files on a system?

On Solaris 2.X, there is no limit of the total number of open files on
the system, thus there is nothing to tune.

=========================================================================
19) How to increase the systemwide limit of the number of open files per
process?

For Solaris 2.4 or later, one method is to modify the /etc/system file.
Add the following to the /etc/system file, for example, to implement a
limit of 512:

  set rlim_fd_cur = 1024

If the hard limit is to be changed from its default, add a line similar
to the following to /etc/system.  (The default is 1024.)

  set rlim_fd_max = 512

This modifies the area in the kernel from where processwide limits are
initialized.  The first value corresponds to the soft limit;  the second
to the hard limit.  (The soft and hard limits are also settable from the
"limit" command in csh or the "ulimit" command in sh or ksh.)

Here are 2 "got-cha's" the cu should know about if they decide
to set rlim_fd_* higher than 1024 (from srdb srdb 11112):

1. Stdio routines are limited to using file descriptors 0 - 255.  Even
   though you can set the limit higher than 256, if fopen() cannot get a
   file descriptor lower than 256, the fopen() fails.  This can be a
   problem if you have other routines using open() directly.  For example,
   if you open 256 files with open() without closing any, you won't be able
   to open any files at all with fopen(), because all the low-numbered file 
   descriptors have been used up.

2. It is somewhat dangerous to set the fd limits higher than 1024.
   There are some structures defined in the system (like fd_set in
   ) that assume that the maximum fd is 1023.  If the
   program uses an fd larger than this with the macros and routines that
   access this structure (like FD_SET()), it will corrupt its memory
   space because it will modify memory outside the bounds of the structure.
   This structure is used specifically by the select() routine, and
   indirectly by many library calls that use select().
   
For issue #2, the workaround would be to use the poll() system call.
Check out srdb srdb 15998, for details.

For Solaris 2.3 and older releases, the kernel must be patched with adb.
Here are two documents that describe how to do this:
Solaris 2.3 descriptors
faqs 1001

An alternative to setting the value in the /etc/system file or patching
the kernel is to add a line in the systemwide /etc/.profile (for ksh and
sh users) or systemwide /etc/.login file (for csh users) which is the
equivalent of a user-entered command.

  For ksh or sh users, the command to set the number of descriptors to
  512 in the .profile is:

    ulimit -n 512

  For csh users, the command in the .login file is

    limit descriptors 512

Whether the hard limit (rlim_fd_max above) is specified in /etc/system,
or not (default is 1024), the above "limit" commands can set the soft
limit up to whatever the hardlimit is.

=========================================================================
20) How to increase the number of open files per process, on a process
by process basis?

Use the limit command for csh or the ulimit command for ksh or sh.
The soft limit can be set with normal user privilege, to any value up
through the hard limit, as set by rlim_fd_max (or 1024 if not modified
from the default).  The user would either execute manually, or put in
the .login file:

  limit descriptors 512

to raise the descriptor limit to 512, from the default value of 64 for
2.4 and 2.5.

=========================================================================
21) How to change systemwide things other than file descriptor maximum?

Only the file descriptor maximum is settable from the /etc/system file.
Other parameters specified by the limit or ulimit shell commands can be
set using the appropriate command in the systemwide /etc/.login file for
csh users, or /etc/.profile file for ksh or sh users.  Only the soft
limits can be changed in this way, and must take into account system
default hard limits.

System default hard limits can be changed only by modifying the kernel
using adb.  Please see Cockcroft's "Sun Performance and Tuning," page
222, for information on how to do this.  Doing this is not recommended,
because there is no way to enforce documentation of the changes;  that
is, the system may break and there will be no obvious indication of why.

=========================================================================
22) How to increase number of processes systemwide.

If the message "Proc: table is full" is displayed,  this means that the
system cannot start any more processes.  The maximum number of processes 
on the system at any time is determined, by default, by the value of
maxusers.  Maxusers, by default on Solaris 2.3 and higher, is determined
by the amount of memory on the system.

View the current maximum number of processes with adb, as root:

  # adb -k /dev/ksyms /dev/mem		<<--- Type this
  physmem xxxx				<<--- System prints this
  v$<v					<<--- Type this
  v:
  v:		buf		call		proc
		100		0		490   <<--- 490 is max
  v+0xc:	globpri		maxsyspri
		110		99
  v+0x1c:	maxup		hbuf		hmask
		485		64		3f
  v+0x28:	pbuf		maxpmem		autoup
		0		0		30
  v+0x38:	bufhwm
		620

The value of "proc" listed is the maximum number of processes supported
on the system.

The best way of changing this is to change maxusers.  See the relevant
question on maxusers for more information on how to do this.

=========================================================================
23) How to increase number of ptys

Solaris by default has 48 5.x (System V) pseudo-ttys (pts),
with device files /dev/pts/0, /dev/pts/1, etc.
and 48 4.x (BSD) pseudo-ttys,
with device files /dev/ptyp0, /dev/ptyp1, etc.
configured.  Two kernel variables can be modified to change this
default. 

 npty    -> Number of 4.x pseudo_ttys
 pt_cnt  -> Number of 5.x pseudo_ttys

To set these values edit the /etc/system file.  The following lines
change the pseudo_tty counts to 128:

 set npty=128
 set pt_cnt=128

BSD is limited by a maximum of 176 device files.  The System V count can be
set as high as 3000 under 2.4.  Boot -r after changing
these, to have the changes take effect.

For more details see:
Internal Info Doc 7314
For help with ptys on SunOS 4.x see:
Internal Info Doc 4332
The MAKEDEV script (in /dev directory) is also useful in answering questions on
ptys for SunOS 4.x.

=========================================================================
24) What can be tuned regarding memory parameters?

Solaris systems will begin paging when the number of free pages drops
below the tunable threshold value "lotsfree".

They will swap whole processes out if the 30 second average of the number
of free pages is below the tunable threshold value "desfree".

They will swap whole processes out of the 5 second average of the number
of free pages is below the tunable threshold value "minfree".

The rate the page daemon scans for pages to remove from memory, when the
number of pages on the freelist (freemem) drops to just below
"lotsfree",
is the tunable value "slowscan".

The rate the page daemon scans for pages to remove from memory, when the
number of pages on the freelist (freemem) drops to zero, is the tunable
value "fastscan".

The maximum number of page out I/O operations scheduled by the system
per second is "maxpgio".  This parameter purposely bottlenecks the
paging operations to keep some disk bandwidth available for normal
operations.  It is set normally to (disk revolutions per second X 2/3).

Please consult with Cockcroft's "Sun Performance and Tuning" book,
chapter 11, for a full description and tuning tips.

=========================================================================
25) What is dnlc?  How to tune?

Dnlc stands for "Directory Name Lookup Cache."  This is a cache which
holds pathname segments of the most recently opened files.  Only segments
which are 32 characters or smaller are cached.

For example, if the file /usr/openwin/bin/cm were open, there would
be a cache entry for the "usr" part, one for "bin", one for
"openwin"
and one for "cm".  The cache stores the inodes corresponding to the
directories which correspond to the parts of the pathname (/usr,
/usr/bin, /usr/bin/openwin, and /usr/bin/openwin/cm).  This makes future
lookups of inodes corresponding to the parts of the pathname faster, as
the inode would not have to be read off the disk again.

The size of the dnlc is tunable.  The command placed in the /etc/system
file, to set it to 5000 is:

  set ncsize = 5000

The default is set automatically based on maxusers:

  ncsize = max_nprocs + 80 + maxusers

and may need to be increased for large nfs servers which have lots of
clients and lots of files to open.

Determine whether this value needs increasing by issuing a "vmstat -s"
command and looking at the hit percentage of the "total name lookups"
line.  If it is less than 90%, increase the value with an entry in
/etc/system.

Please see P189 in Cockcroft's "Sun Performance and Tuning" for more
information.

=========================================================================
26) What is ufs_ninode?  How to tune?

ufs_ninode is the size of the ufs inode cache.  Ufs is the most used file
system type for locally mounted disks.  An inode is a file header of
sorts, storing modification times, file size, and other things in
addition to pointers to data blocks.  If an inode is kept in memory, it
does not have to be read repeatedly from disk when the same file is
opened again and again.  This helps large nfs servers or other file
servers which open the same files over and over operate more efficiently.

Please consult pages 191-193 of Cockcroft's "Sun Performance and
Tuning" 
book for more information.

=========================================================================
27) What is bufhwm?  How to tune?

This is the amount of memory allocated to the buffer cache.  The buffer
cache in 2.X is used to cache inode-, indirect block- and cylinder
group-related disk I/O.

This value is automatically adjusted to be 2% of the size of RAM.  This
can be a problem, though, since virtual space for it is allocated from
the kernelmap pool which is fixed in size and may not be large enough to
support the 2% default.  Cockcroft recommends setting bufhwm to 8000 in
the /etc/system file, if it is too large:

  set bufhwm = 8000

It can be checked by going into adb as follows:

  # adb -k /dev/ksyms /dev/mem		<<--- Type this
  physmem xxxx				<<--- System prints this
  v$<v					<<--- Type this
  v:
  v:		buf		call		proc
		100		0		490
  v+0xc:	globpri		maxsyspri
		110		99
  v+0x1c:	maxup		hbuf		hmask
		485		64		3f
  v+0x28:	pbuf		maxpmem		autoup
		0		0		30
  v+0x38:	bufhwm
		620		<<------------- This is the value.

Please consult page 189 of Cockcroft's "Sun Performance and Tuning" for
more information.

=========================================================================
28) What kernel parameters need to be modified to enable asynchronous
I/O?

Disk block writes and read aheads are asynchronous by default due to how
Unix buffers disk I/O.  There is no way to tell when they have completed,
though, when done in asynchronous fashion.

The aioread(3) and aiowrite(3) calls to perform asynchronous operations
may be used, and these provide a mechanism to tell when they are
completed.  There is nothing to tune in the kernel for this.

If running 2.4, please make sure that the kernel jumbo patch 101945-27
or higher rev, is installed.  2.5 and higher needs no patch for this.

=========================================================================
29) What is a kernel rebuild?  When is it necessary to rebuild a kernel?
How to rebuild a SunOS kernel.  Where are error messages put?

To rebuild a kernel is to create a new bootable executable which runs
the computer.  This executable is called vmunix on SunOS 4.X.

Solaris 2.X kernels never need to be rebuilt, as new devices are
dynamically configured into the kernel.  Adding new features or devices
to SunOS 4.X kernels require a kernel rebuild.

How to build a kernel on 4.X

I) Add the new features to the kernel.  This may involve different
things, depending on what is being added.

  The following files define system software configuration and the
  hardware that the software expects to find:

  /sys/`/usr/bin/arch -k`/conf/<kernelname>

    (The "factory" file on a sun4m architecture, for example, would be
    /sys/sun4m/conf/GENERIC.)  This file specifies the hardware
    configuration the OS expects to find, or what it will support.
    Modify this file if new hardware is added, or pare it down to make
    the kernel smaller by removing unneeded devices.  This file is also
    where software features are enabled and tuned, such as the
    InterProcess Communication (IPC) system services of shared memory,
    semaphores and message queues.  Some tuning of general kernel
    parameters, such as the maximum number of users supported, MAXUSERS,
    is done here as well.

    It is a good idea to start with a COPY of the GENERIC kernel, rather
    than modifying the original copy.  It is also a good idea to change
    the name on the "ident" line in the file to reflect the name of the
    file;  that way the kernel will identify itself with the name of the
    file when booted.  For the purposes of this document, call the file
    SYS_NAME.

  /sys/`/usr/bin/arch -k`/<kernelname>/param.c

    This file provides more exact tuning of certain kernel parameters.
    For example, the process table may be increased without increasing
    MAXUSERS by modifying this file.

  /sys/`/usr/bin/arch -k`/conf/files

    New devices added to the kernel must have their driver entered into
    this file.  The general form of a new line added to this file for a
    user-written or 3rd party driver is:

      sundev/<driver_file>.c	optional <driver> device-driver

    where <driver_file> is the driver source and <driver> is
the 2 or 3
    letter abbreviated driver name.
  
  /sys/sun/conf.c

    New devices added to the kernel must have an entry in this file,
    to tell the OS what to call when the new device is accessed.  This
    file contains two large arrays of structures of device driver entry
    points.

  xx.h

    Note that each device references an include file.  The include file
    contains one line which specifies the number of devices to be
    supported by the driver.  For example, the include file for a
    (hypothetical) abc driver supporting one device would be called
    abc.h and would contain the following line:

      #define NABC 1

II) Build the kernel.

  # cd /sys/`/usr/bin/arch -k`/conf
  # config SYS_NAME

  # cd ../SYS_NAME

  # make
 
  NOTE: if there are "make" errors, look for a file called
"makedeperrs"
  in the current directory to give a better description of what happened.

III) Move the new kernel into place.

  For a stand alone system or server:

    # mv /vmunix /vmunix.orig
    # cp vmunix /vmunix

  For a diskless client:

    # mv /export/root/SYS_NAME/vmunix /export/root/SYS_NAME/vmunix.orig
    # cp vmunix /export/root/SYS_NAME/vmunix

  For a dataless client:

    # mv /vmunix /vmunix.orig
    # cp /usr/kvm/sys/sun[3,3x,4,4c]/SYS_NAME/vmunix /vmunix

IV) Boot with the new kernel.

    # /etc/reboot

V) If there are problems booting with the new kernel, boot off the old
one:

    > boot vmunix.orig -s
    # mv /vmunix /vmunix.new
    # ln /vmunix.orig /vmunix
    # reboot

More information on this topic:
infodoc 2160
infodoc 11263
infodoc 11819

=========================================================================
30) Will 4.X run in multiple processor systems?

Only the sparc 600 MP is supported to run SunOS 4.X with multiple
processors.  SunOS will run on non-supported multiple processor systems,
but will crash from time to time.

SunOS has a single lock around the kernel, allowing only one process at
a time into the kernel proper.  The OS will support multiple processes
running in user context, but processes will block in system service
calls if another process is executing in the kernel already.

=========================================================================
31) Which Ross modules are supported on which operating systems and
architecures?

Only Ross modules sold by Sun, and run on the architectures listed below
with the operating systems listed below, are supported by Sun Service.
Ross modules sold by Sun are called "hyperSparc" modules.

  a) SunOS 4.1.4 - uniprocessor SW/HW support, multiprocessor HW support

  b) Solaris 2.3 and higher - uniprocessor and multiprocessor SW/HW
     support.

  c) Customers running 4.1.3 or 4.1.3U1 with the kernel patch, or 4.1.4
     with multiple processors are supported by Ross Technology, not Sun
     Service.  The phone number for Ross Technology is 1-800-ROSS-YES.

  Architecture		Min Prom Rev	Supported	# processors
(Motherboard MAL)	(FRU part #)	   OS
-----------------	---------------	---------------	------------
SparcStation 10		PROM Rev 2.19.3	    4.1.4	1
 (None)			(370-2013-01)	2.3 or higher	1 or more
-----------------	--------------	--------------	------------
SparcStation 20		PROM Rev 2.19.3	    4.1.4	1
 (None)			(370-2013-01)	2.3 or higher	1 or more
-----------------	--------------	--------------	------------
SparcServer 600		PROM Rev 2.14.0	    4.1.4	1
 (501-2055-04)		(370-2015-01	2.3 or higher	1 or more
-----------------	--------------	--------------	------------

=========================================================================
32) My programs are not freeing up memory after they exit.  Vmstat free
column shows that there is very little free memory.  How can one tell
whether a system is short of memory?

What the vmstat "free memory" column represents.

The vmstat free memory column and the sar -r "freemem" column indicate
the size of the freelist, which is a list of pages which have been marked
free by the page daemon.  This, however, is a misleading figure, because
"free" memory not on the freelist is reclaimed from exited or idle
processes on demand.

The page daemon runs mainly when there is a shortage of memory. The pages
it puts onto the freelist will be quickly used by the process causing the
shortage in the first place.  It is not on the freelist long enough to
show up there in a vmstat command.

The size of the freelist will appear to shrink to a very small value
(determined by the tunable parameter "lotsfree"), and will remain near
that value.  The page daemon will kick in and look for more memory to
reclaim from exited and idle processes when the amount on the freelist
drops below this threshold.  There is no way for the value to grow much
above the threshold, because there is no way to get the page daemon to
work to reclaim memory beyond the threshold.

What all of this means, is that the size of the freelist is no indication
of how much free memory there really is.  There may be a great amount of
unused memory which has yet to be reclaimed by the page daemon, because
the page daemon has had no need to reclaim it.

A better way to tell if a system is short of memory.

Take a look at the "sr" column of vmstat, or the sar -g
"pagescan" value.
This indicates how quickly the page daemon is looking to find unused
memory pages to reclaim for use by needy processes (in units of pages per
second).  Systems which are short of memory will do lots of paging and
possibly swapping.  The shorter the system is of memory, the faster it
will look for pages to swap out, and the higher the value.  When there
is no memory shortage, this value will sit at zero.  When new processes
first startup, this value may jump and then return to zero.  It may be a
steady small value, which indicates that the system page daemon is active
all the time.  The experts say that if this value is steadily over 200,
though, more memory is definitely needed.

It is normal for this value to spike up briefly, as when a new program
is being started and the system scrambles for pages to accomodate it.
It is also possible to tune the kernel so that it keeps a larger reserve
on the freelist so that such spikes do not occur as radically, providing
better interactive response.

For more information:

Please see chapter 11 of the "Sun Performance and Tuning" book by
Adrian
Cockcroft (1995, SunSoft Press) for more information on how paging and
memory management works, and for information on how to tune memory
parameters for optimal performance.

=========================================================================
33) How to tell how much memory is on a system?

  I)	/usr/sbin/prtconf displays this near the top of its output.

  II)	/usr/platform/sun4d/sbin/prtdiag (sun4d only, 2.5)
				or
        /usr/kvm/prtdiag (sun4d only, 2.3, 2.4)

  III)	Look at the messages files (/var/adm/messages*) or the output of
	the "dmesg" command, during boot messages, toward the start of a
	boot sequence.  (Search for "Release" to get to the start of a
        boot sequence.)  This displays the amount of memory in bytes.

=========================================================================
34) How to tell how much swap on my system?

2.X

  I)	/usr/sbin/swap -l tells the number of 512 byte blocks configured,
 	and the number of blocks free, on a swapfile by swapfile basis.
	It does not include any swap space in RAM.

  II)	/usr/sbin/swap -s
	This reports reserved swap (swap space which has been set aside
	for a particular process, but has not been used yet), available
	swap (swap space not assigned to a particular process), and
	allocated swap (swap which is in use).  The space reported
	includes all swapfiles plus swap space in RAM.

  III)	sar -r freeswap
	This reports swap -s available + swap -s reserved

  IV)	vmstat swap
	This shows available swap (same item as sar -r freeswap, but) in
	kilobytes.

  V)	The /usr/openwin/bin/wsinfo command, run from within
	openwindows, displays the same things that swap -s does, but in
	graphical form.

4.X

	pstat -s
	This reports reserved swap (swap space which has been set aside
	for a particular process, but has not been used yet), available
	swap (swap space not assigned to a particular process), and
	allocated swap (swap which is in use).  The space reported
	includes all swapfiles.  Unlike 2.X, there is no RAM set aside
	to be used with the swap mechanism.

=========================================================================
35) How to add secondary swapfiles to a system

  I)	Find a local partition other than /tmp, where there is enough
	diskspace.

  II)	cd to that directory

  III)	Use the mkfile(1M) command to crreate the swapfile.  To create,
	for example, a 50 megabyte file, the command would be:

	  mkfile 50m new_swap_file

  IV)	Add the new swapfile to the system:

	  swap -a new_swap_file

  V)	Add the added swap file to the /etc/vfstab file if it is to
	be added each time the system is booted.  For example, to add
	the above swap file, assuming it exists in /export, the vfstab
	entry would look like:

	#device			device	mount	FS	fsck	mount	mount
	#to mount		to fsck	point	type	pass	at boot	options
	#
	/export/new_swap_file	-	-	swap	-	no	-

	Swapfiles can be on metadevices;  the dump file, though, best
	resides on a regular disk.  The dump file is typically the
	primary swapfile.

=========================================================================
36) How much memory does a process take?

Both flavors of the Solaris ps command can say.

  /usr/ucb/ps -aux 

    A SZ column containing kilobytes of swap space (combined RAM and disk
    swap).

    An RSS column containing the amount of RAM in kilobytes.

    A %MEM column prints the percentage of RAM memory

/usr/bin/ps -efl

    A SZ column containing pages of swap space (combined RAM and disk
    swap).


Troubleshooting: problems and solutions

=========================================================================
37) My system is running slow.

This is a very general statement, and could have lots of answers
depending on configuration and the logistics of when the system runs
slow.  Generally, the system runs slow when it is bottlenecked by
something;  the whole system is as slow as the slowest link in its
chain.  Generally, the bottlenecks in a system are:

a) CPU shortage: If a system's CPUs are maxed out, they are the
bottleneck.  To tell whether CPUs are maxed out, simply bring up the
perfmeter.  Other ways of measuring this is with vmstat where the far
right column of its output, "id" (for "idle"), will be zero; 
sar -u
also displays CPU usage.

Checking the vmstat "faults" and "cpu" columns may indicate
what the CPU
is doing.  A large "in" column indicates that the system is getting
many
interrupts from external devices.  "sy" indicates the number of system
service calls.  "cs" indicates the number of context switches, that is,
the number of times one process stops and another is resumed;  this
takes overhead.  Additionally, the "sy" column indicates the percentage
of system-context cpu time used (time spent executing system service
calls, interrupts or context switches), and the "us" column indicates
the user-context cpu time used (that is, non system-context time).

b) Memory shortage: A system may be slow because it is spending too much
time paging and swapping (a phenomenon called "thrashing").  This will
happen if there is not enough main memory to maintain all processes and
their data inside RAM.  Check the vmstat "sr" (scanrate) column;  there
is a memory shortage if it is a sustained nonzero value.  The experts
say there is a problem if it is 200 or more sustained.

Spikes in the "sr" column are OK, but may indicate that the system
needs
to be tuned to get a better interactive response.  The system's memory
management parameters may be tuned to keep more of a memory reserve on
the freelist so that the transient severe shortages indicated by the
spikes will not occur, and the paging process will itself not become a
bottleneck at these times.  The tradeoff for this, though, is that there
will be less memory available to run programs with if it is sitting on
the freelist, so do this only if there is adequate amounts of memory to
start with.

Please check the other questions on this document regarding memory, for
more information.

c) Disk bottlenecks: Disks are slow, and could cause the rest of the
system to backup.  Disk bottlenecks can be spotted using sar -d;  Large
(over 50ms) avwait times for reads, or large (over 50ms) avserv times
for writes indicate a bottleneck.  %busy is the percentage of the sample
time the device was busy, and avque indicates the average number of
queued outstanding requests for that device during the sample period.
If the busy time is greater than 65%, the disk is too burdened.

The solution to disk bottlenecks may be to shuffle disks around so that
the more heavily used disks are the faster ones, or divide the most
commonly accessed stuff among several disks.  If swapping is heavy,
divide large swapfiles into several smaller ones on different disks, to
spread the load.

d) Network bottlenecks: Lots of traffic on a network makes it more
likely that data sent from or destined for the slow system will take
longer to get to where it's going.  Packets on a busy network will need
to be investigated by every system on it, even though each packet may
be destined for a different system.  Add subnets to remove unnecessary
network traffic from the system's view.

e) Certain operations are known to take a long time, and tunable kernel
parameters which exist to speed up their procedures may need to be
adjusted.  For example, large file servers may need to have their
directory name lookup cache expanded to speed up file lookup times.

A great tool exists to aid in spotting bottlenecks.  It is called
"Virtual Adrian" and is available off the web.  The latest release
(as of 6/98) is at http://www.Sun.COM/sun-on-net/performance/se3/.
It can display on the screen, or log in a logfile, shortages of
resources as they occur, and in some cases recommend what to tune.

Another reference is "Sun Performance and Tuning" by Adrian Cockcroft.
Incidentally, Adrian Cockcroft is the one who wrote "Virtual Adrian."

=========================================================================
38) What is a mutex?  What is mutex contention?

Mutexes, as used in the kernel, are locks placed around certain kernel
data structures to limit to one at a time the number of kernel threads
which can change those data structures.  They insure that when a thread
does gain access to a data structure, that data structure will be intact.

Mutex contention is when several access attempts are made to get a
particular mutex while it is owned.  This becomes a problem when the
mutex owner is slow to do what it must while owning the mutex (as when a
mutex is locked while allocating a resource which is scarce), or when
too many threads try to access a particular mutex.

The latter case can often be helped at the application level.
Applications which make the same system service call many times, or
which call many system services in the same family (for example, any or
all of the several shared memory system service calls), can cause mutex
contention while trying to gain access to the common data structures
used by those system service calls.  Taking the shared memory example,
if an application can allocate a larger chunk of shared memory with each
system service call, it will need to make fewer calls, causing a
reduction in mutex contention.  In general, restructuring an application
to make each system service call count more so that fewer calls are
needed, will help mutex contention.

Incidentally, making fewer system service calls will also help
performance in that fewer context switches (from user mode to kernel
mode) will be needed.  This cuts down on overhead, leaving more time to
do useful work.  It also reduces the number of accesses needed for any
mutex locks associated with the context switching mechanism.

=========================================================================
39) What is an adaptive mutex?  A spin mutex?

An adaptive mutex is one which goes to sleep while waiting for the lock
to become available.  A spin mutex busywaits waiting for the lock to
become available.

Adaptive mutexes are used in both single- and multi-processor systems.
There is some overhead associated with putting the waiting thread to
sleep, but then that thread's processor is available to do useful work
on behalf of another thread.

Spin mutexes can be used only in multi-processor systems, because if the
sole processor in a single processor system was spinning, it would not
have the opportunity to free the mutex holding the lock.  Spin mutexes
are used in cases where the resource being waited for is expected to
become free very soon;  they are used when it would take more time to
put a thread to sleep and wake it up again, rather than to just let it
spin for a few cycles instead.

=========================================================================
40) What are some other kinds of kernel locking mechanisms in addition
to mutexes?

The kernel has four kinds of locking mechanisms of which mutexes are
one.  The other mechanisms are:

  multiple reader, single writer locks.  These locks allow many readers
    at a time, or one writer at a time.  These prevent a writer from
    modifying a data structure until there are no more readers using the
    data structure.  When a writer requests the lock, readers requesting
    the lock afterward are made to wait until after the writer gets and
    releases the lock.

  semaphore locks.  (These are NOT to be confused with the semaphore IPC
    system service calls.)  These locks are used when there is more than
    one of a resource, but the resource is limited.  These locks allow
    up to a certain number of accessors at a time.  Once that number is
    reached, additional accessors are made to wait until an existing
    accessor releases the lock.

  conditional variables.  Threads wait on this type of lock for an event
    to occur, at which time they are awoken.  This is opposed to the
    other kinds of locks, which are for waiting for resources.

=========================================================================
41) What is kernelmap?

Kernelmap is a generic resource map of page segments to which memory or
devices can be mapped.  It is NOT memory, but rather a place to which
memory can be mapped.  A list of free kernelmap pages is maintained by
the kernel and may be viewed via the "map kernelmap" command of the
"crash" program.  Systems which are hung are often out of kernelmap;
the "map kernelmap" command will show in this case either no free
segments, or else only a few small free segments.

Kernelmap segments mapped to memory may be viewed via the "kmastat"
command of the "crash" program.  Mapped kernel memory is divided into
pools of different sizes.  Memory to satisfy a request is taken from the
pool managing that sized memory chunk.

Kernelmap is fixed in size, and varies from architecture to
architecture.  The kernelmap is defined in source module startup.c.
To see the kernel memory limits for various os revs and architectures,
check: Internal Info Doc 13900

=========================================================================
42) What is memory mapping?

Memory mapping is a mechanism by which files or devices (which are seen
as files) may be accessed from within a program as if they were mapped
in as variables.  The mmap() system service call sets this up.

Suppose, for example, that a data file exists, and is to be accessed by
a program.  The program might mmap() the file into a character pointer
variable, then access the file's data by indexing into the character
pointer variable like it was an ordinary character array.

Devices are often mmapped for access.  Memory mapping is ideally suited
for devices with large buffers, such as video cards which can have
buffers of several megabytes.  Memory mapping the devices saves having
to transfer megabytes of data from memory to the device, by putting the
data there in the first place.

=========================================================================
43) What is DBE?  Is it necessary for Solaris?

DBE is a product which changes a 4.X kernel for use with databases.  It
changes from 256 the number of file descriptors per process to 2048,
and provides other kernel enhancements and optimizations, such as
asynchronous I/O support, as well.

DBE must be purchased separately for 4.1.3, but it comes on the 4.1.3U1
and 4.1.4 installation CDroms as a separately-installable patch.

The features of DBE are incorporated into Solaris 2.X, and so a separate
DBE product for 2.X is not needed.  (The number of file descriptors per
process for Solaris, though, go up to 1024, not 2048).

=========================================================================
44) What is the largest size of a pathname segment?  Total pathname?

The largest size of a pathname segment is 256 characters.  The largest
size of a total pathname is 1024.  These are defined in
/usr/include/sys/param.h as MAXNAMELEN and MAXPATHLEN.

=========================================================================
45) How does one acquire detailed process information?

Programs such as the "ps" command acquire process information via the
proc(4) interface.  The proc(4) interface allows one to open a door into
the kernel in the form of a file in the /proc directory.  Every process
on the system has an entry there, with the PID as the filename.  Once
opened, that file descriptor may be passed to an ioctl(2) call to
extract information about that process.

Please see the man page on proc(4) for more information.  Do a truss on
the ps command to see the proc interface in action.

=========================================================================
46) How can one access kernel statistics?

Kernel statistics are kept in maintained places inside the kernel.
Programs such as vmstat and sar use the kvm interface to read these.
There are man pages on kvm_open(3K) and on other kvm routines (all in
section 3K) which describe how to use them.  Do a truss on the vmstat
command to see the kvm interface in action.

=========================================================================
47) What is chroot?  How is it used?  What are its common problems?

chroot causes a command to be executed relative to a new root directory.
Command syntax is

  /usr/sbin/chroot newroot command

It is often used on secure ftp sites to prevent outsiders from being
able to access anything more than where they are supposed to be, by
making where they are supposed to be the "root" directory that they
see.

In order for this command to work properly all special device files used
by programs operating within this context need to be available.  Devices
such as /dev/zero, which are used by nearly everything, will not show up
as they are outside the new root directory's scope.  Links cannot be
made to them either, because the links would point outside the new root
directory's scope.  New devices must be created with mknod, the utility
by which special device files are created.  They must be created in a
directory in the same relative location to the new root as they were to
the old root;  for example, a new device analogous to /dev/zero would be
created in a dev directory (note, "dev" not "/dev") hanging
from the new
root.

Programs which fail in a chroot'ed context but which succeed otherwise
may be trussed to see which devices or files they cannot find.  These
devices or files can then be created under the new root.

=========================================================================
48) What does setuid mean?

setuid() is a system service call which makes a program act as if its
owner is running it.  It works only if the program's setuid file
protection bit is set on.  (Use chmod u+s <filename> to set this
protection bit on.)

Programs which need to be run by normal users, but with special
privilege, use this mechanism.  Such programs are made owned by root, and
have their setuid file protection bit set on.  They can then call
setuid() from within the program, and can execute privileged system
service call functions.

setgid() is a system service call which makes a program act as if
someone in its group is running it.  It works the same way as setuid()
does, except the program's setgid file protection bit must be set on.
(Use chmod g+s <filename> to set this protection bit on.)

=========================================================================
49) How come the system() system call does not execute a privileged
command when called from a program which is setuid'ed to root?

The system() system call allows one to execute a command from within a
running program.  It works by forking and execing a new shell process
to execute the command, while the original process waits for the new one
to complete.

setuid privileges are not propagated through the system() system service
call.  Programs called through that mechanism will only run with
privilege if they, themselves, have their setuid file protection bit
set, are owned by root, and call setuid() inside.

This is a security feature. It prevents the exec() system service call
from being able to run anything arbitrarily, with privilege.

=========================================================================
50) What causes defunct processes?  What to do about them?

When processes terminate or are terminated, they clean themselves up,
but wait for the parent to pick up the exit status.  If the parent does
not pick up the status, the child will remain a zombie (defunct) forever.

Defunct processes take up no resources except a process table slot.
Their memory has been freed.

Processes will not become zombies if their parent is set up to not wait
for children status.  Parent processes can register that their SIGCLD
signal be ignored, or sigaction can be called by the parent to configure
itself so that the child processes it creates have the SNOWAIT flag set.

Please see FAQ 1849 or SRDB 6348 for more information.

=========================================================================
51) My machine crashed.  What is a panic?

A panic is what the kernel does when it has detected a fatal error and
cannot continue execution.  It shuts down the system in as graceful a
way as it can, while going down very quickly.  Sometimes panics occur
because the operating system has detected an inconsistency in its data
structures and shuts down the system to prevent corruption.  Other
times, panics occur because an unexpected event occurs, such as a divide
by zero or trying to reference something at an address near zero.

Panics always say why the system is going down.  A system is not
panicking if it leaves no indication of what is happening (i.e. if the
system just stops).

Panics attempt to sync the filesystems before halting the system;  this
is to prevent filesystem corruption.  Syncing the filesystems writes out
all pages modified by the operating system which have not made it back
out to the disk.  Without this operation, the disk may be in an
inconsistent state, as some of the pages may have been written out and
other may not have been.

Panics can be caused by either hardware or software.  Inconsistencies
may be due to software bugs, or to uncooperative hardware.  The cause
must be determined on a case by case basis, usually by looking at a
system corefile.  If, for example, one finds inconsistencies in the
system data structures when examining a core file, this usually
indicates a software problem; finding a stuck bit in a CPU register
which leads to following a bad pointer indicates a hardware problem.
It all depends on what is found.

One place to start, in examining a corefile, is to get a stack traceback
and to try and correlate it with a bug or a patch using Sunsolve.  If
the stack traceback in the bug report is the same or almost the same as
the traceback in the corefile, have the customer install the relevent
patch, if one exists.  (Two adb commands to get a stack traceback are

  sp$<stacktrace		(long and accurate)
  $c				(short but possibly inaccurate)

=========================================================================
52) What causes a watchdog reset?  How can it be distinguished from a panic?

A watchdog reset is when a system takes a trap in the middle of handling
another trap, while it has its trap handling disabled.  There is a
small window, immediately after the system takes a trap, when initial
processing must be done and another trap cannot be accepted.  The
system cannot do anything with the new trap because trap handling is
disabled, so the system quits.

Watchdog resets leave the system at the OK prompt without panicking or
syncing the file systems.  Typing "sync" can be done;  the likelihood
of
it working is small.  One can issue a few commands to dump registers and
state to the screen (so they can be copied down onto paper), or just
reboot the machine.

The commands to dump state from the OK prompt on newer systems are:

.registers - this dumps kernel internal CPU registers.

.locals - Dumps registers in the register window at the time of the
   crash.

.psr - Dumps the processor status register

ctrace - Displays stack, similar to $c in adb.

wd-dump - (on sun4d architectures) Displays the PC, which contains the
  location of the instruction which caused the crash.

If the system will boot, try adb'ing the live kernel and plugging in the
addresses returned from ctrace and wd-dump.  They may or may not make
sense, because the symbol table of the new boot may be different from
the table intact when the watchdog reset occured.  If it makes sense,
though, use that as a guide to figuring out in which routine the system
stopped, and whether it stopped in the same place each time.

If there are several watchdogs, apply the logistics of frequency and
randomness, to help determine whether it is a hardware or a software
problem.  If they started happening out of the blue, they are probably
due to a hardware problem.  If the ctrace of each is different than the
next, then it is probably hardware.  If the ctrace of several is the
same, then it could be software, etc.  (Please see the question "Is my
System Problem Hardware or Software" for more information on determining
whether a problem is hardware or software.)

Additional information: Watchdog FAQ

=========================================================================
53) What is a memory leak?  How can I tell I have one?

A memory leak is present whenever a program loses track of the memory it
has allocated and allocates more to replace memory it already has.  A
common example would be a C program calling malloc() to allocate memory,
then the same program calling malloc() to allocate memory to the same
variable again without freeing what it had malloc()'ed in the first
place:

	x = malloc (SOME_AMT_OF_MEMORY);
	x = malloc (SOME_AMT_OF_MEMORY);

The first allocation would be lost because the pointer to the second
chunk of memory would overwrite the pointer to the first chunk.

Memory leaks present themselves as systems running out of swap or out of
kernelmap, depending on what type of memory leak it is.

Applications or non-kernel code (including daemons) which have memory
leaks will eventually use up all swap space.  The /tmp directory will
shrink to almost nothing and will show full, because the primary swap
area is used for both the /tmp directory and swapspace with swapspace
taking precedence.  Programs will bomb with "out of memory" (ENOMEM)
errors.  The system may run slower and slower until it comes to a stop;
all processes requiring swap to continue running will wait for it
forever.

Both ps commands (/usr/ucb/ps -aux or /usr/bin/ps -efl) show a SZ
column, which displays the amount of virtual space consumed.  A program
with a memory leak will show more and more virtual space consumed as
time goes on.

Memory leaks in the kernel manifest themselves in the same way: a system
which runs slower and slower until it comes to a stop.  A coredump of a
kernel which is out of memory will show threads waiting for memory (adb
$<threadlist command), and will show little or no kernelmap (the crash
map/kernelmap command).  There will be lots of errors showing up in the
crash kmastat command.  Crash kmastat output can help pinpoint the
problem: look for lines corresponding to memory pools which have large
memory-in-use values and perhaps lots of allocation attempts (both
successful and unsuccessful).

The best way to debug a memory leak is to turn on kmem_flags in the
memory allocator.  See the relevent question in this document for
information on how to do this.

=========================================================================
54) How to tell why a system hung?

Systems hang usually when they run out of a resource, such as kernelmap.
A more general case is when one process does not release a lock, as when
waiting for a resource, and other processes either wait for that lock,
or for other locks held by that process.

Systems which are hung in the proper sense have no processes running as
all are waiting for a resource or for each other.  Processes waiting for
something will have a WCHAN specified in the output of the "ps -efl"
command.  With adb, using the address in the ADDR field of the ps
output, get the proc structure, then the list of threads of that
process.  Dump the stack of each thread, to see what it is waiting for;
this will provide a clue as to what WCHAN represents (mutex, conditional
variable, etc).  Find the owner (thread) of whatever it is (the mutex
owner, etc);  then dump the stack of that owner thread to see what it is
waiting on.  Follow this up the chain until the thread being checked is
waiting on a resource instead of a lock, or until a deadlock is detected.
(Such a deadlock might be one thread waiting for the lock held by
another thread, while the second thread is waiting for a lock held by
the first thread.)

For more details on hang analysis, please consult infodoc 14138, "kernel
tips: Use of the ps command in corefile analysis"

=========================================================================
55) Is my system problem hardware or software?

Knowing when the system started crashing, the frequency of the crashes,
what is going on when the system crashes, plus the consistency (or
randomness) of the crashes, can help determine whether hardware or
software is to blame.

The circumstances during which (or after which) the system started to
crash is important in determining whether the problem is hardware
related.  Hardware problems tend to be more random, change in frequency
and can start out of no where.  Software problems tend to be more
predictable and methodical.

Hardware problems often occur after some trauma to the hardware.  This
includes power failures, hardware modifications, hardware additions, and
improper handling.  A problem which surfaces after no hardware or
software changes is also a good bet to be hardware related, as hardware
wears out.

The frequency, consistency and timing of crashes is a telltale sign of
whether the problem is hardware or software related.  Randomness,
increasing frequency, and/or correlation to temperature or other
conditions is a sign of hardware problems.

The trait of a software problem is consistency.  Software problems are
usually replicatable, and start with a reason (i.e. a change to
software, as opposed to "out of the blue" which is a trait of a
hardware
problem).

Logfiles are a big help in determining problems.

The /var/adm/messages file is where errors get logged.  Many a panic or
problem condition is prefaced by messages in the messages file.  If
there are panics with ufs (such as "freeing free block"), and there are
disk errors in the messages file, suspect the hardware before the
software.  The software cannot work without cooperative hardware.  If
there are messages that the system is running out of memory before a
hang, perhaps there are messages from the routines needing the memory
which can provide a clue as to who is hogging it.

For more information, please see infodoc 14133, "Kernel Tips: Is System
Crash Due to Hardware or Software?"

=========================================================================
56) What is savecore?

A system saves a copy of its memory when it goes down in a panic.  This
information is saved so that it can be used in determining the state of
the system during the panic, to diagnose the problem which caused it to
go down.  The system saves the memory copy in an area which will be
overwritten once the system is rebooted and is fully operational again.
Savecore is a program which runs at boottime to salvage this system
coredump information before it would be overwritten, format it and save
it as a set of regular files in a form usable by adb.

Savecore is not enabled by default, because coredumps can be very
large.  Savecore is enabled by uncommenting the lines which run it from
the boot scripts.

On 2.X, the file containing the savecore command is /etc/init.d/sysetup.
There is a soft link to this file from /etc/rc2.d/Sysetup.  Uncomment the
savecore command at the bottom of the file, by removing the "#" from
the
bottom 6 lines:

  ##
  ## Default is to not do a savecore
  ##
  #if [ ! -d /var/crash/`uname -n` ]
  #then mkdir -m 0700 -p /var/crash/`uname -n`
  #fi
  #                echo 'checking for crash dump...\c '
  #savecore /var/crash/`uname -n`
  #                echo ''

For 4.X, the file is /etc/rc.local.  Uncomment the last 4 lines of this
cluster:

  #
  # Default is to not do a savecore
  #
  # mkdir -p /var/crash/`hostname`
  # echo -n 'checking for crash dump... '
  # intr savecore /var/crash/`hostname`
  # echo ''

Make sure, in both cases, that there is adequate room in the partition
where /var is located.  If not, find any locally mounted partition other
than /tmp, which has room.  Generally, a coredump can take up to 35% of
the size of main memory (RAM).

Note: there is no need to reboot after enabling this.

=========================================================================
57) How large can a coredump be?

Generally a coredump is estimated at 35% of the size of RAM.  There are
limits as to how big they can be, though, since only the kernel memory
and resident process information is saved.  The size of kernel memory is
limited as follows:

Solaris 2.4:
    sun4c 33MB
    sun4m 61MB
    sun4d 139MB

Solaris 2.5:
    sun4c 33MB
    sun4m 100MB
    sun4d 251MB
    sun4u 2525MB

This information was taken from SRDB 13900.  Please consult this SRDB
for more up-to-date information.

=========================================================================
58) How to change the default dump device

The kernel configures the default dump device to be the first configured
swap device.  This is usually the primary swap partition.

Sometimes the primary swap partition can be unusable as a dump device.
It may be too small, or may be a metadevice (a virtual device made up of
many actual disk drives, as with DiskSuite or Veritas).

Normally, the first configured swap device is the primary swap device as
designated in /etc/vfstab.  If the desired dump area is added as a swap
device with the "swap -a" command before the vfstab file is consulted,
the kernel will be fooled into setting up the dump area using the
desired location, as it is technically the first configured swap device.
The desired dump area can then be unconfigured as a swap device using
swap -d, before the /etc/vfstab file is consulted for the **real**
primary swap file.

Please see intsrdb 11964 for further information.
=========================================================================
59) How can I manually crash my system and get a coredump from it?

A system with a Sun console keyboard can often be panicked manually by
depressing the STOP key (usually near the upper-left corner of the
keyboard) while depressing the "A" key at the same time, to get to an
OK
prompt, and then typing "sync" at the OK prompt to panic the system.
(Systems with a dumb terminal as a console can hit the "break" key in
lieu of STOP-A.)

This is done when a system is hung (that is, when it is unresponsive to
any input -- except, hopefully, STOP-A).  The panic string, which is
printed on the console,  in the messages file, and the dmesg buffer,
will give "zero" as the reason for the panic in this case.  This is a
signature by the system that the system was panicked in this way,  It is
a safe bet that the user of the system thought the system was hung when
he/she brought it down in this way.

=========================================================================
60) How come the system won't produce a coredump when it crashes?

Systems sometimes hang when going down in a panic, when trying to
produce a coredump.  Here is a list of potential obstacles to getting a
coredump:

a) The swapfile is too small to hold the coredump.  Coredumps are
usually ~35% of the size of main memory.  Modify the primary swap
partition to be larger, or change the default dump device to be a
different, larger area.  (See the question on modifying the default dump
device for more information.)

b) The dump area (usually primary swap file) is too big.  It must be
less than 2 Gb in order for the dumping mechanism to work.

c) The dump area is not on a regular disk.  If it is on a
meta-filesystem (managed by DiskSuite or Veritas) which is a logical
disk made up of many physical disks, the software which coordinates the
logical disk may be hosed at the time of the panic, which would prevent
the coredump from taking place.

d) The dump area is not a local disk.  NFS may be hosed by the time a
panic occurs, preventing a core to be dumped to a remote system.

e) A problem exists with the hardware involved to produce a coredump.
For example, the disk drive onto which the coredump would be saved, or
the SCSI bus to which it is connected, is locked up or nonfunctional.

f) The area where savecore saves the corefiles is too small or cannot be
accessed.  Make sure it is large enough, and is on a partition listed
in vfstab.

g) Savecore is not enabled.  See the question on how to enable
Savecore.

=========================================================================
61) How to get a coredump if STOP-A doesn't work?

STOP-A sends a low-priority interrupt (level 1) to the system.  What if
a higher-priority interrupt is what is locking up the system?  One can
unplug then replug the keyboard of the newer systems (Sun4m, sun4d,
sun4u) to send the system a higher-priority interrupt (level 12) to get
the system's attention and drop it to the OK prompt.  If that does not
work, one can disconnect the fans on sun4d systems, which will send a
(top-priority) level 15 interrupt.

Once at the OK prompt, one can type "sync" to panic the system.

Note that if the screen is completely black, chances are that a
hardware problem has so completely hosed up the system that all that can
be done is to power-cycle it.

=========================================================================
62) How to send a tape of system configuration and corefiles in to Sun
Service for analysis?

Place all files to submit into a single directory.  The directory where
the corefiles are already is a good choice, so that they won't need to
be moved.  (They are huge and there might not be room elsewhere on the
system.)

  cd /var/crash/my_system	# or wherever the corefiles are
  cp /var/adm/messages .
  showrev -p > showrev.out
  etc
  etc

From there, tar them onto a tape:

  tar cvf /dev/rmt/0l *

Mail the tape to

  Support Service Group
  SunService
  Mailstop MTV07-203
  2700 Coast Ave
  Mountain View, CA 94043

if the engineer assisting you is on the west coast (phone
number is in 415 area code), or

  Support Service Group
  SunService
  Mailstop UCHL02-105
  2 Omni Way
  Chelmsford, MA 01824

if the engineer assisting you is on the east coast (phone
number is in the 508 area code).

Do not submit a tape before logging a service order with Sun Service.

=========================================================================
63) How to FTP files to Sun Service?

Place all files to submit into a single directory.  The directory where
the corefiles are already is a good choice, so that they won't need to
be moved.  (They are huge and there might not be room elsewhere on the
system.)

  cd /var/crash/my_system	# or wherever the corefiles are
  cp /var/adm/messages .
  showrev -p > showrev.out
  etc
  etc

Create a tar file from these files.  Call the tar file by the service
order number.  (Do not FTP files to Sun Service before logging a service
call.)

  tar cvf 2345678.tar * 

Then compress the tar file.  Use gzip if available, otherwise use
compress.

  gzip 2345678.tar
or
  compress 2345678.tar

gzip will generate a file 2345678.tar.gz;  compress will generate a file
2345678.tar.Z.

Then log into sunsolve1 to place the compressed tarfile.

$ftp sunsolve1.sun.com
login: anonymous
password: your_email_address
ftp> cd cores
ftp> bin
ftp> put 2345678.tar.Z
ftp> quit

Do not ftp any files to Sun Service before logging a service order with
them.

Infodoc 14230 (enabling savecore and sending corefiles)
Infodoc 11835 (what to send)

=========================================================================
64) How to retrieve files off of sunsolve1 for analysis.

From inside Sun:

  cd to the directory where the corefiles will be placed.
  $ Iftp sunsolve1.sun.com
  login: <Sun Service Personnel Login Name>
  password: <Sun Service Personnel Login Password>

At this point, the engineer will be in the directory where the corefiles
were placed by the customer.  An ls -l will work to display the files
there.  (No ls can be done by the anonymous login, as a security
measure.)  Retrieve the files:

ftp> bin
ftp> get 2345678.tar.Z (or whatever)
ftp> quit

Then uncompress and untar the file.  Verify its soundness before
deleting the file off of sunsolve1.  (Perhaps do a /usr/bin/sum on the
tarfile, have the customer to the same, and compare them.)  Files left
in the cores directory of sunsolve will be deleted automatically after 3
days.  They may be deleted manually as:

ftp> del 2345678.tar.Z (or whatever)

Note that customers cannot overwrite or delete files on sunsolve1, nor
can they look to see what those files are.

=========================================================================
65) What is iscda?

Iscda is a script which runs a preliminary analysis on a corefile.  It
issues common commands used during corefile analysis, and gathers other
information requested by kernel engineers to aid them in getting a full
picture of the system.

Running Iscda is useful at secure sites where classified information may
prevent a customer from submitting a core file proper.  Iscda usually
gives enough information to at least get started on an analysis;  in the
case of a classified site, it can often provide the kernel engineer with
enough information to know which commands to have the customer run next.

Iscda is available for 2.X only.  It is available on the Sunsolve CD,
and from the web at http://sunsolve1.sun.com/sunsolve/freeinfo.html
(as of 9/96).

=========================================================================
66) What is adb?  How is it used to examine corefiles?

Adb is the debugger used to read corefiles.  It requires both the
symbol table and a corefile proper.  It is invoked on a system
crashdump normally as:

# adb -k unix.# vmcore.#

It does not provide a prompt.  It will leave the cursor on a blank line
after displaying the amount of physical memory present.  From there,
commands to dump kernel data structures and variables can be issued.

Generally, commands to dump variables and data structures are of the
format:

  variable,qty/format

where qty is a number of things to dump, beginning from "variable," and
format is how the output is displayed.

For example, 

  maxuprc,2/D

dumps 2 longwords beginning with maxuprc in decimal format:

maxuprc,2/D
maxuprc:
maxuprc:	485
phys_msgbuf:	8192

The most common format specifiers are:

  D - Long decimal
  d - short decimal
  X - Long hexadecimal
  x - short hexadecimal
  s - string
  Y - Long date format
  i - Machine instruction

A bunch of instructions starting at a particular location may be
dumped.  The following dumps 5 machine instructions starting at the
beginning of the fork() routine:

fork,5/ai
fork:		save	%sp, -0x60, %sp
fork+4:		clr	%i2
fork+8:		clr	%i3
fork+0xc:	call	cfork
fork+0x10:	restore

Data structures may be dumped in this way too.  The header files in
/usr/include... may be used along with the dump to figure out whether
information dumped is reasonable or is invalid  (i.e. a NULL
pointer,etc).

Adb has macro support also.  The system has a set directory where those
macros are, and what it is depends on the system architecture.  Macros
often print fieldnames of structures as well as their values.  Macros
are invoked as $<macro.  For example:

$<utsname
utsname:
utsname:	sys	SunOS
utsname+0x101:	node	kimosabe
utsname+0x202:	release	5.4
utsname+0x303:	version	Generic_101945-37
utsname+0x404:	machine	sun4m

prints the utsname structure, including field names and values.

=========================================================================
67) Can adb be used on live kernels too?

Invoke adb on a 2.X live kernel as root, as:

  # adb -k /dev/ksyms /dev/mem

or on a 4.X live kernel as root, as:

  # adb -k /vmunix /dev/mem

=========================================================================
68) How is a kernel patched live?

Variables may be changed on a live kernel using adb.  Adb must be
started differently in order to enable writing:

  # adb -w -k /dev/ksyms /dev/mem
 
or on a 4.X live kernel as root, as: 
 
  # adb -w -k /vmunix /dev/mem 

The command to write the live kernel has the following format:

  variable/Whex

where "hex" is a hex number.  For example, typing this:

  maxusers/W20

produces output to show that maxusers was changed from 0x1e to 0x20:

  maxusers:	0x1e		=	0x20

This can be used **sometimes** to test changes before making them
permanent.  However, some variables may be changed with no effect, and
changing others may cause the system to panic.

=========================================================================
69) Can a kernel be patched permanently using adb?

Yes, but this is NOT recommended, because there is no way to force
documentation of the event.  Something could be changed in this way,
causing something else in the kernel to break, and no one would ever
suspect to look for such a cause, because there is no obvious place
where this would be documented.  No one would ever find the cause of the
problem.  This type of thing could create headaches for both the
customer and tech support.

To do this, though, invoke adb with the "-w" flag as above, and issue a
command of the following format:

  variable?Whex

where "hex" is a hex number.

=========================================================================
70) What is a deadman kernel?

A deadman kernel is one which can detect that it itself is hung, and will
shut itself down.  It is used when neither a STOP-A nor an unplugging of
the keyboard will drop a hung system to an OK prompt.  Without an OK
prompt, one cannot get a coredump by typing "sync".

You can enable the deadman kernel on sun4m, sun4d and sun4u systems
by adding the following line to the /etc/system file, and rebooting:
set snooping = 1
This only works if you have a minimum kernel patch installed as follows:
Solaris 2.4   101945-45
Solaris 2.5   103093-08
Solaris 2.5.1 103640-04

Infodoc 13258 contains more information.

=========================================================================
71) What is the kernel memory debugger?  How is it enabled?

The kernel memory debugger is built into Solaris 2.4 and higher.  It
detects memory corruption and logs information relevant for determining
where memory leaks are.

Corruption detection.

When it detects corruption, it will stop the system.  Corruption
detection includes use of freed memory, freeing of bogus addresses,
redzone checking (allocating an extra page at the end, which is not
mapped, which will force bus errors if written).

Memory leak debugging.

Each time a call is made to kmem_alloc() or a related memory allocation
routine, the debugging code records the stacktrace.  It maintains a list
of the number of times a particular stacktrace has been called, and the
number is decremented when kmem_free() is called.  Memory leaks are
debugged by panicking the system with STOP-A and sync, (or unplugging
the keyboard, or with a deadman kernel) when the system runs out of
memory and hangs.  The kma_users command in the "crash" program can be
used to print out the stacktrace information gathered by the kernel
memory debugger.  (The kma_users command is available with the 2.5 or
higher version of crash.  2.4 versions with this command do exist within
Sun, but must be requested through Sun Service.

How to enable the kernel memory debugger.

It can be enabled through adb on a boot by boot basis:
See infodoc 12172.

  # halt
  ok boot kadb -d
  kadb:	(type return)
  kadb[0]: kmem_flags/W 1f
  kadb[0]: :c

For 2.5 , deposit "f" instead of "1f" into kmem_flags;  that is

  ok boot kadb -d
  kadb: (type return)
  kadb(0)> startup:b
  kadb(0)> :c
  kadb(0)> kmem_flags/W f
  kadb(0)> :c

What the flags mean on 2.4:

  0x01	KMF_AUDIT	turn on transaction logging/auditing
  0x02	KMF_DEADBEEF	overwrite free'd memory, verify on allocation
  0x04	KMF_REDZONE	detect writes past end of buffer
  0x08	KMF_UNMAP	deallocate arenas when all buffers are free'd
  0x10	KMF_VERIFY	verify that free'd addr was allocated

What the flags mean on 2.5:

  0x01	KMF_AUDIT	turn on transaction logging/auditing
  0x02	KMF_DEADBEEF	overwrite free'd memory, verify on allocation
  0x04	KMF_REDZONE	detect writes past end of buffer
  0x08	KMF_VERIFY	verify that free'd addr was allocated


It can be enabled by patching the kernel (though not a live kernel):

  # cp /kernel/unix /kernel/unix.orig
  # adb -w /kernel/unix
  kmem_flags?W 1f	(or f if 2.5)
  $q
  # reboot	(booting kadb not necessary)

Patching the kernel in this way will enable debugging as long as this
kernel is the one booted.  to disable debugging, move the original
/kernel/unix.orig back to /kernel/unix and reboot with it.

=========================================================================
72) What is truss?  How to see what's going on in a program?

Truss is a debugging tool which provides insight into how a program
operates, by printing out system service calls, along with their
arguments and return statuses;  faults; and signals.  As such it is
extremely useful for debugging errors and for figuring out how programs
work.  (The command is called "trace" in 4.X.)

Truss is easy to use:  just prepend the word "truss" to any command
(including arguments) in its simplest form, and output abounds.

How truss can be used to debug program execution.

Generally, when a program terminates abnormally but does not crash, it
is because a system service call returned an error.  This may be due to
a programming error, or may be due to circumstance.  Truss can help
determine the cause of the problem.

Truss can also help debug where a program crashed by providing a trace
of the system service calls up until the crash.  This output, plus
source code, can help determine where the program crashed.

Debugging a program with a circumstancial error.
 
For an example of a program with a circumstancial error, suppose a
program exits with an error condition of "No such file or directory."
Truss can show which file the program was looking for by listing all of
the open() system service calls made by the program, including those
which failed.

Debugging a program with a programming error.
 
Programming errors may be found by first searching truss output for
error return statuses, and then checking the arguments to the system
service calls made.  System service calls bomb when their arguments are
incorrect or unexpected.  Most system calls bomb gracefully, that is,
without crashing the program. When this happens, though, they will leave
the variables they affect in unexpected states which can cause the
program to crash.  Trying, for example, to malloc() an extremely large
chunk of memory may force malloc to return a NULL pointer instead of a
pointer to memory.  If that NULL pointer is accessed later on in the
program, the program will crash.  Looking through the truss output,
seeing a NULL returned from malloc() and seeing an abnormally large
argument passed to it in this case, indicates what happened.

Getting insight into how factory Solaris programs work.
 
Ever wonder how ps works?  Trussing it will tell how.  Trussing programs
which do certain things can help one discover new system calls which
accomplish those tasks.  Truss can therefore be a very valuable
educational tool.

All system service call return statuses reported by truss are listed in
/usr/include/sys/errno.h.  All signal definitions are in
/usr/include/sys/signal.h.

Please see infodoc 14141 for more information, and examples of how to
use truss.

=========================================================================
73) The system returns a "data access exception" error on probe-SCSI

This can be caused by the following (as taken from SRDB 5034):

  1. The user typed "probe SCSI" (with a space) instead of probe-SCSI
  (with a "-")

  2. CPU

  3. External device or cable

  4. Error in low memory (replace SIMMs, particularly those in bank 0)

  5. NVRAM

  6. SBus device

=========================================================================
74) Some of my disk drives and/or tape drives are missing at boot or
have gone offline.

If the disk drives are metadevices under Solstice Disk Suite control,
verify that the disks have not been moved around.  Disk Suite uses hard
device links (the /devices/... tree) to access disks.  Proceed to the
next step if this is not the case:

Verify the SCSI bus is OK by doing the following:

  a) Shut down the system in as normal a fashion as possible.

  b) Power off all components on the SCSI bus containing the "missing"
  devices, including the computer.  (Shut down the computer first.)

  c) Reseat all SCSI cables between devices.  Make sure they are all
  tight.

  d) Make sure the bus is terminated properly.  A properly terminated
  bus has one terminator at the end of its bus, and no more.  A bus
  which has internally-terminated devices in the middle of it can be as
  problematic as a bus with no termination.

  e) Power on all components powered off in (b), starting with the
  device at the end of the bus, turning the computer on last.

  f) Do a "probe-SCSI" command if there is only one SCSI bus.  Output
  should list each device on the bus, with no duplicates of any kind.

  g) On systems with more than one SCSI bus, do the following from the
  OK prompt:

    OK printenv auto-boot?
    OK setenv auto-boot? false
    OK reset
    OK probe-SCSI-all

    Output should list each device on each bus, with no duplicates.
    Once this is verified, reenable "auto-boot?" if it was enabled
    before, then reset and reboot.

If all devices show up with probe-SCSI but do not show up on the booted
system, try a boot -r.

=========================================================================
75) What to do if new devices time out on the SCSI bus:

First verify that the cables are seated properly in the devices, that
they are not too long (6 meters for non-differential SCSI, 25 meters for
differential SCSI), and that they are properly terminated (including
verifying that multiple termination does not exist on the bus due to
internal termination on some devices).

The next thing to do is to install the appropriate SCSI driver patches
for esp and isp adapter drivers; sd disk driver for disk drives, or st
tape driver for tape drives.

If this does not take care of the problem, consider tweeking the
configuration file for the esp and isp adapter device drivers.

The esp and isp are the SCSI adapter device drivers for 2.X.  Each of
these has a configuration file.  Drivers and configuration files are
located in /kernel/drv.  Configuration files are optional for these
drivers;  if a configuration file does not exist for the driver, the
driver takes defaults.  Defaults may be changed via /etc/system;  please
see srdb 10254 for info on how to do this.

Option bits are defined in /usr/include/sys/SCSI/conf/autoconf.h.

There are several option bits settable in esp.conf and isp.conf,
including fast, sync and tagged queueing.  If the problems happen under
load, then disabling one or more of these may make the problem go away,
but at a performance cost.  Some SCSI options are:

a) Fast SCSI supports transfer rates of between 5 - 10 MHz.

b) Synchronous support refers to a modification to base SCSI protocol
timing allowing for faster transfer rates.

c) Tagged queueing supports multiple commands sent to a given device.
The device manages all requests and can order them for optimal
performance, returning a tag to the computer to let it know which
command has completed.

Please see the man pages for esp(7), isp(7), and infodoc 10254 for more
information.

=========================================================================
76) How to restore devices in /dev directory on Solaris?  SunOS?

Solaris 2.X

Almost all devices in the /dev directory are links to special files in
the /devices directory.  The easiest way to regenerate these links is to
"boot -r" or run devlinks(1M).  If a special file in the /devices tree
is missing, do a "boot -r" or run drvconfig to configure the particular
missing device.

Please see the man pages on drvconfig and devlinks for more information,
and for where else to look for more information.

SunOS 4.X

A script called MAKEDEV exists in the /dev directory of SunOS 4.X
systems.  This script will invoke the mknod command to create all
special devices in the /dev directory so that their corresponding
devices can be used.  The command "MAKEDEV std" is used to create all
standard devices;  there will be error messages for all devices which
exist already, but these are harmless.  Invoking MAKEDEV for a
particular device is described at the top of the MAKEDEV file.

Please see the man pages for MAKEDEV(8) for more information.

=========================================================================
77) How to relate disk errors in /var/adm/messages file (i.e. sd0) to
real devices (cwtxdysz)?

Look at a bootup sequence in the /var/adm/messages file, messages* file
or dmesg output to relate the disk in the error message to a device in
the /devices tree (which will end in a name with an "@", for example
sd@3,0).

Go to the /dev/rdsk directory and do an "ls -l".  Look for the entries
there which point to the proper device in the devices tree:

cd /dev/rdsk
ls -l | grep sd@3,0

There may be several matches, but all will have the same cwtxdy part of
their name.

Matching veritas virtual file systems to cwtxdysz devices can be done
using the vxprint -ht command.  Matching DiskSuite virtual file systems
may be found with the metastat command.

The df command may be used to relate either metadevices or cwtxdysz
devices to filesystem names.

=========================================================================
78) What does "zs silo overflow" mean? "zs3 ring buffer
overflow"?

The zs device manages keyboard and ttya/b I/O.  Very small hardware
buffers exist for each device, and it is up to the OS to respond
immediately and move data out of the hardware buffer into a larger
software counterpart.

A silo overflow occurs when the hardware buffer is not emptied quickly
enough.  Silo overflows usually indicate that something else is wrong
with a system, because the CPU is so overloaded that it cannot get
around to processing keyboard interrupts fast enough to keep the
hardware buffer empty.

A ring buffer overflow occurs when a write to a software buffer (an mblk
structure) is attempted when it is already full.  Like the silo
overflow, this error usually indicates an overloaded system which
cannot process its serial port input in a timely fashion.

See the man page on zs(7) for more information.

=========================================================================
79) The console keyboard is in a weird state;  typing produces garbage.

Try issuing a "kbd_mode -a" command from another terminal to reset it.

=========================================================================
80) My system keeps repeating "proc table is full" (4.X) or "out of
processes" (2.X).  What does it mean, and what is the cause?

If the message is not continuous and there is only a small amount of
memory on the system, it is possible that the process table just filled
up, and needs to be extended.  To do this, edit the /etc/system file and
increase the value for maxusers.  Please see the question on maxusers
for more information.

If the message is continuous...

This situation is caused by a runaway process which is continuously
creating new processes.  The key here, is that the message keeps
repeating itself, indicating that new processes are CONTINUOUSLY being
created.

First confirm that there is a runaway process by executing in the
background a quick, simple command a few times in rapid succession.
For example,

$ date &
[1] 3976
Mon Aug 26 09:51:08 PDT 1996
$
$ date &
[1] 3977
Mon Aug 26 09:51:18 PDT 1996

This will print a process ID number then execute the "date" command. 
Do 
this a few times and notice whether or not the process IDs between two
date commands are (approximately) consecutive, or have a great disparity
between them.  If the latter case is true, there are lots of other
processes being created between the consecutive executions of the
"date"
command.

(In the example above, the two process IDs, 3976 and 3977, are
consecutive, indicating there are no runaway fork processes because no
other processes were created between the executions of the "date"
command.)

If there is a runaway process...

Use the ps command to find the runaway process.  Do it as root, because
the OS reserves a process for root use only, even while the process
table will be full for normal users.

For 4.X, do

  # ps -alx | more

For 2.X do

  # ps -ef | more

and look for many processes with the same parent PID (PPID column).
Their common parent may be the runaway culprit process.  A supporting
observation is if many of the processes have consecutive PIDs.

Kill the suspected culprit process and see if the messages go away.  If
they do, the problem process has been found.

Why the culprit process acted as it did will depend on the program.
Each one will be different.  Investigate as appropriate.

=========================================================================
81) Where are common error messages and return statuses listed?  Signals?

All system service call return statuses are listed in
/usr/include/sys/errno.h.  All signal definitions are in
/usr/include/sys/signal.h.

These are useful, for example, when going through truss output.

As a side note, there is a good infodoc, 11371, which contains the error
outputs of many commands.

=========================================================================
82) Processes won't run on one system but will run on others

The total virtual space, which is a combination of RAM and swap space,
dictates how many things can run on a system.  Programs which allocate
huge arrays will use lots of virtual memory.  Systems must have that
much memory to support such programs (in addition to other memory for
supporting any other programs running at the same time);  if they don't,
they may say "killed" shortly after they are started, or will bomb with
"out of memory" (ENOMEM) errors.  Add more swap space.  (Please see the
question on swap for more information on how to do this.)

=========================================================================
83) The system says "out of memory" (ENOMEM) when I try to run
processes,
yet there is plenty on my system.  The system has no swap configured,
though.  What's going on?

Systems which run with no configured swap space may pose another
problem.  Sometimes programs will not run, complaining of being out of
memory, although the system has plenty.  Some applications may call
memcntl(2), a system service which can mandate that a backing store
(a.k.a. swapfile) be written when its analogous main memory area is. If
there is no swapfile, there is no backing store, so the application
complains.

The solution in this case is to create a temporary swap file.  This can
be done with the mkfile(1M) and swap(1M) commands.

=========================================================================
84) "Process killed" is displayed promptly on the console when
execution
of a process is attempted.

Virtual space includes both swap space and main memory (RAM).  Programs
will run only when enough virtual space exists.  The message shows when
the system knows at the outset that there is not enough virtual space to
run the program.  Remedy the situation by adding more swap or more RAM,
or by reducing the size of the program.

Program size can be reduced, for example, by reducing the size of its
arrays.  Note that the size of the executable, as shown with an "ls -l"
command, will be smaller than the size the program occupies in virtual
space while running.

=========================================================================
85) What causes the message
    "rmallocmap: rmap overflow, lost [number,number]"
    and what can be done about it?

If semaphores are being used, try increasing the semaphore allocation map
(semmap parameter).  Each block of free semaphores uses one entry in the
map.  If the map is too fragmented, there will not be enough entries in
the table to hold all fragments, and some fragments will be dropped.

Please see the question on how to tune semaphores, or infodoc 2270 for
more information.

=========================================================================
86) When is the international version of libc needed, and when is the
domestic version needed?

All systems shipped from Sun come with the international version of
libc.  The difference between the two versions is an encryption kit.
Domestic customers can order from Sales the encryption kit (which costs
about $50) if they so desire.  Systems which have the encryption kit
installed will need the domestic version of libc.

One way to tell if the encryption kit is installed is to execute
the following command:

  nm /usr/lib/libc.so* | grep crypt

and see if a function _cbc_crypt exists in that library (it shows up
in the output).  The encryption kit is not installed if that symbol
is missing from that library.

NOTE: vi -x is not a good test to see if the encryption kit is
installed.  It will work even if the encryption kit is not installed;
it will use the same encryption code which passwd uses on all systems,
in this case.

If it complains about an invalid option, the encryption kit is not
installed.

=========================================================================
87) What is Prestoserve?

Prestoserve is a non-volatile RAM buffer used to cache disk and nfs
accesses.  It comes in two varieties: sbus and vme.

When a system crashes, it must be rebooted and the filesystems
prestoserve manages must be intact before the residual data left in the
prestoserve buffer is flushed out to disk.  It is for this reason that
great care must be taken when using prestoserve with meta-devices
managed by DiskSuite or Veritas;  these are constructed from software
as well as hardware components, and all components have to be put
together at boottime in proper sequence for everything to work properly.

Please see infodoc 11245 for a good paper on prestoserve.
=========================================================================
88) What has to happen at boot time in order for the Sparc Storage Array
(SSA) to work with Veritas or DiskSuite?  What order are the drivers
loaded?  What about Prestoserve and SSAs?

There are three drivers for the SSA itself.  These are:
  a) soc (Serial Optical Controller)
  b) pln (Pluto Host Adapter)
  c) ssd (Sparc Storage Disk)

There are the Disk Suite (ODS/SDS) drivers.  These have names of the
form md*.

There are the Veritas drivers, which have names of the form vx*.

The order drivers are loaded are:
  a) soc
  b) pln
  c) ssd
  d) Veritas or DiskSuite drivers.

This is because the soc must be loaded to establish the communication
bridge to the pln which resides on the SSA.  The pln driver is then
loaded before the ssd driver to establish the communication to the
individual disks before the ssd driver can establish control over the
basic physical disks.  Finally, the metadevice drivers (either DiskSuite
or Veritas) may be loaded after the complete hardware path to the
physical disk drives has been established.  The order drivers are loaded
is insured by forceloading them in the /etc/system file.

Prestoserve is not recommended for use with SSAs.  Prestoserve is
Non-Volatile RAM (NVRAM) which caches data;  this serves no useful
purpose with an SSA because the SSA contains its own NVRAM.  Enabling
both NVRAM implementations for the same file systems is only asking for
trouble in the form of hardware conflicts.  SSA NVRAM is enabled as
"fastwrite" capability.

Be sure that firmware version 3.9 or greater is used if fastwrites are
enabled.

=========================================================================
89) Where is a good matrix of patches and compatibility for the SSA?

Check out infodoc 12317, "SPARCstorage Array Software Configuration
Matrix," and check out the Sparc Storage Array Info Pages at
http://sunsolve.uk/FAQ/ssa.html (internal only).

=========================================================================
90) How to make sure a file has been written out to disk?

From within a program:

a) fsync(3C) is a system call which insures all modified parts of a file
(data and status) have been written out to disk before it returns.

b) fdatasync(3C) is a system call which insures all modified data of a
file has been written out to disk before it returns.

c) One can set flags in an open file descriptor using fcntl(2) to effect
an ongoing fsync(3C) treatment (the flag is O_SYNC) or to effect an
ongoing fdatasync(3C) treatment (O_DSYNC).

From the command line level, a sync(1M) will schedule for write all
modified blocks on a disk (all modified file data, inodes, superblock,
etc).  It will, however, return before the writes have completed.  Wait
several seconds for the disk activity to settle down before assuming the
disk has stabilized.

=========================================================================
91) How to undelete a file?

Solaris proper provides NO way to undelete a file.  Some third party
products, which run on top of Solaris, are available.

=========================================================================
92) What is the proper way of shutting down a system?

There are several proper ways of shutting down a 2.X system:

# init 0
This is the no-frills, clean shutdown.

# shutdown
This warns users to log out, and then does a clean shutdown.  It
ultimately invokes init 0.  This is the frilly, clean shutdown.

# halt
This kills processes, then halts the system.  This is not as clean as
init 0.

If no command prompt is available to shutdown, press the STOP and A keys
at the same time to get to the "OK" prom monitor prompt, then type
"sync" from there.  (Type the "break" key on a dumb terminal
console in
lieu of a STOP-A.)  If that does not work, see the question on getting a
coredump without a STOP-A for more information.

DO NOT power off or reboot the system without shutting down, unless
there is absolutely no other recourse, as file corruption can result.

=========================================================================
93) What are the various ways of booting the system?

One can pass flags as arguments while booting a system.  No flags passed
boots the system to multiuser mode, taking default startup parameters.

The following flags do the following things at boottime:

-s
This boots the system in standalone mode.  Normally, only / and /usr
file systems are mounted.  The network is not exported.  The scripts in
the /etc/rc2.d directory are run, but not those in /etc/rc3.d.

-a
This allows the user to specify paths to key modules used at boottime.
The system prompts for these as it boots, then loads the modules it
finds where it is told to look.

This flag is good for avoiding modules which crash the system while it
boots, for example, a buggy device driver under development.  Such a
driver would go in /usr/kernel/drv, which is a place normally searched
for device drivers.  However, one could specify the module path without
/usr/kernel (only /kernel, for example), so drivers would be checked
only at /kernel/drv and not /usr/kernel/drv;  the buggy driver would not
be found, would not be loaded, and would therefore not cause a problem.

Things prompted for include:
  a) The kernel name (/kernel/unix)
  b) The system configuration file (/etc/system)
  c) The default directory path for modules (normally /kernel
/usr/kernel)
  d) The device instance number file (normally /etc/path_to_inst)
  e) The root file system type (normally ufs)
  f) The physical (/devices...) name of the root device

Note: the defaults shown above are for 2.4.  They are different for
2.5.

-r
This flag requests a reconfiguration during bootup.  This rebuilds the
devices tree, and the links from the /dev directory to it.

-v
This flag specifies verbose startup messages, such as which drivers were
loaded successfully and which ones were not, etc.

Flags can be combined under a single dash, for example "boot -rsv".

=========================================================================
94) How to stop something from configuring (starting) at boot time.

The OS, when it boots, runs all the files beginning with a "S" in
certain directories (/etc/rcx.d), to bring up daemons and to finish
configuring the user environment.

The OS can be brought up to run levels 2 or 3.  Level 2 is standalone or
single user mode;  level 3 is multiuser mode.  By default, the system
comes up to level 3.

The /etc/inittab file dictates which directories' files get run for
which run level.  The first thing in each line of that file corresponds
to the directory level.  For example, having "s2" at the beginning of
the line means that line regards the /etc/rc2.d directory.  The second
item in each line states which run levels that directory's files get run
for.  The "s2" line, for example, has "23" for its second item,
meaning
that /etc/rc2.d's files get run for run levels 2 and 3.  The "s3" line
has only "3" for its second item, meaning that /etc/rc3.d's files get
run only while bringing up run level 3.

Stopping something from configuring at boot time entails either booting
to a run level which does not run the scripts in the directory that the
script file for that item is in, or renaming that script file to a name
which does not begin with a "S".  For example, to bring the system up
without the nfs server daemon running, either rename
/etc/rc3.d/S15nfs.server, the file containing the commands to start the
nfs server daemons, to /etc/rc3.d/xS15nfs.server, which does not begin
with a "S"; or boot to level 2 to avoid executing any files in the
/etc/rc3.d directory.

=========================================================================
95) When is it NOT a good idea to boot -r?

Be very careful doing a boot -r on a system with disks under Solstice
Disk Suite control.

Disks running under Solstice Disk Suite control are expected to be in a
certain place.  SDS uses the hardware device tree (/devices/...) to find
its devices.  If the controller of those disks has been moved to another
slot then SDS will not find its disks.

Sbus cards, even disk controllers, can be added, to sbus slots numbered
either below or above that of the first one.  There is no problem as
long as the /etc/path_to_inst file remains intact.  The main
issue is whether or not the controller of the SDS-controlled disks
appears in the same slot and are found in the same order as before
(the first controller found before is still the first controller found,
etc).  The order will be the same if the path_to_inst file is left
intact, because the system scans for new devices only, and will not
remove or reorder devices that it knows already exists.

If the controller card to SDS controlled-disks must be moved or the
path_to_inst file must be removed for some reason, the internal tables
used by SDS to find its disks must be regenerated in order to prevent
data corruption.  Disks must be removed with the "metaclear" command
and
readded (using their new locations) with the "metainit" command in the
same order as they were in the tables before.

Disks under Veritas control pose no problem, because Veritas configures
itself automatically, going out and checking each disk for placement,
each time it is started.  If disks are moved around, Veritas will find
each of them wherever it is, by looking everywhere for it.

"Regular" disks (not metadevices) are not a problem with boot -r.
However, as with any file systems, the vfstab may need to be updated with
proper device numbers to mount moved disks at their proper place
in the file system tree.

=========================================================================
96) How to startup and shutdown processors?

Systems running Solaris 2.X with multiple processors by default start up
with all processors running.  Processors may be taken offline and
brought back online again with the psradm command.  To take a processor
offline:

  /usr/sbin/psradm -f <processor number> [...<processor
number>]

and to bring back online:

  /usr/sbin/psradm -n <processor number> [...<processor
number>]

Specify "-a" instead of a processor number to affect as many processors
as possible (i.e. take all but one processor offline, or put all
processors online again).

Processors taken offline will not be given any scheduled task to run by
the scheduler, but can still take interrupts.

Please see the man page on psradm(1M) for more information.  Note also
that there are system service calls which perform the same tasks as
these commands.

=========================================================================
97) How to bind processes to a processor?

The pbind command binds all LWPs of a process (that is, every part
of a process) to a given CPU:

  /usr/sbin/pbind -b <processor_id> <process_id>

pbind can be used also to "unbind" a process with

  /usr/sbin/pbind -i <process_id>

and to query status of process bindings:

  /usr/sbin/pbind -q <process_id>

Please see the man page on pbind(1M) for more details.

=========================================================================
98) How to tell processor speed?

On Solaris 2.X:

  I)	/usr/sbin/psrinfo -v

  II)	/usr/platform/sun4d/sbin/prtdiag (sun4d only, 2.5)
				or
        /usr/kvm/prtdiag (sun4d only, 2.3, 2.4)

  III)	Look at the messages files (/var/adm/messages*) or the output of
	the "dmesg" command, during boot messages, toward the start of a
	boot sequence.  (Search for "Release" to get to the start of a
	boot sequence.)

=========================================================================
99) What are the various software priorities?

Dispatch priorities are divided into four main groups:

Priority
0-59		user mode timesharing
60-99		kernel
100-159		realtime
160-169		interrupt

Note that the scheduler is part of the kernel, and runs at a higher
priority than user processes do.  Realtime processes, however, run
higher than the scheduler, so it will not get preempted by the kernel.
Realtime processes run whenever they can, sleeping only when they are
waiting for a resource or for I/O to complete.

Note that if there are no realtime threads, the interrupt priorities
become 100-109.

=========================================================================
100) What are the scheduling classes of Solaris?

Scheduling classes each have parameters which govern how the threads in
that class get scheduled.  There are three scheduling classes:

  TS - timesharing: this includes user mode and kernel mode.
  SYS - system threads, such as the clock thread and swapper.
  RT - Realtime threads, which run at a priority above the scheduler.

Sometimes IA, or interactive, is viewed as another class, sometimes it
is viewed as a part of timesharing class.  It gets a priority boost
when running in the foreground.

These classes roughly correspond to the priority groupings as follows:

Class		Priority group
---------------	--------------
		user mode timeshare
TS, IA		--------------
---------------	kernel
SYS
---------------	--------------
RT		realtime
---------------	--------------
(no sched)	interrupt
---------------	--------------

User mode timeshare and kernel system calls which sleep on resources are
bound by timesharing scheduling.  Some kernel threads are in the SYS
class.  All realtime threads run at realtime priority and are in the
realtime scheduling class.

Interrupts are not scheduled, and so they do not have a scheduling
class.

These classes each have parameters governing their scheduling.  For
example, the timesharing class has the following parameters:

  ts_quantum: the time quantum (in units of 1/RES as displayed by
  dispadmin(1)) a thread can execute, given its priority.

  ts_tqexp: priority given to a thread after it is bumped due to a time
  quantum expiration.

  ts_slpret: priority given to a thread after it wakes up from having
  waited on a resource or for I/O.

  ts_maxwait: Number of seconds a thread has to use its time quantum
  before being assigned a new priority ts_lwait.

  ts_lwait: New user mode priority assigned to a thread if it has not
  used up its time quantum in ts_maxwait seconds.

The realtime class has only one parameter dependent on priority:

  quantum: the time quantum (in units of 1/RES as displayed by
  dispadmin(1)) a thread can execute before being bumped potentially by a
  thread running at the same priority.

Use the dispadmin(1M) command to modify these parameters by...

  a) getting the old parameters with the -g option:
       # dispadmin -c RT -g >rt_disp_settings

  b) modifying them by hand:
       # edit rt_disp_settings

  c) reloading them with the -s option:
       # dispadmin -c RT -s rt_disp_settings

The above changed the Realtime class ("-c RT").  The same can be done
for timesharing (TS) class. 

=========================================================================
101) What is the layout and dataflow of SCSI drivers?

Both Solaris and SunOS adhere to SCSA, which is a Sun standard.  SCSA
regards the layering of SCSI device drivers.  All SCSI devices are
connected to a SCSI bus which is managed by a host adapter.  Sun host
adapter drivers for Solaris are the esp or isp drivers.  SCSI devices
provided by Sun at this time include disks (hard and CDrom) and tapes.
The disk driver is called sd, and the tape driver is called st.
The sd and st drivers, as well as any other SCSI device drivers, must
communicate with their respective devices via their bus and its host
adapter.  They therefore do this by calling routines in their host
adapter driver.

The system configures itself so that it knows which host adapter driver
(and which host adapter) to use for a given device.  The disk and tape
drivers make generic calls, which are mapped automatically to the proper
driver.

The SCSI device drivers will call SCSI_transport(), for example, to
send a command down to their respective devices.  SCSI_transport() will
map to the proper routine in the proper host adapter driver.

=========================================================================
102) What is a self-identifying device?

Self-identifying devices have many device properties encoded in their
prom.  Solaris can extract these properties automatically.  Such devices
may also have a configuration files with additional properties.
Non-self-identifying devices require a configuration file to contain all
essential properties.

=========================================================================
103) What is a major device number?  A minor device number?

Every device driver has a corresponding major number.  This number is
used by the operating system to key into the proper device driver
whenever a device special file corresponding to one of the devices it
manages is opened.

All devices managed by a given device driver contain a unique minor
number.  Some drivers of pseudo-devices (software entities set up to
look like devices) create new minor devices on demand.

Together, the major and minor numbers define uniquely a device and its
device driver.

Device special files have a unique output when listed with the "ls -l"
command, which shows major and minor numbers:

cd /devices/pseudo
ls -l pts@0:1*

crw--w----   1 schwartz tty       24,  1 Sep 11 14:46 pts@0:1
crw--w----   1 schwartz tty       24, 10 Nov  9  1995 pts@0:10
crw--w----   1 schwartz tty       24, 11 Nov  8  1995 pts@0:11
crw-r--r--   1 root     sys       24, 12 Sep  9  1995 pts@0:12
crw-r--r--   1 root     sys       24, 13 Sep  9  1995 pts@0:13
crw-r--r--   1 root     sys       24, 14 Sep  9  1995 pts@0:14
crw-r--r--   1 root     sys       24, 15 Sep  9  1995 pts@0:15
crw-r--r--   1 root     sys       24, 16 Sep  9  1995 pts@0:16
crw-r--r--   1 root     sys       24, 17 Sep  9  1995 pts@0:17
crw-r--r--   1 root     sys       24, 18 Sep  9  1995 pts@0:18
crw-r--r--   1 root     sys       24, 19 Sep  9  1995 pts@0:19

		major number -----^^  ^^
				      ||----- minor number

All pts are managed by the pts driver, which is major number 24 in this
example.  Minor numbers are listed after the comma.

=========================================================================
104) How does the mechanism to add device software to a Solaris system
work?  What are the steps involved in adding a device?

Drivers for third-party devices are placed in /usr/kernel/drv, along
with any configuration file that the device may have.  Drivers and
configuration files may be placed in /kernel/drv instead, but this is
not recommended as this causes headaches if the driver crashes the
system on bootup.  Faulty drivers which crash the system but which are
in /usr/kernel/drv can be excluded from the boot sequence, if the system
is booted with the "-a" option (interactive boot) and /usr/kernel is
excluded from the module load path.

After installing the device driver and configuration file (if required),
the add_drv(1M) command is run to make the system aware that the device
is present.  It updates the following files:

  /etc/name_to_major - This file contains driver name to major number
  mapping.  Every driver has a major number;  the instances of the
  devices it manages each have their own unique minor number.

  /etc/minor_perm - This file contains permission, owner and group
  information used by drivers when creating new /devices entries (as when
  a device is accessed for the first time).

  /etc/driver_aliases - This file contains alternate names for device
  drivers.

  /etc/driver_classes - This file contains classes for device drivers
  (sbus, vme, SCSI, etc).

add_drv then invokes the drvconfig(1M) command to configure the driver
if the system is not a diskless client, and then calls devlinks(1M) to
make any device links from the /dev directory. 

NOTE: devices already existing on a system will not be rearranged when
new devices are added, even if new devices are added to sbus slots
numerically lower than those occupied by existing devices;  the
/etc/path_to_inst file, which maintains this information, is appended
to, not rewritten, when new devices are added.

A reconfigure boot will be done automatically the next time the system
is booted.

=========================================================================
105) How does the system know to do a reconfiguration boot without
having been passed "-r" in the boot command?

The system will perform a reconfiguration boot automatically if it finds
a file called /reconfigure in its root directory.

=========================================================================
106) Recommended reading and references for Sun Kernel Tech Support
Engineers.

Jeff Bonwick, "The Slab Allocator: An Object-Caching Kernel Memory
Allocator", Sun Microsystems, Mountain View, CA, 1993

Adrian Cockcroft, "Sun Performance and Tuning", Sunsoft Press (Prentice
Hall), Mountain View, CA, 1995

Chris Drake and Kimberly Brown, "Panic! UNIX System Crash Dump Analysis",
Sunsoft Press (Prentice Hall), Mountain View, CA, 1995
Panic! Home Page

W. Richard Stevens, "Advanced Programming in the Unix Environment",
Addison-Wesley, Reading, MA, 1992

The Solaris online man pages

Sunsolve infodocs, srdbs, faqs

=========================================================================
107) Will Solaris run properly after the year 2000?

Here is Sun's external web site about year 2000 issues:
Sun's Year 2000 Information Site

=========================================================================
108) Where can I get a copy of proctool?

proctool can be obtained by anonymous ftp from opcom.sun.ca in the
directory /pub/binaries/proctool
Created September, 1996 from Internal Info Docs document 14269
Most recent edit: 10 October 2000 by Scott Shurr
Sun Proprietary/Confidential: Internal Use Only