SunSolve Internal

Infodoc ID   Synopsis   Date
23148   Basics Using the truss Command   14 Jul 2000

Description Top

Description

Contents
	
1. Introduction
 	A. Things truss Can Help With
	B. Things truss May Not Help With

2. Terminology Review 
	A. What are System Calls?
	B. What are signals?
	C. What are faults?

3. Using the truss Command
	A. Common truss Syntax
	B. Common truss Options
	C. Large file (64) System Calls	
	
4. System Call Return Values

5. Troubleshooting Examples
	A. EINVAL Example from truss
	B. SIGSEGV Example from truss
	C. poll System Call Example from truss
      
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

1. Introduction      
	/usr/bin/truss is a utility that reports the following information
	about a process:
	
	- system calls performed, their arguments and return values
	- signals received by the process
	- machine faults inccured by the process
  
  	It can be run against any executable command (binary or shell 
  	script) or using the process ID (PID) of a currently running process.
  	It is best to run it as root as any behind-the-scenes suid/guid programs
  	can only be truss'ed by root.
    	
    	Notes: 
    	
    	For tracing shared library procedure calls use "sotruss", available
    	on 2.5.1 and higher, and "apptrace" which is new in Solaris 8.
    	
    	The "trace" command in SunOS 4.x is similar in function to truss.
    	
    	
	A. Things truss can help with:

	- truss is useful in debugging problems where a program terminates
	  abnormally, with or without an errors, or is hung.
	  
	- Often application processes open a temporary error message file.
          Knowing about these temporary log files can be very helpful.

	- Using truss on "factory" Solaris programs is another way of
	  learning about system calls. It can be a valuable educational tool.


	B. Things truss may not help with:


	- Using truss is only the first step to debugging an application problem.
          Debugging the program source code is also necessary.
          
	- Often a thread list for the program is necessary for debugging purposes.
	  This is true if the process is stuck in a loop and not making system
	  calls. Procedures for obtaining thread lists are beyond the scope 
	  of this article and may require the help of a Solution Center Technical 
	  Support Engineer.

  	- The truss command does change the timing of a program by starting 
  	  and stoppingm its execution, so it could change the nature of the
  	  problem being looked at.
  	  This would be the case for race conditions and other timing-related
  	  problems. 
  
  	- Error messages are common with truss especially when opening files because
  	  truss searches down the directory path until it finds the file it is
  	  looking for; or the underlining code expects not to find a file and the
  	  error is "normal". For these reasons (and others) it can be difficult
  	  to understand truss output and may have only limited value in debugging
  	  a problem.
  	  
---------------------------------------  
2. Terminology Review 
                  
       A. What are System Calls 
         
	  System calls are also known as "system service calls". 
	  
	  "While the kernel is shared by all processes, system space is
	  protected from user-mode access. Processes cannot directly access the kernel,
	  and must instead use the system call interface. When a process makes a "system call",
	  it executes a special sequence of instructions to put the system
	  in kernel mode (this is called a "mode switch") and transfer control
	  to the kernel, which handles the operation on behalf of the process.
	  After the system call is complete, the kernel executes another set of
	  instructions that returns the system to user mode (another mode switch)
	  and transfers control back to the process." (From the book UNIX Internals, The
	  New Frontiers, by Uresh Vahalia 1996)
	 	   
	  System calls and there associated numbers can be found 
	  in /usr/include/sys/syscall.h. These can change for different 
	  OS versions. 
	 
	  See these man pages:
	  
	     % man -s2 intro
	
	     % man -s2 <system_call> or % man <system_call>.2
	     
   	B. What are signals?
   	
	"UNIX uses "signals" to inform a process of asynchronous events, and
	to handle exceptions. For example, when a user types control-C at the
	terminal, a SIGINT signal is sent to the foreground process. Likewise,
	when a process terminates, a SIGCHLD signal is sent to its parent..."
	(From the book UNIX Internals, The New Frontiers, by Uresh Vahalia 1996)
	
	When a process is hung, it will not handle interrupt/kill signals. The
	reasons are varied and that's why the "truss" command is useful.
	 	
	For more information see /usr/include/sys/signal.h
	and the man pages for signal(3C) and signal(5).
	  
	Here are examples for looking up the definition of signals:
	  
		% grep SIGINT /usr/include/sys/signal.h
		#define	SIGINT	2	/* interrupt (rubout) */

  		% grep SIGCHLD signal.h
		#define	SIGCHLD	18	/* child status change alias (POSIX) */

	C. What are faults?
	
	Fault are analagous to signals but they correspond to hardware faults.
	For more information see /usr/include/sys/faults.h and proc(4) for
	details (% man -s4 proc).
		    
   	
   	
---------------------------------------
3. Using the truss Command

A. Common truss Syntax

	Here are 2 common methods of using truss and its options:

		truss -aef -o /tmp/truss.out <command>
		
		and
		
		truss -aef -o /tmp/truss.out -p PID
		
		
B. Common truss Options
		
	The first example is being run against a command at the
	begining of its execution. The second example is truss
	being used against an existing process.	
		
		
	Option -p PID:
	
		This option attaches truss to a currently running 
		process using the the process ID (PID) obtained typically from
		the ps command.  If you interrupt a truss attached to a process
		with the -p option, only the truss will be stopped. Whereas, if you
		interrupt a proccess started with truss, as in the first example
		above, the process will be stopped.
	
   
        Option -o <file>:
        
        	Saves the output to a given file (/tmp/truss.out in the above
        	example).  By default truss sends its output to stderr and that
        	is why you can not pipe the output to the grep command (for 
        	example) as you might with stdout messages to the console device.
     
       Option -f:
       
       		This option is used to follow all child processes created
       		by the fork or vfork system calls. By default truss only follows
       		the process ID of the initial process. Child processes 
       		started before the truss will not be followed.
       		
       Option -e:
       
         	This option shows the environment strings passed in on every
         	"exec" call. It shows the environment that the process is
         	running under.
       
       Option -a:
       
       		This option shows all the arguments to each "exec" system
       		call. This is helpful.
       		
       		
	Option -c:
     	
     		This truss command, "% truss -c ls", is useful because
     		it counts the number system calls, signals, and faults and
     		displays it in a report summary.  This is useful for
     		seeing which system calls a program is spending most of its
     		time in.  The number of errors is really not helpful and
     		can usually be ignored.
     		
     		For example:
     		
     		% truss -c ls
		syscall      seconds   calls  errors
		_exit            .00       1
		open             .02      19     13
		close            .00       6
		time             .00       1
		brk              .00       4
		lseek            .00       1
		fstat            .00       5
		ioctl            .00       2
		execve           .00       1
		fcntl            .00       1
		getdents         .00       2
		lstat            .01       1
		mmap             .00      12
		munmap           .00       3
                ----     ---    ---
		sys totals:      .03      59     13
		usr time:        .01
		elapsed:         .15
		
		
C. Large file (64) System Calls
		
		There are no special man pages for the 64-bit version
	  	of system calls. For example, if information on fstat64
	  	is needed, refer to the man page on fstat.
	  	
		The truss command options -v, -x, -t will interpret both 
		fstat and fstat64 to be the same. For example the following 
		commands will return the same output and both give verbose 
		information about fstat and fstat64:

			% truss -v fstat more /etc/motd

			% truss -v fstat64 more /etc/motd

		The difference between a 64-bit and non-64-bit system call 
		is the ability to handle large files. This means either or
		both the input or output structures or values may differ. 
		
		
---------------------------------------
4. System Call Return Values
   	
	System calls almost always return a -1 or a NULL in case 
	of an error and sets the external variable errno equal to
	the error value. The errno variable is not cleared out on 
	successful system calls so programmers should only check 
	its value after an unsuccessful system call:
		
	Wrong:

			...
			fd = open(filename, O_RDONLY);
			if (errno != 0)
			{
   				 perror("open");
   				 exit(1);
			}
	
	
	Correct:

			...
			fd = open(filename, O_RDONLY);
			if (fd < 0)
			{
 				  perror("open");
  				  exit(1);
			}
			      
5. Troubleshooting Examples


	System call errors can be found in /usr/include/sys/errno.h,
	signal.h and fault.h.
	
	This information is usually too generic and the man page for
	each specific system call should be consulted for its specific meaning.
	  	  
	  	     
	A.  EINVAL Example from truss
	
			...
		5811:   so_socket(1, 2, 0, "", 1)                       = 5
		5811:   getpid()                                        = 5811 [5810]
		5811:   access("/var/tmp/.app3/s#5811.1", 0)            Err#2  ENOENT
		5811:   bind(5, 0xEFFFAB94, 110)                        Err#22 EINVAL
	
	The access(2) call above is returning ENOENT. In this case it
	can be ignored because an examination of source code shows that
	if the file exists, it would be deleted anyway.

		% grep ENOENT errno.h
		#define	ENOENT	2	/* No such file or directory		*/

	The failing of the bind(2) system call is what is causing the 
	program to exit. 
	
		% grep EINVAL /usr/include/sys/errno.h
		#define	EINVAL	22	/* Invalid argument			*/
	
	From the bind(2) man page for Solaris 2.6:

        EINVAL      The socket is already bound to an address,
                    and the protocol does not support binding to
                    a new address; or the socket has been shut
                    down.

        So the resolution to this problem requires further debugging in the
        application source code. 


	B. SIGSEGV Example from truss

	This is an example of poor programming. A system call that
	exits with SIGSEGV has typically crashed on a NULL pointer.

		% grep SIGSEGV /usr/include/sys/signal.h
		#define	SIGSEGV	11	/* segmentation violation */


5345:   getpid()                                        = 5345 [1]
5345:   mkdir("/usrdata/home/Gateway2.06", 0755) Err#17 EEXIST
5345:   mkdir("/usrdata/home/Gateway2.06/log", 0755) Err#17 EEXIST
5345:   mkdir("/usrdata/home/Gateway2.06/log", 0755) Err#17 EEXIST
5345:   open("/usrdata/home/Gateway2.06/log/MAGwdog.log", O_WRONLY|O_APPEND|O_CREAT, 0666) 
Err#13 EACCES
5345:       Incurred fault #6, FLTBOUNDS  %pc = 0x0001CD20
5345:         siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000D
5345:       Received signal #11, SIGSEGV [default]
5345:         siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000D
5345:           *** process killed ***
5345:   open("/usrdata/home/Gateway2.06/log/MAGwdog.log", O_WRONLY|O_APPEND|O_CREAT, 0666) 
Err#13 EACCES
5345:       Incurred fault #6, FLTBOUNDS  %pc = 0x0001CD20
5345:         siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000D
5345:       Received signal #11, SIGSEGV [default]
5345:         siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000D
5345:           *** process killed ***

- Notice the program dies right after an unsuccessful open.

- If the opening of this file is critical to the running of the program, 
  the program should have exited or at least printed an error.

- It appears the program has gone ahead and tried to reference a
  FILE pointer after an unsuccessful open. The fault address of 0x0000000D
  looks like an offset of a NULL pointer.

- It is worth noting that the FLTBOUNDS fault resulted in a SIGSEGV.

	% grep FLTBOUNDS /usr/include/sys/fault.h
	#define	FLTBOUNDS	6	/* Memory bounds (invalid address) */

C. poll() System Call Example from truss

int poll(struct pollfd fds[], nfds_t nfds,int timeout);

The poll(2) system call is used to monitor Input/Output (I/O) over a number 
of file descriptors
(file descriptors could be open files, sockets, devices etc). The first argument 
is an array of open file descriptors to monitor. The second argument is the number 
of file descriptors to monitor in the array. The third argument is the time in milliseconds 
to wait for I/O on the file descriptors. If the third argument is 0, poll(2) checks for I/O 
and returns immediately. If the third argument is INFTIM (or -1) poll(2) waits until I/O 
occurs or the poll is interrupted by a signal.

It is not unusual to see a process looping through poll(2) calls unendingly. In this case 
the problem is likely not with the process being trussed but the process that is supposed 
to supply the I/O. The following is an example of truss of inetd(1m):

% truss -p 137
poll(0xEFFFD900, 43, -1)      (sleeping...)
     
inetd, PID 137, is sleeping because it is waiting for I/O from other program(s).


      

Applies To OS Kernel
Attachments (none)

Top

SunWeb Home SunWeb Search SunSolve Home Simple Search

Sun Proprietary/Confidential: Internal Use Only
Feedback to SunSolve Team