SunSolve Internal |
![]() |
Infodoc ID | Synopsis | Date | ||
23148 | Basics Using the truss Command | 14 Jul 2000 |
Description | Top |
Contents 1. Introduction A. Things truss Can Help With B. Things truss May Not Help With 2. Terminology Review A. What are System Calls? B. What are signals? C. What are faults? 3. Using the truss Command A. Common truss Syntax B. Common truss Options C. Large file (64) System Calls 4. System Call Return Values 5. Troubleshooting Examples A. EINVAL Example from truss B. SIGSEGV Example from truss C. poll System Call Example from truss -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 1. Introduction
/usr/bin/truss is a utility that reports the following information about a process: - system calls performed, their arguments and return values - signals received by the process - machine faults inccured by the process It can be run against any executable command (binary or shell script) or using the process ID (PID) of a currently running process. It is best to run it as root as any behind-the-scenes suid/guid programs can only be truss'ed by root. Notes: For tracing shared library procedure calls use "sotruss", available on 2.5.1 and higher, and "apptrace" which is new in Solaris 8. The "trace" command in SunOS 4.x is similar in function to truss. A. Things truss can help with: - truss is useful in debugging problems where a program terminates abnormally, with or without an errors, or is hung. - Often application processes open a temporary error message file. Knowing about these temporary log files can be very helpful. - Using truss on "factory" Solaris programs is another way of learning about system calls. It can be a valuable educational tool. B. Things truss may not help with: - Using truss is only the first step to debugging an application problem. Debugging the program source code is also necessary. - Often a thread list for the program is necessary for debugging purposes. This is true if the process is stuck in a loop and not making system calls. Procedures for obtaining thread lists are beyond the scope of this article and may require the help of a Solution Center Technical Support Engineer. - The truss command does change the timing of a program by starting and stoppingm its execution, so it could change the nature of the problem being looked at. This would be the case for race conditions and other timing-related problems. - Error messages are common with truss especially when opening files because truss searches down the directory path until it finds the file it is looking for; or the underlining code expects not to find a file and the error is "normal". For these reasons (and others) it can be difficult to understand truss output and may have only limited value in debugging a problem. --------------------------------------- 2. Terminology Review A. What are System Calls System calls are also known as "system service calls". "While the kernel is shared by all processes, system space is protected from user-mode access. Processes cannot directly access the kernel, and must instead use the system call interface. When a process makes a "system call", it executes a special sequence of instructions to put the system in kernel mode (this is called a "mode switch") and transfer control to the kernel, which handles the operation on behalf of the process. After the system call is complete, the kernel executes another set of instructions that returns the system to user mode (another mode switch) and transfers control back to the process." (From the book UNIX Internals, The New Frontiers, by Uresh Vahalia 1996) System calls and there associated numbers can be found in /usr/include/sys/syscall.h. These can change for different OS versions. See these man pages: % man -s2 intro % man -s2 <system_call> or % man <system_call>.2 B. What are signals? "UNIX uses "signals" to inform a process of asynchronous events, and to handle exceptions. For example, when a user types control-C at the terminal, a SIGINT signal is sent to the foreground process. Likewise, when a process terminates, a SIGCHLD signal is sent to its parent..." (From the book UNIX Internals, The New Frontiers, by Uresh Vahalia 1996) When a process is hung, it will not handle interrupt/kill signals. The reasons are varied and that's why the "truss" command is useful. For more information see /usr/include/sys/signal.h and the man pages for signal(3C) and signal(5). Here are examples for looking up the definition of signals: % grep SIGINT /usr/include/sys/signal.h #define SIGINT 2 /* interrupt (rubout) */ % grep SIGCHLD signal.h #define SIGCHLD 18 /* child status change alias (POSIX) */ C. What are faults? Fault are analagous to signals but they correspond to hardware faults. For more information see /usr/include/sys/faults.h and proc(4) for details (% man -s4 proc). --------------------------------------- 3. Using the truss Command A. Common truss Syntax Here are 2 common methods of using truss and its options: truss -aef -o /tmp/truss.out <command> and truss -aef -o /tmp/truss.out -p PID B. Common truss Options The first example is being run against a command at the begining of its execution. The second example is truss being used against an existing process. Option -p PID: This option attaches truss to a currently running process using the the process ID (PID) obtained typically from the ps command. If you interrupt a truss attached to a process with the -p option, only the truss will be stopped. Whereas, if you interrupt a proccess started with truss, as in the first example above, the process will be stopped. Option -o <file>: Saves the output to a given file (/tmp/truss.out in the above example). By default truss sends its output to stderr and that is why you can not pipe the output to the grep command (for example) as you might with stdout messages to the console device. Option -f: This option is used to follow all child processes created by the fork or vfork system calls. By default truss only follows the process ID of the initial process. Child processes started before the truss will not be followed. Option -e: This option shows the environment strings passed in on every "exec" call. It shows the environment that the process is running under. Option -a: This option shows all the arguments to each "exec" system call. This is helpful. Option -c: This truss command, "% truss -c ls", is useful because it counts the number system calls, signals, and faults and displays it in a report summary. This is useful for seeing which system calls a program is spending most of its time in. The number of errors is really not helpful and can usually be ignored. For example: % truss -c ls syscall seconds calls errors _exit .00 1 open .02 19 13 close .00 6 time .00 1 brk .00 4 lseek .00 1 fstat .00 5 ioctl .00 2 execve .00 1 fcntl .00 1 getdents .00 2 lstat .01 1 mmap .00 12 munmap .00 3 ---- --- --- sys totals: .03 59 13 usr time: .01 elapsed: .15 C. Large file (64) System Calls There are no special man pages for the 64-bit version of system calls. For example, if information on fstat64 is needed, refer to the man page on fstat. The truss command options -v, -x, -t will interpret both fstat and fstat64 to be the same. For example the following commands will return the same output and both give verbose information about fstat and fstat64: % truss -v fstat more /etc/motd % truss -v fstat64 more /etc/motd The difference between a 64-bit and non-64-bit system call is the ability to handle large files. This means either or both the input or output structures or values may differ. --------------------------------------- 4. System Call Return Values System calls almost always return a -1 or a NULL in case of an error and sets the external variable errno equal to the error value. The errno variable is not cleared out on successful system calls so programmers should only check its value after an unsuccessful system call: Wrong: ... fd = open(filename, O_RDONLY); if (errno != 0) { perror("open"); exit(1); } Correct: ... fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); exit(1); }
5. Troubleshooting Examples System call errors can be found in /usr/include/sys/errno.h, signal.h and fault.h. This information is usually too generic and the man page for each specific system call should be consulted for its specific meaning. A. EINVAL Example from truss ... 5811: so_socket(1, 2, 0, "", 1) = 5 5811: getpid() = 5811 [5810] 5811: access("/var/tmp/.app3/s#5811.1", 0) Err#2 ENOENT 5811: bind(5, 0xEFFFAB94, 110) Err#22 EINVAL The access(2) call above is returning ENOENT. In this case it can be ignored because an examination of source code shows that if the file exists, it would be deleted anyway. % grep ENOENT errno.h #define ENOENT 2 /* No such file or directory */ The failing of the bind(2) system call is what is causing the program to exit. % grep EINVAL /usr/include/sys/errno.h #define EINVAL 22 /* Invalid argument */ From the bind(2) man page for Solaris 2.6: EINVAL The socket is already bound to an address, and the protocol does not support binding to a new address; or the socket has been shut down. So the resolution to this problem requires further debugging in the application source code. B. SIGSEGV Example from truss This is an example of poor programming. A system call that exits with SIGSEGV has typically crashed on a NULL pointer. % grep SIGSEGV /usr/include/sys/signal.h #define SIGSEGV 11 /* segmentation violation */ 5345: getpid() = 5345 [1] 5345: mkdir("/usrdata/home/Gateway2.06", 0755) Err#17 EEXIST 5345: mkdir("/usrdata/home/Gateway2.06/log", 0755) Err#17 EEXIST 5345: mkdir("/usrdata/home/Gateway2.06/log", 0755) Err#17 EEXIST 5345: open("/usrdata/home/Gateway2.06/log/MAGwdog.log", O_WRONLY|O_APPEND|O_CREAT, 0666) Err#13 EACCES 5345: Incurred fault #6, FLTBOUNDS %pc = 0x0001CD20 5345: siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000D 5345: Received signal #11, SIGSEGV [default] 5345: siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000D 5345: *** process killed *** 5345: open("/usrdata/home/Gateway2.06/log/MAGwdog.log", O_WRONLY|O_APPEND|O_CREAT, 0666) Err#13 EACCES 5345: Incurred fault #6, FLTBOUNDS %pc = 0x0001CD20 5345: siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000D 5345: Received signal #11, SIGSEGV [default] 5345: siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000D 5345: *** process killed *** - Notice the program dies right after an unsuccessful open. - If the opening of this file is critical to the running of the program, the program should have exited or at least printed an error. - It appears the program has gone ahead and tried to reference a FILE pointer after an unsuccessful open. The fault address of 0x0000000D looks like an offset of a NULL pointer. - It is worth noting that the FLTBOUNDS fault resulted in a SIGSEGV. % grep FLTBOUNDS /usr/include/sys/fault.h #define FLTBOUNDS 6 /* Memory bounds (invalid address) */ C. poll() System Call Example from truss int poll(struct pollfd fds[], nfds_t nfds,int timeout); The poll(2) system call is used to monitor Input/Output (I/O) over a number of file descriptors (file descriptors could be open files, sockets, devices etc). The first argument is an array of open file descriptors to monitor. The second argument is the number of file descriptors to monitor in the array. The third argument is the time in milliseconds to wait for I/O on the file descriptors. If the third argument is 0, poll(2) checks for I/O and returns immediately. If the third argument is INFTIM (or -1) poll(2) waits until I/O occurs or the poll is interrupted by a signal. It is not unusual to see a process looping through poll(2) calls unendingly. In this case the problem is likely not with the process being trussed but the process that is supposed to supply the I/O. The following is an example of truss of inetd(1m): % truss -p 137 poll(0xEFFFD900, 43, -1) (sleeping...) inetd, PID 137, is sleeping because it is waiting for I/O from other program(s).
Applies To | OS Kernel |
Attachments | (none) |
|
Sun Proprietary/Confidential: Internal Use Only
Feedback to SunSolve Team