Posts tagged ·

truss

·...

Capturing truss data for padmin CLI

Comments Off

Question

This technote describes how to capture truss data to further investigate failures of padmin CLI on a Virtual I/O Server partition at the request of VIOS SupportLine.
This applies to VIOS 2.x.

Answer

Login to VIOS as padmin

$ oem_setup_env
# mkdir /tmp/testcase
# script -a /tmp/testcase/<PMR#.Branch#>.out
# uname -L
# /usr/ios/cli/ioscli ioslevel

Run offending padmin command to reproduce error as follows
# truss -fealo truss.out /usr/ios/cli/ioscli <failing_padmin_command>

For example, if the offending command is ioslevel, the syntax would be
# truss -fealo truss.out /usr/ios/cli/ioscli ioslevel

# ls -la truss.out =>ensure the file has a valid (non-zero) size
# su – padmin
$ snap =>this will create /home/padmin/snap.pax.Z
$ exit (padmin)
# exit (script)
# cp /home/padmin/snap.pax.Z /tmp/testcase
# pax -wf <PMR#.Branch#.CoutryCode#>.pax ./*

Where to send the file

ftp testcase.software.ibm.com
login: anonymous
password: <your email address>
ftp> cd /toibm/aix
ftp> prompt
ftp> binary
ftp> put <PMR#.Branch#.CoutryCode#>.pax
ftp> quit

Comments Off

Solve application problems with tracing

Comments Off

Systems administrators often wonder what’s going on inside an application. You know the one: The application seems to start but then stops, or maybe it hangs without any output. Neither the logs nor the documentation provide anything helpful. Application tracing is your next course of action.

Application tracing displays the calls that an application makes to external libraries and the kernel.
These calls give the application access to the network, the file system, and the display. By watching the
calls and their results, you can get some idea of what the application “expects”, which can lead to a solution.

Each UNIX® system provides its own commands for tracing. This article introduces you to truss, which Solaris and AIX® support. On Linux®, you perform tracing with the strace command. Although the command-line parameters might be slightly different, application tracing on other UNIX flavors might go by the names ptrace, ktrace, trace, and tusc.

A classic file permissions problem

One class of problems that plagues systems administrators is file permissions. An application likely has to open certain files to do its work. If the open operation fails, the application should let the administrator know. However, developers often forget to check the result of functions or, to add to the confusion, perform the check, but don’t adequately handle the error. For example, here’s the output of an application that’s failing to open:

$ ./openapp
This should never happen!


After running the fictitious openapp application, I received the unhelpful (and false) error message, This should never happen!. This is a perfect time to introduce truss. Listing 1 shows the same application run under the truss command, which shows all the function calls that this program made to outside libraries.


Listing 1. Openapp run under truss

$ truss ./openapp
execve("openapp", 0xFFBFFDEC, 0xFFBFFDF4)  argc = 1
getcwd("/export/home/sean", 1015)               = 0
stat("/export/home/sean/openapp", 0xFFBFFBC8)   = 0
open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
stat("/opt/csw/lib/libc.so.1", 0xFFBFF6F8)      Err#2 ENOENT
stat("/lib/libc.so.1", 0xFFBFF6F8)              = 0
resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/lib/libc.so.1", O_RDONLY)                = 3
memcntl(0xFF280000, 139692, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)                                        = 0
getcontext(0xFFBFF8C0)
getrlimit(RLIMIT_STACK, 0xFFBFF8A0)             = 0
getpid()                                        = 7895 [7894]
setustack(0xFF3A2088)
open("/etc/configfile", O_RDONLY)               Err#13 EACCES [file_dac_read]
ioctl(1, TCGETA, 0xFFBFEF14)                    = 0
fstat64(1, 0xFFBFEE30)                          = 0
stat("/platform/SUNW,Sun-Blade-100/lib/libc_psr.so.1", 0xFFBFEAB0) = 0
open("/platform/SUNW,Sun-Blade-100/lib/libc_psr.so.1", O_RDONLY) = 3
close(3)                                        = 0
This should never happen!
write(1, " T h i s   s h o u l d  ".., 26)      = 26
_exit(3)


Each line of the output represents a function call that the application made along with the return value, if applicable. (You don’t need to know each function call, but for more information, you can call up the man page for the function, such as with the command man open.) To find the call that is potentially causing the problem, it’s often easiest to start at the end (or as close as possible to where the problems start). For example, you know that the application outputs This should never happen!, which appears near the end of the output. Chances are that if you find this message and work your way up through the truss command output, you’ll come across the problem.

Scrolling up from the error message, notice the line beginning with open("/etc/configfile"..., which not only looks relevant but also seems to return an error of Err#13 EACCES. Looking at the man page for the open() function (with man open), it’s evident that the purpose of the function is to open a file — in this case, /etc/configfile — and that a return value of EACCES means that the problem is related to permissions. Sure enough, a look at /etc/configfile shows that the user doesn’t have permissions to read the file. A quick chmod later, and the application is running properly.

The output of Listing 1 shows two other calls, open() and stat(), that return an error. Many of the calls toward the beginning of the application, including the
other two errors, are added by the operating system as it runs the application. Only experience will tell when the errors are benign and when they aren’t. In this case, the two errors and the three lines that follow them are trying to find the location of libc.so.1, which they eventually do. You’ll see more about shared library problems later.


The application doesn’t start

Sometimes, an application fails to start properly; but rather than exiting, it just hangs. This behavior is often a symptom of contention for a resource (such as two processes competing for a file lock), or the application is looking for something that is not coming back. This latter class of problems could be almost anything, such as a name lookup that’s taking a long time to resolve, or a file that should be found in a certain spot but isn’t there. In any case, watching the application under truss should reveal the culprit.

While the first code example showed an obvious link between the system call causing the problem and the file, the example you’re about to see requires a bit more sleuthing. Listing 2 shows a misbehaving application called Getlock run under truss.


Listing 2. Getlock run under truss

$ truss ./getlock
execve("getlock", 0xFFBFFDFC, 0xFFBFFE04)  argc = 1
getcwd("/export/home/sean", 1015)               = 0
resolvepath("/export/home/sean/getlock", "/export/home/sean/getlock", 1023) = 25
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
stat("/export/home/sean/getlock", 0xFFBFFBD8)   = 0
open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
stat("/opt/csw/lib/libc.so.1", 0xFFBFF708)      Err#2 ENOENT
stat("/lib/libc.so.1", 0xFFBFF708)              = 0
resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/lib/libc.so.1", O_RDONLY)                = 3
close(3)                                        = 0
getcontext(0xFFBFF8D0)
getrlimit(RLIMIT_STACK, 0xFFBFF8B0)             = 0
getpid()                                        = 10715 [10714]
setustack(0xFF3A2088)
open("/tmp/lockfile", O_WRONLY|O_CREAT, 0755)   = 3
getpid()                                        = 10715 [10714]
fcntl(3, F_SETLKW, 0xFFBFFD60)  (sleeping...)


The final call, fcntl(), is marked as sleeping, because the function is blocking. This means that the function is waiting for something to happen, and the kernel has put the process to sleep until the event occurs. To determine what the event is, you must look at fcntl().

The man page for fcntl() (man fcntl) describes the function simply as “file control” on Solaris and “manipulate file descriptor” on Linux. In all cases, fcntl() requires a file descriptor, which is an integer describing a file the process has opened, a command that specifies the action to be taken on the file descriptor, and finally any arguments required for the specific function. In the example in Listing 2, the file descriptor is 3, and the command is F_SETLKW. (The 0xFFBFFD60 is a pointer to a data structure, which doesn’t concern us now.) Digging further, the man page states that F_SETLKW opens a lock on the file and waits until the lock can be obtained.

From the first example involving the open() system call, you saw that a successful call returns a file descriptor. In the truss output of Listing 2, there are two cases in which the result of open() returns 3. Because file descriptors are reused after they are closed, the relevant open() is the one just above fcntl(), which is for /tmp/lockfile. A utility like lsof lists any processes holding open a file. Failing that, you could trace through /proc to find the process with the open file. However, as is usually the case, a file is locked for a good reason, such as limiting the number of instances of the application or configuring the application to run in a user-specific directory.


Attaching to a running process

Sometimes, an application is already running when a problem occurs. Being able to run an already-running process under truss would be helpful. For example, notice that in the output of the Top application, a certain process has been consuming 95 percent of the CPU for quite some time, as shown in Listing 3.


Listing 3. Top output showing a CPU-intensive process

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 11063 sean       1   0    0 1872K  952K run     87.9H 94.68% udpsend


The -p option to truss allows the owner of the process, or root, to attach to a running process and view the system call activity. The process id (PID) is required. In the example shown in Listing 3, the PID is 11063. Listing 4 shows the system call activity of the application in question.


Listing 4. truss output after attaching to a running process

$ truss -p 11063
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
. repeats ...


The sendto() function’s man page (man sendto) shows that this function is used to send a message from a socket — typically, a network connection. The output of truss shows the file descriptor (the first 3) and the data being sent (abc). Indeed, capturing a sample of network traffic with the snoop or tcpdump tool shows a large amount of traffic being directed to a particular host, which is likely not the result of a properly behaving application.

Note that truss was not able to show the creation of file descriptor 3, because you had attached after the descriptor was created. This is one limitation of attaching to a running process and the reason why you should gather other information using a tool, such as a packet analyzer before jumping to conclusions.

This example might seem somewhat contrived (and technically it was, because I wrote the udpsend application to demonstrate how to use truss), but it is based on a real situation. I was investigating a process running on a UNIX-based appliance that had a CPU-bound process. Tracing the application showed the same packet activity. Tracing with a network analyzer showed the packets were being directed to a host on the Internet. After escalating with the vendor, I determined that the problem was their application failing to perform proper error checking on a binary configuration file. The file had somehow become corrupted. As a result, the application interpreted the file incorrectly and repeatedly hammered a random IP address with User Datagram Protocol (UDP) datagrams. After I replaced the file, the process behaved as expected.


Filtering output

After a while, you’ll get the knack of what to look for. While it’s possible to use the grep command to go through the output, it’s easier to configure truss to focus only on certain calls. This practice is common if you’re trying to determine how an application works, such as which configuration files the application is using. In this case, the open() and stat() system calls point to any files the application is trying to open.

You use open() to open a file, but you use stat() to find information about a file. Often, an application looks for a file with a series of stat() calls, and then opens the file it wants.

For truss, you add filtering system calls with the -t option. For strace under Linux, you use -e. In either case, you pass a comma-separated list of system calls to be shown on the command line. By prefixing the list with the exclamation mark (!), the given calls are filtered out of the output. Listing 5 shows a fictitious application looking for a configuration file.


Listing 5. truss output filtered to show only stat() and open() functions


$ truss -tstat,open ./app
stat("/export/home/sean/app", 0xFFBFFBD0)   = 0
open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
stat("/opt/csw/lib/libc.so.1", 0xFFBFF700)      Err#2 ENOENT
stat("/lib/libc.so.1", 0xFFBFF700)              = 0
open("/lib/libc.so.1", O_RDONLY)                = 3
stat("/export/home/sean/.config", 0xFFBFFCF0)   Err#2 ENOENT
stat("/etc/app/configfile", 0xFFBFFCF0)         Err#2 ENOENT
stat("/etc/configfile", 0xFFBFFCF0)             = 0
open("/etc/configfile", O_RDONLY)               = 3


The final four lines are the key here. The stat() function for /export/home/sean/.config results in ENOENT, which means that the file wasn’t found. The code then tries /etc/app/configfile before it finds the correct information in /etc/configfile. The significance of first checking in the user’s home directory is that you can override the configuration by user.


Final thoughts

Whether your operating system uses truss, strace, trace, or something else, the ability to peer into an application’s behavior is a powerful tool for problem solving. The methodology can be summed up as follows:

  1. Describe the problem.
  2. Trace the application.
  3. Start at the spot at which the problem occurs and work backward through the system calls to identify the problem. Use the man pages for help on interpreting the system calls.
  4. Correct the behavior and test.

Tracing application behavior is a powerful troubleshooting tool, because you’re observing the system calls that the application makes to the operating system. When the usual problem-solving methods fail, turn to application tracing.



Resources

Learn

  • AIX 5.2 performance tools update, Part 2,” (developerWorks, Nov 2003) provides an outline of the options available for truss on AIX.

  • Comparison of truss and DTrace: Solaris 10 provides a tool called DTrace that goes even further than truss. Check out this good comparison of truss and DTrace, which uses both tools in a performance-tuning situation.


  • Manipulating Files and Directories in UNIX: This tutorial on UNIX file handling is a fairly friendly (to non-programmers, that is) walk-through of the system calls that have to do with file handling. Because a good part of the troubleshooting you”ll do with application tracing concerns files, the tutorial provides a solid background to the calls involved.


  • UNIX Network Programming: There’s no finer book in the world about UNIX than UNIX Network Programming by the late Richard Stevens (Prentice Hall, 1990). It thoroughly explains signals and system calls, sockets and files, and is a wonderful companion to anyone who wants to learn in more depth than the man page can explain. The original version has since been updated and expanded into two volumes.


  • developerWorks Linux zone: Visit the developerWorks Linux zone to expand your Linux skills.


  • developerWorks technical events and Webcasts: Stay current with developerWorks technical events and Webcasts.

Get products and technologies

  • IBM trial
    software
    : Build your next development project with IBM trial software, available for download directly
    from developerWorks.


Discuss

  • developerWorks blogs: Participate in developerWorks blogs and get involved in the developerWorks community.


About the author

Photo of Sean Walberg

Sean Walberg has been working with Linux and UNIX systems since 1994 in academic, corporate, and Internet service provider environments. He has written extensively about systems administration over the past several years. You can contact him at sean@ertw.com.

Comments Off

How to truss a SUID process

Comments Off
 Technote (FAQ)
 
Question
Truss is a useful command for tracking where a process is failing. It doesn't give you the overall system picture in the way that the system trace facility does but it allows you to only consider outputs from the process in question and it's children and can be run over an extended period of time without gathering too much extra information.

However, truss will only allow you to attach to a process if you have permission. For the most part this is fine, however if you are investigating a command which runs as another user under SUID, you will not be allowed to attach to the process as the system identifies it as not belonging to your user.

For example:

# ls -l prog
-rwsr-xr-x 1 root system 6692 29 Aug 08:34 prog

# su - some_user

$ truss -deaf -o truss.out prog

truss: 0915-015 Cannot create subject process.
wait4all: i: 0, status: 32512, pid: 311360, created: 0
 
Answer
So, how can we truss "some_users" commands?
1. Login as the user who you need to investigate and find the PID of your shell using the ps command. For example:

$ ps -f
UID PID PPID C STIME TTY TIME CMD
some_user 159852 372742 0 10:33:54 pts/3 0:00 -ksh
some_user 421946 159852 3 10:36:18 pts/3 0:00 ps -f

2. Start a new session as root and truss the shell session from Step 1:

truss -deaf -o /tmp/truss.out -p 159852

3. This new session will now log all the activity in the original shell. Run the failing command and then stop the truss. The truss.out file can be investigated to find the failure.
 
 
 
Comments Off