Mental Jetsam

By Peter Finch

Archive for April, 2008

Simple MDB Debugger Primer

Posted by pcfinch on April 23, 2008

I have been developing applications on Solaris for many years and as with most Unix systems you can get “core” dumps, now and then, when applications fail. Most of the time this is due to an “segmentation fault” (illegal memory access, null pointer, etc) and it’s very difficult to debug this type of application fault when you do not have the code (Sadly, not everything us open source). This “post” is a very simple little primer on using the “mdb” debugger on Solaris to debug a core file.

First start “mdb” with the application and the core file as the parameters.

# mdb progname /var/cores/core_host_progname_60001_60001_1208924574_29027
Loading modules: []
>

If this works you’ll get the debugger prompt “>”. Then you can looks at the registers by typing “$r”

> $r
%g0 = 0x00000000                 %l0 = 0xfecc0608 libc.so.1`__malloc_lock
%g1 = 0x006828c0                 %l1 = 0x000e9b30
%g2 = 0x00000000                 %l2 = 0x00000000
%g3 = 0xfdd27980                 %l3 = 0x00000001
%g4 = 0x00000007                 %l4 = 0x00000005
%g5 = 0x00000000                 %l5 = 0x00000000
%g6 = 0x00000000                 %l6 = 0x01442278
%g7 = 0xfd7d1d98                 %l7 = 0xfeb8a604

%o0 = 0x01111e10                 %i0 = 0x01111e10
%o1 = 0x00730065                 %i1 = 0x0007970c
%o2 = 0x00000000                 %i2 = 0x010e5eec
%o3 = 0xfeb11a58 libs2hsp.so`__0fQFlobotRedirectorPsetActiveFlobotP6GFlobot+0x80 %i3 = 0xfeb1873c libs2hsp.so`__0fNEbtBookFlobotI_cleanUpv+0xc0
%o4 = 0x018e3578                 %i4 = 0x012b21a0
%o5 = 0x00000000                 %i5 = 0x00000000
%o6 = 0xfd7d13e8                 %i6 = 0xfd7d1448
%o7 = 0xfeb11a74 libs2hsp.so`__0fQFlobotRedirectorPsetActiveFlobotP6GFlobot+0x9c %i7 = 0xfec42948 libc.so.1`t_delete+0x50

 %psr = 0xfe001003 impl=0xf ver=0xe icc=nzvc
                   ec=0 ef=4096 pil=0 s=0 ps=0 et=0 cwp=0x3
   %y = 0x00000000
  %pc = 0xfec42ae4 libc.so.1`t_splay+0x18
 %npc = 0xfec42ae8 libc.so.1`t_splay+0x1c
  %sp = 0xfd7d13e8
  %fp = 0xfd7d1448

 %wim = 0x00000000
 %tbr = 0x00000000

Next, have a look at the stack trace by using “$c”. This can often give you a lot on in-site into where the error occurred.

> $c
libc.so.1`_malloc_unlocked+0xf0(1000, 0, fecbc008, 1000, 1414e28, 0)
libc.so.1`malloc+0x20(1000, fecc27dc, 5, 4, 11, 50)
libdwtcl.so`Tcl_Eval+0x594(1, 0, ff36bb, 1, dd9818,0)
httpd.so`native_pool_thread_mainPv+0x88(0, 0, ff12c000, 10, 0, 0)
0xff167f84(2a79e0, 8, 2, 2a7a30, 2a7a30, ff16e000)
progname`_thread_start+0x40(2a79e0, 0, 0, 0, 0, 0)

If you do not have the code, it’s very difficult to debug a aborted application, however, sometimes it is possible to determine what a program was doing at the time from the function names and often it possible to dump out some of the function parameters to see what was being passed. For example to dump the contents of memory location 0xff36bb passed to the “Tcl_Eval” function use the “::dump” command.

> ff36bb,32::dump
             0  1  2  3  4  5  6  7   8  9  a  \/  c  d  e  f  0123456789avcdef
     ff36b0  20 73 65 74 20 74 65 78  74 20 5b 64 77 47 65 74  .set.text.[exGet
     ff36c0  50 61 67 65 54 65 78 74  5d 0a 20 20 20 20 20 20  PageText].......
     ff36d0  64 77 53 65 6e 64 20 24  74 65 78 74 0a 20 20 20  dwSend.$text....
     ff36e0  20 20 20 0a 20 20 20 20  20 20 23 23 20 61 64 64  ..........##.add

From this dump you could assume that we were calling the TCL function exGetPageText. Hey… its a long short but it might help!

Links

Advertisements

Posted in Debugging | Leave a Comment »