I have been developing applications on Solaris for many years and as with most Unix systems you can get “core” dumps, now and then, when applications fail. Most of the time this is due to an “segmentation fault” (illegal memory access, null pointer, etc) and it’s very difficult to debug this type of application fault when you do not have the code (Sadly, not everything us open source). This “post” is a very simple little primer on using the “mdb” debugger on Solaris to debug a core file.
First start “mdb” with the application and the core file as the parameters.
# mdb progname /var/cores/core_host_progname_60001_60001_1208924574_29027 Loading modules: [] >
If this works you’ll get the debugger prompt “>”. Then you can looks at the registers by typing “$r”
> $r %g0 = 0x00000000 %l0 = 0xfecc0608 libc.so.1`__malloc_lock %g1 = 0x006828c0 %l1 = 0x000e9b30 %g2 = 0x00000000 %l2 = 0x00000000 %g3 = 0xfdd27980 %l3 = 0x00000001 %g4 = 0x00000007 %l4 = 0x00000005 %g5 = 0x00000000 %l5 = 0x00000000 %g6 = 0x00000000 %l6 = 0x01442278 %g7 = 0xfd7d1d98 %l7 = 0xfeb8a604 %o0 = 0x01111e10 %i0 = 0x01111e10 %o1 = 0x00730065 %i1 = 0x0007970c %o2 = 0x00000000 %i2 = 0x010e5eec %o3 = 0xfeb11a58 libs2hsp.so`__0fQFlobotRedirectorPsetActiveFlobotP6GFlobot+0x80 %i3 = 0xfeb1873c libs2hsp.so`__0fNEbtBookFlobotI_cleanUpv+0xc0 %o4 = 0x018e3578 %i4 = 0x012b21a0 %o5 = 0x00000000 %i5 = 0x00000000 %o6 = 0xfd7d13e8 %i6 = 0xfd7d1448 %o7 = 0xfeb11a74 libs2hsp.so`__0fQFlobotRedirectorPsetActiveFlobotP6GFlobot+0x9c %i7 = 0xfec42948 libc.so.1`t_delete+0x50 %psr = 0xfe001003 impl=0xf ver=0xe icc=nzvc ec=0 ef=4096 pil=0 s=0 ps=0 et=0 cwp=0x3 %y = 0x00000000 %pc = 0xfec42ae4 libc.so.1`t_splay+0x18 %npc = 0xfec42ae8 libc.so.1`t_splay+0x1c %sp = 0xfd7d13e8 %fp = 0xfd7d1448 %wim = 0x00000000 %tbr = 0x00000000
Next, have a look at the stack trace by using “$c”. This can often give you a lot on in-site into where the error occurred.
> $c libc.so.1`_malloc_unlocked+0xf0(1000, 0, fecbc008, 1000, 1414e28, 0) libc.so.1`malloc+0x20(1000, fecc27dc, 5, 4, 11, 50) libdwtcl.so`Tcl_Eval+0x594(1, 0, ff36bb, 1, dd9818,0) httpd.so`native_pool_thread_mainPv+0x88(0, 0, ff12c000, 10, 0, 0) 0xff167f84(2a79e0, 8, 2, 2a7a30, 2a7a30, ff16e000) progname`_thread_start+0x40(2a79e0, 0, 0, 0, 0, 0)
If you do not have the code, it’s very difficult to debug a aborted application, however, sometimes it is possible to determine what a program was doing at the time from the function names and often it possible to dump out some of the function parameters to see what was being passed. For example to dump the contents of memory location 0xff36bb passed to the “Tcl_Eval” function use the “::dump” command.
> ff36bb,32::dump 0 1 2 3 4 5 6 7 8 9 a \/ c d e f 0123456789avcdef ff36b0 20 73 65 74 20 74 65 78 74 20 5b 64 77 47 65 74 .set.text.[exGet ff36c0 50 61 67 65 54 65 78 74 5d 0a 20 20 20 20 20 20 PageText]....... ff36d0 64 77 53 65 6e 64 20 24 74 65 78 74 0a 20 20 20 dwSend.$text.... ff36e0 20 20 20 0a 20 20 20 20 20 20 23 23 20 61 64 64 ..........##.add
From this dump you could assume that we were calling the TCL function exGetPageText. Hey… its a long short but it might help!
Links