another arlad crash on netbsd
Ken Raeburn
raeburn at raeburn.org
Sun Jan 3 09:22:32 CET 1999
I was running a "du" across a modem line (ppp) that probably had a
bunch of other traffic as well (mail & news downloads, X11), and when
I went to look at the output, after some numbers for the first many
directories, I saw a lot of "network is down" messages for individual
files, then:
du: ./.mh/save/1610: Network is down
du: ./.mh/save/1616: Network is down
du: ./.mh/save/.mh_sequences: Network is down
751 ./.mh/save
du: ./.mh/Zephyr: Operation not supported by device
du: ./.mh/ANSI_C: Not a directory
du: ./.mh/tcl: Not a directory
The "not a directory" stuff seems to come up when arlad isn't running,
so I'm guessing that that's the point when it crashed, and the
"network is down" came from having a heavy load on the ppp link, but
I'm just guessing.
The crash was different this time:
#0 0x9734 in throw_entry (entry=0x118ae0) at ../../arlad/fcache.c:455
455 ret = RXAFS_GiveUpCallBacks (conn->connection, &fids, &cbs);
(gdb) p conn
$1 = (ConnCacheEntry *) 0x0
(gdb) bt
#0 0x9734 in throw_entry (entry=0x118ae0) at ../../arlad/fcache.c:455
#1 0x9c51 in cleaner (arg=0x0) at ../../arlad/fcache.c:567
#2 0x3a2fd in Create_Process_Part2 () at ../../lwp/lwp.c:629
#3 0xfffefdfc in ?? ()
#4 0x1 in ?? ()
#5 0x3a09a in LWP_MwaitProcess (wcount=1, evlist=0xefbfce6c)
at ../../lwp/lwp.c:567
#6 0x3a140 in LWP_WaitProcess (event=0x4c60) at ../../lwp/lwp.c:585
#7 0x5345 in main (argc=0, argv=0xefbfd8b8) at ../../arlad/arla.c:910
(gdb)
Looks like conn_get returned NULL. Which means that internal_get
returned NULL, or e->parent was null and e->flags.alivep was zero. A
NULL return from internal_get should only happen if connected_mode is
DISCONNECTED, but gdb shows it as being CONNECTED. I'm guessing that
the "network is down" messages imply that the connection's alivep flag
may have been zero....
And entry->host does correspond to the host holding the volume I was
examining. (However, using Transarc "fs whereis" on the volume after
restarting arlad, I get a backwards IP address printed out,
"30.0.185.18" when it should presumably be "18.185.0.30" or
"cronos.mit.edu". Perhaps AFS and Arla are using different byte
orders for that datum.)
Unfortunately, I didn't have debug logging turned on after recently
rebooting my system.
Ken
More information about the Arla-drinkers
mailing list