another arlad crash on netbsd

Ken Raeburn raeburn at raeburn.org
Sun Jan 3 09:22:32 CET 1999


I was running a "du" across a modem line (ppp) that probably had a
bunch of other traffic as well (mail & news downloads, X11), and when
I went to look at the output, after some numbers for the first many
directories, I saw a lot of "network is down" messages for individual
files, then:

    du: ./.mh/save/1610: Network is down
    du: ./.mh/save/1616: Network is down
    du: ./.mh/save/.mh_sequences: Network is down
    751     ./.mh/save
    du: ./.mh/Zephyr: Operation not supported by device
    du: ./.mh/ANSI_C: Not a directory
    du: ./.mh/tcl: Not a directory

The "not a directory" stuff seems to come up when arlad isn't running,
so I'm guessing that that's the point when it crashed, and the
"network is down" came from having a heavy load on the ppp link, but
I'm just guessing.

The crash was different this time:

#0  0x9734 in throw_entry (entry=0x118ae0) at ../../arlad/fcache.c:455
455             ret = RXAFS_GiveUpCallBacks (conn->connection, &fids, &cbs);
(gdb) p conn
$1 = (ConnCacheEntry *) 0x0

(gdb) bt
#0  0x9734 in throw_entry (entry=0x118ae0) at ../../arlad/fcache.c:455
#1  0x9c51 in cleaner (arg=0x0) at ../../arlad/fcache.c:567
#2  0x3a2fd in Create_Process_Part2 () at ../../lwp/lwp.c:629
#3  0xfffefdfc in ?? ()
#4  0x1 in ?? ()
#5  0x3a09a in LWP_MwaitProcess (wcount=1, evlist=0xefbfce6c)
    at ../../lwp/lwp.c:567
#6  0x3a140 in LWP_WaitProcess (event=0x4c60) at ../../lwp/lwp.c:585
#7  0x5345 in main (argc=0, argv=0xefbfd8b8) at ../../arlad/arla.c:910
(gdb) 

Looks like conn_get returned NULL.  Which means that internal_get
returned NULL, or e->parent was null and e->flags.alivep was zero.  A
NULL return from internal_get should only happen if connected_mode is
DISCONNECTED, but gdb shows it as being CONNECTED.  I'm guessing that
the "network is down" messages imply that the connection's alivep flag
may have been zero....

And entry->host does correspond to the host holding the volume I was
examining.  (However, using Transarc "fs whereis" on the volume after
restarting arlad, I get a backwards IP address printed out,
"30.0.185.18" when it should presumably be "18.185.0.30" or
"cronos.mit.edu".  Perhaps AFS and Arla are using different byte
orders for that datum.)

Unfortunately, I didn't have debug logging turned on after recently
rebooting my system.

Ken





More information about the Arla-drinkers mailing list