odd behavior when server is "down"

Ken Raeburn raeburn at raeburn.org
Sun Jan 10 21:59:26 CET 1999


As I've been doing before to exercise arla, I was running "du" in a
cell at MIT, over a slow modem link.

After some time -- almost half an hour -- I noticed it wasn't getting
anywhere.  I checked the system logs, and found a "lost connection"
message from about ten minutes after I started.  The du itself was
reporting nothing.

I ran "fs checks", and then (1) an "up again" message got logged, and
(2) du reported a whole bunch of "network is down" errors for various
files.  (The "up again" message was logged again about 30 seconds
later.)

IMHO, either the "network is down" error should be returned sooner, or
once the server is known to be up again, the user process should get
the data.  Getting "network is down" only at the point when the server
is known to be available again seems strange.  Is there some reason to
want it this way that's not occurring to me?

Jan 10 15:16:49 kr-pc arla[15120]: Lost connection to 18.185.0.35/afs3-fileserver
Jan 10 15:34:11 kr-pc arla[15120]: Server 18.185.0.35/afs3-fileserver up again
Jan 10 15:34:43 kr-pc arla[15120]: Server 18.185.0.35/afs3-fileserver up again

Ken

P.S.  I noticed some of the "lost connection"/"up again" messages are
only separated by a second or two, occasionally four; is there an easy
way to raise the period a little so it times out less often in the
first place?





More information about the Arla-drinkers mailing list