Debugging arla on FreeBSD 4.0

Robert Ricci ricci at siren.eng.utah.edu
Sat Apr 1 05:24:46 CEST 2000


Well, I traced the function calls back to process_message(), and
discovered that:

At the point where xfs_message_recieve() was called (leading to
the eventual assertion failure), header->size was 40, but
msg_len was only 16. To me this would suggest that arlad was
somehow passed an invalid message by the kernel module. Looking
at the calling stack frame, process_message() was called with a
msg_len of 65536. Co-incidentally, this is MAX_XMSG_SIZE .
Hmm....

Looks to me like the queue for the channel filled up, with more
than 2^16 bytes of messages to send. The message that caused the
core dump was sent by xfs_reclaim, so maybe this happened during
a period of cache cleaning?

I'm still looking at the code, so I'll let you know if I have
any other insights.

Thus spake Assar Westerlund on Sat, Apr 01, 2000 at 01:47:34AM +0200:
> Robert P Ricci <ricci at eng.utah.edu> writes:
> > I've had arlad 0.32 crash on me a few times in the past weeks.
> > After examining a core dump from the latest crash, I was able
> > to track the problem down to a failed assertion in volcache.c,
> > line 588:
> > 
> > if (db_servers == NULL || num_db_servers == 0) {
> >         arla_warnx (ADEBWARN,
> >                     "Cannot find any db servers in cell %d(%s) while "
> >                     "getting data for volume `%s'",
> >                     cell, cell_num2name(cell), name);
> > ----->  assert (cell_is_sanep (cell)); <-----
> >         return ENOENT;
> >     }
> > }
> > 
> > Any ideas on how arlad could get into this state, or what code I
> > should look at to investigate it further? Thanks!
> 
> I assume that `cell' (and the rest of the parameters) are garbage?  If
> that's the case, I'm afraid that my theory is that something is
> sending down a bogus Fid to the volume cache and that's why it's
> crashing there.  Can you tell us from where the bogus information has
> been propagation, i.e. where is the source of the bogus information
> that get_info_loop() has gotten?
> 
> /assar

-- 
/-----------------------------------------------------------
| Robert Ricci - <ricci at eng.utah.edu>
| University of Utah - CADE Lab operator
| "Boredom comes to those who wait" - The Pietasters
\-----------------------------------------------------------





More information about the Arla-drinkers mailing list