Another debug run with the arla lockup problem
Neulinger, Nathan R.
nneul at umr.edu
Wed Feb 17 18:11:17 CET 1999
I can get this particular problem to reproduce itself 100% reliably.
Sequence of actions:
1. Boot machine
2. Clear cache completely
3. Start arla
4. Enable arla and xfs debugging
5. Start web server
(the above is all in rc startup stuff)
6. log in as root and run ~websrch/bin/run-build
Symptom: run-build sits there eating cpu, but not doing anything
Contents of run-build:
----------------------
#!/bin/csh
set echo
setenv LOG /local/logs/build-`date +%Y%m%d`.log
find /local/logs -mtime +4 -exec rm -f {} \;
rm -f $LOG
touch $LOG
/umr/s/websrch/bin/build >& $LOG </dev/null &
----------------------
build is another script. No output is ever written to LOG and it appears
that build is never started, even though run-build does exit and continue in
the background.
As indicated in the debug trace
(http://www.umr.edu/~nneul/debug-traces/arla-webindex-19990217.gz), arla
seems to get into a state where it spins doing getnode/installnode with no
xfs logging activity. That is around line 1000 of the log. It looks like the
last xfs activity taking place is an xfs_node_find.
This is running the 2-13 snapshot of arla with my setgroups patch.
Oh, BTW, something else - on this machine if I leave it running, at some
point it seems to get into a state where it spins doing clear_all_childs.
(Can't get into it to get the debug output and console spinning too fast to
get any more details.)
993 Feb 17 10:25:15 webindex kernel: xfs_node_find
994 Feb 17 10:25:15 webindex kernel: xfs_message_installnode: dp: c35226a8
995 Feb 17 10:25:15 webindex kernel: xfs_message_installnode: fetching new
node
996 Feb 17 10:25:15 webindex kernel: new_xfs_node 0.536956207.219.1050
997 Feb 17 10:25:15 webindex kernel: xfs_node_find
998 Feb 17 10:25:15 webindex arla[350]: worker 0: processing
999 Feb 17 10:25:15 webindex arla[350]: Rec message: opcode = 4 (getnode),
s
Looking at the code, it looks like it never gets past new_xfs_node in the
following.
XFSDEB(XDEBMSG, ("xfs_message_installnode: fetching new node\n"));
n = new_xfs_node(&xfs[fd], &message->node); /* VN_HOLD's */
XFSDEB(XDEBMSG, ("xfs_message_installnode: inode: %p aliases: ",
XNODE_TO_VNODE(n)));
Now, if xfs_node_find were to crash or never return - would arlad get
confused?
I'm thinking of adding a bunch of debug statements scattered around the that
area of the code to see if I find anything.
-- Nathan
------------------------------------------------------------
Nathan Neulinger EMail: nneul at umr.edu
University of Missouri - Rolla Phone: (573) 341-4841
Computing Services Fax: (573) 341-4216
More information about the Arla-drinkers
mailing list