Bonnie confuses arla
Tomas Olsson
tol at stacken.kth.se
Wed Dec 8 18:28:23 CET 2004
Friedrich Delgado Friedrichs <delgado at dfn-cert.de> writes:
> delgado at count:/afs/dfn-cert.de/public/TEST$ bonnie -s 2000 -v 3
> Bonnie 1.4: File './Bonnie.12312', size: 2097152000, volumes: 3
> Writing with putc()... Bonnie: drastic I/O error (putc): No space left on device
>
(snip)
> The cache partition is full, but arla reports low cache use (with fs
> getcacheparms). There are no bonnie files in the test directory.
>
> /dev/sda12 1012M 960M 168K 100% /var/cache/afs
>
According to traditional AFS semantics, files are written to the file
servers upon close() and fsync(), and at that time the full length of the
file is written in a single transaction. Also, to keep things "simple",
Arla caches files as normal files on local disk. This means that if you
want to write a 2GB file in AFS, you need a 2GB+ cache size.
Most of the logic (such as cache size calculations and trimming) is in
arlad, but (IIRC) to avoid the cost of nnpfs-arlad communication, arlad
isn't currently informed of user file size updates until the file is closed
or fsynced or such.
So bonnie basically writes directly to the cache file, which is fast. Then
the cache volume is filled, before arlad is informed of the file size
change or has a chance to write back the data to the file server. Thus the
error message and full partition.
> ls: /afs: No such device
>
> arlad dies in this case and can't be restarted, since the listening
> socket is still in use. Reboot is the only thing that helps.
>
The "No such device" usually means that arlad has died. But I would expect
the sockets to be released when arlad crashes. If they don't, we should
probably figure out why. Is it easy to reproduce? Does it always happen
when arlad dies?
> Unfortunately it's not possible to backtrace arlad with gdb:
>
gdb used to be unhappy about our lwp-threads. The easiest thing to do is
probably to compile Arla with pthreads instead. There is also some old info
at http://www.stacken.kth.se/project/arla/gdb.html
> If you need any further information, debug output, whatever, please
> feel free to ask.
>
Well, if you manage to find a good test case, you can use
'fs arladebug almost-all' and 'fs nnpfsdebug almost-all' to turn on a _lot_
or debug logging. The syntax is a bit strange, use -almost-all to remove
flags. There are more selective options, too (try -help).
Arla comes with it's own test suite, try going into your build tree, cd
tests, set $WORKDIR and run ./run-tests -all -fast. Skip -fast if you want
it to take some time, or select individual tests for narrowing down.
If errors are easy to reproduce, one can always run the test (or bonnie) in
gdb or under strace to see what part of the test fails.
We really should try to get rid of get those oopses.
thanks
/Tomas
More information about the Arla-drinkers
mailing list