arla-0.38 on FreeBSD/amd64

Mon Jan 10 20:51:54 CET 2005

<<On Sat, 08 Jan 2005 14:44:22 +0100, Love <lha at stacken.kth.se> said:

> What types of benchmarks are you running ? We have the last few years
> concentrated on maintenance, bugfixes and functionallity. I have no doubt
> there could be improvements done to performance.

I've done a few different sorts of benchmarks, both simple
microbenchmarks with `dd' and also more complicated benchmarks like
bonnie++.  Here's an interesting test of the `dd' variety:

wollman at wollman-random-testing(37)$ pwd
/afs/csail.mit.edu/u/w/wollman
wollman at wollman-random-testing(38)$ rm foo
rm: foo: No such file or directory
wollman at wollman-random-testing(39)$ time dd if=/dev/zero of=foo bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 2.429002 secs (43169004 bytes/sec)

real    0m22.322s
user    0m0.000s
sys     0m0.543s

[We can see here that the kernel side of arla is doing exactly the
right thing and passing the writes through to the underlying cache
file as fast as it can; the reported speed is very close to the speed
of writing to a local file.  Then we wait 20 seconds at close for the
write-back to the server.  I waited about ten to twenty seconds after
this before executing the next command.]

wollman at wollman-random-testing(40)$ time dd if=foo of=/dev/null bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 30.001047 secs (3495131 bytes/sec)

real    0m30.013s
user    0m0.000s
sys     0m0.162s

[Why is it so slow?  It is clear from network traces that arla had
already turned in its callback on this file -- and did so within a few
seconds of the last close.]

wollman at wollman-random-testing(41)$ time dd if=foo of=/dev/null bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 0.140142 secs (748224137 bytes/sec)

real    0m0.150s
user    0m0.000s
sys     0m0.149s

[But if we give it less than a second from close to open, we still
have the callback and don't have to drag the whole file over the
network again.]

wollman at wollman-random-testing(42)$ time dd if=foo of=/dev/null bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 29.780244 secs (3521046 bytes/sec)

real    0m29.791s
user    0m0.000s
sys     0m0.162s

[Oops, waited too long and had to drag the whole file back over the
network again!]

wollman at wollman-random-testing(47)$ /usr/local/arla/bin/fs getcacheparms
Arla is using 26 of the cache's available 1851392 1K byte blocks
(and 2 of the cache's available 10000 vnodes)

I don't have an OpenAFS machine with comparable cache parameters to
compare against, so this may be the server's fault.  My feeling,
though, is that Arla is a little too eager to return callbacks,
particularly for large files which are expensive to drag across the
network.

>> Any advice as to whether lwp or pthreads is preferred, going forward?
>> pthreads is at least SMP-capable.

> The pthread code in arla is to emulate lwp-threads, so it have have the
> same problem as LWP threads, even when using it on a smp machine.

I don't quite understand this.  If you are using pthreads,
pthread_create() will give you multiple real threads; if you are using
"true" LWP, the best you can do is multiple simulated threads inside a
single real thread.

-GAWollman