Problems with many open files using Arla and Mosix

Jacob Gorm Hansen jg at ioi.dk
Fri Jun 8 12:56:29 CEST 2001


On Fri, Jun 08, 2001 at 02:40:59AM +0200, Magnus Ahltorp wrote:
> Excuse me for not knowing anything about Mosix, but where are you
> running it? On the server or on the clients, or both? The "AFS
> clients" that you say cannot allocate more memory, is that Arla, or do
> you mean tar? What error message do you get?
> 
> Do you mean that the client opens that many files? tar shouldn't open
> lots of files, just one at a time (except for directories).

I had some sleep and did some more testing.

Right now it seems to boil down to this:

If mosix is not patched into the kernel everything is fine. As soon as mosix
goes in, problems start to occur - even though actual process migration has
not been enabled at all.

If the client (not necessarily also the server) has mosix, performing lots of
file operations in succession (like rm -Rf on the whole kernel tree) will 
fail after a while, with a 'Cannot allocate memory' error.
After this, arla fails with the same memory error when trying to 'ls' any 
directory not in it's cache (strace shows that lstat64() returns -1).

Unmounting /afs, killing arlad and removing the arla xfs.o kernel module,
and then restarting/remounting againg, makes the problem go away, so the 
'broken' state seems to be held by Arla, even though I suspect mosix to be
the original cause of trouble.

I do not suspect the amount of open files to be the culprit any longer, since
this number stays stable on the client (the OpenAFS server still uses way too
many IMHO but never mind that for now).

The problem should be possible to reproduce by building a 2.4.5 kernel patched
with Mosix 1.0.3, building arla 0.35.4 for it (we currently do not use kerberos so
everything is done with system:anyuser rights but the problem has also been
seen when properly logged in), and then untarring the linux kernel into /afs
space. The server needs not run mosix for things to go wrong.

I see great potential in combining Mosix and AFS, especially for distributed
makes, due to the caching advantages AFS holds over NFS or Mosix' MFS.

Any help will be appreciated (we have to turn in a paper on this next week ;-)).

Best,
Jacob
-- 
always always avoid redundancy





More information about the Arla-drinkers mailing list