setgroups() problem w/ arla module loaded

Steven N. Hirsch shirsch at adelphia.net
Tue Jun 22 13:23:10 CEST 2004


On Tue, 22 Jun 2004, Harald Barth wrote:

> 
> > Nice work on the latest arla!  It seems to work properly under SuSE 9.1 
> > with the 2.6.4 kernel.
> 
> Almost, as you noticed, but more below.
> 
> > However, with /afs active (arlad and kernel module loaded), postfix is 
> > going down with this in the system logs:
> > 
> > Jun 21 07:51:48 monarch postfix/master[5288]: fatal: setgroups(1, &0): Bad address
> > 
> > If I turn the service off, it runs fine.
> > 
> > Is this a known/expected issue?  Something to do with how you manage PAG 
> > based credentials?
> 
> Right. We (*) noticed the problem directly when we finally tried to
> build on SuSE 9.1. The nnpfs kernel module broke setgroups(). It _is_
> a bug, but it hides quite well if you use the kernel.org kernels which
> we used first. The differences between a x.y.z from kernel.org and a
> x.y.z from SuSE and a x.y.z from Fedora2 can be quite significant.
> After some really frustrating days (at least for me) we finally found
> the bug and it resulted in the stuff you find here:
> 
> file:///afs/stacken.kth.se/ftp/pub/arla/distributions/SuSE-9.1/
>           ftp://ftp.stacken.kth.se/arla/distributions/SuSE-9.1/

Slight correction: That should be:

ftp://ftp.stacken.kth.se/pub/arla/distributions/SuSE-9.1/
                        ^^^
> The bug fixes in the SRPM will be merged or are allready merged into
> the main repository.

Ok, I see the fix for the setgroups() issue - thanks.

> The arla-0.36-4 RPM has nnpfs kernel modules for the following
> kernels:

Actually, it does not have prebuilt modules for the 2.6.5-7* kernels. Not 
a big deal, since I can build them myself...

> kernel-default-2.6.5-7.75
> kernel-syms-2.6.5-7.75
> kernel-smp-2.6.5-7.75

> If you use another kernel (all days are "new-kernel" days), you need a
> build environment and to use the SRPM. 
> 
> Known issues:
> 
>       * You need an ext[23] partition mounted at /var/cache/arla

Thanks for the warning!  The only misbehavior I saw was that the 'iozone' 
benchmark would simply hang.

>       Reiserfs has mangled my cache serveral times :-( As ext2 is
>       faster than ext3 and there is no valuable data in there,
>       I use mostly ext2. There are checks in /etc/init.d/arla 
>       to prevent you doing different. If you know what you are
>       doing, you can modify the script :-)

>       * On SMP, there might still be some lock issues lurking,
>       sometimes processes hang. But that can be solved by
>       interrupting the hanging process with Control-C. Tell
>       us if you have these or even know how to reproduce.

This actually could be what I'm seeing with 'iozone'.  If so, grab a copy 
of the older 'iozone' (NOT iozone++) and do:

$ iozone auto

in an AFS volume.

> We havn't hade any chance to test this stuff for a longer time, so
> surprises might still be lurking. It hasn't blown up my files, but
> that is no guarantee that it won't blow up in your face. Backup is a
> good.

You are still ahead of the OpenAFS folks.  Although I'm told they and the 
Linux kernel developers have made peace, there still is nothing usable 
available for the 2.6 kernels.  Your solution for managing PAGs seems 
quite simple.  Is there some reason that hooking the syscall wouldn't work 
for OpenAFS?

Steve






More information about the Arla-drinkers mailing list