is cache-only-prefixes an nnpfs limitation?

Tomas Olsson tol at stacken.kth.se
Sun Apr 9 22:00:09 CEST 2006


Adam Megacz <megacz at cs.berkeley.edu> writes:
> I often store large media files in AFS and seek around in them (ie access
> starting at random locations in the file).  Many of these files would fit
> in my disk cache, but moving the entire file over the network or crowding
> (almost) everything else out of my local cache would be bad.
>
True. ID3 tags and pdf indices are popular examples too.

> If I understand correctly, when used with arlad, nnpfs never gets file
> data directly from arlad -- it just asks arlad to place the file data in
> a particular file and then let the kernel know that it's there
>
Yup.

>   - Do remote-filesystem file have to map 1:1 exactly to local files?
>     Or do remote blocks map to local files?  Or do remote blocks map
>     to local blocks?
>
Traditionally it's 1:1 for data, plus possibly some lookup sugar. nnpfs
just has one "cache file" handle per fid.

For the current block prototype we use the same idea, but just split data
in fixed size chunks in the simplest possible way and use those cache files
as we use the single file today.  The chunk size is configurable but
assumed to be a power of 2.  For some reason I'm also assuming a 1:1
mapping between (fid, offset) and cache "block" file.

>   - If it is known that a remote file will only be read once, is it
>     possible to avoid writing it to the localfs disk?
>
Currently, not really.  One could fairly easily add code to fetch data
using the pioctl/wakeup_data buffers, but I'm not sure how efficient that
would be.  Once the block handling is in place and stable, one could start
thinking about other ways to store cache data, like anonymous mmap space.
Later.

The HPC folks are almost ready to kill for this (and cacheless writes).
Right now they'll have to resort to linking against arlad to do it.

BTW, what's the current status on ways for the user to communicate such
hints to the fs on various OS:es?  I just love Windows in this respect,
they have tons of flags. They even have FILE_OPEN_FOR_FREE_SPACE_QUERY.  I
wonder if it's ever used.

> I tried reading ./nnpfs/include/nnpfs/nnpfs_message.h from the block
> branch, and it seems to me that when the userland replies with an
> INSTALLDATA message, it tells the kernel not only what localfs file
> contains the requested block, but also *at what offset* in the localfs
> file the requested remotefs block is located.
>
The install messages aren't connected to the requests, there's a separate
WAKEUP to say "it's done" or signal error.  So most messages are standalone
and can be used by the daemon without nnpfs requesting it.  It's useful for
readahead of data and stat nodes.  INSTALLDATA includes the offset of the
block in the fid (not cache file) and enough information to find the cache
file.  For now, we put one block in each cache file for simplicity.

>   - Does this mean that the nnpfs kernel module maintains a big
>     complex mapping of (localfs_inode,offset)<->(remotefs_vnode,block#)?
>
The offset "is" the block#, I'd say
  (cache block handle)<->(fid,offset)

We're not yet clear about how to implement the cache policies (LRU?), but
the bookkeeping looks like it will be scary.  Finding the data at offset X
in some file needs to be fast, so we'll keep some map/list/tree in the
nnpfs node or maybe a huge global hash.  We need some kind of LRU to do
cache replacement.  We need some sort of reverse mapping to update the node
on eviction.

All in wired kernel memory.  Yikes.  Bright ideas are welcome.

> Lastly, is there a separate nnpfs mailing list, or is this the right
> place to discuss these questions?
>
It's the right place.

If y'all want to split nnpfs discussions to new list, convince me.  I just
haven't seen a need yet.

/t


More information about the Arla-drinkers mailing list