Replicated servers, milko?

Sat Jun 19 00:12:35 CEST 1999

On Friday, June 18, 1999, 5:51 PM -0400 Lyle Seaman
<LSeaman at stormsystems.com> wrote:

> It depends on how stringent you want your synchronization semantics to be.
> If you want guaranteed read-write replication, the client has to keep
> track of changes until the changes become "durable", in whatever form
> that takes. At present, "durable" simply means resident in the sole
> server's buffer cache.  If you use a quorum model of durability, then the
> client has to keep track of all the changes until the quorum is reached.

Or you can do what Ubik does - let the master server be reponsible for
contacting a quorum, and simply return an error to the client if an
operation fails without a quorum being reached.  The advantage to this
approach is that it could probably be done in such a way that unmodified
clients would continue to work (of course, clients that want to take full
advantage of R/W replication would have to understand the semantics of
multiple R/W sites in the VLDB).

> Yes, that's the Sprite and DFS model. The problem with doing only that in
> AFS is that the maximum callback interval is 4.5 hours.  Forcing servers
> to be effectively read-only for the first 4+ hours of uptime is untenable.
> Even with the DFS model, the server goes through a "token recovery period"
> for a few minutes during which it is less than completely useful.  For
> maximal availability, that recovery period needs to be minimal.

Yes, I agree.  I wasn't aware that the maximum callback interval was quite
that large.  The other thing you could do is have the new master
reinitialize the callback state on each cilent it hears from.  This would
have the same effect as if a single server crashed, recovered, and took an
update, all between the time a long callback was granted and when the
client in question tried to talk to the server again.

-- Jeff