two bugs and a patch

Nickolai Zeldovich kolya at mit.edu
Fri Jan 5 05:41:16 CET 2001


It looks like rxi_DecongestionEvent decrements the refcount on its
rx_peer too early, so it can potentially be recycled while it is
still running (arlad crashed on me once earlier today, and rx_peer
was filled with 0xAA, which my osi_Free memsets all freed memory to.)
Attached below is a patch which should fix this.

I seem to have also ran into some bug with arla and FreeBSD 4.2, where
the contents of a directory don't seem to be validated, even when the
callback from the fileserver has been broken. While it doesn't seem to
be perfectly deterministic, at the moment I'm seeing this:

  freebsd% cd /afs/zepa.net/user/kolya
  freebsd% ls | grep -c -w Q
  0
  aix% cd /afs/zepa.net/user/kolya
  aix% touch Q ; ls | grep -c -w Q
  1
  freebsd% ls | grep -c -w Q ; ls Q
  0
  Q
  freebsd% cd / ; ls /afs/zepa.net/user/kolya | grep -c -w Q
  1

Another bug I've ran into this so far is a concurrency problem on the
rx_connHashTable, in particular in rxi_ReapConnections. Here's some
information on the crash (as I mentioned above, my osi_Free memsets
everything to 0xAA):

(gdb) bt
#0  0x8086b79 in rxi_CheckCall (call=0xaaaaaaaa) at rx.c:3237
#1  0x8087669 in rxi_ReapConnections () at rx.c:3596
#2  0x808b80b in rxevent_RaiseEvents (next=0x80f5f10) at rx_event.c:213
#3  0x808bc34 in rxi_Listener () at rx_user.c:283
[...]
(gdb) frame 1
#1  0x8087669 in rxi_ReapConnections () at rx.c:3596
3596                            rxi_CheckCall(conn->call[i]);
(gdb) print *conn
$1 = {next = 0xaaaaaaaa, peer = 0xaaaaaaaa, ...

Should there be some per-entry locking on rx_connHashTable, and maybe
some other hash tables in arla as well? It looks like LWP is preemptive
so something of this sort would be required, unless I'm confused.

-- kolya

--- rx.c	2000/11/25 22:36:28	1.16
+++ rx.c	2001/01/04 18:16:26
@@ -3398,7 +3402,6 @@
     struct rx_call *call;
     struct rx_call *nxcall;   /* Next pointer for queue_Scan */
 
-    peer->refCount--;		       /* It was bumped by the callee */
     peer->burst += nPackets;
     if (peer->burst > peer->burstSize)
 	peer->burst = peer->burstSize;
@@ -3415,8 +3418,10 @@
 	 */
 	rxi_Start((struct rxevent *) 0, call);
 	if (!peer->burst)
-	    return;
+	    goto done;
     }
+done:
+    peer->refCount--;		       /* It was bumped by the callee */
 }
 
 /*





More information about the Arla-drinkers mailing list