During integration my project with Ehcache (with BlockingCache decorator, which is internally using ReentrantLock) I found some strange behaviour on one machine. From time to time, threads that are waiting for a lock acquisition via ReentrantLock.tryLock(timeout,TimeUnit)-call for time greater than TIMEOUT are not being timeouted (with "false" result). JVM left them behind, and they are block until this lock is released by another thread. In short: from time to time ReentrantLock.tryLock(long timeout, TimeUnit unit) behaves like ReentrantLock.lock().
This situation occurs only on one machine, with a spec:
- windows server 2003 R2 (5.2.3790 SP2 build 3790)
- vmware esx 3.5
- xeon x5675 (2x cpu x86 6 Model 44 Stepping 2 GenuineIntel)
I ran it on jre 1.6(1.6.0_31, 1.6.0_37)
I tested it on 2 other machines, and it works perfectly (this strange issue can not be reproduced there).
Here is my test code: It starts second thread which holds lock for 5seconds. Main thread tries to acquire it (with timout=50ms). From time to time (it breaks about 14 times for 1000 iterations on the machine) main thread does not timeout after 50ms but waits until second thread releases the lock (=5s).
@Test
public void test() throws Exception {
System.out.println("debug___");
int failCount = 0;
final int REPEAT_TIMES = 50 * 1000;
for (int i = 0; i < REPEAT_TIMES; i++) {
System.out.println("");
try {
testOne(i);
System.out.println("debug___" + i + ":ok");
} catch (Throwable e) {
failCount++;
System.err.println(i + ":" + e.getMessage());
System.out.println(i + ":" + e.getMessage());
System.out.println("debug___" + i + ":fail");
}
}
System.out.println("fails:" + failCount);
Assert.assertEquals("Failed " + failCount + "/" + REPEAT_TIMES, 0, failCount);
}
public void testOne(final int i) throws Exception{
final StringBuilder sb = new StringBuilder();
try {
final ReentrantLock lock = new ReentrantLock();
final Semaphore s = new Semaphore(0);
Thread t1 = new Thread() {
@Override
public void run() {
try {
sb.append(i + ":" + 1).append("\n");
boolean locked = lock.tryLock(1000, TimeUnit.MILLISECONDS);
sb.append(i + ":" + 2 + " " + locked).append("\n");
s.release();
sb.append(i + ":" + 3).append("\n");
Thread.sleep(5000); // !!! release lock after 5s
sb.append(i + ":" + 4 + " \n");
lock.unlock();
Thread.sleep(2000);
} catch (Throwable e) {
e.printStackTrace();
}
}
};
t1.start();
sb.append(i + ":" + "m1").append("\n");
s.acquire();
sb.append(i + ":" + "m2").append("\n");
long start = System.currentTimeMillis();
boolean tryLock = lock.tryLock(50, TimeUnit.MILLISECONDS); // try acquire lock for 50ms
long stop = System.currentTimeMillis();
long diff = stop - start;
if (tryLock)
throw new IllegalStateException("it is really bad.... blocked for a [ms]:" + diff);
} finally {
System.out.print(sb.toString());
}
}
Thanks for any suggestions!
UPDATE
Here's a second tests that blocks the machine:
import java.util.concurrent.locks.*;
import java.util.concurrent.*;
public class TestLock2 {
final ReentrantLock lock = new ReentrantLock();
volatile int inx = 0;
public static void main(String[] a) throws Exception {
TestLock2 t = new TestLock2();
t.test();
}
public void log(String s) {
System.out.println(s);
System.err.println(s);
}
public void test() throws Exception {
Thread t1 = new Thread() {
@Override
public void run() {
lock.lock();
System.out.println("ok");
while (true) {
try {
Thread.sleep(60 * 1000);
log(inx
+ " "
+ new java.util.Date());
} catch (Exception e) {
}
}
}
};
t1.start();
Thread.sleep(500); // wait for lock to be locked
System.out.println("debug___");
int failCount = 0;
final int REPEAT_TIMES = 20000;
for (int i = 0; i < REPEAT_TIMES; i++) {
try {
inx = i;
testOne(i);
} catch (Throwable e) {
failCount++;
log(i + ":" + e.getLocalizedMessage());
log("debug___" + i + ":fail");
}
}
log("Failed " + failCount + "/" + REPEAT_TIMES);
}
public void testOne(final int i) throws Exception {
long start = System.currentTimeMillis();
try {
boolean tryLock = lock.tryLock(200, TimeUnit.MILLISECONDS);
} finally {
long diff = System.currentTimeMillis() - start;
log(i + ":" + diff + "ms");
}
}
}
and the output is
0:203ms 1:203ms 2:203ms 3:203ms 4:203ms 5
Sun May 26 10:36:22 EDT 2013 5
Sun May 26 10:37:22 EDT 2013 5
Sun May 26 10:38:22 EDT 2013 ..................... // 20 hours later (log was cut here) 5
Mon May 27 06:00:26 EDT 2013 5
Mon May 27 06:01:26 EDT 2013 5
Mon May 27 06:02:26 EDT 2013 5
Mon May 27 06:03:26 EDT 2013
and Thread Dump:
2013-05-27 04:51:51 Full thread dump Java HotSpot(TM) Client VM (20.6-b01 mixed mode, sharing):
"Thread-0" prio=6 tid=0x02b78400 nid=0xdf0 waiting on condition [0x02ebf000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at TestLock2$1.run(TestLock2.java:26)
"Low Memory Detector" daemon prio=6 tid=0x02b5a400 nid=0x218c runnable [0x00000000] java.lang.Thread.State: RUNNABLE
"C1 CompilerThread0" daemon prio=10 tid=0x02b55400 nid=0x7a4 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE
"Attach Listener" daemon prio=10 tid=0x02b52c00 nid=0x2450 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x02b51800 nid=0x11d4 runnable [0x00000000] java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=8 tid=0x02b4c000 nid=0x13bc in Object.wait() [0x02cdf000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x22991148> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x22991148> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x02b47400 nid=0x2634 in Object.wait() [0x02c8f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x22991048> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x22991048> (a java.lang.ref.Reference$Lock)
"main" prio=6 tid=0x002a6c00 nid=0x1fd4 waiting on condition [0x0090f000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x229c10d8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:905) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1224) at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:416) at TestLock2.testOne(TestLock2.java:58) at TestLock2.test(TestLock2.java:44) at TestLock2.main(TestLock2.java:10)
"VM Thread" prio=10 tid=0x02b0a800 nid=0x11d0 runnable
"VM Periodic Task Thread" prio=10 tid=0x02b65400 nid=0x1efc waiting on condition
JNI global references: 975
Any ideas?