2
votes

During integration my project with Ehcache (with BlockingCache decorator, which is internally using ReentrantLock) I found some strange behaviour on one machine. From time to time, threads that are waiting for a lock acquisition via ReentrantLock.tryLock(timeout,TimeUnit)-call for time greater than TIMEOUT are not being timeouted (with "false" result). JVM left them behind, and they are block until this lock is released by another thread. In short: from time to time ReentrantLock.tryLock(long timeout, TimeUnit unit) behaves like ReentrantLock.lock().

This situation occurs only on one machine, with a spec:

  • windows server 2003 R2 (5.2.3790 SP2 build 3790)
  • vmware esx 3.5
  • xeon x5675 (2x cpu x86 6 Model 44 Stepping 2 GenuineIntel)

I ran it on jre 1.6(1.6.0_31, 1.6.0_37)

I tested it on 2 other machines, and it works perfectly (this strange issue can not be reproduced there).

Here is my test code: It starts second thread which holds lock for 5seconds. Main thread tries to acquire it (with timout=50ms). From time to time (it breaks about 14 times for 1000 iterations on the machine) main thread does not timeout after 50ms but waits until second thread releases the lock (=5s).

@Test
public void test() throws Exception {
    System.out.println("debug___");
    int failCount = 0;
    final int REPEAT_TIMES = 50 * 1000;
    for (int i = 0; i < REPEAT_TIMES; i++) {
        System.out.println("");
        try {
            testOne(i);
            System.out.println("debug___" + i + ":ok");
        } catch (Throwable e) {
            failCount++;
            System.err.println(i + ":" + e.getMessage());
            System.out.println(i + ":" + e.getMessage());
            System.out.println("debug___" + i + ":fail");
        }

    }

    System.out.println("fails:" + failCount);
    Assert.assertEquals("Failed " + failCount + "/" + REPEAT_TIMES, 0, failCount);
}

public void testOne(final int i)  throws Exception{
    final StringBuilder sb = new StringBuilder();
    try {
        final ReentrantLock lock = new ReentrantLock();
        final Semaphore s = new Semaphore(0);

        Thread t1 = new Thread() {
            @Override
            public void run() {

                try {
                    sb.append(i + ":" + 1).append("\n");
                    boolean locked = lock.tryLock(1000, TimeUnit.MILLISECONDS);
                    sb.append(i + ":" + 2 + " " + locked).append("\n");
                    s.release();
                    sb.append(i + ":" + 3).append("\n");
                    Thread.sleep(5000); // !!! release lock after 5s
                    sb.append(i + ":" + 4 + " \n");
                    lock.unlock();
                    Thread.sleep(2000);
                } catch (Throwable e) {
                    e.printStackTrace();
                }

            }
        };

        t1.start();
        sb.append(i + ":" + "m1").append("\n");
        s.acquire();
        sb.append(i + ":" + "m2").append("\n");

        long start = System.currentTimeMillis();

        boolean tryLock = lock.tryLock(50, TimeUnit.MILLISECONDS); // try acquire lock for 50ms

        long stop = System.currentTimeMillis();
        long diff = stop - start;

        if (tryLock)
            throw new IllegalStateException("it is really bad.... blocked for a [ms]:" + diff);
    } finally {
        System.out.print(sb.toString());
    }

}

Thanks for any suggestions!

UPDATE

Here's a second tests that blocks the machine:

import java.util.concurrent.locks.*;
import java.util.concurrent.*;

public class TestLock2 {
final ReentrantLock lock = new ReentrantLock();
volatile int inx = 0;

public static void main(String[] a) throws Exception {
    TestLock2 t = new TestLock2();
    t.test();
}

public void log(String s) {
    System.out.println(s);
    System.err.println(s);
}

public void test() throws Exception {
    Thread t1 = new Thread() {
        @Override
        public void run() {
            lock.lock();
            System.out.println("ok");
            while (true) {
                try {
                    Thread.sleep(60 * 1000);
                    log(inx
                            + "                                                     "
                            + new java.util.Date());
                } catch (Exception e) {
                }
            }
        }
    };
    t1.start();
    Thread.sleep(500); // wait for lock to be locked

    System.out.println("debug___");
    int failCount = 0;
    final int REPEAT_TIMES = 20000;
    for (int i = 0; i < REPEAT_TIMES; i++) {
        try {
            inx = i;
            testOne(i);
        } catch (Throwable e) {
            failCount++;
            log(i + ":" + e.getLocalizedMessage());
            log("debug___" + i + ":fail");
        }
    }
    log("Failed " + failCount + "/" + REPEAT_TIMES);
}

public void testOne(final int i) throws Exception {
    long start = System.currentTimeMillis();

    try {
        boolean tryLock = lock.tryLock(200, TimeUnit.MILLISECONDS);
    } finally {
        long diff = System.currentTimeMillis() - start;
        log(i + ":" + diff + "ms");
    }
}

}

and the output is

0:203ms 1:203ms 2:203ms 3:203ms 4:203ms 5
Sun May 26 10:36:22 EDT 2013 5
Sun May 26 10:37:22 EDT 2013 5
Sun May 26 10:38:22 EDT 2013 ..................... // 20 hours later (log was cut here) 5
Mon May 27 06:00:26 EDT 2013 5
Mon May 27 06:01:26 EDT 2013 5
Mon May 27 06:02:26 EDT 2013 5
Mon May 27 06:03:26 EDT 2013

and Thread Dump:

2013-05-27 04:51:51 Full thread dump Java HotSpot(TM) Client VM (20.6-b01 mixed mode, sharing):

"Thread-0" prio=6 tid=0x02b78400 nid=0xdf0 waiting on condition [0x02ebf000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at TestLock2$1.run(TestLock2.java:26)

"Low Memory Detector" daemon prio=6 tid=0x02b5a400 nid=0x218c runnable [0x00000000] java.lang.Thread.State: RUNNABLE

"C1 CompilerThread0" daemon prio=10 tid=0x02b55400 nid=0x7a4 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE

"Attach Listener" daemon prio=10 tid=0x02b52c00 nid=0x2450 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x02b51800 nid=0x11d4 runnable [0x00000000] java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=0x02b4c000 nid=0x13bc in Object.wait() [0x02cdf000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x22991148> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x22991148> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x02b47400 nid=0x2634 in Object.wait() [0x02c8f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x22991048> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x22991048> (a java.lang.ref.Reference$Lock)

"main" prio=6 tid=0x002a6c00 nid=0x1fd4 waiting on condition [0x0090f000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x229c10d8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:905) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1224) at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:416) at TestLock2.testOne(TestLock2.java:58) at TestLock2.test(TestLock2.java:44) at TestLock2.main(TestLock2.java:10)

"VM Thread" prio=10 tid=0x02b0a800 nid=0x11d0 runnable

"VM Periodic Task Thread" prio=10 tid=0x02b65400 nid=0x1efc waiting on condition

JNI global references: 975

Any ideas?

1

1 Answers

0
votes

A possible explanation can be that after the main process gets the system time and the thread will start to sleep, the system process scheduler removes your process from execution. After 5 seconds, continues execution: thread already slept 5 seconds and the main threads gets the lock. It's kinda strange, though.

Anyway I recommend you to use a profiler, that will let you to see what really is happening.