In fact, the test I made above with Connect timeout set to 1000 and response timeout to 2000 was on a single thread (user).
So the socket error was probably due to a connect timeout parameter too low...
I change those parameters and set connect timeout to 60 000 (1min) and response timeout to 360 000 (6min because sometimes we have requests that do not send response and we limit them to 5 minutes, it is very rare but this was blocking the scenario).
I removed this from JMeter.bat file :
set NEW=-XX:NewSize=128m -XX:MaxNewSize=128m
set SURVIVOR=-XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=50%
set TENURING=-XX:MaxTenuringThreshold=2
set RMIGC=-Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000 set PERM=-XX:PermSize=64m -XX:MaxPermSize=64m
I played my scenario in batch mode with 50 users. It appears that we do not have anymore threads that are blocked. Unfortunately we saw the following for most of our users:a request is played, the server respond with good delay (less than one second) and the next request is played an hour later which gives a 500 HTTP Error....
Example : if we have a look at the unit Group6. The following is played and written in the JTL File
<httpSample t="13" lt="13" ts="1410856270124" s="true" lb="/hopex/service.aspx?data=generationType-standard|generator-E98AEA3A4F717715" rc="200" rm="OK" tn="Groupe d'unités 1-6" dt="text" by="412">
<java.net.URL>http://172.16.1.23/hopex/service.aspx?data=generationType-standard|generator-E98AEA3A4F717715</java.net.URL>
</httpSample>
**played at 16/09/2014 10:31:10**
<httpSample t="0" lt="0" ts="1410856270138" s="true" lb="/hopex/statesessionprovider.aspx" rc="200" rm="OK" tn="Groupe d'unités 1-6" dt="text" by="238">
<java.net.URL>http://172.16.1.23/hopex/statesessionprovider.aspx</java.net.URL>
</httpSample>
**played at 16/09/2014 10:31:10**
<sample t="0" lt="0" ts="1410856274818" s="true" lb="Timer between steps" rc="200" rm="OK" tn="Groupe d'unités 1-6" dt="text" by="1478"/>
**played at 16/09/2014 10:31:15**
<httpSample t="3" lt="3" ts="1410860493293" s="false" lb="/Hopex/service.aspx?data=generationType-standard|generator-E98AEA3A4F717715" rc="500" rm="Internal Server Error" tn="Groupe d'unités 1-6" dt="text" by="298">
<java.net.URL>http://172.16.1.23/Hopex/service.aspx?data=generationType-standard|generator-E98AEA3A4F717715</java.net.URL>
</httpSample>
**played at 16/09/2014 11:41:33**
Most of the time, we have problems just after timers. Here, you can see that the last request has been played more than one hour after the previous one (which was a JMeter timer)...
Our application log shows that the last request has never been sent to the application.
So it means JMeter made a pause of more than one hour before sending the request.
It should be noted that if we remove the while statement from our scenario, it works.
It should be noted also that the errors do not apply near a while statement.
Since you were thinking that the server was overloaded, I registered Windows indicators with performance monitor.
It appears that the average CPU during the test was around 10% (probably because most of the threads stop). If I have a look at 10:31, the CPU does not go over 30%.
If I checked the memory consumption, there was 20 GB of RAM available when the problem occurred.
So, I think the server is not overloaded...
I retrieve this information from JMeter logs. It seems the problem comes from JMeter with stack overflow. I do not know how to solve this problem. I tried to change JMeter.bat parameters but we had side effects.
Here is a part of JMeter log :
2014/09/16 10:30:49 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:30:49 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:30:49 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:30:51 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:31:00 INFO - jmeter.reporters.Summariser: summary + 196 in 30s = 6.5/s Avg: 154 Min: 0 Max: 11347 Err: 0 (0.00%) Active: 50 Started: 50 Finished: 0
2014/09/16 10:31:00 INFO - jmeter.reporters.Summariser: summary = 5974 in 1103s = 5.4/s Avg: 406 Min: 0 Max: 47864 Err: 0 (0.00%)
2014/09/16 10:31:01 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:31:32 INFO - jmeter.reporters.Summariser: summary + 154 in 32s = 4.9/s Avg: 94 Min: 0 Max: 10982 Err: 0 (0.00%) Active: 50 Started: 50 Finished: 0
2014/09/16 10:31:32 INFO - jmeter.reporters.Summariser: summary = 6128 in 1135s = 5.4/s Avg: 399 Min: 0 Max: 47864 Err: 0 (0.00%)
2014/09/16 10:31:37 WARN - jmeter.control.GenericController: StackOverflowError detected
I am on this problem since one month now and I do not know how to solve it...
If you have an idea, I would really appreciate.
Regards
Sylvie