I encountered an issue with the Mule ESB FTP Transport: when polling, the thread running the client would hang indefinitely without throwing an error. This causes FTP poll to stop completely. Mule uses Apache Commons Net FTPClient.
Looking further into the code, I think it is caused by the SocketTimeout of the FTPClient not being set, sometime causing infinite hanging when reading lines from the FTPClient's socket.
We can clearly see the problem in these stacks retrieved with jstack when the problem occured. The __getReply() function seems to be the more direct link to the problem.
This one hanging on connect() call when creating a new FTPClient:
receiver.172 prio=10 tid=0x00007f23e43c8800 nid=0x2d5 runnable [0x00007f24c32f1000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
- locked <0x00000007817a9578> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
- locked <0x00000007817a9578> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:294)
at org.apache.commons.net.ftp.FTP._connectAction_(FTP.java:364)
at org.apache.commons.net.ftp.FTPClient._connectAction_(FTPClient.java:540)
at org.apache.commons.net.SocketClient.connect(SocketClient.java:178)
at org.mule.transport.ftp.FtpConnectionFactory.makeObject(FtpConnectionFactory.java:33)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1188)
at org.mule.transport.ftp.FtpConnector.getFtp(FtpConnector.java:172)
at org.mule.transport.ftp.FtpConnector.createFtpClient(FtpConnector.java:637)
at org.mule.transport.ftp.FtpMessageReceiver.listFiles(FtpMessageReceiver.java:134)
at org.mule.transport.ftp.FtpMessageReceiver.poll(FtpMessageReceiver.java:94)
at org.mule.transport.AbstractPollingMessageReceiver.performPoll(AbstractPollingMessageReceiver.java:216)
at org.mule.transport.PollingReceiverWorker.poll(PollingReceiverWorker.java:80)
at org.mule.transport.PollingReceiverWorker.run(PollingReceiverWorker.java:49)
at org.mule.transport.TrackingWorkManager$TrackeableWork.run(TrackingWorkManager.java:267)
at org.mule.work.WorkerContext.run(WorkerContext.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00000007817a3540> (a java.util.concurrent.ThreadPoolExecutor$Worker)
And the other hanging on pasv() call when using listFiles():
receiver.137" prio=10 tid=0x00007f23e433b000 nid=0x7c06 runnable [0x00007f24c2fee000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
- locked <0x0000000788847ed0> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
- locked <0x0000000788847ed0> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:294)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:490)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:534)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:583)
at org.apache.commons.net.ftp.FTP.pasv(FTP.java:882)
at org.apache.commons.net.ftp.FTPClient._openDataConnection_(FTPClient.java:497)
at org.apache.commons.net.ftp.FTPClient.initiateListParsing(FTPClient.java:2296)
at org.apache.commons.net.ftp.FTPClient.initiateListParsing(FTPClient.java:2269)
at org.apache.commons.net.ftp.FTPClient.initiateListParsing(FTPClient.java:2189)
at org.apache.commons.net.ftp.FTPClient.initiateListParsing(FTPClient.java:2132)
at org.mule.transport.ftp.FtpMessageReceiver.listFiles(FtpMessageReceiver.java:135)
at org.mule.transport.ftp.FtpMessageReceiver.poll(FtpMessageReceiver.java:94)
at org.mule.transport.AbstractPollingMessageReceiver.performPoll(AbstractPollingMessageReceiver.java:216)
at org.mule.transport.PollingReceiverWorker.poll(PollingReceiverWorker.java:80)
at org.mule.transport.PollingReceiverWorker.run(PollingReceiverWorker.java:49)
at org.mule.transport.TrackingWorkManager$TrackeableWork.run(TrackingWorkManager.java:267)
at org.mule.work.WorkerContext.run(WorkerContext.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x0000000788832180> (a java.util.concurrent.ThreadPoolExecutor$Worker)
I think the problem is caused by the use of the default FTPClient constructor (extending SocketClient) in Mule default FtpConnectionFactory.
Note the setConnectTimeout() values seems to be used only when calling socket.connect(), but ignored on other operations using the same socket:
protected FTPClient createFtpClient()
{
FTPClient ftpClient = new FTPClient();
ftpClient.setConnectTimeout(connectionTimeout);
return ftpClient;
}
It uses the FTPClient() constructor, itself using SocketClient with a 0 timeout, defined when creating the socket.
public SocketClient()
{
...
_timeout_ = 0;
...
}
And then we call connec(), which calls _ connectAction()_.
In SocketClient:
protected void _connectAction_() throws IOException
{
...
_socket_.setSoTimeout(_timeout_);
...
}
In FTP, a new Reader is instanciated with our everlasting socket:
protected _connectAction_(){
...
_controlInput_ =
new BufferedReader(new InputStreamReader(_socket_.getInputStream(),
getControlEncoding()));
...
}
Then, when calling __getReply() function, we use this Reader-with-everlasting-socket:
private void __getReply() throws IOException
{
...
String line = _controlInput_.readLine();
...
}
Sorry for the long post, but I think this required correct explanations. A solution may be to call setSoTimeout() just after connect(), to define a Socket Timeout.
Having a default timeout does not seem an acceptable solution, as each users may have different needs and a default is not suitable in any case. https://issues.apache.org/jira/browse/NET-35
Finally, this raises 2 questions:
- It seems like a bug to me, as it will completely stops FTP polling without giving error. What do you think?
- What could be an easy way to avoid such situation? Calling setSoTimeout() with a custom FtpConnectionFactory? Am I missing a configuration or parameter somewhere?
Thanks by advance.
EDIT: I am using Mule CE Standalone 3.5.0, which seems to use Apache Commons Net 2.0. But looking in the code, Mule CE Standalone 3.7 with Commons Net 2.2 does not seem different. Here are the source codes involved: