3
votes

We recently developed an application which will run a query in DB2 and send a mail to the corresponding recipient. It works well in our local system and QA region. But in production, few queries failed (even if it's rare, like once in week). It throws the exception below.

Exception InnerDetails:

ERROR [40003] [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "111.111.111.111". Communication function detecting the error: "recv". Protocol specific error code(s): "10004", "", "". SQLSTATE=08001

Since error occurs only in production and not very often, we are not sure whether it is the code or a setting issue. Do you have any idea?

3
Did you ever get to a resolution on this? We have this problem intermittently when accessing mainframe data and usually we just end up with retry logic.Paul G

3 Answers

3
votes

We recently discussed this issue with our IBM rep. After looking in their internal knowledge base, he suggested we add "Interrupt=0" to our connection string, based on recommendations given to other customers that had the same problem.

The default value for Interrupt was 1 before v10.5 FP2 and still is for most connections. They changed the default value to 2 for connections to z/OS (mainframe) in FP2.

We're using C# and the connection string properties for the IBM Data Server Driver for .Net can be found here. I'm sure there is a similar property for their drivers for other languages.

This page from the IBM docs goes into a bit more detail about the setting.

We haven't seen the issue since we recently added the property, but it was always intermittent so I can't yet confidently say that the problem is fixed. Time will tell...

2
votes

That particular error (SQL30081N) is just a generic message that indicates a network issue between your DB2 client and the server. In this case, you want to look at the Protocol specific error code(s). Here, it looks like you're on Windows, and that particular code (10004) isn't given in the IBM documentation.

So, if you google "windows network error codes", you'll find this page, which says:

WSAEINTR

10004

Interrupted function call.

A blocking operation was interrupted by a call to WSACancelBlockingCall.

Which links to this page with more information on that specific function (emphasis mine):

The WSACancelBlockingCall function has been removed in compliance with the Windows Sockets 2 specification, revision 2.2.0.

The function is not exported directly by WS2_32.DLL and Windows Sockets 2 applications should not use this function. Windows Sockets 1.1 applications that call this function are still supported through the WINSOCK.DLL and WSOCK32.DLL.

Blocking hooks are generally used to keep a single-threaded GUI application responsive during calls to blocking functions. Instead of using blocking hooks, an applications should use a separate thread (separate from the main GUI thread) for network activity.

I'm guessing that your application may be blocking for a longer time in your production application than your other environments, and something along the way is causing the interrupt.

Hopefully this leads you down the right path...

0
votes

I spent hours to solve the same problem and fixed it. I use a Windows exe (developed with C#.NET) to run a SELECT query from a DB2 database and I sometimes got this error. Finally I realized that my problem is a time out error. Error with protocol code "10004" message, sometimes occurs if query execution is longer than 30 seconds which is default timeout value. Maybe the interruption call on the "Windows Socket Error Codes" page occurs for time out mechanism. I add aline to set an acceptable timeout value and got rid off this annoying error. I hope it helps other. Here is my code fix :

 ... 
 connDb.Open(); 
 DB2Command cmdDb = new DB2Command(QueryText,connDb); 
 cmdDb.CommandTimeout = 300; //I added this line. 
 using (DB2DataReader readerDb = cmdDb.ExecuteReader()) 
 {
 ...