3
votes

I am trying out Apache HTTPClient 4.3.6 connection pool manager to increase the throughput of my HTTP calls. My assumption is, HTTPClient implementation in general is using persistence connection. The result of my test code (included at the end), however, showed that multiple concurrent HTTP connections using JDK URLConnection perform better.

  1. How do I make HTTPClient fast?
  2. Does HTTPClient uses the same HTTP connection for http://localhost:9000/user/123 and http://localhost:9000/user/456?

Thanks

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.protocol.HttpClientContext;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;

public class FooTest {

public static void main(String[] args) throws Exception {
    runWithConnectionPool();
}

 private static String extract(BufferedReader reader) throws Exception {
     StringBuilder buffer = new StringBuilder();
     String line = null;
     while ((line = reader.readLine()) != null) {
         buffer.append(line);
     }
     return buffer.toString();
 }

 private static void runWithConnectionPool() throws Exception {
     PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
     cm.setMaxTotal(1);

     CloseableHttpClient httpClient = HttpClients.custom()
         .setConnectionManager(cm)
         .setMaxConnTotal(100)
         .setMaxConnPerRoute(100)
         .build();

     long start = System.currentTimeMillis();

     HttpGet getReq = new HttpGet("http://www.google.com");

     /*
       Option A: Using HTTP connection pool
       Option B: Individual JDK 8 URL connection
     */
 //    Thread[] workers = generateAndStart(10, httpClient, getReq, 0);                 // (A)
     Thread[] workers = generateAndStart(10, getReq.getURI().toURL(), 0);  // (B)

     for (int i = 0; i < workers.length; i++) {
         workers[i].join();
     }

     System.out.println("Elasped: " + (System.currentTimeMillis() - start));
 }

 private static Thread[] generateAndStart(int num, URL url, long delay) {
     Thread[] workers = new Thread[num];
     for (int i = 0; i < num; i++) {
         System.out.println("Starting worker: " + i);
         int j = i;
         workers[i] = new Thread(() -> connect(url, delay, j));
         workers[i].start();
     }
     return workers;
 }

 private static void connect(URL url, long delay, int ndx)  {
     try {
         System.out.println(url.toURI().toString() + " started.");
     } catch (Exception e) {
         e.printStackTrace();
     }

     try {

         URLConnection connection = url.openConnection();
         connection.addRequestProperty("Accept", "application/json");
         BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         while ((line = reader.readLine()) != null) {
             ObjectMapper mapper = new ObjectMapper();
             System.out.println(line);
         }
         if (delay > 0) {
             System.out.println("Delayed.");
             sleep(delay);
         }
         reader.close();
     } catch (Exception e) {
         e.printStackTrace();
     }
 }

 private static Thread[] generateAndStart(int num, CloseableHttpClient httpClient, HttpGet getReq, long delay) {
     Thread[] workers = new Thread[num];
     for (int i = 0; i < num; i++) {
         System.out.println("Starting worker: " + i);
         final int j = i;
         workers[i] = new Thread(() -> connect(httpClient, getReq, delay, j));
         workers[i].start();
     }
     return workers;
 }

 private static void connect(CloseableHttpClient httpClient, HttpGet request, long delay, int ndx) {
     System.out.println(request.getURI().toString() + " started.");

     try(
         CloseableHttpResponse response = httpClient.execute(request, HttpClientContext.create());
         BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent()))) {

         String line;
         while ((line = reader.readLine()) != null) {
             ObjectMapper mapper = new ObjectMapper();
             System.out.println(line);
         }
         if (delay > 0) {
             System.out.println("Delayed.");
             sleep(delay);
         }
     } catch (Exception e) {
         e.printStackTrace();
     }
 }

 private static void sleep(long delay) {
     try {
         Thread.sleep(delay);
     } catch (Exception e) {
         e.printStackTrace();
     }
 }
 }

Update 1 (28 March 2017)

I have made several observations and conclusions,

  1. JDK java.net.URLConnection does not make a connection until URLConnection.getInputStream() is called.
  2. java.net.URLConnection closes the current socket if bad connection happens e.g. HTTP error, and creates a new socket.
  3. Using the java.net.URLConnection instances created from the same java.net.URL instance in a multi threaded environment will create multiple sockets to the server. Instead, for simplicity, invoke URL.openConnection() in a synchronized block. On the same note, it does not mean every call to the URL.openConnection() will create a new socket. I believe URL regulates this. I was wrong. The number of sockets created is according to the number of threads that invoke URL.openConnection().
  4. It is mentioned in many places and I mention again, closing / disconnecting the URLConnection does not close the socket.
  5. Connecting to a different path of the same server does not create another connection socket. In other words, persistent connection is usable with different paths.
  6. Apache HTTPClient in general is easier to use and more intuitive. It supports persistent connection (uses the same socket for connection) in a multi threaded environment without user's intervention.
  7. I could not get URL to comply to http.maxConnections and http.keepAlive. For example, I include -Dhttp.keepAlive=false during runtime does not prevent Connection: keep-alive included in the HTTP headers.

My observations come from the examples I pasted here. They are better examples than the code pasted above.

1
As you defined the pool to manage up to 100 connections, I'm not sure if you already taking advantage of the pool as for your "test" the pool may create a new connection for each worker. The pool is most useful if many workers have to share a limited number of connections (i.e. DB connections which are sometimes limited to 5 concurrent access'; 8 in case of HTTP connections to the same domain (same origin policy) for a single HTTP 1.1 web application)Roman Vottner
I found out from the connection tool nettop, it only used one connection instead of tenthlim
Are you aware that java.net.HttpURLConnection also does connection pooling? There may be no point to this.user207421
I am not aware of that. I just discover URLConnection.connect(). I am testing my stack again. Thanks <rant> There are so many ways to "connect" e.g. URL.openConnection(), URLConnection.getInputStream() and URLConnection.connect() </rant>thlim
If you stopped ranting and started reading the Javadoc you would see that connect() is called automatically. You don't need to call it at all.user207421

1 Answers

2
votes

I found my answers after experimenting with JDK URLConnection and Apache HTTPClient.

  1. URLConnection is fast because it opens new sockets for each connection made by each thread to the server whilst Apache HTTPClient controls the number of sockets open according its setting in a multi threaded environment. When the sockets are limited to few, the total connection time taken is about the same for both HTTP libraries.
  2. Apache HTTPClient uses persistent connection to the same server for different URLs.

mitmproxy is a good and easy to use tool for HTTP connections verifications.