0
votes

I'm making a java server to process HTTP requests using only Socket class, since my professor said we couldn't use the HTTP libraries (because the goal is to learn the HTTP...). So, I decided to process the requests using regex. The first thing that happens on the code is that it gets each line of the request and transform it into a single string which I process with patterns. I only need to implement cases for: GET, POST, PUT, HEAD, DELETE. I'm using the app Postman, a Google Chrome extension to test my program. Here are some examples of requests coming from postman after I made it into a single string:

Get:

GET / HTTP/1.1 Host: 127.0.0.1:15000 Connection: keep-alive Cache-Control: no-cache User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.101 Safari/537.36 Postman-Token: dd87e652-2b21-3632-30ad-ace26581d369 Accept: / Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8

Post without body:

POST / HTTP/1.1 Host: 127.0.0.1:15000 Connection: keep-alive Content-Length: 0 Cache-Control: no-cache Origin: chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.101 Safari/537.36 Postman-Token: 8094b5ce-4b3d-cee7-2d10-f5dd2bc6b7b2 Accept: / Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.8

Post with body:

POST / HTTP/1.1 Host: 127.0.0.1:15000 Connection: keep-alive Content-Length: 9 Postman-Token: 3fb2f5e0-2df1-5af4-7853-e9de84648dd5 Cache-Control: no-cache Origin: chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.101 Safari/537.36 Content-Type: text/plain;charset=UTF-8 Accept: / Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.8

Etc...

The pattern I wrote is:

    String somethingPattern = "(.*)?";

    String ipPattern = "(((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))|"+somethingPattern+")((:)\\d{3,})?"; // regex for ip varying from 0.0.0.0 to 255.255.255.255 or some string, followed or no by : and a port number 
    String objetoPattern = "([/?a-zA-Z0-9\\.\\-_]+)"; // regex for a linux path to a file, including only letters, numbers and -_.

    String connectionPattern = "(connection:\\s*"+somethingPattern+")?";
    String contentLenPattern = "(content-length:\\s*([0-9]+))?";
    String postmanTokenPattern = "(postman-token:\\s*"+somethingPattern+")?";
    String cacheControlPattern = "(cache-control:\\s*"+somethingPattern+")?";
    String originPattern = "(origin:\\s*"+somethingPattern+")?";
    String userAgentPattern = "(user-agent:\\s*"+somethingPattern+")?";
    String charsetPattern = "(charset="+somethingPattern+")?";
    String contentTypePattern = "(content-type:\\s*"+somethingPattern+";"+charsetPattern+")?";
    String acceptPattern = "(accept:\\s*"+somethingPattern+")?";
    String acceptEncodingPattern = "(accept-encoding:\\s*"+somethingPattern+")?";
    String acceptLanguagePattern = "(accept-language:\\s*"+somethingPattern+")?";


    // (?i) is for the case of coming get, Get, GET... etc...
    String pattern = "^(?i)(get|put|head|post|delete)\\s+?" + objetoPattern + "\\s+?HTTP/1.1\\s+?host:\\s+?" + ipPattern + "\\s+?" + connectionPattern + "\\s+?" + contentLenPattern + "\\s+?" + postmanTokenPattern + "\\s+?" + cacheControlPattern + "\\s+?" + originPattern + "\\s+?" + userAgentPattern + "\\s+?" + contentTypePattern + "\\s+?" + acceptPattern + "\\s+?" + acceptEncodingPattern + "\\s+?" + acceptLanguagePattern + "\\s+?$";

The regex is matching and grouping fine for most of the request except from GET, HEAD and a POST without a body. I don't know why this is happening. I put a ? in the end of each pattern just for the case that, for example, a origin, content-length or something like being not present in the request. But even though it is not matching these cases. The part of the code of the matching is:

Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(in); // this in is the input string that is the request all joined in a single line string

if(m.find()){
// ......
} else {
  System.out.println("Input didn't match");
}

EDIT: The part of the code that process the input from the Socket:

bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream()));

        String in = "";
        while((msgDoSocket = bufferedReader.readLine()) != null){
            try {
                in += msgDoSocket + " ";
                if(msgDoSocket.isEmpty()){
                    processaInput(in); // this calls the part that process regex
                }
            } catch (Exception ex) {
                Logger.getLogger(ServerThread.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
1

1 Answers

2
votes

The header lines are separated by newlines, and the header is separated from the body (if present) with 2 consecutive newlines. You should use a Scanner object as it's using newlines to separate tokens by default, much easier than the Matcher. You could simply iterate through those lines. When you got those headerlines, you could slice them by the ':' to form a Map instead of a million type of variables to cover all the header key possibilities. Then you could simply check the map key-values to match what you sent.

Also you could use Fiddler/Wireshark to see the raw request by postman.

This answer using reader and doing the same you want.