I'm making a java server to process HTTP requests using only Socket
class, since my professor said we couldn't use the HTTP libraries (because the goal is to learn the HTTP...). So, I decided to process the requests using regex. The first thing that happens on the code is that it gets each line of the request and transform it into a single string which I process with patterns. I only need to implement cases for: GET, POST, PUT, HEAD, DELETE. I'm using the app Postman, a Google Chrome extension to test my program. Here are some examples of requests coming from postman after I made it into a single string:
Get:
GET / HTTP/1.1 Host: 127.0.0.1:15000 Connection: keep-alive Cache-Control: no-cache User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.101 Safari/537.36 Postman-Token: dd87e652-2b21-3632-30ad-ace26581d369 Accept: / Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8
Post without body:
POST / HTTP/1.1 Host: 127.0.0.1:15000 Connection: keep-alive Content-Length: 0 Cache-Control: no-cache Origin: chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.101 Safari/537.36 Postman-Token: 8094b5ce-4b3d-cee7-2d10-f5dd2bc6b7b2 Accept: / Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.8
Post with body:
POST / HTTP/1.1 Host: 127.0.0.1:15000 Connection: keep-alive Content-Length: 9 Postman-Token: 3fb2f5e0-2df1-5af4-7853-e9de84648dd5 Cache-Control: no-cache Origin: chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.101 Safari/537.36 Content-Type: text/plain;charset=UTF-8 Accept: / Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.8
Etc...
The pattern I wrote is:
String somethingPattern = "(.*)?";
String ipPattern = "(((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))|"+somethingPattern+")((:)\\d{3,})?"; // regex for ip varying from 0.0.0.0 to 255.255.255.255 or some string, followed or no by : and a port number
String objetoPattern = "([/?a-zA-Z0-9\\.\\-_]+)"; // regex for a linux path to a file, including only letters, numbers and -_.
String connectionPattern = "(connection:\\s*"+somethingPattern+")?";
String contentLenPattern = "(content-length:\\s*([0-9]+))?";
String postmanTokenPattern = "(postman-token:\\s*"+somethingPattern+")?";
String cacheControlPattern = "(cache-control:\\s*"+somethingPattern+")?";
String originPattern = "(origin:\\s*"+somethingPattern+")?";
String userAgentPattern = "(user-agent:\\s*"+somethingPattern+")?";
String charsetPattern = "(charset="+somethingPattern+")?";
String contentTypePattern = "(content-type:\\s*"+somethingPattern+";"+charsetPattern+")?";
String acceptPattern = "(accept:\\s*"+somethingPattern+")?";
String acceptEncodingPattern = "(accept-encoding:\\s*"+somethingPattern+")?";
String acceptLanguagePattern = "(accept-language:\\s*"+somethingPattern+")?";
// (?i) is for the case of coming get, Get, GET... etc...
String pattern = "^(?i)(get|put|head|post|delete)\\s+?" + objetoPattern + "\\s+?HTTP/1.1\\s+?host:\\s+?" + ipPattern + "\\s+?" + connectionPattern + "\\s+?" + contentLenPattern + "\\s+?" + postmanTokenPattern + "\\s+?" + cacheControlPattern + "\\s+?" + originPattern + "\\s+?" + userAgentPattern + "\\s+?" + contentTypePattern + "\\s+?" + acceptPattern + "\\s+?" + acceptEncodingPattern + "\\s+?" + acceptLanguagePattern + "\\s+?$";
The regex is matching and grouping fine for most of the request except from GET, HEAD and a POST without a body. I don't know why this is happening. I put a ?
in the end of each pattern just for the case that, for example, a origin
, content-length
or something like being not present in the request. But even though it is not matching these cases. The part of the code of the matching is:
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(in); // this in is the input string that is the request all joined in a single line string
if(m.find()){
// ......
} else {
System.out.println("Input didn't match");
}
EDIT: The part of the code that process the input from the Socket:
bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
String in = "";
while((msgDoSocket = bufferedReader.readLine()) != null){
try {
in += msgDoSocket + " ";
if(msgDoSocket.isEmpty()){
processaInput(in); // this calls the part that process regex
}
} catch (Exception ex) {
Logger.getLogger(ServerThread.class.getName()).log(Level.SEVERE, null, ex);
}
}