1
votes

Sorry for my bad English.

I have a log file from a Web server with 120,000 lines.

Example of input file:

10.160.0.10;16.11.2011 12:56;/;-;"Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0" 10.160.0.100;14.11.2011 7:22;/;-;"Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0" 10.160.0.100;14.11.2011 10:45;/;-;"Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0" 10.160.0.100;14.11.2011 10:53;/;-;"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)"

I need to compare the IP address in the first line with IP in the second line and at the same time to compare the last box that contains the version of the web browser with version in the second line. And second line with third line etc.

If first IP is same as second IP and together first version is same as second version then add to the end of line info example #1 (that will be mean that it is first user)

If IP or version are different then add to the end of line #2 (second user).

It identifies users based on IP address and User-Agent field (based on different versions of a web browser).

Example of ouput file:

10.160.0.10;16.11.2011 12:56;/;-;"Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0";#1 10.160.0.100;14.11.2011 7:22;/;-;"Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0";#2 10.160.0.100;14.11.2011 10:45;/;-;"Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0";#2 10.160.0.100;14.11.2011 10:53;/;-;"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)";#3

Do you have any idea how to do this?

Which method to use?

Thank you for help.

2
Look up "Regular Expression". This is one way to do this. - it would be my preferred way - DThought
Unrelated, but doing something like this in a scripting language like Perl would be a cinch. Some 3-4 lines of code perhaps. - Anirudh Ramanathan
I think that Microsoft Log Parser (technet.microsoft.com/en-us/scriptcenter/dd919274.aspx) would be an excellent tool for this kind of job. - Henrik Aasted Sørensen
you can use substring() to get the IP address if you are new to "Regular Expression" - Chaitanya K
I'm a bit confused about the And so I identify the users phrase. Do you want to parse the file like this for every user? You should go for a DB solution then. Use regexp or whatever to load the data into the DB once and be happy. - svz

2 Answers

2
votes

This is not complete nor anywhere near optimal, but is basically everything you need.

List list = new ArrayList();
Scanner in = new Scanner(file);
while(in.hasNext()) {
    String line = in.nextLine();
    String[] splitLine = line.split(";",5);
    String identifier = splitLine[0] + splitLine[4];
    if(list.contains(identifier)) {
        line = line + " #" + (list.indexOf(identifier) + 1));
    }
    else {
        line = line + " #" + (list.size() + 1);
        list.add(identifier);
    }
    System.out.println(line);
}
0
votes

String.split method, use ; as character for splitting.