1
votes

I have a table in which I store a log of every request to a web site. Every time a page is requested, a record is inserted. I now want to analyze the data in the log to detect possible automated (non-human) requests. The criteria I need to use is x number of requests within y seconds by an individual user.

So, the data looks like this:

| Page          | UserId | Date                           |
| /Page1.htm | 001      | 2014-06-02 11:03 AM |
| /Page2.htm | 001      | 2014-06-02 11:03 AM |
| /Page1.htm | 002      | 2014-06-02 11:04 AM |
| /Page3.htm | 001      | 2014-06-02 11:04 AM |
| /Page2.htm | 002      | 2014-06-02 11:05 AM |
| /Page4.htm | 001      | 2014-06-02 11:05 AM |
| /Page5.htm | 001      | 2014-06-02 11:07 AM |
| /Page3.htm | 002      | 2014-06-02 11:15 AM |

So, I wanted to get all UserIDs that made 5 or more requests within any 5 second timespan. How can I get that? Is this even possible with SQL alone?

I don't have access to the web server logs or anything else other than the SQL Server database.

3
What is the version of your SQL Server? - PM 77-1

3 Answers

2
votes

Here is the query which you are looking for:

SELECT
    T1.Page, 
    T1.UserId, 
    T1.Date, 
    MIN(T2.Date) AS Date2, 
    DATEDIFF(minute, T1.Date, MIN(T2.Date)) AS DaysDiff,
    COUNT(*) RequestCount
FROM
    [STO24541450] T1 LEFT JOIN [STO24541450] T2
    ON T1.UserId = T2.UserId AND T2.Date > T1.Date
GROUP BY
    T1.Page, T1.UserId, T1.Date
HAVING 
    DATEDIFF(minute, T1.Date, MIN(T2.Date)) >= 5 AND COUNT(*) >= 5;
0
votes

I would probably group by the time range and UserId and grab any with a count greater than 5.

select count(*),
       UserId,
       dateadd(SECOND, DATEDIFF(SECOND, '01-jan-1970', [date])/5*5, '01-jan-1970') 
from   [LogTable]
group by UserId, DATEDIFF(SECOND, '01-jan-1970', [date])/5
having count(1) > 5

The above will return the same UserId for each period in which the user has made more than 5 requests. If you're only interested in the userId's and not when or how many times they breached the conditions you could simplify the above to

select distinct(UserId)
from   [LogTable]
group by UserId, DATEDIFF(SECOND, '01-jan-1970', [date])/5
having count(1) > 5
-1
votes

This is untested but should hopefully give you an idea on how to proceed.

SELECT UserId, COUNT(UserId) as AccessCount
     (SELECT UserId FROM AccessLogTable WHERE Date BETWEEN startDate AND endDate) a
GROUP BY UserId HAVING COUNT(UserId) > x

What we're doing here is using a subquery to first select the subset of records (your y criteria) that are of interest. This will then be encapsulated in an external query where you would do a group by UserId having count(userid) > x.