2
votes

We run some web services.

We use ModSecurity for Apache webserver with the OWASP core rule set.

We have problems with greek and russian requests, because of cyrillic and greek letters.

In the rules of OWASP CRS there are patterns like

"(^[\"'´’‘;]+|[\"'´’‘;]+$)"

In the ModSecurity Log there are UTF-8 code units where should be unicode characters. All ascii letters are shown as characters as should be.

Example:

[Matched Data: \x85 2 \xce\xb7\xce\xbb\xce\xb9\xce\xbf\xcf\x85\xcf\x80\xce found within ARGS:q: 163 45 \xcf\x83\xce\xbf\xcf\x85\xce\xbd\xce\xb9\xce\xbf\xcf\x85 2 \xce\xb7\xce\xbb\xce\xb9\xce\xbf\xcf\x85\xcf\x80\xce\xbf\xce\xbb\xce\xb7]

[Pattern match "(?i:(?:[\"'\\xc2\\xb4\\xe2\\x80\\x99\\xe2\\x80\\x98]\\\\s*?(x?or|div|like|between|and)\\\\s*?[\\"'\xc2\xb4\xe2\x80\x99\xe2\x80\x98]?\\d)|(?:\\\\x(?:23|27|3d))|(?:^.?[\"'\\xc2\\xb4\\xe2\\x80\\x99\\xe2\\x80\\x98]$)|(?:(?:^[\\"'\xc2\xb4\xe2\x80\x99\xe2\x80\x98\\\\]*?(?:[\\ ..."]

Now we know that it was triggered by a request in greek: σουνιου ηλιουπολη (a street in Athen) Thats not our problem. We can figure that out.

The problem is that x80 is part of the character ’ (e2 80 99) and x80 is also part of a greek letter, thats why we get a false positive.

The actual rule that was triggered:

SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|!REQUEST_COOKIES:/_pk_ref/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "(?i:(?:[\"'´’‘]\s*?(x?or|div|like|between|and)\s*?[\"'´’‘]?\d)|(?:\\x(?:23|27|3d))|(?:^.?[\"'´’‘]$)|(?:(?:^[\"'´’‘\\]?(?:[\d\"'´’‘]+|[^\"'´’‘]+[\"'´’‘]))+\s*?(?:n?and|x?x?or|div|like|between|and|not|\|\||\&\&)\s*?[\w\"'´’‘][+&!@(),.-])|(?:[^\w\s]\w+\s?[|-]\s*?[\"'´’‘]\s*?\w)|(?:@\w+\s+(and|x?or|div|like|between|and)\s*?[\"'´’‘\d]+)|(?:@[\w-]+\s(and|x?or|div|like|between|and)\s*?[^\w\s])|(?:[^\w\s:]\s*?\d\W+[^\w\s]\s*?[\"'`´’‘].)|(?:\Winformation_schema|table_name\W))" "phase:2,capture,t:none,t:urlDecodeUni,block,msg:'Detects classic SQL injection probings 1/2',id:'981242',tag:'OWASP_CRS/WEB_ATTACK/SQL_INJECTION',logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}',severity:'2',setvar:'tx.msg=%{rule.id}-%{rule.msg}',setvar:tx.sql_injection_score=+1,setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},setvar:'tx.%{tx.msg}-OWASP_CRS/WEB_ATTACK/SQLI-%{matched_var_name}=%{tx.0}'"

For a workaround we adjusted some patterns like [\"'´’‘] to (\"|'||\xc2\xb4|\xe2\x80\x99|\xe2\x80\x98) so it matches the actual combinations of UTF-8 code units that build a character. We could do this for all 55 SQL Injection Rules of the Core Rule Set, but this is a heavy time consuming task.

We wonder if there is just a misconfiguration with the decoding of Apache or ModSecurity. We know all non-ascii and some ascii characters as well are URL encoded with % and UTF-8 by the webbrowsers.

1
The OWASP CRS mailing list (lists.owasp.org/mailman/listinfo/…) is quite good for this sort of thing. Post the answer here if they help you figure it out. - Barry Pollard

1 Answers

2
votes

I don't think it's a decoding problem, that looks as expected to me, and your (annoyingly verbose) fix is fine if it is known that the application you are protecting treats all its URL input as UTF-8. (It wouldn't be ‘right’ for something that used Windows-1252, for example, as it would start to let through again.)

Alternatively you could remove the smart-quote filtering entirely, assuming you are not trying to protect an application specifically known to have SQL-injection issues as well as poor Unicode handling. The smart quotes are in there because if an application flattens then to ASCII using a platform function which maps non-ASCII characters to ASCII, like Windows's misguided ‘best fit’ mappings, they could get converted to single quotes, thus evading a preceding WAF filter that tried to remove those. (It seems to me the rule fails to include some other characters that would get flattened to quotes, such as U+02B9, U+02BC, U+02C8, U+2032 and U+FF07, so it's probably already not watertight in any case.)

TBH this is par for the course for mod_security CRS rules; especially for sites that use arbitrary strings in path parts you get lots of false positives, and the larger part of deploying tools like this is configuring them to avoid the worst of the damage.

IMO: WAFs are fundamentally flawed in principle (as it's impossible to define what input might constitute an attack vs a valid request), and the default CRS is more flawed than most. They're useful as a tactical measure to block known attacks against software you can't immediately fix at source, but as a general-purpose input filter they typically cause more problems than they fix.