Background
I am using Nginx with mod_zip (http://wiki.nginx.org/NgxZip, on Github here) as a proxy to stream data from Amazon's S3 into an archive.
mod_zip takes a manifest of space-separated arguments. Newlines delimit new files.
The format for each line is:
CRC[- for unknown] size location filename
Example of a two-line manifest:
- 4 /pro-core.com/prostore/9228407_foobar.txt?AWSAccessKeyId=key&Expires=sometime&Signature=signed foo/foobar.txt
- 288134 /pro-core.com/prostore/9228400_38.png?AWSAccessKeyId=key&Expires=soon&Signature=signed bar/38.png
This would create an archive with 2 directories:
|- foo
| |- foobar.txt
|- bar
|- 38.png
My nginx.conf file:
user nginx;
worker_processes 1;
error_log /usr/local/nginx/logs/error.log debug;
events { worker_connections 1024; }
http { include mime.types; default_type application/octet-stream;
#access_log logs/access.log main; keepalive_timeout 0; sendfile off; gzip off; server { listen 8008; server_name localhost; #charset koi8-r; #access_log logs/host.access.log main; root html; location / { internal; } location /pro-core.com/ { internal; proxy_pass http://s3.amazonaws.com; proxy_buffering off; proxy_buffers 2 4m; proxy_buffer_size 4m; proxy_busy_buffers_size 4m; } location /download/ { proxy_pass http://localhost:3000/utilities/s3_manifest_for_nginx_to_zip_and_stream/; proxy_redirect off; proxy_buffering off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header Accept-Encoding identity; } }
}
Problem
mod_zip seems to do something funny with parentheses. Some of my S3 keys have parentheses in them, like 9228403_foobar (1).txt
. I cannot change the S3 keys. In the manifest, I have a line that looks like:
- 4 /pro-core.com/prostore/9228403_foobar%20%281%29.txt?AWSAccessKeyId=key&Expires=time&Signature=signed foo/foobar _1_.txt
Notice the location URL is escaped. When I try and open the resulting archive, it is corrupt. Sad panda. Looking at the nginx error log, I am getting a 403 from S3 for the file with parens in the key:
From nginx log:
"GET /pro-core.com/prostore/9228403_foobar%20(1).txt?AWSAccessKeyId=key&Expires=time&Signature=signed HTTP/1.0
...
[debug] : *31 http proxy status 403 "403 Forbidden"
Notice that the parens in the location URL are no longer escaped. I verified that the URL was the problem by performing a vanilla "GET" via curl.
$ curl -v http://s3.amazonaws.com/..._foobar%20(1).txt?...
=> 403 Forbidden -- SignatureDoesNotMatch
$ curl -v http://s3.amazonaws.com/..._foobar%20%281%29.txt?...
=> 200 OK -- contents of foobar (1).txt
Question
Is there any way to change something in my app, or tell nginx or mod_zip to not un-escape my URLs?
proxy_pass http://s3.amazonaws.com;
nginx shall pass the original $request_uri to the upstream server. So the most possible case is that the bad un-escaping happens in your app server. - Chuan Ma