5
votes

I have a static site with the following file/folder structure:

  • index.html
  • /foobar/
    • index.html
    • bob.html
    • alice.html

I'd like to achieve the following:

  • remove all .html extensions. ✔ works
  • remove index.html (resp. index). ✔ works
  • I want files to end without trailing slash. ✔ works
    • if someone adds a trailing slash, redirect to URL without trailing slash. ✘ doesn't work
  • I want "folders" (actually index.html files inside a folder) to end without trailing slash. ✘ doesn't work
    • if someone adds a trailing slash, redirect to URL without trailing slash. ✘ doesn't work

So the following URLs should work:

  • example.com/ (actually: /index.html)
  • example.com/foobar (actually: /foobar/index.html)
  • example.com/foobar/bob (actually: /foobar/bob.html)
  • example.com/foobar/alice (actually: /foobar/alice.html)

The following requests should redirect (301):

  • example.com/foobar/ redirects to: example.com/foobar)
  • example.com/foobar/bob/ redirects to: example.com/foobar/bob)
  • example.com/foobar/alice/ redirects to: example.com/foobar/alice)

I see that this would create a problem when a file /foobar.html exists: when someone visits /foobar, it is not clear whether the directory or the file is requested. However, I will make sure that this never happens.


At the moment, I have this .htaccess:

# Turn MultiViews off. (MultiViews on causes /abc to go to /abc.ext.) 
Options +FollowSymLinks -MultiViews

# It stops DirectorySlash from being processed if mod_rewrite isn't. 
<IfModule mod_rewrite.c>

    # Disable mod_dir adding missing trailing slashes to directory requests.
    DirectorySlash Off

    RewriteEngine On

    # If it's a request to index(.html) 
    RewriteCond %{THE_REQUEST} \ /(.+/)?index(\.html)?(\?.*)?\  [NC]
    # Remove it. 
    RewriteRule ^(.+/)?index(\.html)?$ /%1 [R=301,L]

    # Add missing trailing slashes to directories if a matching .html does not exist. 
    # If it's a request to a directory. 
    RewriteCond %{SCRIPT_FILENAME}/ -d
    # And a HTML file does not (!) exist.
    RewriteCond %{SCRIPT_FILENAME}.html !-f
    # And there is not trailing slash redirect to add it. 
    RewriteRule [^/]$ %{REQUEST_URI}/ [R=301,L]

    # Remove HTML extensions. 
    # If it's a request from a browser, not an internal request by Apache/mod_rewrite. 
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    # And the request has a HTML extension. Redirect to remove it. 
    RewriteRule ^(.+)\.html$ /$1 [R=301,L]

    # If the request exists with a .html extension. 
    RewriteCond %{SCRIPT_FILENAME}.html -f
    # And there is no trailing slash, rewrite to add the .html extension. 
    RewriteRule [^/]$ %{REQUEST_URI}.html [QSA,L]

</IfModule>

What would I have to change/remove/add in my .htaccess? I don't understand much of it. I tried to remove the block commented "Add missing trailing slashes to directories if a matching .html does not exist", but this didn't help.

1
FWIW, I asked over at Code Review for possible improvements: codereview.stackexchange.com/q/18440/16414unor

1 Answers

4
votes

Right above your # Add missing trailing slashes to directories if a matching .html does not exist. rule, try adding this rule that redirects when there is an html file and the request is NOT a directory AND there's a trailing slash:

# if request has a trailing slash
RewriteCond %{REQUEST_URI} ^/(.*)/$
# but it isn't a directory
RewriteCond %{DOCUMENT_ROOT}/%1 !-d
# and if the trailing slash is removed and a .html appended to the end, it IS a file
RewriteCond %{DOCUMENT_ROOT}/%1.html -f
# redirect without trailing slash
RewriteRule ^ /%1 [L,R=301]

This shouldn't conflict with the redirect rule following it because its conditions check for the exact opposite.


EDIT:

To handle the index.html thing, you need to change this rule that you have, which is appending the trailing slashes:

# Add missing trailing slashes to directories if a matching .html does not exist. 
# If it's a request to a directory. 
RewriteCond %{SCRIPT_FILENAME}/ -d
# And a HTML file does not (!) exist.
RewriteCond %{SCRIPT_FILENAME}.html !-f
# And there is not trailing slash redirect to add it. 
RewriteRule [^/]$ %{REQUEST_URI}/ [R=301,L]

To:

# Add missing trailing slashes to directories if a matching .html does not exist. 
# If it's a request to a directory. 
RewriteCond %{REQUEST_FILENAME}/ -d
# And a HTML file does not (!) exist.
RewriteCond %{REQUEST_FILENAME}/index.html !-f
# And there is not trailing slash redirect to add it. 
RewriteRule [^/]$ %{REQUEST_URI}/ [R=301,L]    

This checks that the index.html file is missing from the directory before adding the trailing slash. The reason you must have this is because of the information disclosure security issue when missing the trailing slash will actually expose all of your directory contents if you don't have the trailing slash. Now, add these rules to remove the trailing slash when there's an index.html:

RewriteCond %{REQUEST_FILENAME} -d
# And a HTML file exists.
RewriteCond %{REQUEST_FILENAME}/index.html -f
# And there is a trailing slash redirect to remove it. 
RewriteRule ^(.*?)/$ /$1 [R=301,L]    

Now add these rules right after to explicitly display the index.html when there is no trailing slash (note no R=301 in the rule's flags):

RewriteCond %{REQUEST_FILENAME} -d
# And a HTML file exists.
RewriteCond %{REQUEST_FILENAME}/index.html -f
# And there is no trailing slash show the index.html. 
RewriteRule [^/]$ %{REQUEST_URI}/index.html [L]