David Kitchen

Avatar

Just another SharePoint developer blogging

mod_rewrite + mod_proxy + spaces in URI = boom!

So I have a piece of .htaccess magic on one of the sites that I run.

What this particular set of instructions do is allow me to run a second web server of a different version behind my Apache 1.3 installation, and still have it appear to the end user as if I only run one web server, basically it’s a proxy. I like this, this is good.

The .htaccess rule I setup is for Trac and specifically the tracd daemon, and my rules look like this:
RewriteEngine On
RewriteRule ^trac_common/(.*)$ http://127.0.0.1:8080/trac_common/$1 [P]
RewriteRule ^projects/?(.*)$ http://127.0.0.1:8080/$1 [P]
RewriteRule ^trac(.*)$ http://127.0.0.1:8080/trac$1 [P]

The [P] at the end of those mod_rewrite lines is telling mod_rewrite to use mod_proxy to handle the request and forward the request internally to the web server running on the high port.

Nice and simple, if a request comes in which has the url /projects/ go to the top Trac page.
If a request comes in for the project named trac with the url /trac/ go to that projects’ Trac site.

This works OK, except that the code browser within Trac is accessing the Subversion repository, and within SVN some of the directory paths and files have spaces in them.

What happens is that under the hood a request ends up looking like this:
GET /trac/browser/root/path with spaces in HTTP/1.1

That’s wrong as only two spaces can exist in a GET request, and those spaces delimit the request type, request URI and the protocol.

So somewhere within mod_rewrite and mod_proxy the spaces are not being escaped as they should, because the request should look like this:
GET /trac/browser/root/path%20with%20spaces%20in HTTP/1.1

So, how do you go about replacing spaces using mod_rewrite?

Much googling will tell you that mod_rewrite does not allow you to perform regexp or string replacements.

However, you don’t need to. mod_rewrite already has a RewriteMap feature and we can use that.

If we take our rewrite rules out of the .htaccess file and place them in our httpd.conf file we can use the built-in escape function:
RewriteMap escape int:escape
RewriteEngine On
RewriteRule ^/trac_common/(.*)$ http://127.0.0.1:8080/trac_common/${escape:$1} [P]
RewriteRule ^/projects/?(.*)$ http://127.0.0.1:8080/${escape:$1} [P]
RewriteRule ^/trac(.*)$ http://127.0.0.1:8080/trac${escape:$1} [P]

That now correctly escapes spaces in the matched URL’s and the source browser connected to SVN now works perfectly.

Hope this saves someone else some time.

4 Comments, Comment or Ping

  1. Peter

    That is nicely explained. Of course I only have access to .htaccess so I couldn’t actually use it. I guess the issue is insolvable with only htaccess files.

  2. Bloutiouf

    In fact the issue is solvable if you add a PHP file on your first server. Let me explain:

    My site use “Search Engine Optimization” uri, i.e. “http://site/foo/barr” and not “http://site/index.php?firstarg=foo&secondarg=bar”. You can easily turn the second syntax into the first one. I mainly use the .htaccess file Joomla provides :

    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_URI} !^/index.php
    RewriteRule (.*) index.php/$1

    In the file “index.php”, you can access to the extra path with the variable $_SERVER['PATH_INFO']. For the example it would look like “/foo/bar”. This string is not escaped. But you can do with the function urlencode. Then, but I didn’t try, you can either put header(‘Location: http://newserveur‘.urlencode($_SERVER['PATH_INFO'])) or put readfile(‘http://newserveur’.urlencode($_SERVER['PATH_INFO'])), I don’t guess which is the correct, I think it depends if your SVN software understands header requests.

    Here is. I hope this works :)

  3. I’m pretty sure this is the only website anywhere that details how to fix this problem. Thank you very much.

    Separately, do you have any resources online that you consult for other tricks using mod_rewrite? I’m having trouble writing a rule allowing Splunk on the side of my website.

    Best…

  4. Thank you for this article. I found it very helpful.

Reply to “mod_rewrite + mod_proxy + spaces in URI = boom!”