Thoughts From My Life
Dec
05

Using Mod Rewrite - Search Engines and Dynamic Links

Written by Neil Galloway
 

I learned my first big lesson yesterday about how search engines work (at least how Google works). I was all happy because Google had finally indexed my site. Unfortunately, when I searched for it and did some google statistics, it appeared like there was only one page.

The link thoughtsfrommylife.com/index.php was the only one showing up. There were no direct links to any of my articles, even if a did really specific google search string, it will still only show the home page link. So I had to do some investigation.

Google and Dynamic Links

So I find out that Google (along with other search engines) does not index dynamic links. What is a dynamic link? It a link to a web page that includes request parameters at the end of the url which the page uses to do something special with (thus called dynamic). For example, my website was using urls like this:

http://thoughtsfrommylife.com/viewarticle.php?articleid=5&title=My Favorite Article

So you can see from my link that the file viewarticle.php was being given the request parameter articleid. I was then using the articleid (in this case "5") to load up article number 5 and display it. I also through the title of the article in there because I thought this would improve my PageRank with Google. It did anything but.

Dynamic links make for simple programming. You just have to create one page and then add a bit of code to load up the different text each time it loads to make it look likes it own page. Unfortunately these links do not work well with Google.

Let me clarify as well. Google indexed the pages behind these links, but it did not index the links themselves. So my site was showing up in searches, but the google link would only go to the homepage and not the pages beneath.

Solution

The solution to this is to rename the pages to not have dynamic links, but I still wanted to use my viewarticle.php file. The solution is a module called mod_rewrite. It allows you to interpret a link that someone is trying to access on your server and apply some rules to it to change it to the link you want to handle. For example, my site now converts a link like this.

http://thoughtsfrommylife.com/article-5-My_Favorite_Article/
to
http://thoughtsfrommylife.com/viewarticle.php?articleid=5&title=My Favorite Article

As you can see, the links on my site now look like they are accessing a folder (with the title so nicely there), but when someone clicks that link it turns it into the original urls I was using. I really have one page that displays all the pages for my site, but now it looks like there are 50 different pages.

How Do I Do This?

First, make sure mod_rewrite is available from your hosting provider. It is a package for Apache Web Server, so make sure they are running that. If they are using IIS there are other solutions I believe (but I'm not sure).

Click here to see Apache's page for the mod_rewrite module.

Second, add the the following line into the .htaccess file in the root web directory of your site.

RewriteEngine On

Third, create a rule to rewrite your links. It uses a form of regular expressions to do the matching for your urls. Basically just write out the pattern you want having text that will never change and putting text that will change inside round brackets and use a regular expression on the inside to define what can be seen. Here is an example from the links I used above.

RewriteRule ^article-([0-9]*)-([A-Z|a-z|0-9|-|_]*)viewarticle.php?articleid=$1&title=$2 [L,NC]

This looks confusing but bear with me. So we are defining a rewrite rule (you can have as many of these as you want). They use a form of regular expression. You can read up on it at the mod_rewrite page at Apache's website.

  • The ^ at the beginning, means "at the beginning of the line".
  • article- will show up at the front of all matching urls. This means after the hostname (http://thoughtsfrommylife.com/article-...).
  • ([0-9]*) This expression is in round brackets so we are defining a regular expression here. The square brackets means I'm defining a range for a single character. So I am saying any number between 0 and 9. The * immediately after means this can happen as many times as necessary. Basically, this expression means any number will match (postive, integer number that is).
  • At the end of the line there is -([A-Z|a-z|0-9|-|_]*). This means there will be a hyphen (-) followed by the the matching regular expression. This regular expression uses square brackets and inside defines multiple ranges. The | between them means OR. So you can read it as saying, "match anything that is between uppercase A and Z, lowercase a and z, 0 and 9, -, or _". And the * at the end means as many of these that are strung together. Basically I was just trying to cover off whatever kind of titles I would have for my articles.
  • You will also notice $1 and $2. This means, rewrite out the first and second pattern matches. So it writes out the article id and the title for me.
  • The L and NC on the end are extra directives the tell mod_rewrite how to process this. L means this the "last" rule if it matches. Don't try to match any more. NC means "no case" or case insensitive. Uppercase and lowercase do not matter.
Test your new links to make sure they are getting rewritten correctly.

Last, you need to make sure the links on your web page are correct. Remove any of the dynamic links and use your new replacement link instead.

You should be done. Just regenerate your Google sitemap and wait for Google to reindex now.

I'll post an update in a couple days letting you know my own results.

If you enjoyed this post, then make sure you subscribe to my RSS feed or subscribe for email updates. Only one email a day and only if there was a new post.

Virus and Spyware Removal Service

Digg!

Related Posts

Earning Revenue From Your Website - Week 1
Earning Revenue From Your Website - Week 4
Earning Revenue From Your Website - Week 3
Keywords - Targeting Keywords To Improve Traffic
Adding an RSS or XML Feed to Your Website

Email this article

Category: Computers


1 Comments

Zheng Says:
2007-09-22 12:11:34
Great article, it helps me a lot! Thanks man!

Add a Comment

Note: Comments will be visible after they have been moderated.
Name:

Email: (Never made public)

Web Page:
(include http:// or https://)
Comment:


Verification: