Let’s get started.
As smart as the Google spider is, it’s possible for them to miss pages on your site. Maybe you’ve got an orphaned page that isn’t in your navigation anymore. Or, perhaps you have moved a link to a piece of content so that it’s not easily accessible. It’s also possible that your site is so big that Google just can’t crawl it all without completely pulling all your server’s resources—not pretty!
The solution is a sitemap.
In the early 2000s, Google started supporting XML sitemaps. Soon after Yahoo came out with their own standard and other search engines started to follow suit. Fortunately, in 2006, Google, Yahoo, Microsoft, and a handful of smaller players all got together and decided to support the same sitemap specification. That made it much easier for site owners to make sure every page of their web site is crawled and added to the search engine index. They published their specification at http://sitemaps.org. Shortly thereafter, the Drupal community stepped up and created a module called (surprise!) the XML sitemap module. This module automatically generates an XML sitemap containing every node and taxonomy on your Drupal site. Actually, it was written by Matthew Loar as part of the Google Summer of Code. The Drupal 6 version of the module was developed by Kiam LaLuno. Finally, in mid-2009, Dave Reid began working on a version 2.0 of the module to address performance, scalability, and reliability issues. Thanks, guys!
According to www.sitemaps.org:
Sitemaps are an easy way for Webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata.
Using a sitemap does not guarantee that every page will be included in the search engines. Rather, it helps the search engine crawlers find more of your pages. In my experience, submitting an XML Sitemap to Google will greatly increase the number of pages when you do a site: search.
The keyword site: searches show you how many pages of your site are included in the search engine index, as shown in the following screenshot:
Setting up the XML Sitemap module
The XML Sitemap module creates a sitemap that conforms to the sitemap.org specification.
Which XML Sitemap module should you use?
There are two versions of the XML Sitemap module for Drupal 6. The 1.x version is, as of this writing, considered the stable release and should be used for production sites. However, if you have a site with more than about 2000 nodes, you should probably consider using the 2.x version. From www.drupal.org: ‘The 6.x-2.x branch is a complete refactoring with considerations for performance, scalability, and reliability. Once the 6.x-2.x branch is tested and upgradeable, the 6.x-1.x branch will no longer be supported’. What this means is that in the next few months (quite possibly by the time you’re reading this) everyone should be using the 2.x version of this module. That’s the beauty of open source software—there are always improvements coming that make your Drupal site better Search Engine Optimized.
Carry out the following steps to set up the XML Sitemap module:
- Download the XML Sitemap module and install it just like a normal Drupal module. When you go to turn on the module, you’ll be presented with a list that looks similar to the following screenshot:
Now that you have the XML sitemap module properly installed and configured, you can start defining the priority of the content on your site—by default, the priority is .5. However, there are times when you may want Google to visit some content more often and other times when you may not want your content in the sitemap at all (like the comment or contact us submission forms).
Each node now has an XML sitemap section that looks like the following screenshot:
Before you turn on any included modules, consider what pieces of content on your site you want to show up in the search engines and only turn on the modules you need.
- The XML sitemap module is required. Turn it on.
- XML sitemap custom allows you to add your own customized links to the sitemap. Turn it on.
- XML sitemap engines will automatically submit your sitemap to the search engines each time it changes. This is not necessary and there are better ways to submit your sitemap. However, it does a nice job of helping you verify your site with each search engine. Turn it on.
- XML sitemap menu adds your menu items to the sitemap. This is probably a good idea. Turn it on.
- XML sitemap node adds all your nodes. That’s usually the bulk of your content so this is a must-have. Turn it on.
- XML sitemap taxonomy adds all your taxonomy term pages to the sitemap. Generally a good idea but some might not want this listed. Term pages are good category pages so I recommend it. Turn it on.
- Don’t forget to click Save configuration.
- Go to http://www.yourDrupalsite.com/admin/settings/xmlsitemap or go to your admin screen and click on Administer | Site Configuration | XML sitemap link. You’ll be able to see the XML sitemap, as shown in the following screenshot:
- Click on Settings and you’ll see a few options, as shown in the following screenshot:
- Minimum sitemap lifetime: It determines that minimum amount of time that the module will wait before renewing the sitemap. Use this feature if you have an enormous sitemap that is taking too many server resources. Most sites should leave this set on No minimum.
- Include a stylesheet in the: The sitemaps will generate a simple css file to include with the sitemap that is generated. It’s not necessary for the search engines but very helpful for troubleshooting or if any humans are going to view the sitemap. Leave it checked.
- Generate sitemaps for the following languages: In the future, this option will allow you to actually specify sitemaps for different languages. This is very important for international sites who want to show up in localized search engines. For now, English is the only option and should remain checked.
- Click the Advanced settings drop-down and you’ll see several additional options.
- Number of links in each sitemap page allows you to specify how many links to pages on your web site will be in each sitemap. Leave it on Automatic unless you are having trouble with the search engines accepting the sitemap.
- Maximum number of sitemap links to process at once sets the number of additional links that the module will add to your sitemap each time the cron runs. This highlights one of the biggest differences between the new XML sitemap and the old one. The new sitemap only processes new nodes and updates the existing sitemap instead of reprocessing every time the sitemap is accessed. Leave this setting alone unless you notice that cron is timing out.
- Sitemap cache directory allows you to set where the sitemap data will be stored. This is data that is not shown to the search engines or users; it’s only used by the module.
- Base URL is the base URL of your site and generally should be left as it is.
- Click on the Front page drop-down and set these options:
- Front page priority: 1.0 is the highest setting you can give a page in the XML sitemap. On most web sites, the front page is the single most important part of your site so, this setting should probably be left at 1.0.
- Front page change frequency: Tells the search engines how often they should revisit your front page. Adjust this setting to reflect how often the front page of your site changes.
What is priority and how does it work?
Priority is an often-misunderstood part of a sitemap. For instance, the priority is only used to compare pages of your own site and you cannot increase your ranking in the Search Engine Results Page (SERPS) by increasing the priority of your pages. However, it does help let the search engines know which pages of your site you feel are more important. They could use this information to select between two different pages on your site when deciding which page to show to a search engine user.