(For more resources related to this topic, see here.)

The importance of search engine optimization

Every day, web crawlers scrape the Internet for updates on new content to update their associated search engines. People's immediate reaction to finding web pages is to load a query on a search engine and select the first few results. Search engine optimization is a set of practices used to maintain and improve search result ranks over time.

Item 1 – using keywords effectively

In order to provide information to web crawlers, websites provide keywords in their HTML meta tags and content. The optimal procedure to attain effective keyword usage is to:

Come up with a set of keywords that are pertinent to your topic

Research common search keywords related to your website

Take an intersection of these two sets of keywords and preemptively use them across the website

Once this final set of keywords is determined, it is important to spread them across your website's content whenever possible. For instance, a ski resort in California should ensure that their website includes terms such as California, skiing, snowboarding, and rentals. These are all terms that individuals would look up via a search engine when they are interested in a weekend at a ski resort. Contrary to popular belief, the keywords meta tag does not create any value for site owners as many search engines consider it a deprecated index for search relevance. The reasoning behind this goes back many years to when many websites would clutter their keywords meta tag with irrelevant filler words to bait users into visiting their sites. Today, many of the top search engines have decided that content is a much more powerful indicator for search relevance and have concentrated on this instead.

However, other meta tags, such as description, are still being used for displaying website content on search rankings. These should be brief but powerful passages to pull in users from the search page to your website.

Item 2 – header tags are powerful

Header tags (also known as h-tags) are often used by web crawlers to determine the main topic of a given web page or section. It is often recommended to use only one set of h1 tags to identify the primary purpose of the web page, and any number of the other header tags (h2, h3, and so on) to identify section headings.

Item 3 – make sure to have alternative attributes for images

Despite the recent advance in image recognition technology, web crawlers do not possess the resources necessary for parsing images for content through the Internet today. As a result, it is advisable to leave an alt attribute for search engines to parse while they scrape your web page. For instance, let us suppose you were the webmaster of Seattle Water Sanitation Plant and wished to upload the following image to your website:

best-practices-modern-web-applications-img-0

Since web crawlers make use of the alt tag while sifting through images, you would ideally upload the preceding image using the following code:

<img src = "flow_chart.png"
  alt="Seattle Water Sanitation Process Flow Chart" />

This will leave the content in the form of a keyword or phrase that can help contribute to the relevancy of your web page on search results.

Item 4 – enforcing clean URLs

While creating web pages, you'll often find the need to identify them with a URL ID. The simplest way often is to use a number or symbol that maps to your data for simple information retrieval. The problem with this is that a number or symbol does not help to identify the content for web crawlers or your end users.

The solution to this is to use clean URLs. By adding a topic name or phrase into the URL, you give web crawlers more keywords to index off. Additionally, end users who receive the link will be given the opportunity to evaluate the content with more information since they know the topic discussed in the web page. A simple way to integrate clean URLs while retaining the number or symbol identifier is to append a readable slug, which describes the topic, to the end of the clean URL and after the identifier. Then, apply a regular expression to parse out the identifier for your own use; for instance, take a look at the following sample URL:

http://www.example.com/post/24/golden-dragon-review

The number 24, when parsed out, helps your server easily identify the blog post in question. The slug, golden-dragon-review, communicates the topic at hand to both web crawlers and users.

While creating the slug, the best practice is often to remove all non-alphanumeric characters and replace all spaces with dashes. Contractions such as can't, don't, or won't, can be replaced by cant, dont, or wont because search engines can easily infer their intended meaning. It is important to also realize that spaces should not be replaced by underscores as they are not interpreted appropriately by web crawlers.

Item 5 – backlink whenever safe and possible

Search rankings are influenced by your website's clout throughout websites that search engines deem as trustworthy. For instance, due to the restrictive access of .edu or .gov domains, websites that use these domains are deemed trustworthy and given a higher level of authority when it comes down to search rankings. This means that any websites that are backlinked on trustworthy websites are seen at a higher value as a result.

Thus, it is important to often consider backlinking on relevant websites where users would actively be interested in the content. If you choose to backlink irrelevantly, there are often consequences that you'll face, as this practice can often be caught automatically by web crawlers while comparing the keywords between your link and the backlink host.

Item 6 – handling HTTP status codes properly

Server errors help the client and server communicate the status of page requests in a clean and consistent manner. The following chart will review the most important server errors and what they do:

Status Code

Alias

Effect on SEO

200

Success

This loads the page and the content is contributed to SEO

301

Permanent redirect

This redirects the page and the redirected content is contributed to SEO

302

Temporary redirect

This redirects the page and the redirected content doesn't contribute to SEO

404

Client error

(not found)

This loads the page and the content does not contribute to SEO

500

Server error

This will not load the page and there is no content to contribute to SEO

In an ideal world, all pages would return the 200 status code. Unfortunately, URLs get misspelled, servers throw exceptions, and old pages get moved, which leads to the need for other status codes. Thus, it is important that each situation be handled to maximize communication to both web crawlers and users and minimize damage to one's search ranking.

When a URL gets misspelled, it is important to provide a 301 redirect to a close match or another popular web page. This can be accomplished by using a clean URL and parsing out an identifier, regardless of the slug that follows it. This way, there exists content that contributes directly to the search ranking instead of just leaving a 404 page.

Server errors should be handled as soon as possible. When a page does not load, it harms the experience for both users and web crawlers, and over an extended period of time, can expire that page's rank.

Lastly, the 404 pages should be developed with your users in mind. When you choose not to redirect them to the most relevant link, it is important to either pass in suggested web pages or a search menu to keep them engaged with your content.

The connect-rest-test Grunt plugin can be a healthy addition to any software project to test the status codes and responses from a RESTful API. You can find it at https://www.npmjs.org/package/connect-rest-test.

Alternatively, while testing pages outside of your RESTful API, you may be interested in considering grunt-http-verify to ensure that status codes are returned properly. You can find it at https://www.npmjs.org/package/grunt-http-verify.

Item 7 – making use of your robots.txt and site map files

Often, there exist directories in a website that are available to the public but should not be indexed by a search engine. The robots.txt file, when placed in your website's root, helps to define exclusion rules for web crawling and prevent a user-defined set of search engines from entering certain directories.

For instance, the following example disallows all search engines that choose to parse your robots.txt file from visiting the music directory on a website:

User-agent: * Disallow: /music/

While writing navigation tools with dynamic content such as JavaScript libraries or Adobe Flash widgets, it's important to understand that web crawlers have limited capability in scraping these. Site maps help to define the relational mapping between web pages when crawlers cannot heuristically infer it themselves. On the other hand, the robots.txt file defines a set of search engine exclusion rules, and the sitemap.xml file, also located in a website's root, helps to define a set of search engine inclusion rules. The following XML snippet is a brief example of a site map that defines the attributes:

<?xml version="1.0" encoding="utf-8"?>
 
<urlset >
    <url>
        <loc>http://example.com/</loc>
        <lastmod>2014-11-24</lastmod>
        <changefreq>always</changefreq>
        <priority>0.8</priority>
    </url>
    <url>
        <loc>http://example.com/post/24/golden-dragon-review</loc>
        <lastmod>2014-07-13</lastmod>
        <changefreq>never</changefreq>
        <priority>0.5</priority>
    </url>
</urlset>

The attributes mentioned in the preceding code are explained in the following table:

Attribute

Meaning

loc

This stands for the URL location to be crawled

lastmod

This indicates the date on which the web page was last modified

changefreq

This indicates the page is modified and the number of times the crawler should visit as a result

priority

This indicates the web page's priority in comparison to the other web pages

Using Grunt to reinforce SEO practices

With the rising popularity of client-side web applications, SEO practices are often not met when page links do not exist without JavaScript. Certain Grunt plugins provide a workaround for this by loading the web pages, waiting for an amount of time to allow the dynamic content to load, and taking an HTML snapshot. These snapshots are then provided to web crawlers for search engine purposes and the user-facing dynamic web applications are excluded from scraping completely.

Some examples of Grunt plugins that accomplish this need are: