Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

WordPress: Avoiding the Black Hat Techniques

Packt

600 min read
2011-04-26 00:00:00

0 Likes
0 Comments

WordPress 3 Search Engine Optimization

wordpress-avoiding-black-hat-techniques-img-0

Optimize your website for popularity with search engines

Read more about this book

(For more resources on WordPress, see here.)

Typical black hat techniques

There is a wide range of black hat techniques fully available to all webmasters. Some techniques can improve rankings in short term, but generally not to the extent that legitimate web development would, if pursued with the same effort. The risk of black hat techniques is that they are routinely detected and punished. Black hat is never the way to go for a legitimate business, and pursuing black hat techniques can get your site (or sites) permanently banned and will also require you to build an entirely new website with an entirely new domain name. We will examine a few black hat techniques to help you avoid them.

Hidden text on web pages

Hidden text is the text that through either coding or coloring does not appear to users, but appears to search engines. Hidden text is a commonly-used technique, and would be better described as gray hat. It tends not to be severely punished when detected. One technique relies on the coloring of elements. When the color of a text element is set to the same color as the background (either through CSS or HTML coding), then the text disappears from human readers while still visible to search spiders. Unfortunately, for webmasters employing this technique, it's entirely detectible by Google.

More easily detectible is the use of the CSS property display: none. In the language of CSS, this directs browsers to not display the text that is defined by that element. This technique is easily detectible by search engines. There is an obvious alternative to employing hidden text: Simply use your desired keywords in the text of your content and display the text to both users and search spiders.

Spider detection, cloaking, redirection, and doorway pages

Cloaking and spider detection are related techniques. Cloaking is a black hat SEO technique whereby the content presented to search engine spiders (via search spider detection) differs from the content presented to users. Who would employ such a technique? Cloaking is employed principally by sellers of products typically promoted by spam, such as pharmaceutics, adult sites, and gambling sites. Since legitimate search traffic is difficult to obtain in these niches, the purveyors of these products employ cloaking to gain visitors.

Traditional cloaking relies upon spider detection. When a search spider visits a website, the headers accompanying a page view request identify the spider by names such as Goolgebot (Google's spider) or Slurp (Inktomi's spider). Conversely, an ordinary web browser (presumably with a human operator) will identify itself as Mozilla, Internet Explorer, or Safari, as the case may be. With simple JavaScript or with server configuration, it is quite easy to identify the requesting browser and deliver one version of a page to search spiders and another version of the page to human browsers. All you really need is to know the names of the spiders, which are publicly known.

A variation of cloaking is a doorway page. A doorway page is a page through which human visitors are quickly redirected (through a meta refresh or JavaScript) to a destination page. Search spiders, however, index the doorway page, and not the destination page. Although the technique differs in execution, the effect is the same: Human visitors see one page, and the search engines see another.

The potential harm from cloaking goes beyond search engine manipulation. More often than not, the true destination pages in a cloaking scheme are used for the transmission of malware, viruses, and Trojans. Because the search engines aren't necessarily reading the true destination pages, the malicious code isn't detected. Any type of cloaking, when reported or detected, is almost certain to result in a severe Google penalty, such as removal of a site from the search engine indexes.

Linking to bad neighborhoods and link farms

A bad neighborhood is a website or a network of websites that either earns inbound links through illegitimate means or employs other "black hat on-page" techniques such as cloaking, and redirects them. A link farm is a website that offers almost no content, but serves solely for the purpose of listing links. Link farms, in turn, offer links to other websites to increase the rankings of these sites.

A wide range of black hat techniques can get a website labeled as a bad neighborhood. A quick test you can employ to determine if a site is a bad neighborhood is by entering the domain name as a part of the specialized Google search query, "site:the-website-domain.com" to see if Google displays any pages of that website in its index. If Google returns no results, the website is either brand new or has been removed from Google's index—a possible indicator that it has been labeled a bad neighborhood. Another quick test is to check the site's PageRank and compare the figure to the number of inbound links pointing to the site. If a site has a large number of backlinks but has a PageRank of zero, which would tend to indicate that its PageRank has been manually adjusted downwards due to a violation of Google's Webmaster Guidelines.

If both of the previous tests are either positive or inconclusive, you would still be wise to give the site a "smell test". Here are some questions to ask when determining if a site might be deemed as a bad neighborhood:

Does the site offer meaningful content?

Did you detect any redirection while visiting the site?

Did you get any virus warning while visiting the site?

Is the site a little more than lists of links or text polluted with high numbers of links?

Check the website's backlink profile. Are the links solely low-value inbound links?

If it isn't a site you would engage with when visiting, don't link to it.

Google Webmaster Guidelines

Google Webmaster Guidelines are a set of written rules and prohibitions that outline recommended and forbidden website practices. You can find these webmaster guidelines at: http://www.google.com/support/webmasters/bin/ answer.py?hl=en&answer=35769, though you'll find it easier to search for "Google Webmaster Guidelines" and click on the top search result.

You should read through the Google Webmaster Guidelines and refer to them occasionally. The guidelines are divided into design and content guidelines, technical guidelines, and quality guidelines.

Google Webmaster Guidelines in a nutshell

At their core, Google Webmaster Guidelines aim for quality in the technology underlying websites in their index, high-quality content, and also discourage manipulation of search results through deceptive techniques. All search engines have webmaster guidelines, but if you follow Google's dictates, you will not run afoul of any of the other search engines. Here, we'll discuss only the Google's rules.

Google's design and content guidelines instruct that your site should have a clear navigational hierarchy with text links rather than image links. The guidelines specifically note that each page "should be reachable from at least one static text link". Because WordPress builds text-based, hierarchical navigation naturally, your site will also meet that rule naturally. The guidelines continue by instructing that your site should load quickly and display consistently among different browsers.

The warnings come in Google's quality guidelines; in this section, you'll see how Google warns against a wide range of black hat techniques such as the following:

Using hidden text or hidden links, elements that through coloring, font size, or CSS display properties to show to the search engines but do not show them to the users.

The use of cloaking or "sneaky redirects". Cloaking means a script that detects search engine spiders and displays one version of a website to users and displays an alternate version to the search engines.

The use of repetitive, automated queries to Google. Some unscrupulous software vendors (Google mentions one by name, WebPosition Gold, which is still in the market, luring unsuspecting webmasters) sell software and services that repeatedly query Google to determine website rankings. Google does allow such queries in some instances through their AJAX Search API Key—but you need to apply for one and abide by the terms of its use.

The creation of multiple sites or pages that consist solely of duplicate content that appears on other web properties.

The posting or installation of scripts that behave maliciously towards users, such as with viruses, trojans, browser interceptors, or other badware.

Participation in link schemes. Google is quite public that it values inbound links as a measure of site quality, so it is ever vigilant to detect and punish illegitimate link programs.

Linking to bad neighborhoods. A bad neighborhood means a website that uses illegitimate, forbidden techniques to earn inbound links or traffic.

Stuffing keywords onto pages in order to fool search spiders. Keyword stuffing is "the oldest trick in the book". It's not only forbidden, but also highly ineffective at influencing search results and highly annoying to visitors.

When Google detects violations of its guidelines

Google, which is nearly an entirely automated system, is surprisingly capable of detecting violations of its guidelines. Google encourages user-reporting of spam websites, cloaked pages, and hidden text (through their page here: https://www. google.com/webmasters/tools/spamreport). They maintain an active antispam department that is fully engaged in an ongoing improvement in both, manual punishments for offending sites, and algorithmic improvements for detecting violations.

When paid link abuses are detected, Google will nearly always punish the linking site, not necessarily the site receiving the link—even though the receiving site is the one earning a ranking benefit. At first glance, this may seem counter-intuitive, but there is a reason. If Google punished the site receiving a forbidden paid link, then any site owner could knock a competitor's website by buying a forbidden link, pointing to the competitor, and then reporting the link as spam.

When an on-page black hat or gray hat element is detected, the penalty will be imposed upon the offending site. The penalties range from a ranking adjustment to an outright ban from search engine results. Generally, the penalty matches the crime; the more egregious penalties flow from more egregious violations.

We need to draw a distinction, however, between a Google ban, penalty, and algorithmic filtering. Algorithmic filtering is simply an adjustment to the rankings or indexing of a site. If you publish content that is a duplicate of the other content on the Web, and Google doesn't rank or index that page, that's not a penalty, it's simply the search engine algorithm operating properly. If all of your pages are removed from the search index, that is most likely a ban. If the highest ranking you can achieve is position 40 for any search phrase, that could potentially be a penalty called a "-40 penalty". All search engines can impose discipline upon websites, but Google is the most strict and imposes far more penalties than the other search engines, so we will largely discuss Google here.

Filtering is not a penalty; it is an adjustment that can be remedied by undoing the condition that led to the it. Filtering can occur for a variety of reasons but is often imposed following over optimization. For example, if your backlink profile comprises links of which 80% use the same anchor text, you might trigger a filter. The effect of a penalty or filter is the same: decreased rankings and traffic. In the following section, we'll look at a wide variety of known Google filters and penalties, and learn how to address them.