11 min read

(For more resources on Ruby, see here.)

We start off with an easy application, a simple yet very useful Internet application, URL shorteners. We will take a quick tour of URL shorteners before jumping into the design of a simple URL shortener, followed by an in-depth discussion of how we clone our own URL shortener, Tinyclone.

All about URL shorteners

Internet applications don’t always need to be full of features or cover all aspects of your Internet life to be successful. Sometimes it’s ok to be simple and just focus on providing a single feature. It doesn’t even need to be earth-shatteringly important—it should be just useful enough for its target users. The archetypical and probably most extreme example of this is the URL shortening application or URL shortener.

This service offers a very simple but surprisingly useful feature. It provides a shorter URL that represents a normally longer URL. When a user goes to the short URL, he will be redirected to the original URL. For this simple feature, top three most popular URL shortening services (TinyURL, bit.ly, and is.gd) collectively had about 11 million unique visitors, 110 million page views and a reach of about one percent of the Internet in June 2009. In 2008, the most popular URL shortener at that time, TinyURL, was made one of Time Magazine’s Top 50 Best Websites.

The idea to shorten long and unwieldy URLs into shorter, more manageable ones has been around for some time. One of the earlier attempts to make it a public service is Make A Shorter Link (MASL), which appeared around July 2001. MASL did just that, though the usefulness was debatable as the domain name was long and the shortened URL could potentially be longer than the original.

However, the pioneering site that popularized this concept (and subsequently bought over MASL and a few other similar sites) is TinyURL. TinyURL was launched in January 2002 by Kevin Gilbertson to help him to link directly to newsgroup postings which frequently had long URLs. It rapidly became one of the most popular URL shorteners around. In 2008, an estimated 100 similar services came to existence in various forms.

URLs or Uniform Resource Locators are resource identifiers that specify where identified resources are available and how they can be retrieved. A popular term for URL is a Web address. Every URL is made up of the following:

<resource type>://<username>:<password>@<domain>:<port
>/<file path name>?<query string>#<anchor>

Not all parts of the URL are required by a browser, if the resource type is missing, it is normally assumed to be http, if the port is missing, it is normally assumed to be 80 (for http). The username, password, query string and anchor components are optional.

Initially, TinyURL and similar types of URL shorteners focused on simply providing a short representative URL to their users. Naturally the competitive breadth for shortening URLs was rather well, short. Many chose TinyURL over MASL because TinyURL had a shorter and easier to remember domain name (http://tinyurl.com over http://makeashorterlink.com)

Subsequent competition over this space intensified and extended to providing various other features, including custom short URLs (TinyURL, bit.ly), analysis of click-through statistics (bit.ly), advertisements (Adjix, Linkbee), preview pages (TinyURL, is.gd) and so on.

The explosive growth of Twitter (from June 2008 to June 2009, Twitter grew 1,164%) opened a new chapter for URL shorteners. Twitter chose a limit of 140 characters for each tweet to accommodate the 160 characters in an SMS message (Twitter was invented as a service for people to use SMS to tell small groups what they are doing). With Twitter’s popularity skyrocketing, came the need for users to shorten URLs to fit into the 140 characters limit. Originally Twitter used TinyURL as its default URL shortener and this triggered a steep climb in the usage of TinyURL during the early days of Twitter.

However, in May 2009, bit.ly replaced TinyURL as Twitter’s default URL shortener and the impact was immediate. For the first time in that period, TinyURL recorded a drop in the number of users in May 2009, dropping from 6.1 million to 5.3 million unique users, while bit.ly jumped from 1.8 million to 2.9 million almost overnight. That’s not the end of the story though. In April 2010 during Twitter’s Chirp conference, Twitter announced its own URL shortener (twt.tl). As of writing it is still unclear the market share will pan out but it’s clear that URL shorteners have good value and everyone is jumping into this market. In December 2009, Google came up with its own two URL shorteners goo.gl and youtu.be. Amazon.com (amzn.to), Facebook (fb.me) and WordPress (wp.me) all have their own URL shorteners as well.

Next, let’s do a quick review of why URLs shorteners are so popular and why they attract criticism as well.

Here’s a quick summary of the benefits:

  • Create short and easy to remember URLs
  • Allow passing of links in character-limited services such as Twitter
  • Create vanity URLs for marketing purposes
  • Can verbally pass URLs

The most obvious benefit of having a shortened URL is that it’s, well, short. A typical example of an URL gone bad is a link to a location in Google Maps:

http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&q=singapore +flyer&vps=1&jsv=169c&sll=1.352083,103.819836&sspn=0.68645,1.382904&g =singapore&ie=UTF8&latlng=8354962237652576151&ei=Shh3SsSRDpb4vAPsxLS3 BQ&cd=1&usq=Singapore+Flyer

Such URLs are meant to be clicked on as it is virtually impossible to pass it around verbally. It might be justifiable if the URL is cut and pasted on documents, but sometimes certain applications will truncate parts of the URL while processing. This makes a long URL difficult to click on and even produces erroneous links. In fact, this was the main motivation in creating most of the earlier URL shorteners—older email clients tend to truncate URLs when they are more than 80 characters.

Short links are of course crucial in character-limited message passing systems like Twitter, Plurk, and SMS. Passing long URLs is impossible without URL shorteners.

Short URLs are very useful in cases of vanity URLs where for example, the Google Maps link above could be shortened to http://tinyurl.com/singapore-flyer. Such vanity URLs are useful when passing from one person to another, or even when using in a mass marketing way. Sticking to the maps theme in our examples, if you want to give a Google Maps link to your restaurant and put it up in catalogs and brochures, you will not want to give the long URL. Instead you would want a nice, descriptive and short URL.

Short URLs are also useful in cases of accessibility. For example, reading out the Google Maps link above is almost impossible, but reading out the TinyURL link (vanity or otherwise) is much easier in comparison.

Many popular URL shorteners also provide some form of statistics and analytics on the usage of the links. This feature allows you to track your short URLs to see how many clicks it received and what kind of patterns can be derived from the clicks. Although the metrics are usually not advanced, they do provide basic usefulness.

On the other hand, URL shorteners have it fair share of criticisms as well. Here is a summary of the bad side of URL shorteners:

  • Provides opportunity to spammers because it hide original URLs
  • Could be unreliable if dependent on it for redirection
  • Possible undesirable or vulgar short URLs

URL shorteners have security issues. When a URL shortener creates a short URL, it effectively hides the original link and this provides opportunity for spammers or other abusers to redirect users to their sites. One relatively mild form of such attack is ‘rickrolling’. Rickrolling uses a classic bait-and-switch trick to redirect users to a Rick Astley music video of Never Gonna Give You Up. For example, you might feel that the URL http://tinyurl.com/singapore-flyer goes to Google Map, but when you click on it, you might be rickrolled and redirected to that Rick Astley music video instead.

Also, because most short URLs are not customized, it is quite difficult to see if the link is genuine or not just from the URL. Many prominent websites and applications have such concerns, including MySpace, Flickr and even Microsoft Live Messenger, and have one time or another banned or restricted usage of TinyURL because of this problem. To combat spammers and fraud, URL shortening services have come up with the idea of link previews, which allows users to preview a short URL before it redirects the user to the long URL. For example TinyURL will show the user the long URL on a preview page and requires the user to explicitly go to the long URL.

Another problem is performance and reliability. When you access a website, your browser goes to a few DNS servers to resolve the address, but the URL shortener adds another layer of indirection. While DNS servers have redundancy and failsafe measures, there is no such assurance from URL shorteners. If the traffic to a particular link becomes too high, will the shortening service provider be able to add more servers to improve performance or even prevent a meltdown altogether? The problem of course lies in over-dependency on the shortening service.

Finally, a negative side effect of random or even customized short URLs is that undesirable, vulgar or embarrassing short URLs can be created. Earlier on TinyURL short URLs were predictable and it was exploited, such as embarrassing short URLs that were made to redirect to the White House websites of then U.S. Vice President Dick Cheney and Second Lady Lynne Cheney.

We have just covered significant ground on URL shorteners. If you are a programmer you might be wondering, “Why do I need to know such information? I am really interested in the programming bits, the others are just fluff to me.”

Background information on the application we want to clone is very important. It tells us what why that application exists in the first place and gives us an idea what are the main features (what makes it popular). It also tells us what problems it faces such that we are aware of the problem while programming it, or even avoid it altogether. This is important when we come to the design of the application. Finally it gives us better appreciation of the application and the motivations and issues faced by the product and technical people behind the application we wish to clone.

Main features

Next, let’s list down the features of a URL shortener. The intention in this section is to distill the basic features of the application, features that define the service. Features listed here will be features that make the application what it is.

However, as much as possible we want to also explore some additional features that extend the application and are provided by many of its competitors. Most importantly, the features here are mostly features of the most popular and definitive web application in the category. In this article, this will be TinyURL.

These are the main features of a URL shortener:

  • Users can create a short URL that represents a long URL
  • Users who visit the short URL will be redirected to the long URL
  • Users can preview a short URL to enable them to see what the long URL is
  • Users can provide a custom URL to represent the long URL
  • Undesirable words are not allowed in the short URL
  • Users are able to view various statistics involving the short URL, including the number of clicks and where the clicks come from (optional, not in TinyURL)

URL shorteners are simple web applications and the one that we will design and build will also be simple.

Designing the clone

Cloning TinyURL is relatively simple but there is some thought behind the design of the application. We will be building a clone of TinyURL called Tinyclone, which will be hosted at the domain http://tinyclone.saush.com.

Creating a short URL for each long URL

The domain of the short URL is fixed. What’s left is the file path name. We need to represent the long URL with a unique file path name (a key), one for each long URL. This means we need to persist the relationship between the key and the URL.

One of the ways we can associate the long URL with a unique key is to hash the long URL and use the resulting hash as the unique key. However, the resulting hash might be long and hashing functions could be slow.

The faster and easier way is to use a relational database’s auto-incremented row ID as the unique key. The database will help ensure the uniqueness of the ID. However, the running row ID number is base 10. To represent a million URLs would already require 7 characters, to represent 1 billion would take up 9 characters. In order to keep the number of characters smaller, we will need a larger base numbering system.

In this clone we will use base 36, which is 26 characters of the alphabet (case insensitive) and 10 numbers. Using this system, we will only need 5 characters to represent 1 million URLs:

1,000,000 base 36 = lfls

And 1 billion URLs can be represented in just six characters:

1,000,000,000 base 36 = gjdgxs

LEAVE A REPLY

Please enter your comment!
Please enter your name here