Home Data Tutorials Amazon SimpleDB versus RDBMS

Amazon SimpleDB versus RDBMS

June 8, 2010 - 12:00 am

1507

6 min read

(For more resources on SimpleDB, see here.)

We have all used a Relational Database Management System (RDBMS) at some point in our careers. These relational databases are ubiquitous and are available from a wide range of companies such as Oracle, Microsoft, IBM, and so on. These databases have served us well for our application needs. However, there is a new breed of applications coming to the forefront in the current Internet-driven and socially networked economy. The new applications require large scaling to meet demand peaks that can quickly reach massive levels. This is a scenario that is hard to satisfy using a traditional relational database, as it is impossible to requisition and provision the hardware and software resources that will be needed to service the demand peaks. It is also non-trivial and difficult to scale a normal RDBMS to hundreds or thousands of nodes. The overwhelming complexity of doing this makes the RDBMS not viable for these kinds of applications. SimpleDB provides a great alternative to an RDBMS and can provide a solution to all these problems. However, in order to provide this solution, SimpleDB makes some choices and design decisions that you need to understand in order to make an informed choice about the data storage for your application domain.

No normalization

Normalization is a process of organizing data efficiently in a relational database by eliminating redundant data, while at the same time ensuring that the data dependencies make sense. SimpleDB data models do not conform to any of the normalization forms, and tend to be completely de-normalized. The lack of need for normalization in SimpleDB allows you a great deal of flexibility with your model, and enables you to use the power of multi-valued attributes in your data.

Let’s look at a simple example of a database starting with a basic spreadsheet structure and then design it for an RDBMS and a SimpleDB. In this example, we will create a simple contact database, with contact information as raw data.

ID	First_Name	Last_Name	Phone_Num
101	John	Smith	555-845-7854
101	John	Smith	555-854-9885
101	John	Smith	555-695-7485
102	Bill	Jones	555-748-7854
102	Bill	Jones	555-874-8654

The obvious issue is the repetition of the name data. The table is inefficient and would require care to update to keep the name data in sync. To find a person by his or her phone number is easy.

SELECT * FROM Contact_Info WHERE Phone_Num = '555-854-9885'

So let’s analyze the strengths and weaknesses of this database design.

SCORE-Raw data	Strength	Weakness
Efficient storage		No
Efficient search by phone number	Yes
Efficient search by name		No
Easy-to-add another phone number	Yes

The design is simple, but as the name data is repeated, it would require care to keep the data in sync. Searching for phone numbers by name would be ugly if the names got out of sync.

To improve the design, we can rationalize the data. One approach would be to create multiple phone number fields such as the following. While this is a simple solution, it does limit the phone numbers to three. Add e-mail and Twitter, and the table becomes wider and wider.

ID	First_Name	Last_Name	Phone_Num_1	Phone_Num_2	Phone_Num_3
101	John	Smith	555-845-7854	555-854-9885	555-695-7485
102	Bill	Jones	555-748-7854	555-874-8654

Finding a person by a phone number is ugly.

SELECT * FROM Contact_Info WHERE Phone_Num_1 = '555-854-9885'
OR Phone_Num_2 = '555-854-9885'
OR Phone_Num_3 = '555-854-9885'

Now let’s analyze the strengths and weaknesses of this database design.

SCORE-Rationalize data	Strength	Weakness
Efficient storage	Yes
Efficient search by phone number		No
Efficient search by name	Yes
Easy to add another phone number		No

The design is simple, but the phone numbers are limited to three, and searching by phone number involves three index searches.

Another approach would be to use a delimited list for the phone number as follows:

ID	First_Name	Last_Name	Phone_Nums
101	John	Smith	555-845-7854;555-854-9885;555-695-7485
102	Bill	Jones	555-748-7854;555-874-8654

This approach has the advantage of no data repetition and is easy to maintain, compact, and extendable, but the only way to find a record by the phone number is with a substring search.

SELECT * FROM Contact_Info WHERE Phone_Nums LIKE %555-854-9885%

This type of SQL forces a complete table scan. Do this with a small table and no one will notice, but try this on a large database with millions of records, and the performance of the database will suffer.

SCORE-Delimited Data	Strength	Weakness
Efficient storage	Yes
Efficient search by phone number		No
Efficient search by name	Yes
Easy to add another phone number	Yes

A delimited field is good for data that is of one type and will only be retrieved.

The normalization for relational databases results in splitting up your data into separate tables that are related to one another by keys. A join is an operation that allows you to retrieve the data back easily across the multiple tables.

Let’s first normalize this data.

This is the Person_Info table:

ID	First_Name	Last_Name
101	John	Smith
102	Bill	Jones

And this is the Phone_Info table:

ID	Phone_Num
101	555-845-7854
101	555-854-9885
101	555-695-7485
102	555-748-7854
102	555-874-8654

Now a join of the Person_Info table with the Phone_Info can retrieve the list of phone numbers as well as the e-mail addresses. The table structure is clean and other than the ID primary key, no data is duplicated. Provided Phone_Num is indexed, retrieving a contact by the phone number is efficient.

SELECT First_Name, Last_Name, Phone_num, Person_Info.ID
FROM Person_Info JOIN Phone_Info
ON Person_Info.ID = Phone_Info.ID
WHERE Phone_Num = '555-854-9885'

So if we analyze the strengths and weaknesses of this database design, we get:

SCORE-Relational Data	Strength	Weakness
Efficient storage	Yes
Efficient search by phone number	Yes
Efficient search by name	Yes
Easy to add another phone number	Yes

While this is an efficient relational model, there is no join command in SimpleDB. Using two tables would force two selects to retrieve the complete contact information. Let’s look at how this would be done using the SimpleDB principles.

No joins

SimpleDB does not support the concept of joins. Instead, SimpleDB provides you with the ability to store multiple values for an attribute, thus avoiding the necessity to perform a join to retrieve all the values.

ID
101	First_Name=John	Last_Name=Smith	Phone_Num =555-845-7854 Phone_Num =555-854-9885 Phone_Num =555-695-7485
102	First_Name=Bill	Last_Name=Jones	Phone_Num =555-748-7854 Phone_Num =555-874-8654

In the SimpleDB table, each record is stored as an item with attribute/value pairs. The difference here is that the Phone_Num field has multiple values. Unlike a delimited list field, SimpleDB indexes all values enabling an efficient search each value.

SELECT * FROM Contact_Info WHERE Phone_Num = '555-854-9885'

This SELECT is very quick and efficient. It is even possible to use Phone_Num multiple times such as follows:

SELECT * FROM Contact_Info WHERE Phone_Num = '555-854-9885'
OR Phone_Num = '555-748-7854'

Let’s analyze the strengths and weaknesses of this approach:

SCORE-SimpleDB Data	Strength	Weakness
Efficient storage	Yes
Efficient search by phone number	Yes
Efficient search by name	Yes
Easy to add another phone number	Yes

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Amazon SimpleDB versus RDBMS

No normalization

No joins

LEAVE A REPLY Cancel reply

MobilePro

datapro

Programming

Subscribe to our newsletter