We have our entire source structures defined in the Warehouse Builder. But before we can do anything with them, we need to design what our target data warehouse structure is going to look like. When we have that figured out, we can start mapping data from the source to the target. So, let’s design our target structure. First, we’re going to take a look at some design topics related to a data warehouse that are different from what we would use if we were designing a regular relational database.
Data Warehouse Design
When it comes to the design of a data warehouse, there is basically one option that makes the most sense for how we will structure our database and that is the dimensional model. This is a way of looking at the data from a business perspective that makes the data simple, understandable, and easy to query for the business end user. It doesn’t require a database administrator to be able to retrieve data from it.
We know the normalized method of modelling a database. A normalized model removes redundancies in data by storing information in discrete tables, and then referencing those tables when needed. This has an advantage for a transactional system because information needs to be entered at only one place in the database, without duplicating any information already entered. For example, in the ACME Toys and Gizmos transactional database, each time a transaction is recorded for the sale of an item at a register, a record needs to be added only to the transactions table. In the table, all details regarding the information to identify the register, the item information, and the employee who processed the transaction do not need to be entered because that information is already stored in separate tables. The main transaction record just needs to be entered with references to all that other information.
This works extremely well for a transactional type of system concerned with daily operational processing where the focus is on getting data into the system. However, it does not work well for a data warehouse whose focus is on getting data out of the system. Users do not want to navigate through the spider web of tables that compose a normalized database model to extract the information they need. Therefore, dimensional models were introduced to provide the end user with a flattened structure of easily queried tables that he or she can understand from a business perspective.
A dimensional model takes the business rules of our organization and represents them in the database in a more understandable way. A business manager looking at sales data is naturally going to think more along the lines of “how many gizmos did I sell last month in all stores in the south and how does that compare to how many I sold in the same month last year?” Managers just want to know what the result is, and don’t want to worry about how many tables need to be joined in a complex query to get that result. A dimensional model removes the complexity and represents the data in a way that end users can relate to it more easily from a business perspective.
Users can intuitively think of the data for the above question as a cube, and the edges (or dimensions) of the cube labeled as stores, products, and time frame. So let’s take a look at this concept of a cube with dimensions, and how we can use that to represent our data.
Cube and Dimensions
The dimensions become the business characteristics about the sales, for example:
- A time dimension—users can look back in time and check various time periods
- A store dimension—information can be retrieved by store and location
- A product dimension—various products for sale can be broken out
Think of the dimensions as the edges of a cube, and the intersection of the dimensions as the measure we are interested in for that particular combination of time, store, and product. A picture is worth a thousand words, so let’s look at what we’re talking about in the following image:
Notice what this cube looks like. How about a Rubik’s Cube? We’re doing a data warehouse for a toy store company, so we ought to know what a Rubik’s cube is! If you have one, maybe you should go get it now because that will exactly model what we’re talking about. Think of the width of the cube, or a row going across, as the product dimension. Every piece of information or measure in the same row refers to the same product, so there are as many rows in the cube as there are products. Think of the height of the cube, or a column going up and down, as the store dimension. Every piece of information in a column represents one single store, so there are as many columns as there are stores. Finally, think of the depth of the cube as the time dimension, so any piece of information in the rows and columns at the same depth represent the same point in time. The intersection of each of these three dimensions locates a single individual cube in the big cube, and that represents the measure amount we’re interested in. In this case, it’s dollar sales for a single product in a single store at a single point in time.
But one might wonder if we are restricted to just three dimensions with this model. After all, a cube has only three dimensions—length, width, and depth. Well, the answer is no. We can have many more dimensions than just three. In our ACME example, we might want to know the sales each employee has accomplished for the day. This would mean we would need a fourth dimension for employees. But what about our visualization above using a cube? How is this fourth dimension going to be modelled? And no, the answer is not that we’re entering the Twilight Zone here with that “dimension not only of sight and sound but of mind…” We can think of additional dimensions as being cubes within a cube. If we think of an individual intersection of the three dimensions of the cube as being another cube, we can see that we’ve just opened up another three dimensions to use—the three for that inner cube. The Rubik’s Cube example used above is good because it is literally a cube of cubes and illustrates exactly what we’re talking about.
We do not need to model additional cubes. The concept of cubes within cubes was just to provide a way to visualize further dimensions. We just model our main cube, add as many dimensions as we need to describe the measures, and leave it for the implementation to handle.
This is a very intuitive way for users to look at the design of the data warehouse. When it’s implemented in a database, it becomes easy for users to query the information from it.