Normalizing Dimensional Model

3 min read

First Normal Form Violation in Dimension table

Let’s revisit the author problem in the book dimension. The AUTHOR column contains multiple authors, a first-normal-form violation, which prevents us from querying book or sales by author.

BOOK dimension table

BOOK_SK

TITLE

AUTHOR

PUBLISHER

CATEGORY

SUB-CATEGORY

1

Programming in Java

King, Chan

Pac

Programming

Java

2

Learning Python

Simpson

Pac

Programming

Python

3

Introduction to BIRT

Chan, Gupta, Simpson (Editor)

Apes

Reporting

BIRT

4

Advanced Java

King, Chan

Apes

Programming

Java

Normalizing and spinning off the authors into a dimension and adding an artificial BOOK AUTHOR fact solves the problem; it is an artificial fact as it does not contain any real business measure.  Note that the Editor which is an author’s role is also “normalized” into its column in the AUTHOR table (It is related to author, but not actually an author’s name).

AUTHOR table

AUTHOR_SK

AUTHOR_NAME

1

King

2

Chan

3

Simpson

4

Gupta



BOOK AUTHOR table

BOOK_SK

AUTHOR_SK

ROLE

COUNT

1

1

Co-author

1

1

2

Co-author

1

2

3

Author

1

3

2

Co-author

1

3

3

Editor

1

3

4

Co-author

1

4

1

Co-author

1

4

2

Co-author

1

Note the artificial COUNT measure which facilitates aggregation always has a value of numeric 1.

SELECT name, SUM(COUNT) 
FROM book_author ba, book_dim b, author_dim a
WHERE ba.book_sk = b.book_sk AND ba.author_sk = a.author_sk
GROUP BY name

You might need to query sales by author, which you can do so by combining the queries of each of the two stars (the two facts) on their common dimension (BOOK dimension), producing daily book sales by author.

SELECT dt, title, name, role, sales_amt FROM
(SELECT book_sk, dt, title, sales_amt FROM sales_fact s, date_dim d, book_dim b
WHERE s.book_sk = b.book_sk AND s.date_sk = d.date_sk) sales,
(SELECT b.book_sk, name, role FROM book_author ba, book_dim b, author_dim a
WHERE ba.book_sk = b.book_sk AND ba.author_sk = a.author_sk) author
WHERE sales.book_sk = author.book_sk

Single Column with Repeating Value in Dimension table

Columns like the PUBLISHER, though not violating any normal form, is also good to get normalized, which we accomplish by adding an artificial fact, PUBLISHED BOOK fact, and its own dimension, PUBLISHER dimension.

This normalization is not exactly the same as that in normalizing first-normal-form violation; the PUBLISHER dimension can correctly be linked to the SALES fact, the publisher surrogate key must be added though in the SALES fact.

PUBLISHER table

 

PUBLISHER_SK

PUBLISHER

1

Pac

2

Apes

BOOK PUBLISHER table

BOOK_SK

PUBLISHER_SK

COUNT

1

1

1

1

2

1

2

3

1

3

2

1

3

3

1

3

4

1

4

1

1

4

2

1

Related Columns with Repeating Value in Dimension table

CATEGORY and SUB-CATEGORY columns are related, they form a hierarchy. Each of them can be normalized into its own dimension, but they need to be all linked into one artificial fact.

Non-Measure Column in Fact table

The ROLE column inside the BOOK AUTHOR fact is not a measure; it violates the dimensional modeling norm; to resolve we just need to spin it off into its own dimension, effectively normalizing the fact table.

ROLE dimension table and sample rows

 

ROLE_SK

ROLE

1

Author

2

Co-Author

3

Editor

BOOK AUTHOR table with normalized ROLE

BOOK_SK

AUTHOR_SK

ROLE_SK

COUNT

1

1

2

1

1

2

2

1

2

3

1

1

3

2

2

1

3

3

3

1

3

4

2

1

4

1

2

1

4

2

2

1

 

Summary

This article shows that both dimensional table and fact table in a dimensional model can be normalized without violating its modeling norm.

If you have read this article, you may be interested to view :

Packt

Share
Published by
Packt

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago