We will use a simplified book sales dimensional model as an example to demonstrate our bridge solution.
Our book sales model initially has the SALES_FACT fact table and two dimension tables: BOOK_DIM and DATE_DIM. The granularity of the model is sales amount by date (daily) and by book.
Assume the BOOK_DIM table has five rows:
BOOK_SK | TITLE | AUTHOR |
1 | Programming in Java | King, Chan |
3 | Learning Python | Simpson |
2 | Introduction to BIRT | Chan, Gupta, Simpson (Editor) |
4 | Advanced Java | King, Chan |
5 | Beginning XML | King, Chan (Foreword) |
The DATE_DIM has two rows:
DATE_SK | DT |
1 | 11-DEC-2009 |
2 | 12-DEC-2009 |
3 | 13-DEC-2009 |
And, the SALES_FACT table has ten rows:
DATE_SK | BOOK_SK | SALES_AMT |
1 | 1 | 1000 |
1 | 2 | 2000 |
1 | 3 | 3000 |
1 | 4 | 4000 |
2 | 2 | 2500 |
2 | 3 | 3500 |
2 | 4 | 4500 |
2 | 5 | 5500 |
3 | 3 | 8000 |
3 | 4 | 8500 |
Note that:
The columns with _sk suffixes in the dimension tables are surrogate keys of the dimension tables; these surrogate keys relate the rows of the fact table to the rows in the dimension tables.
King and Chan have collaborated in three books; two as co-authors, while in the “Beginning XML” Chan’s contribution is writing its foreword. Chan also co-authors the “Introduction to BIRT”.
Simpson singly writes the “Learning Python” and is an editor for “Introduction to BIRT”.
To analyze daily book sales, you simply run a query, joining the dimension tables to the fact table:
SELECT dt, title, sales_amt
FROM sales_fact s, date_dim d, book_dim b
WHERE s.date_sk = d.date_sk
AND s.book_sk = b.book_sk
This query produces the result showing the daily sales amount of every book that has a sale:
DT | TITLE | SALES_AMT |
11-DEC-09 | Advanced Java | 4000 |
11-DEC-09 | Introduction to BIRT | 2000 |
11-DEC-09 | Learning Python | 3000 |
11-DEC-09 | Programming in Java | 1000 |
12-DEC-09 | Advanced Java | 4500 |
12-DEC-09 | Beginning XML | 5500 |
12-DEC-09 | Introduction to BIRT | 2500 |
12-DEC-09 | Learning Python | 3500 |
13-DEC-09 | Advanced Java | 8500 |
13-DEC-09 | Learning Python | 8000 |
You will notice that the model does not allow you to readily analyze the sales by individual writer—the AUTHOR column is multi-value, not normalized, which violates the dimension modeling rule (we can resolve this by creating a view to “bundle” the AUTHOR_DIM with the SALES_FACT tables such that the AUTHORtable connects to the view as a normal dimension. We will create the view a bit later in this section). We can solve this issue by adding an AUTHOR_DIM and its AUTHOR_GROUP bridge table.
The AUTHOR_DIM must contain all individual contributors, which you will have to extract from the books and enter into the table. In our example we have four authors.
AUTHOR_SK | NAME |
1 | Chan |
2 | King |
3 | Gupta |
4 | Simpson |
The weighting_factor column in the AUTHOR_GROUP bridge table contains a fractional numeric value that determines the contribution of an author to a book. Typically the authors have equal contribution to the book they write, but you might want to have different weighting_factor for different roles; for example, an editor and a foreword writer have smaller weighting_factors than that of an author. The total weighting_factors for a book must always equal to 1.
The AUTHOR_GROUP bridge table has one surrogate key for every group of authors (a single author is considered a group that has one author only), and as many rows with that surrogate key for every contributor in the group.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…