In this article by Iwan Shomnikov, author of the book SAP Data Services 4.x Cookbook, you will learn that the Pivot transform belongs to the Data Integrator group of transform objects in Data Services, which are usually all about the generation or transformation (meaning change in the structure) of data.
Simply inserting the Pivot transform allows you to convert columns into rows. Pivot transformation increases the amount of rows in a dataset as for every column converted into a row, an extra row is created for every key (the non-pivoted column) pair. Converted columns are called pivot columns.
Pivoting rows to columns or columns to rows is quite a common transformation operation in data migration tasks, and traditionally, the simplest way to perform it with the standard SQL language is to use the decode() function inside your SELECT statements. Depending on the complexity of the source and target datasets before and after pivoting, the SELECT statement can be extremely heavy and difficult to understand.
Data Services provide a simple and flexible way of pivoting data inside the Data Services ETL code using the Pivot and Reverse_Pivot dataflow object transforms.
The following steps show how exactly you can create, configure, and use these transforms in Data Services in order to pivot your data.
(For more resources related to this topic, see here.)
We will use the SQL Server database for the source and target objects, which will be used to demonstrate the Pivot transform available in Data Services. The steps in this section describe the preparation of a source table and the data required in it for a demonstration of the Pivot transform in the Data Services development environment:
create table Sales.AccountBalance (
[AccountID] integer,
[AccountNumber] integer,
[Year] integer,
[Q1] decimal(10,2),
[Q2] decimal(10,2),
[Q3] decimal(10,2),
[Q4] decimal(10,2));
-- Row 1
insert into Sales.AccountBalance
([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])
values (1,100,2015,100.00,150.00,120.00,300.00);
-- Row 2
insert into Sales.AccountBalance
([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])
values (2,100,2015,50.00,350.00,620.00,180.00);
-- Row 3
insert into Sales.AccountBalance
([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])
values (3,200,2015,333.33,440.00,12.00,105.50);
So, the source table would look similar to the one in the following figure:
This section describes the ETL development process, which takes place in the Data Services Designer application. We will not create any workflow or script object in our test jobs; we will keep things simple and have only one batch job object with a dataflow object inside it performing the migration and pivoting of data from the ACCOUNTBALANCE source table of our OLTP database. Here are the steps to do this:
Pivot columns are columns whose values are merged in one column after the pivoting operation, thus producing an extra row for every pivoted column.
Non-pivot columns are columns that are not affected by the pivot operation.
As you can see, the pivoting operation denormalizes the dataset, generating more rows. This is why ACCOUNTID does not define the uniqueness of the record anymore, and we have to specify the extra key column, PIVOT_SEQ.
So, you may wonder, why pivot? Why don’t we just use data as it is and perform the required operation on data from the columns Q1-Q4?
The answer, in the given example, is very simple—it is much more difficult to perform an aggregation when the amounts are spread across different columns. Instead of summarizing using a single column with the sum(AMOUNT) function, we would have to write the sum(Q1 + Q2 + Q3 + Q4) expression every time. Quarters are not the worst part yet; imagine a situation where a table has huge amounts of data stored in columns defining month periods and you have to filter data by these time periods.
Of course, the opposite case exists as well; storing data across multiple columns instead of just in one is justified. In this case, if your data structure is not similar to this, you can use the Reverse_Pivot transform, which does exactly the opposite—it converts rows into columns. Look at the following example of a Reverse_Pivot configuration:
Reverse pivoting or transformation of rows into columns leads us to introduce another term—Pivot axis column. This is a column that holds categories defining different columns after a reverse pivot operation. It corresponds to the Header column option in the Pivot transform configuration.
As you noted in this article, the Pivot and Reverse_Pivot transform objects available in Data Services Designer are a simple and easily configurable way to pivot data of any complexity. The GUI of the Designer tool makes maintaining the ETL process developed in Data Services easy and keeps it readable.
If you make any changes to the pivot configuration options, Data Services automatically regenerates the output schema in pivot transforms accordingly.
Further resources on this subject:
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…