[box type=”note” align=”” class=”” width=””]The following extract is taken from the book IBM SPSS Modeler Essentials, written by Keith McCormick and Jesus Salcedo. SPSS Modeler is one of the popularly used enterprise tools for data mining and predictive analytics. [/box]
In this article, we will explore how SPSS Modeler can be effectively used to combine different file types for efficient data modeling.
In many organizations, different pieces of information for the same individuals are held in separate locations. To be able to analyze such information within Modeler, the data files must be combined into one single file. The Merge node joins two or more data sources so that information held for an individual in different locations can be analyzed collectively. The following diagram shows how the Merge node can be used to combine two separate data files that contain different types of information:
Like the Append node, the Merge node is found in the Record Ops palette. This node takes multiple data sources and creates a single source containing all or some of the input fields.
Let’s go through an example of how to use the Merge node to combine data files:
The Merge stream contains the files we previously appended, as well as the main data file we were working with in earlier chapters.
2. Place a Merge node from the Record Ops palette on the canvas.
3. Connect the last Reclassify node to the Merge node.
4. Connect the Filter node to the Merge node.
[box type=”info” align=”” class=”” width=””]Like the Append node, the order in which data sources are connected to the Merge node impacts the order in which the sources are displayed. The fields of the first source connected to the Merge node will appear first, followed by the fields of the second source connected to the Merge node, and so on.[/box]
5. Connect the Merge node to a Table node:
6. Edit the Merge node.
Since the Merge node can cope with a variety of different situations, the Merge tab allows you to specify the merging method.
There are four methods for merging:
Let’s combine these files. To do this:
Fields contained in all input sources appear in the Possible keys list. To identify one of more fields as the key field(s), move the selected field into the Keys for merge list. In our case, there are two fields that appear in both files, ID and Year.
2. Select ID in the Possible keys list and place it into the Keys for merge list:
There are five major methods of merging using a key field:
The Filter tab lists the data sources involved in the merge, and the ordering of the sources determines the field ordering of the merged data. Here, you can rename and remove fields. Earlier, we saw that the field Year appeared in both datasets; here we can remove one version of this field (we could also rename one version of the field to keep both):
The second Year field will no longer appear in the combined data file.
The Optimization tab provides two options that allow you to merge data more efficiently when one input dataset is significantly larger than the other datasets, or when the data is already presorted by all or some of the key fields that you are using to merge:
All of these files have now been combined. The resulting table should have 44 fields and 143,531 records.
We saw how the Merge node is used to join data files that contain different information for the same records.
If you found this post useful, make sure to check out IBM SPSS Modeler Essentials for more information on leveraging SPSS Modeler to get faster and efficient insights from your data.
At Packt, we are always on the lookout for innovative startups that are not only…
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…