6 min read

[box type=”note” align=”” class=”” width=””]The following excerpt is taken from the book IBM SPSS Modeler Essentials, co-authored by Keith McCormick and Jesus Salcedo. This book gives you a quick overview of the fundamental concepts of data mining and how to put them to practical use with the help of SPSS Modeler.[/box]

SPSS Modeler allows users to mine data visually on the stream canvas. This means that you will not be writing code for your data mining projects; instead you will be placing nodes on the stream canvas. Remember that nodes represent operations to be carried out on the data. So once nodes have been placed on the stream canvas, they need to be linked together to form a stream. A stream represents the flow of data going through a number of operations (nodes). The following diagram is an example of nodes on the canvas, as well as a stream:

Nodes on canvas

Given that you will spend a lot of time building streams, in this section you will learn the most efficient ways of manipulating nodes to create a stream.

Mouse buttons

When building streams, mouse buttons are used extensively so that nodes can be brought onto the canvas, connected, edited, and so on. When building streams within Modeler, mouse buttons are used in the following ways:

  • The left button is used for selecting, placing, and positioning nodes on the stream Canvas
  • The right button is used for invoking context (pop-up) menus that allow for editing, connecting, renaming, deleting, and running nodes
  • The middle button (optional) is used for connecting and disconnecting nodes

Adding nodes

To begin a new stream, a node from the Sources palette needs to be placed on the stream canvas. There are three ways to add nodes to a stream from a palette:

Method one: Click on palette and then on stream:

  1. Click on the Sources palette.
  2. Click on the Var. File node.

This will cause the icon to be highlighted.

  1. Move the cursor over the stream canvas.
  2. Click anywhere in the stream canvas.

A copy of the selected node now appears on the stream canvas. This node represents the action of reading data into Modeler from a delimited text data file. If you wish to move the node within the stream canvas, select it by clicking on the node, and while holding the left mouse button down, drag the node to a new position.

Method two: Drag and drop:

  1. Now go back to the Sources palette.
  2. Click on the Statistics File node and drag and drop this node onto the canvas.

The Statistics File node represents the action of reading data into Modeler from an IBM SPSS Statistics data file.

Method three: Double-click:

  1. Go back to the Sources palette one more time.
  2. Double click on the Database node.

The Database node represents the action of reading data into Modeler from an ODBC compliant database.

Editing nodes

Once a node has been brought onto the stream canvas, typically at this point you will want to edit the node so that you can specify which fields, cases, or files you want the node to apply to. There are two ways to edit a node.

Method one: Right-click on a node:

  1. Right-click on the Var. File node:

Editing nodes

Notice that there are many things you can do within this context menu. You can edit, add comments, copy, delete a node, connect nodes, and so on. Most often you will probably either edit the node or connect nodes.

Method two: Double-click on a node:

  1. Double-click on the Var. File node.

This bypasses the context menu we saw previously, and goes directly into the node itself so we can edit it.

Deleting nodes

There will be times when you will want to delete a node that you have on the stream canvas. There are two ways to delete a node.

Method one: Right-click on a node:

  1. Right-click on the Database File node.
  2. Select Delete.

The node is now deleted.

Method two: Use the Delete button from the keyboard:

  1. Click on the Statistics File node.
  2. Click on the Delete button on the keyboard.

Building a stream

When two or more nodes have been placed on the stream canvas, they need to be connected to produce a stream. This can be thought of as representing the flow of data through the nodes.

To demonstrate this, we will place a Table node on the stream canvas next to the Var. File node. The Table node presents data in a table format.

  1. Click the Output palette to activate it.
  2. Click on the Table node.
  3. Place this node to the right of the Var. File node by clicking in the stream canvas:

Var.file

At this point, we now have two nodes on the stream canvas, however, we technically do not have a stream because the nodes are not speaking to each other (that is, they are not connected).

Connecting nodes

In order for nodes to work together, they must be connected. Connecting nodes allows you to bring data into Modeler, explore the data, manipulate the data (to either clean it up or create additional fields), build a model, evaluate the model, and ultimately score the data. There are three main ways to connect nodes to form a stream that is, double-clicking, using the middle mouse button, or manually:

Connecting nodes

Method one: Double-click.

The simplest way to form a stream is to double-click on nodes on a palette. This method automatically connects the new node to the currently selected node on the stream canvas:

  1. Select the Var. File node that is on the stream canvas
  2. Double-click the Table node from the Output palette

This action automatically connects the Table node to the existing Var. File node, and a connecting arrow appears between the nodes. The head of the arrow indicates the direction of the data flow.

Method two: Manually. To manually connect two nodes:

  1. Bring a Table node onto the canvas.
  2. Right-click on the Var. File node.
  3. Select Connect from the context menu.
  4. Click the Table node.

Method three: Middle mouse button. To use the middle mouse button:

  1. Bring a Table node onto the canvas.
  2. Use the middle mouse button to click on the Var. File node.
  3. While holding the middle mouse button down, drag the cursor over to the Table node.
  4. Release the middle mouse button.

Deleting connections

When you know that you are no longer going to use a node, you can delete it. Often, though, you may not want to delete a node; instead you might want to delete a connection. Deleting a node completely gets rid of the node. Deleting a connection allows you to keep a node with all the edits you have done, but for now the unconnected node will not be part of the stream. Nodes can be disconnected in several ways:

Delete connections

Method one: Delete the connecting arrow:

  1. Right-click on the connecting arrow.
  2. Click Delete Connection.

Method two: Right-click on a node:

  1. Right-click on one of the nodes that has a connection.
  2. Select Disconnect from the Context menu.

Method three: Double-clicking:

  1. Double-click with the middle mouse button on a node that has a connection.

All connections to this node will be severed, but the connections to neighboring nodes will be intact. Thus, we saw it’s fairly easy to build and manage data streams in SPSS Modeler.

If you found the above excerpt useful, make sure to check out our book IBM SPSS Modeler Essentials for more tips and tricks on effective data mining.

IBM SPSS Modeler Essentials

 

 

Data Science Enthusiast. A massive science fiction and Manchester United fan. Loves to read, write and listen to music.

LEAVE A REPLY

Please enter your comment!
Please enter your name here