Visualize This!

17 min read

This article is written by Michael Phillips, the author of the book TIBCO Spotfire: A Comprehensive Primer, discusses that human beings are fundamentally visual in the way they process information. The invention of writing was as much about visually representing our thoughts to others as it was about record keeping and accountancy. In the modern world, we are bombarded with formalized visual representations of information, from the ubiquitous opinion poll pie chart to clever and sophisticated infographics. The website http://data-art.net/resources/history_of_vis.php provides an informative and entertaining quick history of data visualization. If you want truly breathtaking demonstration of the power of data visualization, seek out Hans Rosling’s The best stats you’ve ever seen at http://ted.com.

(For more resources related to this topic, see here.)

We will spend time getting to know some of Spotfire’s data capabilities. It’s important that you continue to think about data; how it’s structured, how it’s related, and where it comes from. Building good visualizations requires visual imagination, but it also requires data literacy.

This article is all about getting you to think about the visualization of information and empowering you to use Spotfire to do so. Apart from learning the basic features and properties of the various Spotfire visualization types, there is much more to learn about the seamless interactivity that Spotfire allows you to build in to your analyses.

We will be taking a close look at 7 of the 16 visualization types provided by Spotfire, but these 7 visualization types are the most commonly used.

We will cover the following topics:

Displaying information quickly in tabular form
Enriching your visualizations with color categorization
Visualizing categorical information using bar charts
Dividing a visualization across a trellis grid
Key Spotfire concept—marking
Visualizing trends using line charts
Visualizing proportions using pie charts
Visualizing relationships using scatter plots
Visualizing hierarchical relationships using treemaps
Key Spotfire concept—filters
Enhancing tabular presentations using graphical tables

Now let’s have some fun!

Displaying information quickly in tabular form

While working through the data examples, we used the Spotfire Table visualization, but now we’re going to take a closer look. People will nearly always want to see the “underlying data”, the details behind any visualization you create. The Table visualization meets this need.

It’s very important not to confuse a table in the general data sense with the Spotfire Table visualization; the underlying data table remains immutable and complete in the background. The Table visualization is a highly manipulatable view of the underlying data table and should be treated as a visualization, not a data table.

The data used here is BaseballPlayerData.xls

There is always more than one way to do the same thing in Spotfire, and this is particularly true for the manipulation of visualizations. Let’s start with some very quick manipulations:

First, insert a table visualization by going to the Insert menu, selecting New Visualization, and then Table.
To move a column, left-click on the column name, hold, and drag it.
To sort by a column, left-click on the column name. To sort by more than one column, left-click on the first column name and then press Shift + left-click on the subsequent columns in order of sort precedence.
To widen or narrow a column, hover the mouse over the right-hand edge of the column title until you see the cursor change to a two-way arrow, and then click and drag it.

These and other properties of the Table visualization are also accessed via visualization properties. As you work through the various Spotfire visualizations, you’ll notice that some types have more options than others, but there are common trends and an overall consistency in conventions.

Visualization properties can be opened in a number of ways:

By right-clicking on the visualization, a table in this case, and selecting Properties.
By going to the Edit menu and selecting Visualization Properties.
By clicking on the Visualization Properties icon, as shown in the following screenshot, in the icon tray below the main menu bar.

It’s beyond the scope of this book to explore every property and option. The context-sensitive help provided by Spotfire is excellent and explains all the options in glorious detail.

I’d like to highlight four important properties of the Table visualization:

The General property allows you to change the table visualization title, not the name of the underlying data table. It also allows you to hide the title altogether.
The Data property allows you to switch the underlying data table, if you have more than one table loaded into your analysis.
The Columns property allows you to hide columns and order the columns you do want to show.
The Show/Hide Items property allows you to limit what is shown by a rule you define, such as top five hitters. After clicking on the Add button, you select the relevant column from a dropdown list, choose Rule type (Top), and finally, choose Value for the rule (5). The resulting visualization will only show the rows of data that meet the rule you defined.

Enriching your visualizations with color categorization

Color is a strong feature in Spotfire and an important visualization tool, often underestimated by report creators. It can be seen as merely a nice-to-have customization, but paying attention to color can be the difference between creating a stimulating and intuitive data visualization rather than an uninspiring and even confusing corporate report. Take some pride and care in the visual aesthetics of your analytics creations!

Let’s take a look at the color properties of the Table visualization.

Open the Table visualization properties again, select Colors, and then Add the column Runs.
Now, you can play with a color gradient, adding points by clicking on the Add Point button and customizing the colors. It’s as easy as left-clicking on any color box and then selecting from a prebuilt palette or going into a full RGB selection dialog by choosing More Colors….
The result is a heatmap type effect for runs scored, with yellow representing low run totals, transitioning to green as the run total approaches the average value in the data, and becoming blue for the highest run totals.

Visualizing categorical information using bar charts

We saw how the Table visualization is perfect for showing and ordering detailed information. It’s quite similar to a spreadsheet. The Bar Chart visualization is very good for visualizing categorical information, that is, where you have categories with supporting hard numbers—sales by region, for example. The region is the category, whereas the sales is the hard number or fact.

Bar charts are typically used to show a distribution. Depending on your data or your analytic requirement, the bars can be ordered by value, placed side by side, stacked on top of each other, or arranged vertically or horizontally.

There is a special case of the category and value combination and that is where you want to plot the frequencies of a set of numerical values. This type of bar chart is referred to as a histogram, and although it is number against number, it is still, in essence, a distribution plot. It is very common in fact to transform the continuous number range in such cases into a set of discrete bins or categories for the plot. For example, you could take some demographic data and plot age as the category and the number of people at that age as the value (the frequency) on a bar chart. The result, for a general population, would approach a bell-shaped curve.

Let’s create a bar chart using the baseball data. The data we will use is BaseballPlayerData.xls, which you can download from http://www.insidespotfire.com.

Create a new page by right-clicking on any page tab and selecting New Page. You can also select New Page from the Insert menu or click on the new page icon in the icon bar below the main menu.
Create a Bar Chart visualization by left-clicking on the bar chart icon or by selecting New Visualization and then Bar Chart from the Insert menu.
Spotfire will automatically create a default chart, that is, rarely exactly what you want, so the next step is to configure the chart.
Two distributions might be interesting to look at: the distribution of home runs across all the teams and the distribution of player salaries across all the teams.
The axes are easy to change; simply use the axes selectors.
If the bars are vertical, it means that the category—Team, in our case—should be on the horizontal axis, with the value—Home Runs or Salary—on the vertical axis, representing the height of the bars.
We’re going to pick Home Runs from the vertical axis selector and then an appropriate aggregation dropdown, which is highlighted in red in the screenshot. Sum would be a valid option, but let’s go with Avg (Average). Similarly, select Team from the horizontal axis dropdown selector.

The vertical, or value, axis must be an aggregation because there is more than one home run value for each category. You must decide if you want a sum, an average, a minimum, and so on.
You can modify the visualization properties just as you did for the Table visualization. Some of the options are the same; some are specific to the bar chart. We’re going to select the Sort bars by value option in the Appearance property. This will order the bars in descending order of value. We’re also going to check the option Vertically under Scale labels | Show labels for the Category Axis property.
There are two more actions to perform: create an identical bar chart except with average salary as the value axis, and give each bar chart an appropriate title (Visualization Properties|General|Title:).

To copy an existing visualization, simply right-click on it and select Duplicate Visualization.

We can now compare the distribution of home run average and salary average across all the baseball teams, but there’s a better way to do this in a single visualization using color.

Close the salary distribution bar chart by left-clicking on X in the upper right-hand corner of the visualization (X appears when you hover the mouse) or right-clicking on the visualization and selecting Close.
Now, open the home run bar chart visualization properties, go to the Colors property, and color by Avg(Salary).
Select a Gradient color mode, and add a median point by clicking on the Add Point button and selecting Median from the dropdown list of options on the added point.
Finally, choose a suitable heat map range of colors; something like blue (min) through pale yellow (median) through red (max).
You will still see the distribution of home runs across the baseball teams, but now you will have a superimposed salary heat map. Texas and Cleveland appear to be getting much more bang for their buck than the NY Yankees.

Dividing a visualization across a trellis grid

Trellising, whereby you divide a series of visualizations into individual panels, is a useful technique when you want to subdivide your analysis. In the example we’ve been working with, we might, for instance, want to split the visualization by league.

Open the visualization properties for the home runs distribution bar chart colored by salary and select the Trellis property.
Go to Panels and split by League (use the dropdown column selector).

Spotfire allows you to build layers of information with even basic visualizations such as the bar chart. In one chart, we see the home run distribution by team, salary distribution by team, and breakdown by league.

Key Spotfire concept – marking

It’s time to introduce one of the most important Spotfire concepts, called marking, which is central to the interactivity that makes Spotfire such a powerful analysis tool. Marking refers to the action of selecting data in a visualization. Every element you see is selectable, or markable, that is, a single row or multiple rows in a table, a single bar or multiple bars in a bar chart.

You need to understand two aspects to marking. First, there is the visual effect, or color(s) you see, when you mark (select) visualization elements. Second, there is the behavior that follows marking: what happens to data and the display of data when you mark something.

How to change the marking color

From Spotfire v5.5 onward, you can choose, on a visualization-by-visualization basis, two distinct visual effects for marking:

Use a separate color for marked items: all marked items are uniformly colored with the marking color, and all unmarked items retain their existing color.
Keep existing color attributes and fade out unmarked items: all marked items keep their existing color, and all unmarked items also keep their existing color but with a high degree of color fade applied, leaving the marked items strongly highlighted.

The second option is not available in versions older than v5.5 but is the default option in Versions 5.5 onward.

The setting is made in the visualization’s Appearance property by checking or unchecking the option Use separate color for marked items. The default color when using a separate color for marked items is dark green, but this can be changed by going to Edit|Document Properties|Markings|Edit. The new option has the advantage of retaining any underlying coloring you defined, but you might not like how the rest of the chart is washed out. Which approach you choose depends on what information you think is critical for your particular situation.

When you create a new analysis, a default marking is created and applied to every visualization you create by default. You can change the color of the marking in Document Properties, which is found in the Edit menu. Just open Document Properties, click on the Markings tab, select the marking, click on the Edit button, and change the color.

You can also create as many markings as you need, giving them convenient names for reference purposes, but we’ll just focus on using one for now.

How to set the marking behavior of a visualization

Marking behavior depends fundamentally on data relationships. The data within a single data table is intrinsically related; the data in separate data tables must be explicitly related before you configure marking behavior for visualizations based on separate datasets.

When you mark something in a visualization, five things can happen depending on the data involved and how you configured your visualizations:

Conditions	Behavior
Two visualizations with the same underlying data table (they can be on different pages in the analysis file) and the same marking scheme applied.	Marking data on one visualization will automatically mark the same data on the other.
Two visualizations with related underlying data tables and the same marking scheme applied.	The same as the previous condition’s behavior, but subject to differences in data granularity. For example, marking a baseball team in one visualization will mark all the team’s players in another visualization that is based on a more detailed table related by team.
Two visualizations with the same or related data tables where one has been configured with data dependency on the marking in the other.	Nothing will display in the marking-dependent visualization other than what is marked in the reference visualization.
Visualizations with unrelated underlying data tables.	No marking interaction will occur, and the visualizations will mark completely independently of one another.
Two visualizations with the same underlying data table or related data tables and with different marking schemes applied.	Marking data on one visualization will not show on the other because the marking schemes are different.

Here’s how we set these behaviors:

Open the visualization properties of the bar chart we have been working with and navigate to the Data property.
You’ll notice that two settings refer to marking: Marking and Limit data using markings.
Use the dropdown under Marking to select the marking to be used for the visualization. Having no marking is an option. Visualizations with the same marking will display synchronous selection, subject to the data relation conditions described earlier.
The options under Limit data using markings determine how the visualization will be limited to marking elsewhere in the analysis. The default here is no dependency. If you select a marking, then the visualization will only display data selected elsewhere with that marking.

It’s not good to have the same marking for Marking and Limit data using markings. If you are using the limit data setting, select no marking, or create a second marking and select it under Marking.

You’re possibly a bit confused by now. Fortunately, marking is much harder to describe than to use! Let’s build a tangible example.

We’ll start a new analysis, so close any analysis you have open and create a new one, loading the player-level baseball data (BaseballPlayerData.xls).
Add two bar charts and a table. You can rearrange the layout by left-clicking on the title bar of a visualization, holding, and dragging it. Position the visualizations any way you wish, but you can place the two bar charts side by side, with the table below them spanning both.

Save your analysis file at this point and at regular intervals. It’s good behavior to save regularly as you build an analysis. It will save you a lot of grief if your PC fails in any way. There is no autosave function in Spotfire.

For the first bar chart, set the following visualization properties:

Property	Value
General \| Title	Home Runs
Data \| Marking	Marking
Data \| Limit data using markings	Nothing checked
Appearance \| Orientation	Vertical bars
Appearance \| Sort bars by value	Check
Category Axis \| Columns	Team
Value Axis \| Columns	Avg(Home Runs)
Colors \| Columns	Avg(Salary)
Colors \| Color mode	Gradient Add Point for median Max = strong red; Median = pale yellow; Min = strong blue
Labels \| Show labels for	Marked Rows
Labels \| Types of labels \| Complete bar	Check

For the second bar chart, set the following visualization properties:

Property	Value
General \| Title	Roster
Data \| Marking	Marking
Data \| Limit data using markings	Nothing checked
Appearance \| Orientation	Horizontal bars
Appearance \| Sort bars by value	Check
Category Axis \| Columns	Team
Value Axis \| Columns	Count(Player Name)
Colors \| Columns	Position
Colors \| Color mode	Categorical

For the table, set the following visualization properties:

Property	Value
General \| Title	Details
Data \| Marking	(None)
Data \| Limit data using markings	Check Marking
Columns	Team, Player Name, Games Played, Home Runs, Salary, Position

Now start selecting visualization elements with your mouse. You can click on elements such as bars or segments of bars, or you can click and drag a rectangular block around multiple elements.

When you select a bar on the Home Runs bar chart, the corresponding team bar automatically selects the Roster bar chart, and details for all the players in that team display in the Details table. When you select a bar segment on the Roster bar chart, the corresponding team bar automatically selects on the Home Runs bar chart and only players in the selected position for the team selected appear in the details.

There are some very useful additional functions associated with marking, and you can access these by right-clicking on a marked item. They are Unmark, Invert, Delete, Filter To, and Filer Out. You can also unmark by left-clicking on any blank space in the visualization.

Play with this analysis file until you are comfortable with the marking concept and functionality.

Summary

This article is a small taste of the book TIBCO Spotfire: A comprehensive primer. You’ve seen how the Table visualization is an easy and traditional way to display detailed information in tabular form and how the Bar Chart visualization is excellent for visualizing categorical information, such as distributions.

You’ve learned how to enrich visualizations with color categorization and how to divide a visualization across a trellis grid. You’ve also been introduced to the key Spotfire concept of marking.

Apart from gaining a functional understanding of these Spotfire concepts and techniques, you should have gained some insight into the science and art of data visualization.