9 min read

In this article by Shweta Savale, the author of the book Tableau Cookbook- Recipes for Data Visualization, we will cover how you need to install My Tableau Repository and connecting to the sample data source.

(For more resources related to this topic, see here.)

Introduction to My Tableau Repository and connecting to the sample data source

Tableau is a very versatile tool and it is used across various industries, businesses, and organizations, such as government and non-profit organizations, BFSI sector, consulting, construction, education, healthcare, manufacturing, retail, FMCG, software and technology, telecommunications, and many more. The good thing about Tableau is that it is industry and business vertical agnostic, and hence as long as we have data, we can analyze and visualize it.

Tableau can connect to a wide variety of data sources and many of the data sources are implemented as native connections in Tableau. This ensures that the connections are as robust as possible.

In order to view the comprehensive list of data sources that Tableau connects to, we can visit the technical specification page on the Tableau website by clicking on the following link:

http://www.tableau.com/products/desktop?qt-product_tableau_desktop=1#qt-product_tableau_desktops.

Getting ready

Tableau provides some sample datasets with the Desktop edition. In this article, we will frequently be using the sample datasets that have been provided by Tableau. We can find these datasets in the Data sources directory in the My Tableau Repository folder, which gets created in our Documents folder when Tableau Desktop is installed on our machine. We can look for these data sources in the repository or we can quickly download it from the link mentioned and save it in a new folder called Tableau Cookbook data under Documents/My Tableau Repository/Datasources.

The link for downloading the sample datasets is as follows: https://1drv.ms/f/s!Av5QCoyLTBpngihFyZaH55JpI5BN

There are two files that have been uploaded. They are as follows:

  • Microsoft Excel data called Sample – Superstore.xls
  • Microsoft Access data called Sample – Coffee Chain.mdb

In the following section, we will see how to connect to the sample data source. We will be connecting to the Excel data called Sample – Superstore.xls.

This Excel file contains transactional data for a retail store. There are three worksheets in this Excel workbook. The first sheet, which is called the Orders sheet, contains the transaction details; the Returns sheet contains the status of returned orders, and the People sheet contains the region names and the names of managers associated with those regions. Refer to the following screenshot to get a glimpse of how the Excel data is structured:

Now that we have taken a look at the Excel data, let us see how to connect to this Excel data in the following recipe. To begin with, we will work on the Orders sheet of the Sample – Superstore.xls data. This worksheet contains the order details in terms of the products purchased, the name of the customer, Sales, Profits, Discounts offered, day of purchase, order shipment date, among many other transactional details.

How to do it…

  1. Let’s open Tableau Desktop by double-clicking on the Tableau 10.0 icon on our Desktop. We can also right-click on the icon and select Open. We will see the start page of Tableau, as shown in the following screenshot:

  1. We will select the Excel option from under the Connect header on the left-hand side of the screen.
  2. Once we do that, we will have to browse the Excel file called Sample – Superstore.xls, which is saved in Documents/My Tableau Repository/Datasources/Tableau Cookbook data.
  3. Once we are able to establish a connection to the referred Excel file, we will get a view as shown in the following screenshot:

  1. Annotation 1 in the preceding screenshot is the data that we have connected to, and annotation 2 is the list of worksheets/tables/views in our data.
  2. Double-click on the Orders sheet or drag and drop the Orders sheet from the left-hand side section into the blank space that says Drag sheets here. Refer to annotation 3 in the preceding screenshot.
  3. Once we have selected the Orders sheet, we will get to see the preview of our data, as highlighted in annotation 1 in the following screenshot. We will see the column headers, their data type (#, Abc, and so on), and the individual rows of data:

  1. While connecting to a data source, we can also read data from multiple tables/sheets from that data source. However, this is something that we will explore a little later.
  2. Further moving ahead, we will need to specify what type of connection we wish to maintain with the data source. Do we wish to connect to our data directly and maintain a Live connectivity with it, or do we wish to import the data into Tableau’s data engine by creating an Extract? Refer to annotation 2 in the preceding screenshot. We will understand these options in detail in the next section. However, to begin with, we will select the Live option.
  3. Next, in order to get to our Tableau workspace where we can start building our visualizations, we will click on the Go to Worksheet option/ Sheet 1. Refer to annotation 3 in the preceding screenshot.
  4. This is how we can connect to data in Tableau. In case we have a database to connect to, then we can select the relevant data source from the list and fill in the necessary information in terms of server name, username, password, and so on. Refer to the following screenshot to see what options we get when we connect to Microsoft SQL Server:

How it works…

Before we connect to any data, we need to make sure that our data is clean and in the right format. The Excel file that we connected to was stored in a tabular format where the first row of the sheet contains all the column headers and every other row is basically a single transaction in the data. This is the ideal data structure for making the best use of Tableau. Typically, when we connect to databases, we would get columnar/tabular data. However, flat files such as Excel can have data even in cross-tab formats. Although Tableau can read cross-tab data, we may face certain limitations in terms of options for viewing, aggregating, and slicing and dicing our data in Tableau.

Having said that, there may be situations where we have to deal with such cross-tab or pre-formatted Excel files. These files will essentially need cleaning up before we pull into Tableau. Refer to the following article to understand more about how we can clean up these files and make them Tableau ready:

http://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#data_tips.html

In case it is a cross-tab file, then we will have to pivot it into normalized columns either at the data level or on the fly at Tableau level. We can do so by selecting multiple columns that we wish to pivot and then selecting the Pivot option from the dropdown that appears when we hover over any of the columns. Refer to the following screenshot:

If the format of the data in our Excel file is not suitable for analysis in Tableau, then we can turn on the Data Interpreter option, which becomes available when Tableau detects any unique formatting or any extra information in our Excel file. For example, the Excel data may include some empty rows and columns, or extra headers and footers. Refer to the following screenshot:

Data Interpreter can remove that extra information to help prepare our Tableau data source for analysis. Refer to the following screenshot:

When we enable the Data Interpreter, the preceding view will change to what is shown in the following screenshot:

This is how the Data Interpreter works in Tableau.

Now many a times, there may also be situations where our data fields are compounded or clubbed in a single column. Refer to the following screenshot:

In the preceding screenshot, the highlighted column is basically a concatenated field that has the Country, City, and State. For our analysis, we may want to break these and analyze each geographic level separately. In order to do so, we simply need to use the Split or Custom Split…option in Tableau. Refer to the following screenshot:

Once we do that, our view would be as shown in the following screenshot:

When preparing data for analysis, at times a list of fields may be easy to consume as against the preview of our data. The Metadata grid in Tableau allows us to do the same along with many other quick functions such as renaming fields, hiding columns, changing data types, changing aliases, creating calculations, splitting fields, merging fields, and also pivoting the data. Refer to the following screenshot:

After having established the initial connectivity by pointing to the right data source, we need to specify as to how we wish to maintain that connectivity. We can choose between the Live option and Extract option.

The Live option helps us connect to our data directly and maintains a live connection with the data source. Using this option allows Tableau to leverage the capabilities of our data source and in this case, the speed of our data source will determine the performance of our analysis.

The Extract option on the other hand, helps us import the entire data source into Tableau’s fast data engine as an extract. This option basically creates a .tde file, which stands for Tableau Data Extract. In case we wish to extract only a subset of our data, then we can select the Edit option, as highlighted in the following screenshot. The Add link in the right corner helps us add filters while fetching the data into Tableau. Refer to the following screenshot:

A point to remember about Extract is that it is a snapshot of our data stored in a Tableau proprietary format and as opposed to a Live connection, the changes in the original data won’t be reflected in our dashboard unless and until the extract is updated.

Please note that we will have to decide between Live and Extract on a case to case basis. Please refer to the following article for more clarity:

http://www.tableausoftware.com/learn/whitepapers/memory-or-live-data

Summary

This article thus helps us to install and connect to sample data sources which is very helpful to create effective dashboards in business environment for statistical purpose.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here