Chapter 2. Summarizing Your Data for Dashboards
In this chapter, we will cover:
- Arithmetic – the queen of mathematics!
- Dashboards and dates
- Grouping your data with calculations
- Correlations with calculations
- Using cross-tabs flexibly
- Simplifying your business rules with custom calculations
Introduction
It isn't enough just to make data look beautiful; we know that we can do that with Tableau. The data has to be accurate as well in terms of the business rules of an organization.
Calculations make things easier for the business user. The idea of "intelligent laziness" is often ascribed to Napoleon Bonaparte. The core idea is that people put effort where it is most needed rather than wasting time and resources. Calculations can help you in many ways, such as removing the implementation of repeatable calculations through automation. They also allow you to implement your business calculations so that they are consistently used in your dashboards.
Arithmetic – the queen of mathematics!
This recipe explains how to use simple descriptive statistics and arithmetic as the first step toward analyzing your data. We will also look at ways to import data into Tableau. When we import data, Tableau will create a Tableau Data Extract file behind the scenes. This is also known as a TDE file for short.
Descriptive statistics are a great starting point when analyzing data. They are very helpful in delivering an initial overview of the data to help you interpret it. We can glean information about the spread of the distribution of the data, measures of variability around the mean, and measures of deviation from the normal curve. Descriptive statistics have a variety of uses, for example, to help you identify outliers, which are unusual cases in the data that may warrant further investigation. Missing data points are important as well because missing data can mislead our analysis, and data visualization can help us to profile the data to check for potential instances of missing data.
How do we calculate descriptive statistics? Once the FactInternetSales
data has been imported, we will calculate its mean, median, and mode. These are measures of central tendency that allow us to see the shape of the data. Many business questions are quite simple: what are my average sales? What are my total costs? How well can a dataset be summarized by one number?
We can look at the data in order to see how well it can be described by a single number; this is called the measure of central tendency. Often, when we talk about business questions, people listening to us would want to know the average of something. However, when we look at the average, things become more complex. The average may be skewed, for example, by an outlier. We need to know whether our average is a representative of a dataset.
The average, median, and mode tell us about the symmetry of our data in terms of its distribution. They give us an initial picture of the data, a simple summary. Further, knowing about the symmetry of the data can help us look at important factors such as probability, which may form part of your analysis.
As you might expect, there are many ways to perform descriptive statistics in Tableau. This recipe will show you how to perform simple and quick descriptive statistics that will help you begin analyzing your data; this will be useful in understanding whether the average is effective in describing the data overall.
In this recipe, we will import some data and look at using some descriptive statistics to describe our data. Firstly, we will calculate the average, which is the most well-known measure of central tendency. Then, we will look at the median and mode.
Getting ready
For this recipe and future recipe, you will need to download a mix of Excel and CSV files. Perform the following steps:
- Set up a folder on your computer where you can store the files. As an example, you could call it
TableauCookbookData
and locate it onD:
. The path for the folder would beD:\TableauCookbookData
. - Go to http://bit.ly/TableauDashboardCookbookSampleData and download the ZIP file.
- Right-click on the ZIP file and select Extract To.
- Extract the files to your folder. So, in our example, you would extract the files to
D:\TableauCookbookData
.
How to do it...
- Open up Tableau and navigate to File | New.
- Save the file as
Chapter 2
. - We will connect to the data and import it into Tableau's internal data store mechanism. To do this, click on Connect to Data.
- Then, select the link Text File and a file browser will appear. Navigate to the folder where you stored the CSV files.
- Navigate to the
FactInternetSales.csv
file and select it to open it. - Tableau will ask you to save the connection and give it a name. You can see this in the dialog box named Text File Connection, as shown in the following screenshot:
- If you look at the name in Step 4 of the Text File Connection dialog box, you will see that it is not very user-friendly. It reads FactInternetSales#csv (FactInternetSales.csv). Let's rename the connection to
Connection_FactInternetSales
and then click on OK. - Tableau will then ask you how you would like to connect to the data, or whether you'd like to import some or all of the data. We will import all of the data as per the Data Connection dialog box that is shown in the following screenshot. Select the Import All Data option.
- Once the data is imported, Tableau will ask where you would like to store the Tableau Data Extract file.
Tip
If you store the TDE file in a location that is synchronized with SkyDrive, you will need to rename the file to remove the
#
file since SkyDrive will not save the file due to the presence of this character. - Firstly, we will calculate the average of the sales amount. Take SalesAmount and drag it to the Rows shelf. You will see that Tableau immediately visualizes the data and turns it into a bar chart with one vertical bar. You can see this in the following screenshot:
- We would like to see the actual figure, so we will turn the data into a table. To do this, go to the Show Me panel and select the table. The SalesAmount pill will disappear from the Rows shelf and will reappear in the textbox in the Marks shelf, since Tableau is now showing a number.
The first visualization that Tableau selects is a sum of SalesAmount. However, since we are interested in the measures of central tendency, we are interested in the average, median, and mode. We will calculate these values to perform a quick summary of the data and also to understand how much we can rely on one number, the mean, to summarize the data.
- To calculate the average, simply click on the SalesAmount lozenge in the Rows shelf and click on the down arrow to show the menu. You can see this in the following screenshot:
- The average sales amount is $486.09.
- The average is calculated and visualized in a table. However, we would like to see the median as well. To do this, make sure that the SalesAmount pill is on the Rows shelf.
- Right-click on the pill and select Median. The median SalesAmount is given as 29.99.
- Next, remove SalesAmount so that we have a clean canvas again.
- Then, we need to calculate the mode, which can be defined as the number that occurs most frequently. To calculate the mode, we need to make a copy of SalesAmount and make it a dimension. This is a workaround to help the Tableau user to work out how many times each price occurred. To do this, right-click on the SalesAmount column in the Measures window and select Duplicate. You can see the menu item in the following screenshot:
- Once you have duplicated the SalesAmount figure, you will see that the duplicate is called SalesAmount (copy).
- Next, drag SalesAmount (copy) to the Dimensions shelf in the Data sidebar so that the table works properly.
- Drag the SalesAmount (copy) item to the Rows shelf.
- Next, you can go back to the Measures pane and move the Number of Records pill to the canvas area, next to the SalesAmount (copy) column.
- This will give us the number of records for each sales amount. However, it is quite difficult to work out the most commonly-occurring sales amount from the table simply by looking at it. What we can do instead is sort the data so that the maximum is at the top, and the top item provides us with the mean.
- To sort the data, go to the top of the Tableau interface and look for the horizontal bar chart symbol with the downward arrow. This will sort the data by the number of records in descending order. You can see this under the Server menu item, as highlighted in the following screenshot:
- Once you've done this, drag the Number of Records item on to the Color shelf of the Marks shelf. Once this is done, your screen should resemble the following screenshot:
- The mode is actually the topmost number; the highest frequently occurring SalesAmount is $4.99.
How it works…
To summarize, in this section, we learned the following:
- Basic calculations
- Changing the color
- Basic sorting
We can see that the average and median values are quite different. The mode is $4.99, but the median is $29.99. Since the average is much higher than the median and the mode is different from the other two numbers, the data is not symmetrical. Often, when business analysts look at data, they try to find out whether the data is close to a normal curve or not. The average, median, and mode help us to determine whether the data is close to a normal distribution or whether the data is shaped. Therefore, we can't just simply use the average to summarize the data; we need to know about the other items too. This helps us understand the skewedness of the data, or how far it is from the central measures.
There's more…
If you are interested in learning more about analyzing data and the normal curve, then you can take a look at http://en.wikipedia.org/wiki/Normal_curve.
If statistics interest you, why not look at doing a Khan Academy course? This is a free facility for learning statistics yourself online; refer to https://www.khanacademy.org/.
Dashboards and dates
In Business Intelligence, dates are an essential part of analysis, and they are an important part of Business Intelligence projects. Data warehouses, for example, have a Date dimension as a way of helping business users to describe their data by date. People's business questions often include a when element. Additionally, dashboards will often reference dates.
Comparison is fundamental to analysis. Time is a fundamental part of comparison. Dashboards will often display comparisons between periods of time, so time is an essential part of the dashboard display. It is easy to envisage the following business questions that involve time:
- What are my numbers compared to last year?
- How did my sales region perform this month as compared to the last month?
- When will my department reach its target?
This recipe explains how to use dates in order to analyze your data using the Dates functionality in Tableau. In the last recipe, we imported the FactInternetSales
table. Once the FactInternetSales
data has been imported, we will also do the same for the DimDate
table. To analyze our data, we will perform the following actions:
- Join data together
- Activate links in relationships
- Date analysis in Tableau
Getting ready
We will continue to use the same Tableau workbook we set up in the first recipe. It was called Chapter 2
.
How to do it...
- We will connect to an Excel file called
DimDate.xls
and import it into Tableau's internal data store mechanism. To do this, navigate to Data | Connect to Data. - Then, select the link Microsoft Excel and a file browser will appear. Navigate to the folder where you stored the downloaded Excel and CSV files.
- Navigate to the
DimDate.xls
file and select it. - In the Excel Workbook Connection dialog box, change the Step 4 name so that it reads
Connection_DimDate
. - We have two data sources which will need to be joined together so that the data can be analyzed by date. We can join the
DateKey
andOrderDateKey
columns together for theDimDate
andFactInternetSales
tables respectively. Make sure that theDateKey
andOrderDateKey
columns are both dimensions, not measures. So, ensure that both fields are contained in the Dimensions field; if not, drag them from the Measures pane to the Dimensions pane. If they are measures, then we cannot join them. - To do this, we edit the relationship between the two tables. Ensure that you have selected the
Connection_DimDate
data source in the Data pane in the sidebar. - Next, go to the Data menu item and select Edit Relationships. You can see this illustrated in the following screenshot:
- You can now see the Relationships dialog box. Click on Custom and then click on Add.
- Next, you will see two columns: one for the
DimDate
columns and the other for theFactInternetSales
columns. Each column represents a table, and the items in the list are source fields. - Select DateKey on the left-hand side, OrderDateKey on the right-hand side, and then click on the OK button. You can see this in the following screenshot:
- Let's create a new sheet by going to the tab at the foot of the worksheet with the worksheet name on it. Right-click to select Duplicate Sheet. Let's make the canvas clean again by removing all of the pills from the Columns, Rows, and Marks shelves.
- Let's use the new worksheet to proceed.
- In the Data pane, select the Connection_DimDate dimension. Select Calendar Year and drag it into the Rows shelf. Make sure it is set to the String data type by right-clicking on Calendar Year on the Data pane, then navigating to Change Data Type, and finally selecting String.
- Next, click on the Connection_FactInternetSales connection and you will see a broken link symbol next to OrderDateKey. You can see this in the following screenshot:
- We need to make the relationship between the data sources active. If we don't, then the data sources will not be related to one another and our analysis will not make sense. To activate the link, click on the broken chain; it will go red and change to a whole link. The following is an example:
- Staying in the
FactInternetSales
table, drag the SalesAmount column from the Measures pane to the Canvas shelf. You can see this in the following screenshot: - If you break the link by clicking on it again, then you will notice that the figures change. Each calendar year is now set to 29,358,677, rather than the figures shown in the previous screenshot. This is because the link between the tables is now broken, so Tableau cannot sum the data according to the years; instead, it issues a query to both data sources and simply returns the product of the total.
- Let's look at what happens if we use a different aggregation, count distinct. This returns a count of the distinct SalesAmount values. When we link the
OrderDateKey
column and select Count Distinct for the SalesAmount figure, Tableau turns the SalesAmount pill to red and grays out the whole screen. You can see an example of this in the following screenshot: - In Tableau 8, the aggregation is considered not valid, so Tableau saves the user from themselves by graying out the screen until it is fixed. You have the facility to undo the last step. To do this, press Ctrl + Z.
How it works…
Tableau allows business users to enrich their data through the addition of calculations that get stored as part of the workbook. This is useful for data analysis, since we can look and see how simple steps can quickly affect the data.
Why is this the case? Tableau issues separate queries to each data source and joins the two data sources together. Then, it conducts the aggregation on the joined data sources at the lowest level of detail in the view of the data from the Tableau interface.
Unfortunately, if the level of detail in the underlying query is different from the level of detail in the view, then the calculation will not be correct.
The moral of the story is to keep the relevant dimensions in the Tableau view, that is, put their features in the Columns and Rows shelves or in the Marks shelf. By putting more elements in the view, these shelves will move the query towards serving up data that can be used for matching the data sources. Then, the user can try to incorporate the query by including as many dimensions as possible, which in turn will produce as much detail in the query as possible in order to facilitate matching between the tables.
There's more…
Dashboards use a lot of calculations to summarize data. Research by specialists such as Ben Shneiderman shows that people tend to want to see the summary first, followed by zooming and filtering the data, and then finally see the details on demand. This is a very natural way of engaging with data. Shneiderman calls this the "Visual Information Seeking Mantra".
If you are interested in the psychology of how individuals interact with data, then Ben Shneiderman's paper The Eyes Have It (1996), which you can find at http://dl.acm.org/citation.cfm?id=834354, will be of help.
Grouping your data with calculations
In the first recipe, we specified communication as one of the key features of a dashboard. We need to be able to share the right information with the right audience, at the right time, to the right people, and in the right format.
Sometimes, data needs to be translated so that it matches the business rules and the business understanding of the organization. Tableau offers a number of different ways that help you translate data into something that the business decision makers will understand.
Grouping is one way of making data meaningful to the business. In this recipe, we will look at grouping some dimension members into a single member. Rolling up some of the members in one dimension is a good way of summarizing data for dashboards.
In this recipe, we will look at grouping dimension members; then, we will look at more complex grouping of calculations. The business question is an investigation into the characteristics of customers, for example, those who have children, and those who do not. We will group the NumberChildrenAtHome dimension members into the group of customers who have children and those who do not.
Then, we will look at a more advanced example of grouping the data by measure rather than dimension. To do this, we can create a calculation that will distinguish the values that are below the average sales amount and above the average. Results that are classified as above average are labelled Above or Equal to Average, and below average sales are labelled Below Average. We can then use this calculation to convey a visual message to the business user; for example, we could color the above average sales in one color and the below average sales in another in order to make the distinction easily identifiable.
Getting ready
We will need to add in a new data source for this recipe.
How to do it...
- We will connect to an Excel data file called
DimCustomer.xls
and import it into Tableau's internal data store mechanism. To do this, navigate to Data | Connect to Data. - Then, select the link Text File and a file browser will appear. Navigate to the folder where you stored the downloaded Excel and CSV files.
- Navigate to the
DimCustomer.csv
file and select it. - In the Excel Workbook Connection dialog box, change the Step 4 name so that it reads
Connection_DimDate
. - You can now see the new data source connection on the Data pane.
- Click on Ctrl + M to get a new worksheet in Tableau. Alternatively, go to the Worksheet menu item and select New Worksheet.
- Click on the Connection_DimDate source and drag Number of Records from the Measures pane to the Rows shelf.
- Drag the NumberChildrenAtHome dimension attribute to the Tableau canvas, which is to the left of the
NumberOfRecords
column. You can see an example in the next screenshot. - When we look at the NumberChildrenAtHome dimension, we see the following members and the number of customer records associated with each member:
- We will group the dimension members so it is easier to see which customers have children and which do not.
- Go to the dimension called NumberChildrenAtHome, right-click on it, and select the Create Group option as shown:
- This produces the Create Group dialog box, which you can see in the following screenshot:
- In the Field Name field, enter the name of the field.
- Multiselect the numbers 1 through to 5 by holding the Shift key and clicking to select more than one number at a time.
- Click on Group, rename the group to
CustomersWithChildren
, and click on OK. - You can then see the new group on the left-hand side on the Dimensions pane. Drag your new grouping to the Rows shelf.
- Then, choose Number of Records from the Measures pane and put it into the Columns shelf. You can see that the table now only has two rows in it: one that consists of a zero and another that has CustomersWithChildren.
- In order to make things clear, click on the zero and select Edit Alias.
- Rename the zero to
Customers with No Children
. - You can also rename the Group alias to
Customers With Children
so that it matches the format used elsewhere. To do this, select the numbers 1 to 4 in the table and right-click on them. - In the pop-up menu, select Group.
- Right-click on the new group and select Edit Alias.
- In the textbox, enter
Customers With Children
and click on OK. - It is now clear from the table that over 11,000 customers have no children, whereas just over 7,000 customers do have children. We can visualize this information in a better way, and we will do this for the rest of the exercise.
- To do this, change the visualization to a heat map using the Show Me panel.
- We can then change the colors by dragging Number of Records to the Color panel on the Marks shelf.
- In order to show the difference between the customers who do and do not have children, the custom diverging color palette has been selected here.
- Two diverging colors of orange and blue are selected. Orange is selected for Customers With No Children because there is a greater number of customers who do not have children; brighter and more intense colors are often used in order to denote higher values, as shown in the following screenshot:
- To do this, stay in the
Chapter 2
workbook and continue to work in the existing worksheet. - Right-click on the SalesAmount measure.
- Select the measure Create Calculated Field that is illustrated in the following screenshot:
- When you select this option, you will get the following dialog box:
- In the Name field, enter the name of the calculated field:
SalesAboveOrBelowAverage
. - In the Formula field, we will put in a formula that will calculate whether or not the sales amount is above or below the average amount. The formula is as follows:
IF ( SUM([SalesAmount]) - WINDOW_AVG(SUM([SalesAmount]), First(), Last() ) < 0 ) Then 'Below Average' Else 'Above or Equal To Average' END
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
- Once you've placed the formula into the calculation editor and clicked on OK, you will be returned to the main Tableau interface. You will see your new calculation on the left-hand side in the Measures pane.
- Drag the SUM(SalesAmount) measure to the Columns shelf.
- Drag NumberCarsOwned from the Dimension pane to the Rows shelf.
- Drag your new calculation SalesAboveOrBelowAverage to the Color button on the Marks shelf.
- Your screen should now look like the following:
To summarize, we have created a calculation that is meaningful to a business user. It provides the color display of the measure, which helps the business user understand things more efficiently. To summarize, it is simple and effective to conduct a grouping of dimension members into a binary grouping. This is useful for dashboards in order to provide an "at a glance" metric visualization that shows the organization has more customers who do not have children than those who do.
Essentially, this formula uses the WINDOW_AVG
function to work out the average of the values that are in the Tableau view of the data. Basically, this average works out the value of the data that is viewable in the Tableau canvas and does not include data that has been filtered.
It uses First()
and Last()
to work out the average of all the data from the first row right until the last row. The calculation takes the current SalesAmount value and compares it with the average SalesAmount value.
How it works...
Tableau allows you to group data together by simply arranging fields of your data source on a Tableau worksheet.
When you group fields in a worksheet, Tableau queries the data using standard drivers and query languages (such as SQL and MDX). It then groups data together wherever necessary. Finally, it presents a visual analysis of the data.
Correlation with calculations
Data visualization is all about communicating a message using data. In the first chapter, we specified communication as one of the key features of a dashboard.
The problem with data visualization is that people don't always like what visualization tells them about their business. It can defy commonly held assumptions about a business that goes against the grain of what people believe. This can be particularly uncomfortable if people have been with an organization for a long time and perhaps have not changed their perspective as the business moved ahead.
In this recipe, we will look at how we can use correlation to test a hypothesis and then display this information so that the message of the data is easy to understand.
Getting ready
We will continue to use the same worksheet as we have used in the previous recipes of this chapter.
How to do it...
In this recipe, we will use a real-life example where a business analyst wants to know if people with more cars spend less on bikes. We will test whether there is a correlation between the number of cars owned by a customer and how much money is spent on bikes. Since the AdventureWorks store is selling bikes, there is an assumption that people who own fewer cars will spend more on bikes. However, this hypothesis would need to be tested, and data visualization can help us "sense-check" the data and see patterns easily. Perform the following steps to create correlations in Tableau:
- Open up the
Chapter 2
workbook and create a new worksheet by pressing Ctrl + M, or by going to the Worksheet menu item and then selecting New Worksheet. - Rename the tab
Correlation with Calculations
. - Take the dimension named NumberCarsOwned from the
DimCustomer
table and drag it on to the Rows shelf. Change the measure to Count. - Take the SalesAmount measure from the FactInternetSales table and place it on the Columns shelf.
- Take the NumberCarsOwned member from the DimCustomer dimension and place it on the Color attribute in the Marks shelf.
- Next, we will add a trend line order to convey the relationship between the sales amount and the number of sales. To do this, go to the Analysis menu item and the Trend Lines menu item and select Show Trend Lines. You can see this in the following screenshot:
- The visualization now appears as follows:
- The image needs some more detail in order to clarify the message of the data.
- We can add labels to the data points that will help us identify the data points more clearly. To do this, we can drag the NumberCarsOwned dimension to the Label option in the Marks shelf.
- The black trend line looks quite harsh, and we can soften it by turning it into a light gray color. This would still get the message of the data across, but without attracting too much attention away from other pieces of the image.
- To change the image, right-click on the trend line and you will get the following menu:
- Select Format and the following menu item will appear:
- We can soften the trend line by changing the width. To do this, click on the arrow and the following panel appears. This allows us to soften the trend line.
- The resulting visualization appears as you can see in the next screenshot:
- We can learn more about the trend line by selecting Describe Trend Model from the Trend Lines menu. This provides us with the following information:
- We can see the results of the analysis in the resulting window.
How it works...
Tableau provides a way to test hypotheses quickly and help business users to identify the relationships between variables. Tableau does this by simplifying the process of creating visualizations quickly and on-the-fly.
The visualization shows us that a linear correlation is produced, which shows how the sum of transactions or Number of Records is related to the sales amount.
We can see that the R-squared value is 0.92, and when we find the square root of this number, we will find that R = 0.95. This shows there is a very strong relationship between the two variables. We could then proceed to do some more analysis, but this would provide us with a great starting point.
There's more...
From a dashboard perspective, Tableau also helps business processes by allowing analysts to test assumptions that are perhaps long-held and subtle. When a visualization like this is placed with other visualizations that tell a similar story, then the dashboard becomes a valuable tool for promoting real business change.
Using cross-tabs flexibly
When we create dashboards, we are conveying a unified message of data that can be made up of moving parts. The context of the data can help to ensure that data pieces move together and are consistent with one another. This can mean that showing an aggregated number is not the most meaningful form of data. Instead, percentages or differences can be more meaningful.
In Tableau, we can create very simple calculations using table calculations. What is a table calculation? Table calculations are easy calculations that are provided by default as part of the Tableau interface. Table calculations are quick to create and are a powerful tool that can help to enhance your understanding of the data. From a dashboard perspective, we need to maximize the amount of information in a small space, and table calculations will help you to compress the message of the data while helping the data to be meaningful.
In this recipe, we will create a table calculation and see how it adds to the comprehension of the data in a small space. In this recipe, we will create a chart that shows the percentage of sales attributed to each region, rather than the number.
Getting ready
Open up the Tableau Chapter 2
workbook and start a new sheet by selecting Ctrl + M.
How to do it...
- To do this, go to the DimGeography dimension in the Dimensions pane, select the CountryRegionCode member, and place it on the Rows shelf.
- Then, go to FactInternetSales in the Dimensions pane, select SalesAmount, and drag it to the Columns shelf.
- Right-click on the SalesAmount blue pill on the Columns shelf and navigate to Quick Table Calculation. You can see the following menu item appear:
- For our example, we will choose Percent of Total. The SalesAmount figure will change, and you will see a small triangle appear on the right-hand side. This feature is illustrated as follows:
- Take the SalesAboveOrBelow Average calculation we created earlier and place it in the Color section.
- For the visualization, choose a horizontal bar chart from the Show Me panel. Your screen should be the same as the following screenshot:
- Now, it is important to note that the x axis labeling has changed to show % of Total SalesAmount. This is more accurate, but how could we make it more obvious to the data consumer that they are in fact looking at percentages without cluttering the "real estate" on the dashboard? We will look at these steps next.
- We can set up a variation on a heat map. From the Show Me shelf, select the heatmap option.
- We will use a label to convey the percentage. To do this, take SalesAmount from the Measures pane and drag it to the Label option. Make sure that you change the calculation so that it uses the Percentage of Total option, as we did previously in this recipe. The label will now read the percentage of the total rather than the actual value.
- Then, go back to the Marks shelf and select the drop-down list so that it shows Circle; after this, space the heat map so that it looks more even. The following is an example of the resulting chart:
To summarize, the data is shown in a very compressed way, which provides details as well as a visual message about the data in question. It's clear that the US and Australia regions have above average sales; if more detail is required, the percentages are also given.
How it works...
Tableau allows us to tell a story simply in a small space! The use of table calculations is a very simple way to enhance the data, both visually and also in terms of details. Sometimes, however, using the default table calculations are not enough to meet the business needs and custom calculations are required. This is the subject of our next recipe.
There's more...
The author of the Le Petit Prince books, Antoine de Saint-Exupéry, was once quoted as saying the following:
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
The simplicity of a design is often an asset in data visualization, and is better than adding more detail that deviates from the purpose of data visualization.
Edward Tufte coined a term called chartjunk, which he defined as follows:
"Chartjunk refers to all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph, or that distract the viewer from this information."
This is an important aspect of data visualization. When we are creating dashboards, we have to be careful about adding nonessential items. This is particularly important in situations where we have a small screen space, such as a dashboard or a mobile device.
Table calculations are useful because they help us maximize the space on the screen by providing more enhanced information and encoding business rules effectively. They help us get across essential business information, which is key to the visualization. If you are interested in reading more about chartjunk, it is recommended that you read The Visual Display of Quantitative Information, Edward Tufte (1983).
Simplifying your business rules with customer calculations
Dashboarding concentrates on representing data in a small space while still getting across key concepts in the data. It is important to use the space effectively. There should be a balance between representing too much information and representing not enough information.
Some people circumvent the issue altogether by avoiding data visualization or perhaps simply not requiring it for their roles. For example, you might run up against individuals who are not interested in the "pretty pictures" and want to see the numbers. How do you combine this requirement with other people's requirements to have a visual representation of the data?
Fortunately, there are a number of ways in which we can use data visualization techniques to convey the message of the data in a table. In itself, a table is a valid data visualization technique because tables are very good at representing detail. In particular environments, such as finance, it is the minute details within the table that make all the difference. In data visualization, we also have to cater to the people who just want the numbers as well as those who have a more visual requirement.
In this recipe, we will create a crosstab that has a custom calculation in it, which allows us to enhance the representation of the data for those who prefer a more visual approach; in addition to this, we'll also provision numbers for those who simply want a table.
In this example, we will use the calculation Index()
. This ranks members depending on their value. We can use the ranking calculation in order to restrict our data visualization so that it only shows the top three best performers. This is a concept that is key to many dashboards, and we will look at implementing this element in this recipe.
Getting ready
We will continue to work on the Tableau Chapter 2
workbook and start a new sheet by selecting Ctrl + M.
How to do it...
- Open up the Tableau
Chapter 2
workbook and duplicate the worksheet called Using Crosstabs Flexibly. Rename it toCustom Calculations
. - Click on the Connection_FactInternetSales data view and duplicate the SalesAmount measure by right-clicking on it and selecting Duplicate.
- Rename it to
Rank
and then right-click on it to see a pop-up menu. Select the option called Create Calculated Field. You can see an example in the following screenshot: - Once you have renamed the Calculated Field to
Rank
, click on it and choose Edit. - You will see the Calculated Field dialog box; an example of this dialog box is shown in the following screenshot:
- To use this calculation as a rank, simply type
Index()
in the Formula field. - Fortunately, Tableau provides a helpful guide to show you whether the formula is correct. Here, we can see that the calculation is valid. Once you have typed
Index()
in, click on OK and you are returned to the main Tableau interface. - Next, change the visualization to a table using the Show Me panel.
- Now drag the Rank calculated field to the Columns shelf. You will see that Rank appears as a number with two decimal points. To change the format, simply go back to the Measures pane and right-click on Rank. Under the heading Default Properties, you will see an option specified as Number Format. You can see this in the following screenshot:
- We will change the number format to an integer. When we select Number Format, we get the Default Number Format dialog box.
- Select the option Number (Custom), reduce the decimal places down to zero, and then click on OK.
- The Rank number will appear as an integer, and our data visualization appears as illustrated in the following screenshot:
Now, let's take the scenario whereby we want to display only the top three performers by region.
- Simply drag Rank from the Measures pane through to the Filters pane. When you do this, you will get the following dialog box:
Make sure you enter the number
3
on the right-hand side rather than6
. Although there are six options, this filter means that Tableau will only display the top three best-selling regions.
You can see the final data visualization in the following screenshot:
How it works…
Tableau offers us many interesting ways to compact the data down to its minimum design while still helping us to show the message of the data. Although the main visualization is very small, it helps us to use this element in a later dashboard because we have packaged a lot of information into a very small space.
The visualization shows the following:
- Top three performers
- The third performer is actually below average
- The overall percentage of sales as a number in order to provide numerical detail as well as some ways of visualizing the data so it provides "at a glance" information too
To summarize, making data tell a story is a challenge, particularly when there is not much space to play with. However, making data tell a story is fun with Tableau.
There's more…
Often, the column names in the data source are not meaningful to business users. They may need to be translated into something that is useful. Fortunately, Tableau allows you to use aliases to translate dimension names into something else.