(For more resources related to this topic, see here.)
People are the only active element of data visualization, and as such, they are the most important. We briefly describe the roles of several people that participate in our project, but we mainly focus on the person who is going to analyze and visualize the data.
After the meeting, we get together with our colleague, Samantha, who is the analyst that supports the sales and executive teams. She currently manages a series of highly personalized Excels that she creates from standard reports generated within the customer invoice and project management system. Her audience ranges from the CEO down to sales managers. She is not a pushover, but she is open to try new techniques, especially given that the sponsor of this project is the CEO of QDataViz, Inc.
As a data discovery user, Samantha possesses the following traits:
She has a stake in the project’s success or failure. She, along with the company, stands to grow as a result of this project, and most importantly, she is aware of this opportunity.
She is focused on grasping what we teach her and is self-motivated to continue learning after the project is fi nished. The cause of her drive is unimportant as long as she remains honest.
She understands that data is a passive element that is open to diverse interpretations by different people. She resists basing her arguments on deceptive visualization techniques or data omission.
She does not endanger her job and company results following every technological fad or whimsical idea. However, she realizes that technology does change and that a new approach can foment breakthroughs.
She loves finding anomalies in the data and being the reason that action is taken to improve QDataViz, Inc. As a means to achieve what she loves, she understands how to apply functions and methods to manipulate data.
She is familiar with the company’s data, and she understands the indicators needed to analyze its performance. Additionally, she serves as a data source and gives context to analysis.
She respects the roles of her colleagues and holds them accountable. In turn, she demands respect and is also obliged to meet her responsibilities.
Our next meeting involves Samantha and Ivan, our Information Technology (IT) Director. While Ivan explains the data available in the customer invoice and project management system’s well-defined databases, Samantha adds that she has vital data in Microsoft Excel that is missing from those databases. One Excel file contains the sales budget and another contains an additional customer grouping; both files are necessary to present information to the CEO.
We take advantage of this discussion to highlight the following characteristics that make data easy to analyze.
Ivan is going to document the origin of the tables and fields, which increases Samantha’s confidence in the data. He is also going to perform a basic data cleansing and eliminate duplicate records whose only difference is a period, two transposed letters, or an abbreviation.
Once the system is operational, Ivan will consider the impact any change in the customer invoice and project management system may have on the data. He will also verify that the data is continually updated while Samantha helps con firm the data’s validity.
Ivan will preserve as much detail as possible. If he is unable to handle large volumes of data as a whole, he will segment the detailed data by month and reduce the detail of a year’s data in a consistent fashion. Conversely, he is will consider adding detail by prorating payments between the products of paid invoices in order to maintain a consistent level of detail between invoices and payments.
An Excel file as a data source is a short-term solution. While Ivan respects its temporary use to allow for a quick, first release of the data visualization project, he takes responsibility to find a more stable medium to long-term solution. In the span of a few months, he will consider modifying the invoice system, investing in additional software, or creating a simple portal to upload Excel files to a database.
Ivan will not prevent progress solely for bureaucratic reasons. Samantha respects that Ivan’s goal is to make data more standardized, secure, and recoverable. However, Ivan knows that if he does not move as quickly as business does, he will become irrelevant as Samantha and others create their own black market of company data.
Ivan is going to make available manifold perspectives of QDataViz, Inc. He will maintain history, budgets, and forecasts by customers, salespersons, divisions, states, and projects. Additionally, he will support segmenting these dimensions into multiple groups, subgroups, classes, and types.
We continue our meeting with Ivan and Samantha, but we now change our focus to what tool we will use to foster great data visualization and analysis. We create the following list of basic features we hope from this tool:
We should be able to learn the tool quickly and be able to deliver a first version of our data visualization project within a matter of weeks. In this fashion, we start receiving a return on our investment within a short period of time.
Samantha should be able to continue her analysis with little help from us. Also, her audience should be able to easily perform their own lightweight analysis and follow up on the decisions made.
Ivan should be able to maintain hundreds or thousands of users and data volumes that exceed 100 million rows. He should also be able to restrict access to certain data to certain users. Finally, he needs to have the confidence that the tools will remain available even if a server fails.
Based on these expectations, we talk about data discovery tools, which are increasingly becoming part of the architecture of many organizations. Samantha can use these tools for self-service data analysis. In other words, she can create her own data visualizations without having to depend on pre-built graphs or reports. At the same time, Ivan can be reassured that the tool does not interfere with his goal of providing an enterprise solution that offers scalability, security, and high availability.
The data discovery tool we are going to use is QlikView, and the following diagram shows the overall architecture we will build and where this article focuses its attention:
In this article, we learned about People, data, and tools which are an essential part of creating great data visualization and analysis.
Further resources on this subject:
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…