Importance of data visualization
Many people would say that knowledge sharing is one of the noblest things any human being can do. Aside from helping other people grow and be better, by sharing their knowledge people become happier, develop new professional or private connections and bring more purpose to their life. This is why the author of this blog post has always admired teachers and professors who are the most obvious examples of knowledge sharing. In a way, being a data scientist is all about knowledge sharing as well. Put in simple terms, data scientists extract the knowledge from raw data and share it with the world.
Knowledge sharing is certainly a complex concept and it can be described as anything but a universal and straightforward process. There are many different theories as to how children and people, in general, learn the easiest and the quickest. While the process undoubtedly varies from individual to individual, there is an idea that the use of visualizations in the learning process helps students to learn faster and better. Check out the following articles for a word or two about this: The Use of Visualization in Teaching and Learning Process for Developing Critical Thinking of Students and Enhancing Learning with Visualization Techniques. Speaking from our own experience, visualizations definitely bring more fun and diversity to the learning process. We would also say that they leave stronger traces in our brains when compared to the usual learning methods, which is why the resulting conclusions are often easier to access afterward.
Humans are highly visual creatures, sight is the sense they rely on the most and it brings them the largest share of information about the world around them. Because of this, it makes sense that interesting and concise visualizations could speed up the learning process. Also, it’s very important to keep the visualizations as simple as possible, the idea is to let them tell the story on their own, not to confuse the observers with unnecessary and redundant information. This is very well summed up in a quote by Ben Shneiderman: “The purpose of visualization is insight, not pictures”.
Now that we’ve stated how important visualization is, it’s time to introduce some context to this blog post. This is the newest post in the series about our Retail Business Intelligence Platform project. After finishing up the streaming part of this project, about which you can read in our posts about Kafka and Spark, the next step was to generate appealing visuals in order to present our findings. We decided to go with two widely used visualization tools: Tableau and PowerBI. But before we show you some visualizations, a few words need to be said about data visualization in general.
About data visualization
Data visualization can be more or less effective, depending on the results and types of visuals used. To get the most out of it, one should know which visual should be used in which situation. Also, it is important to note that one of the main perks of visualization is the opportunity to easily send a desired message to someone without the use of words. As we all know, words don’t always come easy and can be tiresome compared to nice colorful visuals.
However, sending a message with the help of visualization tools is anything but easy. In order to do it almost flawlessly, one would need to perfectly understand the data, the results, the human mind, and the laws of perception. Many factors should be taken into consideration when constructing visuals and, naturally, time and thought that has to be invested in the process grows with the complexity of visuals and the intricacy of the message one’s trying to send.
If you’re one for more looking at visuals than coming up with them, you need to be aware of how easily you can be manipulated by someone showing you only visuals that support his or her agenda. This shouldn’t surprise anyone who knows that statistics and data science are closely related, considering that statistics itself is very prone to manipulation. Famous American novelist Mark Twain has popularized a quote about the manipulative nature of statistics and attributed it to the Victorian era British prime minister Benjamin Disraeli. Benjamin has allegedly once said there are three kinds of lies: “lies, damned lies, and statistics.” This is a peculiarly sensitive subject as manipulation and lying don’t always have to be deliberate. Statisticians and data scientists can be subconsciously biased and can affect the final results and visuals unintentionally.
On the other hand, for those who create visuals one of the key pieces of advice is to keep it simple. People can get lost in trying to say too much with a simple visual, which can result in the viewer understanding nothing or getting a message that wasn’t intended. Of course, it’s not easy to make things simple and it takes a lot of experience and understanding. But it’s definitely worth all the effort because obeying simplicity keeps you from going astray by overcomplicating. Interactivity is very welcome as well. It changes the role of the viewer from a passive observer to an active participant. Essentially, it turns the process into an experience for viewers which makes it much more memorable and easy to understand.
Just a quick reminder, our project included the unit sales data of more than three thousand products in ten different Walmart stores throughout three different states. We’ve tried many different visuals and charts and here we offer you a glimpse of the few most interesting ones.
Figure 1 shows the 5 items with the highest average price for each department.
Figure 1: Top Five Most Expensive Items for Each Department
Figure 2 gives you two pieces of information at once. It shows the number of items sold and the total revenue for a certain item. Here we’ve picked item HOBBIES_1_345.
Figure 2: Total Items Sold and Total Revenue for item HOBBIES_1_345
The next visual is Figure 3 which gives an insight into the minimum, average, and maximum price of the item FOODS_1_033 throughout the weekdays. You can notice that a cell’s color depends on the average price for that day. Specifically, the days with lower average prices will have a lighter shade while the days with higher average prices will have a darker shade.
Figure 3: Min, Avg, and Max Price by Item
Figure 4 shows quarterly revenue by the department and it is also conditionally formatted. In other words, cells with higher revenue will be colored with a darker shade.
Figure 4: Quarterly Revenue by Department
Figure 5 is actually a map representing how much revenue did each of the states involved in the dataset generate. The conditional formatting is applied here as well.
Figure 5: Revenue by States
Finally, Figure 6 shows a stacked bar chart where you can see the number of yearly sold items by store.
Figure 6: Number of Items Sold Yearly by Store
To conclude, these are some of the most widely used and most popular Tableau visuals. Of course, there is the possibility of using customized and modified visuals. Keep in mind these are just screenshots of individual sheets that can be combined into larger dashboards where charts can interact with each other. If you want to see how awesome some of these visuals end up being, check out the official Viz of the Day Tableau site where every day a new cool visual is shared with the community.
Just like with Tableau, we’ve tried a lot of visuals in PowerBI, a Microsoft Data Visualization tool. Generally, there are some differences between Tableau and PowerBI, but these two are very similar visualization tools. In most cases, you wouldn’t make a mistake going with either one of them.
The first visual we’ve generated with the help of PowerBI, a stacked bar chart that shows revenue by quarter and year, is shown in Figure 7.
Figure 7: Revenue by Quarter and Year
The next one is an interesting visual that shows the average price by each category available in the dataset. Not only does it show the average price for categories, but it also shows the ratio of these categories in respect to each other. It can be seen in Figure 8.
Figure 8: Average Price by Category
Figure 9 shows the usual line chart representing the number of items sold by month and year.
Figure 9: Number of Items Sold by Month and Year
Sales growth is an interesting key performance indicator, it shows how revenue has changed over a fixed period of time. Figure 10 shows yearly sales growth for Valentine’s Day, a holiday that occurs every year on the 14th of February.
Figure 10: Yearly Sales Growth for Valentine’s Day
Some specific days can cause a spike in sales for some specific items. Figure 11 shows the 5 items with the highest revenue on the day when the Super Bowl, a final playoff game of the NFL, is played.
Figure 11: Items with the highest Revenue for Super Bowl
Figure 12 shows the top 5 best-selling stores for item FOODS_1_308 in terms of the number of items sold. Also, you can see a dashed vertical line that shows the average amount of items sold in a store for the chosen item.
Figure 12: Top 5 Best Selling Stores for item FOODS_1_308
Finally, the last chart, which can be seen in Figure 13, shows the sales growth, a KPI we’ve already mentioned. However, this time it doesn’t show sales growth for a certain date or holiday. Now it shows sales growth across the entire period of time for an item. More specifically, for item FOODS_1_308.
Figure 13: Sales Growth for item FOODS_1_308
Just like with Tableau, keep in mind these are only individual visuals that can be combined to create dashboards. These dashboards can be interactive and, again just like with Tableau, individual visuals can interact with each other.
Step by step, visual by visual, we’ve come to the end of this brief visualization overview. We wanted to show you these 13 charts, 6 made in Tableau and 7 made in PowerBI. Hopefully, you’ve got an idea of how to use these and similar charts, but also an inspiration to create and generate wonderful and interesting new charts.
In this blog post we’ve talked about the importance of knowledge sharing and the part that visualization plays in it. Furthermore, we’ve said a few words about data visualization: what should be kept in mind when creating and looking at visuals. In the rest of the blog post, we’ve shown you 13 unique charts, each of them sending a different message.