There are many kinds of data out there. How can we make sense of them? A useful exercise is devising a categorization system for datasets. By identifying general categories and patterns of data, we can systematically apply appropriate visualization techniques to datasets.
For instance, we can categorize the main types of data as following:
Let's start with, probably the most simplest type of data: a bunch of numbers. It is very common. For instance, if you measure a quantity for a given population, such as heights, weights, or incomes, we get a bunch of numbers.
What will be the most direct way to visualize this type of data? Probably the scatterplot, which shows all the data points directly.
What's the problem with this method? Obviously the scatter plot may hide lots of information when there are lots of points. There are several ways to alleviate this problem. (1) Using vertical axis to add noise to each point and spread them out vertically; (2) using empty symbols; or (3) using transparency.
However, all of these methods are not ultimate fix for the underlying problem: we can show only a limited number of points.
The most common way to solve this problem is segmenting and summarizing data, namely histograms and box plots.