# Data (10/8)

Slides

## Reading materials for this class

Required:

Recommended:

## Types of Data

There are many kinds of data out there. How can we make sense of
them? A useful exercise is devising a categorization system for
datasets. By identifying general categories and patterns of data, we
can systematically apply appropriate visualization techniques to
datasets.

For instance, we can categorize the main types of data as
following:

- Ordered
- Ordinals
- Quantities
- Ratios
- ...

- Categorical (nominal)

Or we can use different ways such as:

- Temporal
- Spatial
- Topical
- ...

We can also think about the relationships in the dataset:

- Tabular (relational)
- Networks
- ...

Often, one can choose appropriate, effective visualization techniques
based on the types of data.

## One dimensional quantitive data

Let's start with, probably the most simplest type of data: a bunch
of numbers. It is very common. For instance, if you measure a quantity
for a given population, such as heights, weights, or incomes, we get a
bunch of numbers.

What will be the most direct way to visualize this type of data?
Probably the scatterplot, which shows all the data points directly.

What's the problem with this method? Obviously the scatter plot may
hide lots of information when there are lots of points. There are
several ways to alleviate this problem. (1) Using vertical axis to add
noise to each point and spread them out vertically; (2) using empty
symbols; or (3) using transparency.

However, all of these methods are not ultimate fix for the
underlying problem: *we can show only a limited number of
points.*

The most common way to solve this problem is segmenting and
summarizing data, namely *histograms* and *box
plots*.

## References