Skip to Main Content
Banner Image

Data Viz Learning Group

  1. Create a visualization using the Cereal Data file.
  2. Create a visualization using some of the other data.

Steps in creating a visualization:

1. Examine the Data

  • Do you have all the data you need? Does it include all the variables that you are interested in?
  • Are there any obvious errors in your data? Is there any data that is missing?

2. Understand the Data Types

  • What type of data have you acquired?

  • What is the range of values for each type of data?

 

3. Transform for Quality

  • Do you need to clean up your data? Do you need to fix any errors or fill in any gaps in your data?

Transform for Analysis

  • Parsing (splitting up) and variables, such as extracting year from a date value
  • Merging variables to form new ones, such as creating a whole name out of title, forename, and surname
  • Converting qualitative data/free-text into coded values or keywords
  • Deriving new values out of others, such as gender from title or a sentiment out of some qualitative data
  • Creating calculations for use in analysis, such as percentage proportions
  • Removing redundant data for which you have no planned use (be careful though!) 

4. Find a Story

Below is a list of different types of data stories (and examples of each) that you can convey using your data:

"1. Measurement  (The simplest story — counting or totaling something)
‘Local councils across the country spent a total of $x billion on paper clips last year’

2. Proportion
‘Last year local councils spent two-thirds of their stationery budget on paper clips’

3. Internal comparison
‘Local councils spend more on paper clips than on providing meals-on-wheels for the elderly’

4. External comparison
‘Council spending on paper clips last year was twice the nation’s overseas aid budget’
Or there are other ways of exploring the data in a contextual or comparative way:

5. Change over time
‘Council spending on paper clips has trebled in the past four years’

6. ‘League tables’
These are often geographical or by institution, and you must make sure the basis for comparison is fair, e.g. taking into account the size of the local population.

‘Example Council spends more on paper clips for each member of staff than any other local authority, at a rate four times the national average’

Or you can divide the data subjects into groups:

7. Analysis by categories
‘Councils run by the Purple Party spend 50% more on paper clips than those controlled by the Yellow Party’

Or you can relate factors numerically

8. Association
‘Councils run by politicians who have received donations from stationery companies spend more on paper clips, with spending increasing on average by $100 for each pound donated’


But, of course, always remember that correlation and causation are not the same thing.
So if you’re investigating paper clip spending, are you also getting the following figures:
 

  •     Total spending to provide context?
  •     Geographical/historical/other breakdowns to provide comparative data?
  •     The additional data you need to ensure comparisons are fair, such as population size?
  •     Other data which might provide interesting analysis to compare or relate the spending to?"

 

This information was copied from http://datajournalismhandbook.org/1.0/en/understanding_data_5.html and shared under a Creative Commons Attribution-ShareAlike license

Breakfast Cereal

This dataset contains nutritional information for 77 different breakfast cereals. It was used for the 1993 Statistical Graphics Exposition as a challenge data set. We retrieved this data from StatLib at CMU. The data is from the nutritional labels and is in CSV format.

The variables are:

  • Cereal name;
  • manufacturer (e.g.,Kellogg’s);
  • type (cold/hot);
  • calories (number);
  • protein (g);
  • fat (g);
  • sodium (mg);
  • dietary fiber (g);
  • complex carbohydrates (g);
  • sugars (g);
  • display shelf (1, 2, or 3, counting from the floor);
  • potassium (mg);
  • vitamins and minerals (0, 25, or 100, respectively);
  • weight (in ounces) of one serving (serving size);
  • cups per serving.

Manufacturers are represented by their first initial: A=American Home Food Products, G=General Mills, K=Kelloggs, N=Nabisco, P=Post, Q=Quaker Oats, R=Ralston Purina.

More difficult tools:

Video