browser icon
You are using an insecure version of your web browser. Please update your browser!
Using an outdated browser makes your computer unsafe. For a safer, faster, more enjoyable user experience, please update your browser today or try a newer browser.

Thinking about numerical data

While the arts don’t lend themselves to numerical data enabling international comparisons, the societies that produce those arts can be quantified. You can’t rely on one number to characterize a whole society. But if you have enough numbers, you can start to see patterns from which you can make inferences about those societies.

On the sidebar menus for Parts II, III, and IV of this course, the last menu item takes you to a page with tables of comparable quantitative data. If you want to include some of this data in your final essay, you can’t just copy the data and let it speak for itself. You have to explain what it tells us.

Use this spreadsheet to analyze the data.

First – outliers and extremes

An outlier is a data point that is distant from the others. Some apparent outliers are uncleaned (“dirty”) data. If the data point is clean but not too distant, it can be considered the extreme end of the range. The data about the countries on these tables are not time series (like accounting data, for example). Look at each column:

  • what countries have the highest and lowest numbers or scores?
  • what countries cluster together?

Second – patterns

    Make some line graphs of individual rows (countries) and use a different color for each country. What visual patterns do you see?

  • Look for lines (countries) that have similar shapes
  • look for lines that have opposite or mirrored shapes.
  • Look for lines that are close together.
  • Look for lines that are far apart.

The default sort is column A: country name. Sort on every other column to get the rank order of the eleven countries that we are studying. Is your country more toward the top or bottom or middle of these column sorts? What about the other countries?

Third – correlations

Before you can write about this data in an essay, you need to have a set of correlated data points that tell your story.outliers_matter

  • dependencies — Which numbers depend on each other? That is, a change in one creates a change in another or others.
  • trade-offs — An improvement in one number causes its opposite in another number. Are countries that are high on one measure low on another?

Discussing these correlations will be the bulk of your analysis and characterization of these countries and aspects of their society. As you see the correlations, you begin to see the story of your country. Some correlations will be irrelevant to that story. Others will tell it in pictures, that is, graphs or charts. The text of your essay will explain those pictures.

Don’t look at the graphs and charts as an extra, optional add-on. Look at them as THE way to tell your story.

Examples: What correlates with wealth? Rich countries are also …… What correlates with health? Healthy countries are also ….. What correlates with treatment of women and children?

These will rarely be one-cause, one-effect correlations. Countries are complex systems.

Fourth – correlations in subsets of data

    How are you going to structure your story? That is, in what order are you going to discuss the correlations?

  • by rows? that is, by country or cluster of countries? For example, the rich countries and the poor countries?
  • by columns? that is, my demographic measure or group of measures?

After you decide the structure, as with the patterns above:

  • Make a separate spreadsheet with only the data for each subset by copying your original onto a new sheet.
  • Rename the sheet for the sort.
  • Do the sorting, the same sorts for each group.
  • Make the new set of graphs, that is, run all your correlations on each subset.

Don’t be confused by the terms positive and negative in the graphs on the right. They simply describe the slope and say nothing about what might be good or bad for these countries or their people.

Tips: Save, save, save. Make new spreadsheets rather than pile up too many graphs on any one sheet.

Fifth – causation


Example of a correlation that is not a cause

This is what you’re after. What caused your country to be the way it is? Don’t look for a single bullet — one cause. Look for a complex set of causes. Some of them will show up in the tables. Others can’t be quantified, which is why you’re writing an essay rather than a report. Based on the data — and adding other things you’re learning about these countries — what caused your country to be what it is?

The idea here is that if you know the cause-and-effect relationships, you can more accurately and vividly characterize your country.

Clearly expressed cause-and-effect relationships and valid inferences from them will go a long way to helping your readers understand what you have learned.

The most famous dictum in statistics: correlation is not causation. In other words, just because you can find a relationship between two variables does not mean that the one that came first in time caused the second. Look at the graph on the right. Yes, as the number of pirates has decreased since 1820, average global temperatures have increased. Does it follow that if we increased the number of pirates to say, the 1860 number, we would also cool Earth? Of course not. Learn more: Wikipedia’s Correlation does not imply causation.

Certainly, any two variables in a cause-and-effect relationship are correlated. Thus, correlation can show you potential causation. But to prove causation, you must do more than just correlate. This is not a statistics course, however, so I’m not asking you to run any of those tests on your data. Use sentences and paragraphs to write out your ideas.