Chapter 11
Univariate Analysis
Modified: 2008-04-10
- I. Descriptive versus inferential statistics
- Descriptive
- A descriptive statistic is a number, numbers, or graph that conveys a characteristic of a sample or population. Descriptive statistics of samples carry an intuitive degree of uncertainty with them, but they do not provide a measure of that uncertainty.
- Inferential
- Inferential statistics are techniques that use sample data and probability to arrive at conclusions about populations. Inferential statistics help generate conclusions, and they also provide a measure of the uncertainty that goes with the conclusion.
- II. Types of statistics
- A. Univariate--distributions consisting of one variable
- B. Multivariate--distributions consisting of two or more variables
- 1. Bivariate--distributions consisting of ONLY two variables (special case of multivariate)
- III. Computer software
- A. Common packages
- B. Spreadsheets
- C. Database management system--way to organize large amounts of information using a computer program (database)
- 1. Relational database--links files together
- 2. Flat database--does not link files
- 3. Common packages
- D. Geographic information system--relational database
that can show geographical relationships
- 1. Analytic mapping--ability to relate database fields to maps
- 2. Common packages
- IV. Presenting data--Tufte and chartjunk (excess graphical elements e.g., K.I.S.S.)
- A. Tabular presentation
- Tables and Figures may be placed anywhere in a research report, but they are often most appropriate in the Results section. The APA Manual recommends that table be used for “crucial data that are directly related to the content of your article and to trim text that would otherwise be dense with numbers” (p. 147).
- According to the APA Publication Manual, a figure is any “chart, graph, photograph, drawing, or other depiction,” (p. 176). Figures should be used to display global patterns of results, statistical interactions, nonlinear relationships, and conceptual details that are difficult to capture using words. Computer software may be used to prepare figures for publication or they can be prepared by hand. Regardless of how figures are prepared they should be camera ready.
- 1. Array--do not usually appear except as appendices
- 2. Frequency distribution
- a. Relative frequency
- b. Class intervals
- 3. Cumulative frequency distribution
- B. Visual presentation of data
- 1. Elements
- a. Show the data
- b. Entice the reader to think about the information
- c. Avoid distortion
- d. Make large datasets coherent
- e. Encourage the eye to compare pieces of data
- f. Serve a clear purpose
- g. Enhance statistical and verbal descriptions of the data
- 2. Formats
- a. Tables
- b. Bar graphs (space between bars) and histograms (no space between lines)
- DV is usually on Y-axis but can be displayed on X-axis
- c. Pie charts
- Use pie charts sparingly because they require that users interpret both angles and areas displayed, a more difficult task than interpreting the two-dimensional position of a point in a scatterplot (Pittenger, 1995). However, flat pie charts are useful for conveying information about proportions and can be very clear.
- d. Line graphs
- Most graphs are created with statistical software packages that include 3-D tools and shading. Be careful. Those tools can lead to graphical overkill and actually make your graph less understandable. Avoid the temptation to add computer-graphic elements such as shading and color. Tufte (2001) calls such additions that hinder interpretation, “chartjunk.” He advises that you keep your graphs simple and elegant. Here are some rules to follow for making graphs that are based on the recommendations of Pittenger (1995) and Cleveland and McGill (1985):
- Summary of Graphing Rules
- Line graphs are favored over other types (but not with discontinuous data).
- Graphs with common scales are favored over graphs with multiple scales.
- Two-dimensional are favored over three-dimensional.
- Bar graphs can becoming confusing when multiple independent variables are used.
- Avoid pie charts and area graphs.
- Avoid the temptation of adding computer-graphic elements such as shading and color.
- It is tempting to crowd as much data as possible on one graph. Better to make more graphs, each with its own story to tell. A simple graph depicting one dependent variable is clearer than a graph that shows multiple dependent variables.
- Line graphs are the easiest to read and to interpret. However line graphs are not appropriate for category, qualitative, or dichotomous data. Use bar graphs instead.
- (1). Frequency polygon--similar to histogram, except using line
- (2). Time series--variable(s) are displayed as line(s) over time variable
- V. Quantitative measures
- A. Percentage or relative frequency distribution--very useful for comparing groups
- B. Proportion--like percentage, but not multiplied by 100
- C. Percent change
- percent change = ((N2 - N1)/N1) x 100
- where N1 is the earlier value and N2 is the later value
- D. Ratio--frequency of one variable compared to frequency of another (e.g., deaths per 1000 of population)
- E. Rates--useful for comparison
- rate = (N1/N2) x (base number)
- 1. Special uses of rates: health care
- Morbidity--measures disease frequency
- Incidence--number of people affected by a disease over time (e.g., a year). Longitudinal, one year only
- Prevalence--includes ALL people affected. Cross-sectional, total of all affected individuals
- Mortality--deaths per population unit over time
- VII. Characteristics of a distribution
- A. Measures of central tendency
- 1. Defined--descriptive statistic which summarizes the centrality of a distribution
- 2. Mode
- The mode is the score that occurs most frequently
- 3. Median
- The median is a point that divides a distribution into an upper half of larger scores and a lower half of smaller scores. Stated another way, the median is the 50th percentile.
- 4. Arithmetic mean
- The mean (which is also known as the arithmetic average) is the sum of the numbers divided by the number of numbers.
- 5. Geometric means
- 6. Selecting appropriate measure of central tendency--use medians for skewed distributions (why?)
- a. Level of measurement
- b. Nature of distribution
- c. Information desired
- B. Measures of variation and dispersion
- 1. Defined--descriptive statistic which describes the dispersion of a population
- 2. Range
- The simplest measure of variability is the range, which is the distance from the highest score to the lowest score.
- 3. Interquartile range
- The interquartile range (IQR) is also a statistic of two numbers. Between these two numbers lie the middle 50 percent of the scores. That is, half the scores are in the interquartile range. The upper bound (the larger number) of the IQR is the 75th percentile score and the lower bound (smaller number) is the 25th percentile score.
- 4. Standard deviation and variance
- SD is square root of Variance
- SD is most commonly used measure of variation (see normal curve, below)
- 5. Average deviation--not used much anymore
- 6. Median absolute deviation--used in some cases (average deviation of cases from median)
- 7. Standard deviation and the normal curve
- The mean of the normal curve is the score that corresponds to the peak of the curve. Of course, the peak of the curve indicates the most frequently occurring score, so the mode is the same score as the mean. Note that the normal curve is symmetrical; half of it is to the right of the mean and half to the left. Thus, the median is the same score as the mean and the mode.
- The value of knowing about SD units on the normal curve is that there is a mathematical relationship between the SD units and the proportions of the curve. Thus, for a score that is two SD’s below the mean, only about 2 ½ percent of the participants have lower scores. Similarly, a score 1 SD above the mean has 84 percent of the participants with lower scores (50 % + 34 % = 84 %).
- There are two percentages of the normal curve that you will encounter frequently. One is that about 95 percent of the scores lie between a score that is 2 SD below the mean and one that is 2 SD above the mean. Thus, 2.5% of the curve is above +2 SD and 2.5 % is below -2 SD. (The exact number of standard deviations that separate the most extreme 2.50 percent of a normal curve from the rest is 1.96.)The other commonly encountered percentage is that 68 percent (about 2/3) of normally distributed measures fall between -1SD and +1SD.
- Relation of the Normal Distribution and the Standard Deviation

- In any normal distribution there is a relationship between the proportion of cases in between + or - each standard deviation from the mean. (See the graphic, The Normal Curve, to see the normal curve and the relationship between it and the standard deviation.) Here is that relationship:
- + or -1 standard deviation--68.26%
- + or -2 standard deviations--95.44%
- + or -3 standard deviations--99.74%
- So, if you can assume a normal distribution, and if you know the standard deviation and the mean, you can get a good idea of the degree of variability within that distribution.
- a. Normal curve defined
- b. Standard scores (z-scores)
- VIII. Exploratory data analysis
- Explore the data using frequency distributions and graphs
- Calculate descriptive statistics
- Calculate an effect size index
- Analyze the data with appropriate NHST techniques
- At every step analyze the experiment for uncontrolled extraneous variables
- A. Trimmed means--be careful with these!
- B. Box plot
- A boxplot presents precise information about central tendency, variability, and skew in one graph. A basic boxplot shows the median, range, interquartile range, and skew. Boxplots can also show the mean, outliers and other characteristics of a distribution.

- Outliers--a very high score that is not representative of the distribution (usually + or - 1.5 IQR from 25 and 75 percentiles, respectively
- C. Tukey five-number summary
- minimum
- maximum
- median
- first quartile
- third quartile
- IX. Chapter summary
- X. Appendix 11.1: Statistical calculations
- XI. Appendix 11.2: Data preparation
Web Pages
Statistical Primer for Psychology Students
Introduction to statistics covers descriptive, correlational, and inferential statistics.
http://www.mhhe.com/socscience/intro/cafe/common/stat/
Exploratory Data Analysis
Site from National Institute of Standards explains exploratory data analysis in detail.
http://www.itl.nist.gov/div898/handbook/eda/eda.htm
Wikipedia on Outliers
Wikipedia’s page discusses outliers and their computation.
http://en.wikipedia.org/wiki/Outlier
Central Tendency
Presents the mean, median, and mode and suggests when to use each properly.
http://www.quickmba.com/stats/centralten/
Dispersion
Presents the range, average deviation, and standard deviation and suggest when to use each properly.
http://www.quickmba.com/stats/dispersion/
Wikipedia on Standard Deviation
Wikipedia’s page on standard deviation explains concept and gives examples.
http://en.wikipedia.org/wiki/Standard_deviation
Interquartile Range
Short page defines the interquartile range and links to methods of graphing distributions.
http://mathworld.wolfram.com/InterquartileRange.html
Standard Error of the Mean
Gives the formula for calculating the standard error of the mean.
http://davidmlane.com/hyperstat/A103735.html
Confidence Intervals
Interactive Java applet displays 95% and 99% confidence intervals from a population with a mean of 50 and a standard deviation of 10 for 100 samples with sample sizes of 10, 15, or 20.
http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/
Frequency Polygons
Introduces frequency polygons and shows creation and use.
http://cnx.org/content/m10214/latest/
Histogram
Interactive page displays histograms for several user-selectable distributions.
www.shodor.org/interactivate/activities/histogram/
Skewness
Detailed page on skewness defines the concept and displays formulas for calculating skew in a set of distributions.
http://mathworld.wolfram.com/Skewness.html
Boxplots
Explains the creation and interpretation of boxplots.
http://www.netmba.com/statistics/plot/box/
The Link Between Error Bars and Statistical Significance
Brief tutorial explains how to estimate statistical significance graphically when error bars for the standard error of the mean are added to graphs.
http://www.graphpad.com/articles/errorbars.htm
Create A Graph
Site for kids from the National Center for Education Statistics allows users to create bar, line, area, pie, and XY graphs.
http://nces.ed.gov/nceskids/createagraph/
Line Graphs and Scatterplots
Short tutorial covers scatterplots, line graphs, and provides hints on creating them using Excel.
http://www.ncsu.edu/labwrite/res/gh/gh-linegraph.html
Wikipedia on the Normal Curve
Wikipedia’s page on the normal curve details the history, properties, and uses of the normal curve.
http://en.wikipedia.org/wiki/Normal_distribution
Sampling Distribution of the Mean
Tutorial module covers the sampling distribution of the mean and graphically shows how the distribution of a sample means approaches a normal distribution.
http://cnx.org/content/m11131/latest/
Effect Statistics
Tutorial covers effect statistics or ways of measuring differences between means.
http://www.sportsci.org/resource/stats/index.html
Effect Size Calculators
Page calculates Cohen’s d and the effect size correlation using means and standard deviations or for the independent groups t test.
http://web.uccs.edu/lbecker/Psy590/escalc3.htm
Correlation Coefficient Calculation
Page provides example of how to graph and calculate a correlation coefficient.
http://helios.bto.ed.ac.uk/bto/statistics/tress11.html
Exploring and Describing Categorical Variables
Web pages
Displaying Tables of Percentages
Page shows how to convert tables of raw data into pages using percentages and provides guidelines.
http://www.childrens-mercy.org/stats/model/descriptive/percentage.asp
Bar Graphs
Interactive page displays bar graphs for several user-selectable datasets.
http://www.shodor.org/interactivate/activities/BarGraph/
Phi Coefficient
Defines the phi coefficient and gives examples of how to calculate it.
http://www.childrens-mercy.org/stats/definitions/phi.htm
Exploring and Describing Ranked Data
Article
Thompson, G. L. (1992). Exploratory graphical techniques for ranked data, Defense Technical Information Center. [Abstract]. Retrieved December 12, 2006, from http://www.childrens-mercy.org/stats/definitions/phi.htm
Back to Main RMPA Page