Research hypothesis. A research hypothesis is a working expectation of what the data will show. It is sometimes referred to as a working hypothesis or the experimental hypothesis.
null hypothesis Usually a hypothesis that there is no relationship or that population means are equal.
Null Hypothesis Statistical Testing (NHST) Logic
In this section, we explain null hypothesis statistical testing (NHST) using a general vocabulary that applies to all NHST tests and not just chi square. NHST begins with two hypotheses about the population the data are from and a criterion for deciding between the two. One hypotheses is the null hypothesis (symbolized H0) and the other is the alternative hypothesis (symbolized H1). The criterion is a probability value (symbolized alpha, α). The sample data are analyzed with a statistical test that produces a probability figure, p. If the probability p is equal to or less than α, H0 is rejected and H1 is accepted. Once H0 is rejected, researchers write a conclusion about the relationship between (or among) the variables, basing this conclusion on descriptive statistics from the samples.
If the sample data result in rejecting H0, the data are said to be statistically significant. Statistically significant means that the null hypothesis was rejected when a NHST test was performed on the sample data.
The null hypothesis (H0) is always a statement about independence or equality in populations. H0 is rejected when the observed sample data are unlikely if H0 is true. But what probabilities qualify as unlikely? That is, what value should α have? The consequences of rejecting H0 or not rejecting H0 differ greatly in different kinds of research and some researchers can choose α based on considerations peculiar to their own research. Nevertheless, the general rule is that probabilities of .05 or less are considered unlikely and only those probabilities lead to rejection of H0. A phrase such as “p < .05” (or some lower probability figure) generally accompanies conclusions that are supported by an NHST test.
Type I Errors A NHST test with α = .05 assures us that some 5 percent of samples that actually come from a null hypothesis population will lead to a conclusion to reject H0. Of course, rejecting H0 in this situation is a mistake. Such a mistake is called a Type I error. Type I errors occur only when H0 is true. The probability of a Type I error is symbolized α. As indicated earlier, researchers typically set α at .05, although sometimes it is set at a smaller figure such as .01.
Type II Errors In many cases of research, the samples do not come from the null hypothesis population. In all of these cases, it is not possible to make a Type I error. If the samples don’t come from the null hypothesis population, the correct thing to do is to reject H0. That is, it would be a mistake if you did not reject H0. However, if the data produce a probability greater than .05, H0 will be retained. Such a mistake is called a Type II error. Type II errors occur only when H0 is false. The probability of a Type II error is symbolized with a Greek beta (β). Several factors control the value of β. We return to the topic of β in the section on power near the end of this chapter.
Table 6.7 Illustration of the types of errors that can occur in NHST testing
True situation in the population
|
|
H0 true |
H0 false |
The decision based on |
Reject H0 |
Type I error |
Correct decision |
sample data |
Retain H0 |
Correct decision |
Type II error |
In The Know Ronald A. Fisher introduced the null hypothesis concept in 1925 in an influential book written for practical research workers. His goal was to give researchers guidelines and the .05 cutoff was proposed as a guideline. Unfortunately, later writers and researchers treated it as a rule. The exclusive reliance on NHST techniques for analyzing quantitative data is under attack these days. In the 1990s there was a move to ban its use. The American Psychological Association (APA) assembled a task force that recommended that NHST not be banned, but that researchers not rely exclusively on NHST. The task force mentioned exploratory data analysis and confidence intervals as examples of alternatives to NHST. The APA report can be viewed at http://www.apa.org/science/bsaweb-tfsi.html. Other accessible explanations of the controversy and its outcome are in Dillon (1999) and Spatz (2000).
Power
Statistical power is about rejecting a false null hypothesis. Of course, rejecting H0 is usually the goal of researchers, so power is a topic of importance. If the populations are actually different (the null hypothesis is false), the sample data may lead the researcher to reject H0 (a correct decision) or retain H0 (a Type II error, the probability of which is β). Thus, the probability of a correct decision is 1 – β. Mathematically, power is defined as 1 - β. This means that the greater the power of a test, the higher the probability that it will detect that H0 is false. To put this in terms of t test values, if power is high the chance of a large t test value is high; if power is low, you shouldn’t be surprised at a small t test value.
How much power should a statistical test have? The generally accepted rule-of-thumb probability is that .80 is adequate power. However, unlike α, which is set by the researcher, power is the result of a combination of factors. Unfortunately, researchers can control only some of the factors. Four of the factors that control the power of a NHST test are:
1. Amount of difference between the populations. The greater the difference between the populations the samples come from, the greater the power of the statistical test to detect the difference. This factor is sometimes referred to as the degree of falseness of H0. In the context of an experiment, the greater the effect of the treatment on the experimental group population, the easier it is to detect that the treatment has an effect. Of course, you never actually know how different the population means are. After the experiment, however, you do have sample data and sample means. The difference between M1 and M2 is an estimate of the difference between the two population means.
To use this factor to your advantage in your research, choose levels of the IV that are far apart. For example, an experiment that compares strenuous exercisers with couch potatoes will probably be more powerful than one that compares moderate exercisers to occasional exercisers. Likewise, an experiment that compares dose levels of high and placebo will be more powerful than one that compares dose levels of medium and low.
2. Sample size. The larger the sample, the greater the power of the test.
To see how increasing N leads to larger values of t and thus to a better chance of rejecting H0, look again at the introduction to the independent-samples t test. The denominator, the standard error of a difference, consists of two (SEM)2 and
![]()
Thus, as N increases, SEM decreases. Smaller values of SEM produce larger t values and the larger the t value , the more likely you are to reject H0. You can use this factor to your advantage by increasing sample sizes.
3. Sample variability. Anything that reduces the variability in the data increases power. Look at the formula for the standard error of the mean above. Reducing the size of SD leads to a smaller SEM and thus, to larger t values. Researchers reduce the variability in their data a number of ways, including using exactly the same procedure with all participants, recruiting homogeneous participants, and using reliable tests. A sometimes overlooked method of reducing sample variability is to ensure that all scores are recorded correctly.
4. Alpha. The larger the value of α, the greater the power of the test. This is a relationship that is fairly easy to understand. Look at Figure 6.3. If the rejection region is increased from .01 to .05, the t values between 3.17 and 2.23 (and between -3.17 and -2.23) will all lead to rejection of the null hypothesis. Thus, increasing α from.01 to .05 increases the chance of rejecting H0. Because an α of .05 is usually the largest level of significance that others will accept, researchers should not increase α beyond .05 to increase the power of their tests.
The relationships among the five factors of power, degree of falseness of H0, N, SD, and α are such that once four factors are set, the fifth is determined. A power analysis consists of setting four values and solving for the fifth. The most common uses of a power analysis are to solve for sample size before the experiment is conducted and to solve for power after an experiment has produced differences that are not significant.
Cohen’s (1992) five-page primer discusses power and provides a table of sample sizes required for eight different NHST tests. Explanations of how to calculate power can be found in Spatz (2005, Chapter 16 on the CD), Howell (2004, Chapter 15), and Aron and Aron (2003, Chapter 8). For a short textbook on statistical power analysis, see Murphy and Myors (2004).
The question of which statistical test to use for a particular set of data depends on a number of considerations. One consideration is the kind of data. Category data, scaled data, and ranked data all require different tests. In addition, various characteristics of experimental design such as the number of independent variables and dependent variables, number of levels of the independent variable, and other considerations lead to different tests. Although the tests differ, every NHST test of sample data results in a probability figure. We turn now to the source of these probability figures.
Statistical significance of χ2 You studied the logic of chi square in an earlier section (pages XXX-XXX. Near the end of that section, we said that the NHST probability associated with the data in Table 6.1 was .036, a probability that is statistically significant. In this section we explain the formula that takes you from the data to its probability. The symbol for chi square is χ2. The formula is
![]()
where: O = observed frequencies (the observed data)
E = expected frequencies (statistically independent values)
The formula for χ2 has two elements, O and E. The observed values (O) come from the data, Table 6.1. The expected values (E) are those that are expected if the two variables are statistically independent. Calculation of the E values was shown in Table 6.4. The conventional way to combine these two elements to produce a χ2 value is with a table that begins with O and E values and proceeds through the arithmetic steps, that lead to a χ2 value. Table 6.8 is an example. Work your way across each of the columns in Table 6.8. Note that the sum of the right hand column is 4.40. One way to get to the probability figure that is associated with χ2 = 4.40 is to determine df and then use Table C in Appendix C. The df of any contingency table is determined by the formula
df = (R-1)(C-1)
where: R is the number of rows
C is the number of columns
In a 2 x 2 table, df = (R-1)(C-1) = (2-1)(2-1) = 1. Thus, every chi square 2 x 2 contingency table has one degree of freedom.
Independent-samples t test
The kind of experimental designs that call for an independent-samples t test are explained in Chapter 7. As with all NHST tests, an independent-samples t test gives you the probability of the difference that was observed, if the treatment has no effect on the scores. The formula for an independent-samples t test for two groups with equal Ns is
![]()
M1 – M2 is called the mean difference and
is called the standard error of a difference.
Elements of the independent-samples t test formula The numerator of the t test is fairly simple; the mean of one sample is subtracted from the mean of the other sample. Thus, the numerator becomes the difference between the two sample means. This difference may reflect a difference in the two populations the samples come from.
The standard error of a difference, the denominator, is more complicated; it is the square root of the sum of two standard errors that have been squared. As a reminder about standard errors of the mean, the formula is
![]()
Thus, a standard error of the mean depends on the variability of the sample scores (SD) and the size of the sample (N). Small values of SEM occur when there is little variability in the sample data or when the sample size is large (or both). The standard error of a difference is a measure of the variability in the data.
To summarize, the numerator of the independent-samples t test is a measure of the variability between the means and the denominator is a measure of the variability within the samples and of N. The concept of variability between means divided by variability within the samples is a recurring one in NHST tests. This concept is at work in all t tests and all analysis of variance tests.
Paired t test
A paired t test is appropriate for two-group designs in which scores in one group are paired with scores in the other group. The kinds of experimental designs that call for a paired t test are explained more fully in Chapter 8, but we note here that
1) scores are paired if they belong to the same person, and
2) scores are paired if participants are matched for some reason such as being in the same family, the same income group, or because they had similar scores on a pretest.
Web Resources
Sampling Fluctuation
Web page
Sampling Fluctuation
Definition of sampling fluctuation.
http://davidmlane.com/hyperstat/A49797.html
A Negative Inference Logic Problem
Web page
Wikipedia on Modus Tollens
Wikipedia’s page on modus tollens explains the logic of negative inference.
http://en.wikipedia.org/wiki/Modus_tollens
Chi Square Logic
Web pages
Statistical Independence
Discusses the concept of statistical independence and gives examples.
http://www.andrew.cmu.edu/user/scheines/tutor/independence.html
Chi Square Statistic
Explains the chi square for independence and discusses data types, contingency tables, and has a Java applet for calculating user-supplied data.
http://math.hws.edu/javamath/ryan/ChiSquare.html
Null Hypothesis Statistical Testing (NHST) Logic
Web pages
Wikipedia on Null Hypothesis
Wikipedia’s page explains the null hypothesis.
http://en.wikipedia.org/wiki/Null_hypothesis
Null Hypothesis
Short tutorial on null hypothesis.
http://davidmlane.com/hyperstat/A29337.html
Alternative Hypothesis
Definition of alternative hypothesis
http://davidmlane.com/hyperstat/A8108.html
Developing a Research Hypothesis
Answers the question of why one should develop a research hypothesis and the steps that should follow.
http://www.childrens-mercy.org/stats/plan/hypo.asp
Hypothesis Testing
Tutorial on hypothesis testing covers null and alternative hypotheses, Type I and Type II errors, critical values, power, one and two tailed tests, and more.
http://www.stats.gla.ac.uk/steps/glossary/hypothesis_testing.html
Sampling Distributions
Interactive page displays four sampling distributions including chi square based on user input.
http://www.stat.berkeley.edu/~stark/Java/Html/SampleDist.htm
Why is p = 0.05?
Explains the history behind the choice of p = 0.05 or less as a criterion value.
http://www.tufts.edu/~gdallal/p05.htm
Rejection Region
Definition of rejection region in a sampling distribution.
http://eksl.cs.umass.edu/eis/pages/glossary/entries/rejection-region.html
Wikipedia on Type I and Type II Errors
Wikipedia’s page explains Type I and Type II errors and gives a number of examples.
http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
Type I and Type II Errors-Making Mistakes in the Justice System
Uses the justice system and its decisions to illustrate the difference between Type I and Type II errors.
http://www.intuitor.com/statistics/T1T2Errors.html
Statistical Significance
Shows the difference between the everyday and statistical use of the word “significance” and gives examples.
http://www.surveysystem.com/signif.htm
Statistical Pitfalls
Gives advise on good and bad practices using statistics.
http://www.zoology.ubc.ca/~whitlock/bio300/LectureNotes/Pitfalls/Pitfalls.html
Gallery of Distributions
Shows 19 theoretical distributions including the normal, t, F, and chi square.
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
Degree of Freedom
Defines the concept of degree of freedom.
http://pespmc1.vub.ac.be/ASC/DEGREE_FREED.html
Effect Size Calculators
Page calculates Cohen’s d and the effect size correlation using means and standard deviations or for the independent groups t test.
http://web.uccs.edu/lbecker/Psy590/escalc3.htm
NHST Techniques
Web pages
Chi Square Calculator
Calculates chi square from user-provided cell values.
http://www.georgetown.edu/faculty/ballc/webtools/web_chi.html
Distribution Tables
Provide statistical tables for Z, t, chi-square, and F distributions.
http://www.statsoft.com/textbook/sttable.html
Correlation
Discusses correlation and how to test the significance of correlations.
http://www.surveysystem.com/correlation.htm
t test Calculator
Calculates independent and paired t tests from user-supplied data.
http://www.graphpad.com/quickcalcs/ttest1.cfm
Gallery of Distributions
Shows 19 theoretical distributions including the normal, t, F, and chi square.
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
Statistical Tests for Ranked Data
Web pages
Mann-Whitney Calculator
Calculates Mann-Whitney statistic. User must first input N for each group.
faculty.vassar.edu/lowry/utest.html
Spearman Rank Order Correlation Calculator
Calculate Spearman Rank Order Correlation. User must first input N for each group.
faculty.vassar.edu/lowry/corr_rank.html
Confidence Intervals
Web pages
Wikipedia on Confidence Intervals
Wikepedia’s page explains confidence intervals in detail.
http://en.wikipedia.org/wiki/Confidence_interval
Confidence Intervals
Interactive Java applet displays 95% and 99% confidence intervals from a population with a mean of 50 and a standard deviation of 10 for 100 samples with sample sizes of 10, 15, or 20.
http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/
A Student’s Guide to Analyzing Data
Web pages
Exploratory Data Analysis
Site summarizes techniques and assumptions of exploratory data analysis.
http://www.itl.nist.gov/div898/handbook/eda/eda.htm
Wikepedia on Exploratory Data Analysis
Wikipedia’s page on exploratory data analysis covers history and software used.
http://en.wikipedia.org/wiki/Exploratory_data_analysis
Power
Web pages
What is Power Analysis
Summary of factors which contribute to statistical power.
http://www.power-analysis.com/power_analysis.htm
Wikipedia on Statistical Power
Wikipedia’s page cover the concept of statistical power.
http://en.wikipedia.org/wiki/Statistical_power
Important Factors in Designing Statistical Power Analysis Studies
Lists factors which help determine the power of a statistical test.
http://cc.uoregon.edu/cnews/summer2000/statpower.html
Meta-analysis
Web pages
Wikipedia on Meta-analysis
Wikipedia’s page explains the concept of meta-analysis.
http://en.wikipedia.org/wiki/Meta-analysis
The Meta-analysis of Research Studies
Short tutorial on the reasons for conducting a meta-analysis.
http://edres.org/meta/
Meta-analysis at 25
On-line article by Gene Glass reviews the history of meta-analysis from 1975 to 2000.
http://glass.ed.asu.edu/gene/papers/meta25.html