Concept: Multiple Comparisons

    When performing κ multiple independent significance tests each at the α level, the probability of making at least one Type I error (rejecting the null hypothesis inappropriately) is 1-(1-α) κ . For example, with κ=10 and α=0.05, there is a 40% chance of at least one of the ten tests being declared significant under the null hypothesis.

    So, when you see a significant result among the ten tests, how confident can you be that it is "really" significant? There is a 40% chance that something will turn out significant, so your effective group-wise Type I error rate is actually 40% -- a far cry from the 5% you may have thought it was.

Bonferroni's Solution

    One very simple method due to Bonferroni (1936) is to divide the test-wise significance level by the number of tests:
    α β =α / κ
    In our example, α β = 0.05 / 10 = 0.005. So if we apply a significance level of 0.005 to each of the ten tests, there is now only a 5% chance that any of them will be declared significant under the null hypothesis.


    In spite of its simplicity (or perhaps because of it), the Bonferroni correction has attracted some criticism. Its biggest problem is that it is too conservative: by controlling the group-wise error rate, each individual test is held to an unreasonably high standard. This increases the probability of a Type II error, and makes it likely that legitimately significant results will fail to be detected.

    A brief discussion of the shortcomings of the method may be found in Perneger (1998).

Alternative Methods

SAS Programming

    In SAS, these methods can be performed using the MULTTEST procedure. Legendre & Legendre (1998) contains a discussion of these methods.

