As it was stated in Two sample t-Test, you can apply the t-test if the following assumptions are met:
- That the two samples are independently and randomly drawn from the source population(s).
- That the scale of measurement for both samples has the properties of an equal interval scale.
- That the source population(s) can be reasonably supposed to have a normal distribution.
Sometimes, however, your data fails to meet the second and/or third requirement. For example, there is nothing to indicate it has a normal distribution, or you do not have an equal interval scale – that is, the spacing between adjacent values cannot be assumed to be constant. But you still want to find out whether the difference between two samples is significant. In such cases, you can use the Mann–Whitney U test, a non-parametric alternative of the t-test.
In statistics, the Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney (WMW) test) is a non-parametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample1, or . However, it is also used as a substitute for the independent groups t-test, with the null hypothesis that the two population medians are equal.
BTW, there are actually two tests – the Mann-Whitney U test and the Wilcoxon rank-sum test. They were developed independently and use different measures, but are statistically equivalent.
The assumptions of the Mann-Whitney test are:
- That the two samples are randomly and independently drawn;
- That the dependent variable is intrinsically continuous – capable in principle, if not in practice, of producing measures carried out to the nth decimal place;
- That the measures within the two samples have the properties of at least an ordinal scale of measurement, so that it is meaningful to speak of "greater than", "less than", and "equal to".2
As you can see, this non-parametric test does not assume (or require) samples from normally distributed populations. Such tests are also called distribution-free tests.
Word of caution
It has been known for some time that the Wilcoxon-Mann-Whitney test is adversely affected by heterogeneity of variance when the sample sizes are not equal. However, even when sample sizes are equal, very small differences between the population variances cause the large-sample Wilcoxon-Mann-Whitney test to become too liberal, that is, the actual Type I error rate for the large-sample Wilcoxon-Mann-Whitney test increased as the sample size increased.3.
Hence you must remember that this test is true only if the two population distributions are the same (including homogeneity of variance) apart from a shift in location.
The method replaces raw values with their corresponding ranks. With this, some results can be achieved using simple math. For example, the total sum of ranks is already known from the total size and it is . Hence, the average rank is .
The general idea is that if the null hypothesis is true and samples aren't significantly different, then ranks are somewhat balanced between A and B, and the average rank for each sample should approximate the total average rank, and rank sums should approximate and respectively.
To perform the test, first you need to calculate a measure known as U for each sample.
You start by combining all values from both samples into a single set, sorting them by value, and assigning a rank to each value (in case of ties, each value receives an average rank). Ranks go from 1 to N, where N is the sum of sizes and . Then you calculate the sum of ranks for values of each sample and .
Now you can calculate U as
For small sample sizes you can use tabulated values. You take the minimum of two Us, and then compare it with the critical value corresponding to sample sizes and the chosen significance level. Statistics textbooks usually list critical values in tables for sample sizes up to 20.
For large sample sizes you can use the z-test. It was shown that U is approximately normally distributed if both sample sizes are equal to or greater than 5 (some sources says if 4).
In case of ties, the formula for standard deviation becomes
where g is the number of groups of ties, tj is the number of tied ranks in group j.
The calculator below uses the z-test. Of course, there is a limitation on sample sizes (both sample sizes should be equal to or greater than 5), but this is probably not much of a limitation for real cases.