Among the many famous statisticians that you have probably never heard of is one William Sealy Gosset. If you have ever enjoyed the cool refreshing taste of a good beer, you have Gosset to thank for that. Gosset worked for the Guinness Brewery in Dublin in the early 1900’s. He was responsible for monitoring the brewing process to ensure that each batch was of a consistently high quality. Gosset struggled to interpret the results because he had a relatively small number of samples to analyze while the statistics of the day required a much larger number of samples. The obvious solution, especially to beer lovers, is ”take more samples”. However, it wasn’t that easy because Gosset had to monitor every step of the process, from barley to beer, and the analysis itself was much more complicated than just knocking back a pint.
After struggling with this problem for a few years, our intrepid statistician worked out the mathematics of what we call the t-test (always spelled with a lower-case ’t’). This statistical tool allows for testing of hypotheses with relatively small sample sizes. Gosset recognized the importance of his achievement and wanted to publish it in a journal but the brewery wouldn’t let him publish under his own name. Guinness viewed the new quality control process Gosset established as a business advantage. It did not want the competition to know that they had a statistician on staff. In the end, this huge advance in statistics was published anonymously, as a ”student’s” t-test. While Gosset’s authorship was later acknowledged, it is known as Student’s t-test to this day.
The t-test is very popular millions of t-tests are performed every day in the market research industry alone. However, many of these tests fail to meet the standards Gosset had in mind for his new statistical tool. The t-test was designed to test previously developed hypotheses. For example, it can be used to test whether two batches of beer are equally good. But today t-tests are often used to measure the differences between large numbers of sample groups. This practice, commonly called data fishing, is very different from the original purpose of the t-test because you are no longer testing a hypothesis that was devised on independent grounds. Blindly generating large numbers of t-tests can lead to some misleading results by chance alone because of the sheer numbers of combinations being tested.
The most common varieties of the t-test used today are:
- The one-sample t-test. Used to test whether or not the population average (mean) has a pre-determined value. For example, a company may have specified that all new concepts need to achieve a score of at least 50 before they can proceed to the next phase of testing. We use the one sample t-test to determine if any new concepts test significantly below this standard.
- The two-sample t-test. This is perhaps the most commonly used (and misused) form of the t-test. It tests for significant differences in the means of two distinct populations. For example, we can use this test to see if there are significant differences in how men and women score the new concept.
- A paired t-test. When we have two measurements coming from the same source, we can use this test to see if there is a difference between the averages (means) of the two measures. For example, if we know how much each respondent liked concept A and how much they liked concept B, we can use a paired t-test to determine whether not there is a significant difference in these preferences.
Many popular statistical software packages come with an optional ’dragnet’ tool, which applies the same t-test to all data tables. This tool may, for example, calculate a two sample t-test for all possible pairs of data columns (called ’banners’ in market research) regardless of how those columns are constructed. It also applies a t-test to all the rows (called ’stubs’) that are the result of a proportion or mean calculation. Blindly using this tool not only leads to inappropriate uses of the t-test, the abundance of false positives is a headache in its own right.
In the end, t-tests are, in some ways, like drinking beer. A good thing when used appropriately, but make sure you pick the right one and don’t be surprised if there are some unfortunate consequences from overuse and misuse.