Parametric hypothesis test for normal distributed unpaired samples
Overview
In the current post, we present the \(t\)
-test as a parametric hypothesis test for normally distributed unpaired samples with equal variance. We state necessary conditions for the application of this test and we analyze an example data set.
Parametric hypothesis test for normally distributed unpaired samples of equal variance — Principles and an illustrative data analysis in Gnu R
Objective and scope of application
The \(t\)
-test for unpaired samples is used to compare the location of the means of two independent data series. In this way, differences between the samples under consideration can be analyzed and the significance of these differences can be examined and proven.
The \(t\)
-test can be applied one-sided or two-sided. In the one-sided test, only the equality or inequality of the mean values of the two samples is analyzed. The null hypothesis \((H_0)\)
of the one-sided test is the means of the two data series \(\mu_1\)
and \(\mu_2\)
are equal \(H_0: \mu_1 = \mu_2\)
. The corresponding alternative hypothesis \((H_1)\)
is the two means \(\mu_1\)
and \(\mu_2\)
are not equal \(H_1: \mu_1 \ne \mu_2\)
. In the two-sided test, the direction of inequality (greater than, less than) of the means is also considered in the analysis. The two possible null hypotheses of the two-sided tests are \(H_0: \mu_1 \le \mu_2\)
and \(H_0: \mu_1 \ge \mu_2\)
with the alternative hypotheses \(H_1: \mu_1 > \mu_2\)
resp. \(H_1: \mu_1 < \mu_2\)
.
Requirements
The \(t\)
-test can only be used if the two data series to be examined fulfill certain requirements:
- The values of the two data series to be analyzed must be normally distributed. This can be checked using the Shapiro-Wilk test or visually using a quantile-quantile plot (Q-Q plot). If the sample number of the data series to be analyzed is greater than 30, then proof of normal distribution is usually not necessary, in this case the law of large numbers comes into effect (central limit theorem of Lindeberg-Lévy).
- The variances of the two data series to be analyzed must be equal; the
\(F\)
test or the Levene test can be used for the verification.
Motivating example
Within the context of a study, the upper arm length (UL) of female and male subjects, aged between 50 and 52 years, is to be compared. The null hypothesis is: the mean values of the upper arm lengths are the same for both groups of subjects, $$H_0: \mu_{\mathrm{UL, f}} = \mu_{\mathrm{UL, m}}$$
and correspondingly the alternative hypothesis $$H_1: \mu_{\mathrm{UL, f}} \ne \mu_{\mathrm{UL, m}} . $$ The data used for the analysis are from the United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics. National Health and Nutrition Examination Survey (NHANES), 2007-2008. Inter-university Consortium for Political and Social Research [distributor], 2012-02-22. doi.org/10.3886/ICPSR25505.v3.
Analysis script in Gnu R
Used Gnu R toolboxes
For data import, analysis and visualization of the results we use already existing toolboxes for the Gnu R system. The toolboxes in use are described further down in the post. They are integrated by the following code fragment:
1library(tidyverse) # use tidy data
2library(haven) # import SPSS file
3library(rstatix) # pipe friendly statitics
4library(RColorBrewer) # color maps
5library(kableExtra) # table output
6library(ggplot2) # high quality plots
7library(ggstatsplot) # statistic plots
8library(pander) # rendering R objects into Pandoc markdown
9library(latex2exp) # mathematical expressions in plots
Import of the data and first exploratory analysis
First, the data to be analysed is imported into the Gnu R analysis environment. The data are available in SAV format. We use the function read_sav
of the toolbox haven
for the data import.
1# Importiere Datensatz im SAV-Format
2dataIn <- read_sav("25505-0012-Data.sav")
The imported dataset contains several variables. We select the two variables that will be considered in the analysis, the gender and the upper arm length of the subjects. We also remove all dataset entries with missing values (NA). Then, using the glimpse(dataAnalysis)
statement, we output some information about the data selected for further analysis.
1dataAnalysis <- dataIn %>%
2 dplyr::filter(RIDAGEYR >= 50 & RIDAGEYR <= 52) %>% # Age between 50 and 52
3 select(RIAGENDR, BMXARML) %>% # Selection of the two variables
4 mutate(RIAGENDR = as_factor(RIAGENDR)) %>% # Convert to factor
5 na.omit() # Remove all entries with missing values (NA)
6
7# Output of the first element of the data set
8glimpse(dataAnalysis)
9## Rows: 316
10## Columns: 2
11## $ RIAGENDR <fct> Male, Male, Male, Female, Male, Male, Female, Male, Male, Mal…
12## $ BMXARML <dbl> 34.1, 37.2, 37.7, 35.7, 40.5, 36.0, 38.5, 40.0, 42.6, 45.7, 3…
For exploratory purposes, we report a few statistical parameters for both groups of subjects.
1dataAnalysis %>%
2 group_by(RIAGENDR) %>%
3 get_summary_stats() %>%
4 kable(caption = "Statistical parameters for the upper arm lengths of the two subject groups")
RIAGENDR | variable | n | min | max | median | q1 | q3 | iqr | mad | mean | sd | se | ci |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Male | BMXARML | 172 | 34.0 | 45.7 | 39.0 | 37.3 | 40.925 | 3.625 | 2.669 | 39.045 | 2.350 | 0.179 | 0.354 |
Female | BMXARML | 144 | 29.5 | 42.4 | 35.7 | 34.2 | 37.150 | 2.950 | 2.224 | 35.933 | 2.354 | 0.196 | 0.388 |
Table 1: Statistical parameters for the upper arm lengths of the two subject groups
The dataset includes 316 complete records of subjects between the ages of 50 and 52. 144 of these subjects are female and 172 are male. The sample numbers of the two groups of subjects are visualized below by a bar chart. We use the functions ggplot
and geom_bar
of the toolbox ggplot2
for the visualization.
1ggplot(dataAnalysis, aes(x = RIAGENDR)) +
2 geom_bar(stat="count", width=0.7, fill="steelblue") +
3 geom_text(stat="count", aes(label=after_stat(count)), vjust=2, size=10) +
4 labs(x = "Gender", y = "Number") +
5 theme_bw() +
6 theme(text = element_text(size = 16))
In the next step, we generate an overview of the distribution of the data to be statistically analysed. We combine box-whisker and violin plots for this purpose. The box-whisker plot shows the median as well as the lower and upper quartiles. The violin plot shows the distribution of the recorded values for both groups of subjects. We also mark the mean values in the plots using a red diamond.
1ggplot(dataAnalysis, aes(x = RIAGENDR,
2 y = BMXARML)) +
3 geom_violin() +
4 geom_boxplot(width=0.3) +
5 stat_summary(fun=mean, colour="darkred", geom="point", shape=18,
6 size=3, show.legend = FALSE) +
7 stat_summary(fun=mean, colour="red", geom="text", show.legend = FALSE,
8 vjust=-0.7, aes( label=round(after_stat(y), digits=1))) +
9 labs(x = "Gender", y = "Upper arm length / cm") +
10 theme_bw() +
11 theme(text = element_text(size = 16))
Verify requirements for \(t\)
-test
Requirement: Normal distribution
The sample number of the two groups of subjects is larger than 30. Due to the large sample number, a proof of the normal distribution is actually not required. To illustrate the procedure, we nevertheless perform the verification. We use the Shapiro-Wilk test for the proof of the normal distribution of the data of the two groups of subjects.
1dataAnalysis %>%
2 group_by(RIAGENDR) %>%
3 shapiro_test(BMXARML) %>%
4 add_significance("p") %>%
5 kable(caption = "Results of the Shapiro-Wilk test")
RIAGENDR | variable | statistic | p | p.signif |
---|---|---|---|---|
Male | BMXARML | 0.9892286 | 0.2164096 | ns |
Female | BMXARML | 0.9870128 | 0.1962956 | ns |
Table 2: Results of the Shapiro-Wilk test
The \(p\)
value of the Shapiro-Wilk test is \(p > 0.05\)
for the data of both groups of subjects considered in the study. Thus, the distribution of the present data does not differ significantly from the normal distribution, it follows that the data are normally distributed.
The Q-Q plot provides a visual way to qualitatively check whether data have a normal distribution. In the Q-Q plot, the quantiles of the empirical distribution of the collected data are shown as a function of the quantiles of the normal distribution. The solid line in the Q-Q plot illustrates the normal distribution. If the samples of the subject group (points in the plot) correspond to a normal distribution, then they should be represented as well as possible by this line.
1ggplot(dataAnalysis, aes(sample = BMXARML, color = RIAGENDR)) +
2 geom_qq() +
3 geom_qq_line(linewidth=1.5) +
4 labs(x = "Theoretical quantiles", y = "Upper arm length / cm") +
5 theme_bw() +
6 theme(text = element_text(size = 16)) +
7 scale_color_brewer(palette = "Paired", name = "Gender")
Requirement: Equality of variance
The second necessary condition to compare two samples by means of the \(t\)
-test is the equality of the variances of the two samples. The equality of the variances of the two samples can be demonstrated using the Levene test.
1dataAnalysis %>%
2 levene_test(BMXARML ~ RIAGENDR) %>%
3 add_significance("p") %>%
4 kable(caption = "Results of *Levene* test")
df1 | df2 | statistic | p | p.signif |
---|---|---|---|---|
1 | 314 | 0.6347787 | 0.4262099 | ns |
Table 3: Results of Levene test
Hypothesis testing by means of the \(t\)
-test
The requirements for the parametric hypothesis test are fulfilled. We now apply the \(t\)
-test to verify the \(H_0\)
hypothesis.
1dataAnalysis %>%
2 t_test(BMXARML ~ RIAGENDR) %>%
3 add_significance("p") %>%
4 kable(caption = "Results of the $t$-test")
.y. | group1 | group2 | n1 | n2 | statistic | df | p | p.signif |
---|---|---|---|---|---|---|---|---|
BMXARML | Male | Female | 172 | 144 | 11.7147 | 304.1166 | 0 | **** |
Table 4: Results of the \(t\)
-test
The \(p\)
value of the \(t\)
test is considerably below the significance level of 0.05. The \(H_0\)
hypothesis is rejected and the alternative hypothesis \(H_1\)
is accepted. The mean value of the upper arm lengths of the two test groups differs statistically highly significant.
Visual representation of the results of the analysis
Finally, we visually present the results of the statistical analysis. The distribution of the data of the two groups of subjects is illustrated by a combination of box-whisker and violin plots, the individual measured values by coloured dots. The mean values are marked by labelled red dots. The number of all samples in the database is given at the top right of the graph \((n_{\mathrm{obs}} = 316)\)
, the sample numbers of the individual groups of subjects are in brackets on the X-axis. The results of the \(t\)
test are shown at the top left of the graph: the statistical test used, the parameter, the calculated statistic and the significance. They are followed by information on the effect size. We will go into more detail about the properties and possibilities for calculating the effect size in one of the next posts in this blog.
1ggbetweenstats(
2 data = dataAnalysis,
3 x = RIAGENDR,
4 y = BMXARML,
5 ggtheme = ggplot2::theme_bw(),
6 bf.message = FALSE,
7 title = "Gender-specific upper arm length differences, age 50 to 52 years"
8 ) +
9 labs(x = "Gender", y = "Upper arm length / cm")
Gnu R toolboxes that we used
For the statistical analysis of the data and the visualisation of results and intermediate results, we used a number of toolboxes. Below is a list of these toolboxes with a short description and links to more detailed information:
tidyverse
— This toolbox includes a number of Gnu R software packages that use tidy data data structures and in which the design philosophy and grammar of tidy data are implemented, e.g. pipes. Information on tidy data: H. Wickham, Tidy data, The Journal of Statistical Software, Vol. 59, 2014, Information on the toolbox: https://www.tidyverse.org/.haven
— The toolboxhaven
enables Gnu R to read and write various data formats used by other statistical software systems. The following formats are supported: SAS, SPSS and Stata. Information on the toolbox: https://haven.tidyverse.org/.rstatix
— Therstatix
toolbox offers a simple, intuitive and coherent with the “tidyverse” design philosophy framework for performing basic statistical tests, e.g. t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. Information on the toolbox: https://rpkgs.datanovia.com/rstatix/.RColorBrewer
— Provides colour schemes designed by Cynthia Brewer, see http://colorbrewer2.org. Information on the toolbox: https://cran.r-project.org/web/packages/RColorBrewer/index.html.kableExtra
— This toolbox offers simple methods to create tables from tidy data data structures. Information about the toolbox: https://github.com/haozhu233/kableExtra.ggplot2
— In theggplot2
toolbox a system for the declarative creation of graphics is realised, which is based on ‘The Grammar of Graphics’. Information on the toolbox: https://ggplot2.tidyverse.org/.ggstatsplot
— This toolbox is an extension ofggplot2
, ‘ggstatsplot’, creates graphs and extends them with results from statistical tests. Information on the toolbox: https://indrajeetpatil.github.io/ggstatsplot/.
Information about the hard- and software configuration of the computer on which this post was authored:
1pander(sessionInfo())
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
locale: LC_COLLATE=German_Austria.utf8, LC_CTYPE=German_Austria.utf8, LC_MONETARY=German_Austria.utf8, LC_NUMERIC=C and LC_TIME=German_Austria.utf8
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: latex2exp(v.0.9.6), pander(v.0.6.5), ggstatsplot(v.0.12.4), kableExtra(v.1.4.0), RColorBrewer(v.1.1-3), rstatix(v.0.7.2), haven(v.2.5.4), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), dplyr(v.1.1.4), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), ggplot2(v.3.5.1) and tidyverse(v.2.0.0)
loaded via a namespace (and not attached): tidyselect(v.1.2.1), viridisLite(v.0.4.2), farver(v.2.1.2), statsExpressions(v.1.6.0), fastmap(v.1.2.0), TH.data(v.1.1-2), blogdown(v.1.19), bayestestR(v.0.14.0), digest(v.0.6.37), timechange(v.0.3.0), estimability(v.1.5.1), lifecycle(v.1.0.4), survival(v.3.6-4), magrittr(v.2.0.3), compiler(v.4.4.1), rlang(v.1.1.4), sass(v.0.4.9), tools(v.4.4.1), utf8(v.1.2.4), yaml(v.2.3.10), knitr(v.1.48), labeling(v.0.4.3), xml2(v.1.3.6), abind(v.1.4-8), multcomp(v.1.4-26), withr(v.3.0.1), grid(v.4.4.1), datawizard(v.0.13.0), fansi(v.1.0.6), xtable(v.1.8-4), colorspace(v.2.1-1), paletteer(v.1.6.0), emmeans(v.1.10.4), scales(v.1.3.0), MASS(v.7.3-60.2), zeallot(v.0.1.0), insight(v.0.20.5), cli(v.3.6.3), mvtnorm(v.1.3-1), rmarkdown(v.2.28), generics(v.0.1.3), rstudioapi(v.0.16.0), tzdb(v.0.4.0), parameters(v.0.22.2), cachem(v.1.1.0), splines(v.4.4.1), effectsize(v.0.8.9), vctrs(v.0.6.5), Matrix(v.1.7-0), sandwich(v.3.1-1), jsonlite(v.1.8.9), carData(v.3.0-5), bookdown(v.0.40), car(v.3.1-3), patchwork(v.1.3.0), hms(v.1.1.3), ggrepel(v.0.9.6), Formula(v.1.2-5), correlation(v.0.8.5), systemfonts(v.1.1.0), jquerylib(v.0.1.4), glue(v.1.8.0), rematch2(v.2.1.2), codetools(v.0.2-20), stringi(v.1.8.4), gtable(v.0.3.5), prismatic(v.1.1.2), munsell(v.0.5.1), pillar(v.1.9.0), htmltools(v.0.5.8.1), R6(v.2.5.1), evaluate(v.1.0.0), lattice(v.0.22-6), highr(v.0.11), backports(v.1.5.0), broom(v.1.0.7), bslib(v.0.8.0), Rcpp(v.1.0.13), svglite(v.2.1.3), coda(v.0.19-4.1), xfun(v.0.48), zoo(v.1.8-12) and pkgconfig(v.2.0.3)
Theoretical foundations of the \(t\)
-test
Requirements and fundamental principles of the test
The \(t\)
-test can be applied if in both cohorts of a sample the variable under consideration is normally distributed and the variances are equal.
In this test, the difference in the mean values of the two cohorts is analysed. According to the null hypothesis that the sample means are the same in both cohorts under consideration, this difference is equal to zero. A test statistic based on the \(t\)
-distribution is used for the analysis. This \(t\)
-test statistic is applied to the difference of the two sample means or to the value of the difference of the sample means, according to the null hypothesis.
Notation and calculation of the \(t\)
-test statistic
The sample sizes of the two analysed cohorts are \(n_1\)
and \(n_2\)
. The sample means are \(\bar{x}_1\)
and \(\bar{x}_2\)
, the sample standard deviations are \(s_1\)
and \(s_2\)
.
The first step is to estimate the pooled standard deviation \(s\)
$$ s = \sqrt{\frac{(n_1 - 1),s^2_1 + (n_2 - 1),s^2_2}{n_1 + n_2 - 2}},.$$
Then the test statistic is determined
$$t = \frac{\bar{x}_1 - \bar{x}_2}{s\,\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\,,$$
which follows a \(t\)
distribution with \((n_1 + n_2 - 2)\)
degrees of freedom. The value \(t\)
calculated using the test statistic is compared with a value of the \(t\)
distribution determined by the degree of freedom \(df\)
and the significance level \(\alpha\)
, see the following figure.
The respective values can be taken from tables in the books listed below or calculated with the help of a Gnu R function qt
, e.g. for \(df = 304\)
and \(\alpha = 0.05\)
.
1df <- 304 # degree of freedom
2alpha <- 0.05 # level of significance
3qt(1 - alpha / 2, df) # two sided test -> alpha / 2
4## [1] 1.967798
Further reading
Ott, R. L., & Longnecker, M. (2010). An introduction to statistical methods and data analysis (6th ed.). Brooks/Cole, Cengage Learning.
Petrie, A., & Sabin, C. (2005). Medical statistics at a glance (2nd ed.). Blackwell Publishing.
Rosner, B. (2016). Fundamentals of biostatistics (8th ed.). Cengage Learning.