Hypothesis test for paired samples

Uwe Graichen · uwe.graichen@kl.ac.at

Overview

In this blog post, we present the procedure for a hypothesis test when two paired samples with continuous values are analysed. We give two possible tests and the necessary conditions for their application. We will check step by step which conditions are met and perform a hypothesis test using a sample data set.

Hypothesis Testing for Paired Samples - Principles and an illustrative data analysis using Gnu R

Objective and scope of application

Paired sample hypothesis testing is used to determine whether there is a significant difference between two related groups. This type of analysis is often used when the same subjects are measured before and after a treatment, or when pairs are matched in a meaningful way. The scope of paired sample hypothesis testing includes medical studies, psychological research, and any scenario where repeated measures or matched subjects are analysed.

Motivating example

Based on data published by (Drew, 1951), in this blog post we want to analyse the influence of visuomotor tasks on the blink rate and determine whether there is a significant difference between the blink rate at rest and the blink rate during a simple visuomotor task (following a straight line with a pencil). We use data from 12 subjects who had to perform this visuomotor task. The average number of blinks per minute was recorded during the task. In addition, the blink rate under resting conditions was recorded as a reference.

Analysis script in Gnu R

Used Gnu R toolboxes

In this analysis script, software toolboxes are used in addition to the core functionality of Gnu R. The additional toolboxes are used for data import, data selection and pre-processing, data visualisation and data analysis. The integration of these toolboxes into the Gnu R analysis script will be by means of the following code fragment:

1library(tidyverse)     # use of pipes and tibbles, data manipulation, pre-processing
2library(labelled)      # handle data labels
3library(gt)            # nice designed tables
4library(ggplot2)       # plots and graphics
5library(qqplotr)       # Q-Q plot extension for ggplot2 
6library(rstatix)       # pipe friendly statistics 
7library(PairedData)    # source of the sample data
8library(effectsize)    # a large number of options for calculating effect sizes 
9library(pander)

Data

Data origin and data specification

The data we analyze in this blog post is provided by the Gnu R package PairedData. It was originally published by (Drew, 1951). The following variables are included in this data set:

  • Subject: Respondent ID
  • Resting: blink rate per minute in pre-experimental condition
  • Straight: blink rate per minute in 1st condition, move a pencil along a straight line
  • Oscillating : blink rate per minute in 2nd condition, move a pencil along a oscillating line

Data import

The first step is to import the Blink2 data set provided by the PairedData toolbox. We use the data function to do this. The imported data will be stored in the dataAnalysis data structure.

1# import the data, provided by the toolbox PairedData
2data("Blink2")
3
4# assign the data to the tibble dataAnalysis
5dataAnalysis <- Blink2 

Just for information, we determine the size of the data to be analyzed.

1# number of data items and variables 
2dim(dataAnalysis)
3## [1] 12  4

The data to be analyzed are presented in tabular form below. Three measurements (arranged in columns) were performed for each of the 12 subjects (arranged in rows). The blink rate is given in blinks per minute.

1# output the data of dataAnalysis
2dataAnalysis %>%
3  
4  # format the output using the command gt()
5  gt()
Subject Resting Straight Oscillating
S01 28 19.0 19.3
S02 24 16.7 9.0
S03 23 2.7 1.1
S04 18 6.6 2.0
S05 17 12.0 1.9
S06 11 7.0 10.2
S07 10 6.0 1.9
S08 10 4.1 1.5
S09 6 3.0 0.5
S10 5 11.3 5.9
S11 4 5.9 4.5
S12 3 3.1 1.2

Selecting an appropriate hypothesis test

Data requirements for applying each hypothesis test

Based on the research question, \(H_0\) and \(H_a\) hypotheses are formulated. The validity of these hypotheses is tested by hypothesis testing, assuming that the probability of a type 1 error is below a threshold \(\alpha\). \(\alpha\) is typically referred to as the significance level. A significance level of \(\alpha = 0.05\) is commonly used.

Various hypothesis testing methods are available for hypotheses based on interval or ratio scaled paired samples. To ensure correct results in hypothesis testing methods, certain requirements of the samples to be analyzed must be met.

Following a list of common hypothesis testing procedures, their data sample requirements and the respective commands of the rstatix toolbox:

In the following, we use methods of explorative analysis and statistical tests to check which requirements are fulfilled and which statistical test can be used.

Distribution of differences between pairs, outliers in the differences between pairs

To gain an understanding of the data being analysed, we use visualisation techniques familiar from exploratory data analysis.

 1# use data selected for analysis
 2dataAnalysis %>%
 3  # select the columns Subject, Resting, Straight
 4  dplyr::select(Subject, Resting, Straight) %>%
 5  # create a new pivot table, with the columns 'Subject', 'condition' and 'eyeblinkrate'
 6  pivot_longer(
 7    # merge the two columns 'Resting' and 'Straight'
 8    cols = c(Resting, Straight),
 9    # former column headings 'Resting' and 'Straight' as labels in the new column 'condition'
10    names_to = "condition",
11    # values of 'Resting' and 'Straight' in the new column 'eyeblinkrate'
12    values_to = "eyeblinkrate"
13  ) %>%
14  # initialize ggplot object, x-axis 'condition', y-axis 'eyeblinkrate'
15  ggplot(aes(x=condition, y=eyeblinkrate)) + 
16  # add a violin plot, with a semi-transparent background
17  geom_violin(alpha=0.2) +
18  # add a small box plot
19  geom_boxplot(width=0.1, color="darkgrey", alpha=0.2, linewidth=.7) +
20  # add points for each mesurement, use different colors for conditions
21  geom_point(aes(fill=condition, group=Subject), size=4, shape=21, alpha=0.5) +
22  # connect paired measurements by dashed lines
23  geom_line(aes(group=Subject), linetype=2, linewidth=0.7) +
24  # use color palette 'Dark2' from https://colorbrewer2.org 
25  scale_fill_brewer(palette="Dark2") +
26  # set x-, y-label and legend heading
27  labs(x = "Condition", y = "Eye blink rate per minute", fill = "Condition") +
28  # select theme for the plot
29  theme_bw()

The measurements of the two conditions ‘Rest’ and ‘Straight’ of the subjects are shown as coloured dots. Dots from the same subject are connected by dashed lines. The distribution of the data within the conditions is visualised by box plots and violin plots.

A closer look at the visualised data shows that the blink rate in the visuomotor task decreased compared to the resting state in nine of the twelve subjects (dashed lines sloping from left to right). It increased in two subjects and remained almost unchanged in one subject. In further analysis we would like to investigate whether the decrease in blink rate during the visuomotor task is a significant change compared to the resting state.

For paired data, an appropriate hypothesis test is chosen based on the properties of the difference between the pairs of data. Therefore, we first calculate this difference and output the table with the additional column containing these differences (column heading ‘Res - Str’).

 1# create a new containing the difference
 2dataAnalysisDiff <- dataAnalysis %>%
 3  # select variables Subject, Resting, Straight
 4  dplyr::select(Subject, Resting, Straight) %>%
 5  # compute the difference and store it in the column DiffRS
 6  mutate(DiffRS = Resting - Straight)
 7
 8# assign the label 'Res - Str' to the new column DiffRS
 9var_label(dataAnalysisDiff$DiffRS) <- 'Res - Str'
10
11# output the data set 'dataAnalysisDiff'
12dataAnalysisDiff %>%
13  gt()
Subject Resting Straight Res - Str
S01 28 19.0 9.0
S02 24 16.7 7.3
S03 23 2.7 20.3
S04 18 6.6 11.4
S05 17 12.0 5.0
S06 11 7.0 4.0
S07 10 6.0 4.0
S08 10 4.1 5.9
S09 6 3.0 3.0
S10 5 11.3 -6.3
S11 4 5.9 -1.9
S12 3 3.1 -0.1

In the next step, we visualise the blink rates at rest and during the simple visual-motor task, as well as the difference between them, using bar plots.

 1# use data selected for analysis, including the differences
 2dataAnalysisDiff %>%
 3  # select the columns Subject, Resting, Straight, DiffRS
 4  dplyr::select(Subject, Resting, Straight, DiffRS) %>%
 5  # create a new pivot table, with the columns 'Subject', 'Resting', 'Straight'
 6  # and 'DiffRS'
 7  pivot_longer(
 8    # merge the two columns 'Resting', 'Straight' and 'DiffRS'
 9    cols = c(Resting, Straight, DiffRS),
10    # former column headings 'Resting', 'Straight' and 'DiffRS' as labels in the new column 'condition'
11    names_to = "Condition",
12    # values of 'Resting', 'Straight' and 'DiffRS' in the new column 'eyeblinkrate'
13    values_to = "Values"
14  ) %>%
15  # order the conditions for the plot 
16  mutate(Condition=factor(Condition, levels=c("Resting", "Straight", "DiffRS"),
17                          ordered=FALSE)) %>%
18  # initialize ggplot object, x-axis 'Subject', y-axis 'eyeblinkrate'
19  ggplot(aes(x=Subject, y=Values, fill=Condition)) +
20  # add a bar plot
21  geom_bar(stat="identity", color="black", width=0.7, position=position_dodge()) +
22  # set x-, y-label and legend heading
23  labs(y = "Eye blink rate per minute", fill = "Condition") +
24  theme_bw()

The properties of the differences are relevant for selecting an appropriate hypothesis test. These are shown as blue bars in the plot above. Some statistical parameters of these differences are given below.

 1dataAnalysisDiff %>%
 2  # select the column DiffRS
 3  dplyr::select(DiffRS) %>%
 4  # compute descriptive statistics,  measures of central tendency and dispersion 
 5  summarytools::descr(
 6    stats = c("mean", "sd", "min", "q1", "med", "q3", "max", "iqr"),
 7    transpose = TRUE) %>%
 8  # output as table
 9  gt() %>%
10  # format the values of column statistic and p, 2 digits
11  fmt_number(
12    decimals = 2
13  )
Mean Std.Dev Min Q1 Median Q3 Max IQR
5.13 6.77 −6.30 1.45 4.50 8.15 20.30 5.50

First, let’s look at the outliers in the differences. Outliers are data points that are above or below the whiskers of the box plot. We distinguish extreme outliers, which are above or below twice the length of the whiskers. Whether the data contains outliers can be determined visually using the box plots. We use the boxplot plot and the identify_outliers command from the rstatix toolbox.

 1# use data selected for analysis, including the differences
 2dataAnalysisDiff %>%
 3  # initialize ggplot object, x-axis "", y-axis 'DiffRS'
 4  ggplot(aes(x="", y=DiffRS)) + 
 5  # add a box plot
 6  geom_boxplot(linewidth=1) +
 7  # add points for the differences of each subject
 8  geom_point(fill="green", size=4, shape=21, alpha=0.5)+
 9  # use color palette 'Dark2' from https://colorbrewer2.org 
10  scale_fill_brewer(palette="Dark2") +
11  # set x-, y-label
12  labs(x = "Difference between condition Straight and Resting",
13       y = "Diff Eye blink rate per minute") +
14  # select theme for the plot
15  theme_bw()

In the visualisation above we can identify two outliers.

Using the identify_outliers command of the rstatix toolbox, it is also possible to assign the outliers to the subjects and to grade the outlier, see the following chunk of code.

1# use data selected for analysis, including the differences
2dataAnalysisDiff %>%
3  # select subject IDs and differences
4  dplyr::select(Subject, 'DiffRS') %>%
5  # find the outliers
6  identify_outliers('DiffRS') %>%
7  # output the result as table
8  gt()
Subject Res - Str is.outlier is.extreme
S03 20.3 TRUE FALSE
S10 -6.3 TRUE FALSE

The next step is to check whether the data to be analysed is normally distributed. Both exploratory statistical methods (Q-Q plot) and statistical tests (Shapiro-Wilk test) are available for this purpose.

Let’s start with the exploratory approach, the Q-Q plot. In a Q-Q plot, the quantiles of the empirical distribution of the data being analysed (Y-axis) are plotted against the theoretical quantiles of the normal distribution (X-axis). When the empirical and theoretical quantiles are approximately equal, the paired data are approximately on a diagonal. In the Q-Q plots below, we have inserted a reference line where the data points would lie if they were normally distributed. We have also highlighted the 95% confidence interval in light blue.

 1# use data selected for analysis
 2dataAnalysisDiff %>%
 3  # initialize ggplot object for a Q-Q plot of differences 'DiffSO'
 4  ggplot(aes(sample = DiffRS)) +
 5  # Q-Q plot of the differences
 6  stat_qq_point() +
 7  # insert reference line for normal distribution
 8  stat_qq_line(distribution = "norm") +
 9  # add a 95% confidence band
10  stat_qq_band(conf = 0.95, alpha = 0.5, fill = "lightblue", color = "black") +
11  # set color palette
12  scale_colour_brewer(palette = "Dark2") +
13  # set label and title
14  labs(x="Normal theoretical quantiles",
15       y="Difference eye blink rate, resting - straight, per minute", ) +
16  # set theme
17  theme_bw()

Seven of the twelve differences are very close to the line that marks an ideal normal distribution. The values at the ‘tails’ of the distribution are more distant from this ideal. Eleven values are within the 95% confidence interval and one is outside.

In addition to visual approaches, statistical tests are available to check the normal distribution of collected data, such as the Shapiro-Wilk test of normality. The null hypothesis \(H_0\) of the Shapiro-Wilk normality test is that the population data follow a normal distribution. The corresponding alternative hypothesis \(H_a\) is that there is no normal distribution.

1# use data selected for analysis including differences, assign the result to resShapiroTest
2dataAnalysisDiff %>%
3  # check by means of Shapiro-Wilk test if differences are normal distributed
4  shapiro_test(DiffRS) %>%
5  # add star coding for significance
6  add_significance("p") %>%
7  # output as table
8  gt()
variable statistic p p.signif
DiffRS 0.9605808 0.7920899 ns

The \(p\) value obtained by the Shapiro-Wilk test is greater than 0.05. Therefore, the null hypothesis \(H_0\) of a normal distribution is not rejected. The differences are consistent with a normal distribution.

Summary and selection of the statistical test

The exploratory review of the data led to the following conclusion:

  • The blink rate data (per minute) are ratio scaled.
  • The Shapiro-Wilk normality test showed that the differences follow a normal distribution.
  • The sample size is small (12 subjects) and there are outliers in the differences.

For these reasons, we choose the Wilcoxon signed-rank test for hypothesis testing.

Performing the statistical test

In the exploratory data analysis, we observed that the blink rate tends to decrease during visual-motor tasks compared to the resting state. We use the selected hypothesis test to test the significance of this decrease. The hypothesis test we use is the Wilcoxon signed-rank test for paired data. Therefore, we have to formulate the hypotheses using the differences between the two conditions (resting, straight) to be analysed.

  • \(H_0\): The median difference between the blink rates of the two conditions ‘resting’ and ‘straight’ is less than or equal to zero.
  • \(H_a\): The median difference between the blink rates of the two conditions ‘resting’ and ‘straight’ is greater than zero.

The Wilcoxon signed-rank test is then performed. Two remarks:

  • The reference in our test is the resting condition, see the ‘ref.group’ parameter in the command.
  • In the command wilcox_test we have to specify the alternative hypotheses \(H_a\), parameter ‘alternative’.
 1# use data selected for analysis
 2resWilcoxTest <- dataAnalysis %>%
 3  # select the columns Subject, Resting, Straight
 4  dplyr::select(Subject, Resting, Straight) %>%
 5  # create a new pivot table, with the columns 'Subject', 'condition' and 'eyeblinkrate'
 6  pivot_longer(
 7    # merge the two columns 'Resting' and 'Straight'
 8    cols = c(Resting, Straight),
 9    # former column headings 'Resting' and 'Straight' as labels in the new column 'condition'
10    names_to = "condition",
11    # values of 'Resting' and 'Straight' in the new column 'eyeblinkrate'
12    values_to = "eyeblinkrate"
13  ) %>%
14  # performing Wilcoxon signed-rank test
15  wilcox_test(
16    # compare eye blink rate of the conditions 'Resting' and 'Straight'
17    eyeblinkrate ~ condition,
18    # paired values
19    paired = TRUE,
20    # reference is the condition 'Resting'
21    ref.group = "Resting",
22    # alternative hypothesis is greater
23    alternative = "greater",
24    # detailed output of test results
25    detailed = TRUE) %>%
26  # add significance star code 
27  add_significance()

The results of the test are shown in the table below.

1resWilcoxTest %>%
2  gt() %>%
3  # format the values of column estimate and p, 3 digits
4  fmt_number(
5    columns = c(estimate, p, conf.low,	conf.high),
6    decimals = 3
7  ) %>%
8  # output '<0.001' for values smaller than 0.001
9  sub_small_vals(threshold = 0.001)
estimate .y. group1 group2 n1 n2 statistic p conf.low conf.high method alternative p.signif
4.817 eyeblinkrate Resting Straight 12 12 67 0.015 1.550 Inf Wilcoxon greater *

The result of the hypothesis test can also be output as a string using the get_test_label command from the rstatix toolbox, as shown below.

1resWilcoxTest %>%
2  get_test_label(detailed = TRUE, type = "text")
3## [1] "Wilcoxon test, V = 67, p = 0.015, n = 12"

Finally, we determine the effect size of this statistical test. The effect size is a quantifiable measure of an empirical effect and is used to illustrate the practical relevance of the results of statistical tests. To do this, we calculate the Pearson correlation coefficient \(r\) for the Wilcoxon sign rank test. We use the wilcox_effsize command of the rstatix toolbox and output the result as a table using gt.

 1# use data selected for analysis
 2dataAnalysis %>%
 3  # select the columns Subject, Resting, Straight
 4  dplyr::select(Subject, Resting, Straight) %>%
 5  # create a new pivot table, with the columns 'Subject', 'condition' and 'eyeblinkrate'
 6  pivot_longer(
 7    # merge the two columns 'Resting' and 'Straight'
 8    cols = c(Resting, Straight),
 9    # former column headings 'Resting' and 'Straight' as labels in the new column 'condition'
10    names_to = "condition",
11    # values of 'Resting' and 'Straight' in the new column 'eyeblinkrate'
12    values_to = "eyeblinkrate"
13  ) %>%
14  # estimate the effect size
15  wilcox_effsize(
16    # compare eye blink rate of the conditions 'Resting' and 'Straight'
17    eyeblinkrate ~ condition,
18    # paired values
19    paired = TRUE,
20    # reference is the condition 'Resting'
21    ref.group = "Resting",
22    # alternative hypothesis is greater
23    alternative = "greater") %>%
24  # output as table
25  gt()
.y. group1 group2 effsize n1 n2 magnitude
eyeblinkrate Resting Straight 0.6343192 12 12 large

Reporting the results of the statistical analysis

In our study we analysed the influence of visual-motor tasks on the eye blink rate and investigated whether there is a significant difference between the eye blink rate in the resting state and the eye blink rate during a simple visual-motor task. We retrospectively analysed the data of twelve subjects whose eye blink rate was measured under different conditions, at rest and during a simple visual-motor task (straight line).

As we were analysing paired data, we analysed the characteristics of the differences between the two conditions in order to select an appropriate hypothesis test. We performed a Shapiro-Wilk normality test, which showed no evidence of non-normality \((W = 0.96, p = 0.79)\). In the exploratory analysis using the Q-Q plot, one sample point deviated more from the normal distribution. The number of samples to be analysed is small \((n = 12)\). The differences between the two conditions contain two outliers. Based on this preliminary analysis, we chose the Wilcoxon signed-rank test to test the \(H_0\) hypothesis: The median difference between the blink rates of the two conditions visuomotor task and resting state is less than or equal to zero.

The null hypothesis was rejected by the test \((V_{\mathrm{Wilcoxon}} = 67\), \(p = 0.0155\), effect size \(r = 0.634)\), the alternative hypothesis was accepted. We conclude that the blink rate is significantly reduced during simple visual-motor tasks compared to the resting state.

 1# create the line with statistical information inserted at the top of the figure
 2statExp = expression(paste(italic("V")[Wilcoxon], "", " = ", "67, ",
 3                           italic("p"), " = ", "0.016", "", 
 4                           paste(", ", italic("r"), " = ", "0.634"),
 5                           paste(", ", italic("n")[pairs], " = ", "12")))
 6
 7# use data selected for analysis
 8dataAnalysis %>%
 9  # select the columns Subject, Resting, Straight
10  dplyr::select(Subject, Resting, Straight) %>%
11  # create a new pivot table, with the columns 'Subject', 'condition' and 'eyeblinkrate'
12  pivot_longer(
13    # merge the two columns 'Resting' and 'Straight'
14    cols = c(Resting, Straight),
15    # former column headings 'Resting' and 'Straight' as labels in the new column 'condition'
16    names_to = "condition",
17    # values of 'Resting' and 'Straight' in the new column 'eyeblinkrate'
18    values_to = "eyeblinkrate"
19  ) %>%
20  # initialize ggplot object, x-axis 'condition', y-axis 'eyeblinkrate'
21  ggplot(aes(x=condition, y=eyeblinkrate)) + 
22  # add a violin plot, with a semi-transparent background
23  geom_violin(alpha=0.2) +
24  # add a small box plot
25  geom_boxplot(width=0.1, color="darkgrey", alpha=0.2, linewidth=.7) +
26  # add points for each mesurement, use different colors for conditions
27  geom_point(aes(fill=condition, group=Subject), size=4, shape=21, alpha=0.5) +
28  # connect paired measurements by dashed lines
29  geom_line(aes(group=Subject), linetype=2, linewidth=0.7) +
30  # use color palette 'Dark2' from https://colorbrewer2.org 
31  scale_fill_brewer(palette="Dark2") +
32  # set x-, y-label and legend heading
33  labs(x = "Condition", y = "Eye blink rate per minute", fill = "Condition") +
34  annotate("text", 1.5, 31, label = statExp, parse = TRUE, size = 4) +
35  # select theme for the plot
36  theme_bw()

Gnu R toolboxes that we used

We used a number of toolboxes for the statistical analysis of the data and the visualisation of the results and interim results. Below is a list of these toolboxes with a brief description and links to more detailed information:

Information about the hard- and software configuration of the computer on which this post was authored:

1pander(sessionInfo())

R version 4.4.0 (2024-04-24 ucrt)

Platform: x86_64-w64-mingw32/x64

locale: LC_COLLATE=German_Austria.utf8, LC_CTYPE=German_Austria.utf8, LC_MONETARY=German_Austria.utf8, LC_NUMERIC=C and LC_TIME=German_Austria.utf8

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: pander(v.0.6.5), effectsize(v.0.8.8), PairedData(v.1.1.1), lattice(v.0.22-6), mvtnorm(v.1.2-5), gld(v.2.6.6), MASS(v.7.3-60.2), rstatix(v.0.7.2), qqplotr(v.0.0.6), gt(v.0.10.1), labelled(v.2.13.0), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), dplyr(v.1.1.4), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), ggplot2(v.3.5.1) and tidyverse(v.2.0.0)

loaded via a namespace (and not attached): bitops(v.1.0-7), tcltk(v.4.4.0), sandwich(v.3.1-0), rlang(v.1.1.3), magrittr(v.2.0.3), multcomp(v.1.4-25), qqconf(v.1.3.2), matrixStats(v.1.3.0), e1071(v.1.7-14), compiler(v.4.4.0), reshape2(v.1.4.4), pbmcapply(v.1.5.1), vctrs(v.0.6.5), summarytools(v.1.0.1), pkgconfig(v.2.0.3), fastmap(v.1.2.0), magick(v.2.8.3), backports(v.1.4.1), labeling(v.0.4.3), caTools(v.1.18.2), utf8(v.1.2.4), rmarkdown(v.2.27), tzdb(v.0.4.0), pracma(v.2.4.4), haven(v.2.5.4), modeltools(v.0.2-23), xfun(v.0.44), cachem(v.1.1.0), jsonlite(v.1.8.8), highr(v.0.10), pryr(v.0.1.6), broom(v.1.0.6), parallel(v.4.4.0), R6(v.2.5.1), coin(v.1.4-3), bslib(v.0.7.0), stringi(v.1.8.4), RColorBrewer(v.1.1-3), car(v.3.1-2), jquerylib(v.0.1.4), estimability(v.1.5.1), Rcpp(v.1.0.12), bookdown(v.0.39), iterators(v.1.0.14), knitr(v.1.46), zoo(v.1.8-12), base64enc(v.0.1-3), parameters(v.0.21.7), Matrix(v.1.7-0), splines(v.4.4.0), timechange(v.0.3.0), tidyselect(v.1.2.1), rstudioapi(v.0.16.0), abind(v.1.4-5), yaml(v.2.3.8), doParallel(v.1.0.17), codetools(v.0.2-20), blogdown(v.1.19), plyr(v.1.8.9), opdisDownsampling(v.1.0.1), withr(v.3.0.0), bayestestR(v.0.13.2), coda(v.0.19-4.1), evaluate(v.0.23), survival(v.3.5-8), proxy(v.0.4-27), xml2(v.1.3.6), pillar(v.1.9.0), carData(v.3.0-5), stats4(v.4.4.0), checkmate(v.2.3.1), foreach(v.1.5.2), insight(v.0.19.11), generics(v.0.1.3), hms(v.1.1.3), munsell(v.0.5.1), scales(v.1.3.0), xtable(v.1.8-4), class(v.7.3-22), glue(v.1.7.0), emmeans(v.1.10.2), lmom(v.3.0), tools(v.4.4.0), robustbase(v.0.99-2), rapportools(v.1.1), grid(v.4.4.0), libcoin(v.1.0-10), datawizard(v.0.10.0), colorspace(v.2.1-0), cli(v.3.6.2), twosamples(v.2.0.1), fansi(v.1.0.6), gtable(v.0.3.5), DEoptimR(v.1.1-3), sass(v.0.4.9), digest(v.0.6.35), TH.data(v.1.1-2), farver(v.2.1.2), htmltools(v.0.5.8.1) and lifecycle(v.1.0.4)

Theoretical foundations of paired sample hypothesis tests

Paired samples

Two samples are said to be paired if each data point in the first sample is uniquely assigned to a data point in the second sample. In medical research, paired samples occur, for example, in intervention studies. Data is collected from the same subject before and after an intervention.

Suppose we have collected data from \(n\) subjects. The two measurements of the \(i\)-th subject are labelled \(y_{1i}\) and \(y_{2i}\), and the \(i\)-th difference between the two measurements is \(d_i = y_{1i} - y_{2i}\).

Student \(t\)-test for paired samples

Perequisites

The prerequisites for using the \(t\)-test for paired samples are:

  • The sampling distribution of the pairwise differences between the measurements \(d_i\) is normal.
  • There is independence within the \(n\) pairs of measured values.

If the requirements are met, we determine the sample mean \(\bar{d}\) and the sample standard deviation \(sd\) of \(d_i\). To formulate the hypothesis, we use the difference between the population means: \(\mu_d = \mu_1 - \mu_2\).

Hypotheses

The null and alternative hypotheses for the one-sided and two-sided variants of the paired samples \(t\)-test are as follows:

  1. Two-sided test (two-tailed test)
    • \(H_0\): \(\mu_d = 0\)
    • \(H_a\): \(\mu_d \ne 0\)
  2. One-sided test (one-tailed test), case 1
    • \(H_0\): \(\mu_d \le 0\)
    • \(H_a\): \(\mu_d > 0\)
  3. One-sided test (one-tailed test), case 2
    • \(H_0\): \(\mu_d \ge 0\)
    • \(H_a\): \(\mu_d < 0\)

Test statistic

Using the sample mean \(\bar{d}\) and the sample standard deviation \(sd\) of \(d_i\), we calculate the test statistic as the quotient of the sample mean of the differences of the measured pairs and the standard error of the mean $$t = \frac{\bar{d}}{sd/\sqrt{n}}\,,$$ with the standard error of the mean \(SEM = sd/\sqrt{n}\).

Rejection region

For a specified probability \(\alpha\) of a type I error and a degree of freedom \(df = n - 1\), we determine \(t_{\alpha}\). We then compare \(t\) and \(t_{\alpha}\). The null hypothesis \(H_0\) (see corresponding hypotheses above) is rejected if:

  1. Two-sided test (two-tailed test): \(|t| \ge t_{\alpha/2}\)
  2. One-sided test (one-tailed test), case 1: \(t \ge t_{\alpha}\)
  3. One-sided test (one-tailed test), case 2: \(t \le - t_{\alpha}\)

Wilcoxon signed rank test

Perequisites

The prerequisites for using the Wilcoxon signed rank test are:

  • The sample data is at least ordinally scaled.
  • There is independence within the \(n\) pairs of measured values.

If the requirements are met, we determine some values using the differences between the measurements \(d_i\):

  • the number \(n^{\prime}\) of pairs of observations with a non-zero difference \(d_i \ne 0\)
  • rank all \(n^{\prime}\) non-zero differences by their absolute values and assign the values \((1, n^{\prime})\) to the ranked differences
  • \(T_{+}\) is the sum of the positive ranks, if there are no positive ranks, \(T_{+} = 0\)
  • \(T_{-}\) is the sum of the negative ranks, if there are no negative ranks, \(T_{-} = 0\).

Hypotheses

The median difference \(M\) in the population is used to formulate the hypotheses. The null and alternative hypotheses for the one-sided and two-sided variants of the Wilcoxon signed rank test are as follows:

  1. Two-sided test (two-tailed test)
    • \(H_0\): \(M = 0\)
    • \(H_a\): \(M \ne 0\)
  2. One-sided test (one-tailed test), case 1
    • \(H_0\): \(M \le 0\)
    • \(H_a\): \(M > 0\)
  3. One-sided test (one-tailed test), case 2
    • \(H_0\): \(M \ge 0\)
    • \(H_a\): \(M < 0\)

Case distinction, \(n^{\prime} \le 25\)

Test statistic

For the test statistic \(T\), use the following values:

  1. Two-sided test (two-tailed test): \(T = \min(T_-, T_+)\)
  2. One-sided test (one-tailed test), case 1: \(T = T_-\)
  3. One-sided test (one-tailed test), case 2: \(T = T_+\)
Rejection region

The \(H_0\) hypothesis is rejected for a given probability \(\alpha\) of a Type I error and a number \(n^{\prime} \le 25\) of pairs of observations with a non-zero difference if \(T\) is less than or equal to the corresponding entry in a table, e.g. (Ott & Longnecker, 2010, Table 6).

Case distinction, \(n^{\prime} > 25\)

Test statistic

In the case of \(n^{\prime} > 25\), the test statistic is calculated as follows $$z = \frac{T - \frac{n^{\prime}(n^{\prime} + 1)}{4}}{\sqrt{\frac{n^{\prime}(n^{\prime} + 1)(2n^{\prime} + 1)}{24}}}$$

Rejection region

For a specified probability \(\alpha\) of a type I error we determine \(z_{\alpha}\), , e.g. (Ott & Longnecker, 2010, Table 1). We then compare \(z\) and \(z_{\alpha}\). The null hypothesis \(H_0\) (see corresponding hypotheses above) is rejected if:

  1. Two-sided test (two-tailed test): \(z < -z_{\alpha/2}\)
  2. One-sided test (one-tailed test), case 1: \(z < -z_{\alpha}\)
  3. One-sided test (one-tailed test), case 2: \(z < -z_{\alpha}\)

Further reading

Drew, G. C. (1951). Variations in reflex blink-rate during visual-motor tasks. Quarterly Journal of Experimental Psychology, 3(2), 73–88. https://doi.org/10.1080/17470215108416776

Ott, R. L., & Longnecker, M. (2010). An introduction to statistical methods and data analysis (6th ed.). Brooks/Cole, Cengage Learning.

Petrie, A., & Sabin, C. (2005). Medical statistics at a glance (2nd ed.). Blackwell Publishing.

Rosner, B. (2016). Fundamentals of biostatistics (8th ed.). Cengage Learning.