# Hypothesis test for paired samples

## Overview

In this blog post, we present the procedure for a hypothesis test when two paired samples with continuous values are analysed. We give two possible tests and the necessary conditions for their application. We will check step by step which conditions are met and perform a hypothesis test using a sample data set.

# Hypothesis Testing for Paired Samples - Principles and an illustrative data analysis using Gnu R

## Objective and scope of application

Paired sample hypothesis testing is used to determine whether there is a significant difference between two related groups. This type of analysis is often used when the same subjects are measured before and after a treatment, or when pairs are matched in a meaningful way. The scope of paired sample hypothesis testing includes medical studies, psychological research, and any scenario where repeated measures or matched subjects are analysed.

## Motivating example

Based on data published by (Drew, 1951), in this blog post we want to analyse the influence of visuomotor tasks on the blink rate and determine whether there is a significant difference between the blink rate at rest and the blink rate during a simple visuomotor task (following a straight line with a pencil). We use data from 12 subjects who had to perform this visuomotor task. The average number of blinks per minute was recorded during the task. In addition, the blink rate under resting conditions was recorded as a reference.

## Analysis script in Gnu R

### Used Gnu R toolboxes

In this analysis script, software toolboxes are used in addition to the core functionality of Gnu R. The additional toolboxes are used for data import, data selection and pre-processing, data visualisation and data analysis. The integration of these toolboxes into the Gnu R analysis script will be by means of the following code fragment:

```
1library(tidyverse) # use of pipes and tibbles, data manipulation, pre-processing
2library(labelled) # handle data labels
3library(gt) # nice designed tables
4library(ggplot2) # plots and graphics
5library(qqplotr) # Q-Q plot extension for ggplot2
6library(rstatix) # pipe friendly statistics
7library(PairedData) # source of the sample data
8library(effectsize) # a large number of options for calculating effect sizes
9library(pander)
```

### Data

#### Data origin and data specification

The data we analyze in this blog post is provided by the Gnu R package `PairedData`

. It was originally published by (Drew, 1951). The following variables are included in this data set:

`Subject`

: Respondent ID`Resting`

: blink rate per minute in pre-experimental condition`Straight`

: blink rate per minute in 1st condition, move a pencil along a straight line`Oscillating`

: blink rate per minute in 2nd condition, move a pencil along a oscillating line

#### Data import

The first step is to import the `Blink2`

data set provided by the `PairedData`

toolbox. We use the `data`

function to do this. The imported data will be stored in the `dataAnalysis`

data structure.

```
1# import the data, provided by the toolbox PairedData
2data("Blink2")
3
4# assign the data to the tibble dataAnalysis
5dataAnalysis <- Blink2
```

Just for information, we determine the size of the data to be analyzed.

```
1# number of data items and variables
2dim(dataAnalysis)
3## [1] 12 4
```

The data to be analyzed are presented in tabular form below. Three measurements (arranged in columns) were performed for each of the 12 subjects (arranged in rows). The blink rate is given in blinks per minute.

```
1# output the data of dataAnalysis
2dataAnalysis %>%
3
4 # format the output using the command gt()
5 gt()
```

Subject | Resting | Straight | Oscillating |
---|---|---|---|

S01 | 28 | 19.0 | 19.3 |

S02 | 24 | 16.7 | 9.0 |

S03 | 23 | 2.7 | 1.1 |

S04 | 18 | 6.6 | 2.0 |

S05 | 17 | 12.0 | 1.9 |

S06 | 11 | 7.0 | 10.2 |

S07 | 10 | 6.0 | 1.9 |

S08 | 10 | 4.1 | 1.5 |

S09 | 6 | 3.0 | 0.5 |

S10 | 5 | 11.3 | 5.9 |

S11 | 4 | 5.9 | 4.5 |

S12 | 3 | 3.1 | 1.2 |

### Selecting an appropriate hypothesis test

#### Data requirements for applying each hypothesis test

Based on the research question, `\(H_0\)`

and `\(H_a\)`

hypotheses are formulated. The validity of these hypotheses is tested by hypothesis testing, assuming that the probability of a type 1 error is below a threshold `\(\alpha\)`

. `\(\alpha\)`

is typically referred to as the significance level. A significance level of `\(\alpha = 0.05\)`

is commonly used.

Various hypothesis testing methods are available for hypotheses based on interval or ratio scaled paired samples. To ensure correct results in hypothesis testing methods, certain requirements of the samples to be analyzed must be met.

Following a list of common hypothesis testing procedures, their data sample requirements and the respective commands of the `rstatix`

toolbox:

**Student**`\(t\)`

-test, paired samples- requirements
- data is
*interval or ratio scaled* - two variables that have matched pairs (e.g. measuring the same subject/patient twice or more)
- individual differences between the pairs should be
*normal distributed* - no outliers in these differences

- data is
`rstatix`

commands- hypothesis test:
`t_test()`

, https://rpkgs.datanovia.com/rstatix/reference/t_test.html - effect size:
`cohens_d()`

, https://rpkgs.datanovia.com/rstatix/reference/cohens_d.html

- hypothesis test:

- requirements
**Wilcoxon signed rank test**- requirements
- data is
*interval or ratio scaled* - two variables that have matched pairs (e.g. measuring the same subject/patient twice or more)
- a
*normal distribution of the differences is not a necessary*prerequisite

- data is
`rstatix`

commands- hypothesis test:
`wilcox_test()`

, https://rpkgs.datanovia.com/rstatix/reference/wilcox_test.html - effect size:
`wilcox_effsize()`

, https://rpkgs.datanovia.com/rstatix/reference/wilcox_effsize.html

- hypothesis test:

- requirements

In the following, we use methods of explorative analysis and statistical tests to check which requirements are fulfilled and which statistical test can be used.

#### Distribution of differences between pairs, outliers in the differences between pairs

To gain an understanding of the data being analysed, we use visualisation techniques familiar from exploratory data analysis.

```
1# use data selected for analysis
2dataAnalysis %>%
3 # select the columns Subject, Resting, Straight
4 dplyr::select(Subject, Resting, Straight) %>%
5 # create a new pivot table, with the columns 'Subject', 'condition' and 'eyeblinkrate'
6 pivot_longer(
7 # merge the two columns 'Resting' and 'Straight'
8 cols = c(Resting, Straight),
9 # former column headings 'Resting' and 'Straight' as labels in the new column 'condition'
10 names_to = "condition",
11 # values of 'Resting' and 'Straight' in the new column 'eyeblinkrate'
12 values_to = "eyeblinkrate"
13 ) %>%
14 # initialize ggplot object, x-axis 'condition', y-axis 'eyeblinkrate'
15 ggplot(aes(x=condition, y=eyeblinkrate)) +
16 # add a violin plot, with a semi-transparent background
17 geom_violin(alpha=0.2) +
18 # add a small box plot
19 geom_boxplot(width=0.1, color="darkgrey", alpha=0.2, linewidth=.7) +
20 # add points for each mesurement, use different colors for conditions
21 geom_point(aes(fill=condition, group=Subject), size=4, shape=21, alpha=0.5) +
22 # connect paired measurements by dashed lines
23 geom_line(aes(group=Subject), linetype=2, linewidth=0.7) +
24 # use color palette 'Dark2' from https://colorbrewer2.org
25 scale_fill_brewer(palette="Dark2") +
26 # set x-, y-label and legend heading
27 labs(x = "Condition", y = "Eye blink rate per minute", fill = "Condition") +
28 # select theme for the plot
29 theme_bw()
```

The measurements of the two conditions ‘Rest’ and ‘Straight’ of the subjects are shown as coloured dots. Dots from the same subject are connected by dashed lines. The distribution of the data within the conditions is visualised by box plots and violin plots.

A closer look at the visualised data shows that the blink rate in the visuomotor task decreased compared to the resting state in nine of the twelve subjects (dashed lines sloping from left to right). It increased in two subjects and remained almost unchanged in one subject. In further analysis we would like to investigate whether the decrease in blink rate during the visuomotor task is a significant change compared to the resting state.

**For paired data, an appropriate hypothesis test is chosen based on the properties of the difference between the pairs of data.** Therefore, we first calculate this difference and output the table with the additional column containing these differences (column heading ‘Res - Str’).

```
1# create a new containing the difference
2dataAnalysisDiff <- dataAnalysis %>%
3 # select variables Subject, Resting, Straight
4 dplyr::select(Subject, Resting, Straight) %>%
5 # compute the difference and store it in the column DiffRS
6 mutate(DiffRS = Resting - Straight)
7
8# assign the label 'Res - Str' to the new column DiffRS
9var_label(dataAnalysisDiff$DiffRS) <- 'Res - Str'
10
11# output the data set 'dataAnalysisDiff'
12dataAnalysisDiff %>%
13 gt()
```

Subject | Resting | Straight | Res - Str |
---|---|---|---|

S01 | 28 | 19.0 | 9.0 |

S02 | 24 | 16.7 | 7.3 |

S03 | 23 | 2.7 | 20.3 |

S04 | 18 | 6.6 | 11.4 |

S05 | 17 | 12.0 | 5.0 |

S06 | 11 | 7.0 | 4.0 |

S07 | 10 | 6.0 | 4.0 |

S08 | 10 | 4.1 | 5.9 |

S09 | 6 | 3.0 | 3.0 |

S10 | 5 | 11.3 | -6.3 |

S11 | 4 | 5.9 | -1.9 |

S12 | 3 | 3.1 | -0.1 |

In the next step, we visualise the blink rates at rest and during the simple visual-motor task, as well as the difference between them, using bar plots.

```
1# use data selected for analysis, including the differences
2dataAnalysisDiff %>%
3 # select the columns Subject, Resting, Straight, DiffRS
4 dplyr::select(Subject, Resting, Straight, DiffRS) %>%
5 # create a new pivot table, with the columns 'Subject', 'Resting', 'Straight'
6 # and 'DiffRS'
7 pivot_longer(
8 # merge the two columns 'Resting', 'Straight' and 'DiffRS'
9 cols = c(Resting, Straight, DiffRS),
10 # former column headings 'Resting', 'Straight' and 'DiffRS' as labels in the new column 'condition'
11 names_to = "Condition",
12 # values of 'Resting', 'Straight' and 'DiffRS' in the new column 'eyeblinkrate'
13 values_to = "Values"
14 ) %>%
15 # order the conditions for the plot
16 mutate(Condition=factor(Condition, levels=c("Resting", "Straight", "DiffRS"),
17 ordered=FALSE)) %>%
18 # initialize ggplot object, x-axis 'Subject', y-axis 'eyeblinkrate'
19 ggplot(aes(x=Subject, y=Values, fill=Condition)) +
20 # add a bar plot
21 geom_bar(stat="identity", color="black", width=0.7, position=position_dodge()) +
22 # set x-, y-label and legend heading
23 labs(y = "Eye blink rate per minute", fill = "Condition") +
24 theme_bw()
```

The properties of the differences are relevant for selecting an appropriate hypothesis test. These are shown as blue bars in the plot above. Some statistical parameters of these differences are given below.

```
1dataAnalysisDiff %>%
2 # select the column DiffRS
3 dplyr::select(DiffRS) %>%
4 # compute descriptive statistics, measures of central tendency and dispersion
5 summarytools::descr(
6 stats = c("mean", "sd", "min", "q1", "med", "q3", "max", "iqr"),
7 transpose = TRUE) %>%
8 # output as table
9 gt() %>%
10 # format the values of column statistic and p, 2 digits
11 fmt_number(
12 decimals = 2
13 )
```

Mean | Std.Dev | Min | Q1 | Median | Q3 | Max | IQR |
---|---|---|---|---|---|---|---|

5.13 | 6.77 | −6.30 | 1.45 | 4.50 | 8.15 | 20.30 | 5.50 |

First, let’s look at the outliers in the differences. Outliers are data points that are above or below the whiskers of the box plot. We distinguish extreme outliers, which are above or below twice the length of the whiskers. Whether the data contains outliers can be determined visually using the box plots. We use the boxplot plot and the `identify_outliers`

command from the `rstatix`

toolbox.

```
1# use data selected for analysis, including the differences
2dataAnalysisDiff %>%
3 # initialize ggplot object, x-axis "", y-axis 'DiffRS'
4 ggplot(aes(x="", y=DiffRS)) +
5 # add a box plot
6 geom_boxplot(linewidth=1) +
7 # add points for the differences of each subject
8 geom_point(fill="green", size=4, shape=21, alpha=0.5)+
9 # use color palette 'Dark2' from https://colorbrewer2.org
10 scale_fill_brewer(palette="Dark2") +
11 # set x-, y-label
12 labs(x = "Difference between condition Straight and Resting",
13 y = "Diff Eye blink rate per minute") +
14 # select theme for the plot
15 theme_bw()
```

In the visualisation above we can identify two outliers.

Using the `identify_outliers`

command of the `rstatix`

toolbox, it is also possible to assign the outliers to the subjects and to grade the outlier, see the following chunk of code.

```
1# use data selected for analysis, including the differences
2dataAnalysisDiff %>%
3 # select subject IDs and differences
4 dplyr::select(Subject, 'DiffRS') %>%
5 # find the outliers
6 identify_outliers('DiffRS') %>%
7 # output the result as table
8 gt()
```

Subject | Res - Str | is.outlier | is.extreme |
---|---|---|---|

S03 | 20.3 | TRUE | FALSE |

S10 | -6.3 | TRUE | FALSE |

The next step is to check whether the data to be analysed is normally distributed. Both exploratory statistical methods (Q-Q plot) and statistical tests (Shapiro-Wilk test) are available for this purpose.

Let’s start with the exploratory approach, the Q-Q plot. In a Q-Q plot, the quantiles of the empirical distribution of the data being analysed (Y-axis) are plotted against the theoretical quantiles of the normal distribution (X-axis). When the empirical and theoretical quantiles are approximately equal, the paired data are approximately on a diagonal. In the Q-Q plots below, we have inserted a reference line where the data points would lie if they were normally distributed. We have also highlighted the 95% confidence interval in light blue.

```
1# use data selected for analysis
2dataAnalysisDiff %>%
3 # initialize ggplot object for a Q-Q plot of differences 'DiffSO'
4 ggplot(aes(sample = DiffRS)) +
5 # Q-Q plot of the differences
6 stat_qq_point() +
7 # insert reference line for normal distribution
8 stat_qq_line(distribution = "norm") +
9 # add a 95% confidence band
10 stat_qq_band(conf = 0.95, alpha = 0.5, fill = "lightblue", color = "black") +
11 # set color palette
12 scale_colour_brewer(palette = "Dark2") +
13 # set label and title
14 labs(x="Normal theoretical quantiles",
15 y="Difference eye blink rate, resting - straight, per minute", ) +
16 # set theme
17 theme_bw()
```

Seven of the twelve differences are very close to the line that marks an ideal normal distribution. The values at the ‘tails’ of the distribution are more distant from this ideal. Eleven values are within the 95% confidence interval and one is outside.

In addition to visual approaches, statistical tests are available to check the normal distribution of collected data, such as the Shapiro-Wilk test of normality. The null hypothesis `\(H_0\)`

of the Shapiro-Wilk normality test is that the population data follow a normal distribution. The corresponding alternative hypothesis `\(H_a\)`

is that there is no normal distribution.

```
1# use data selected for analysis including differences, assign the result to resShapiroTest
2dataAnalysisDiff %>%
3 # check by means of Shapiro-Wilk test if differences are normal distributed
4 shapiro_test(DiffRS) %>%
5 # add star coding for significance
6 add_significance("p") %>%
7 # output as table
8 gt()
```

variable | statistic | p | p.signif |
---|---|---|---|

DiffRS | 0.9605808 | 0.7920899 | ns |

The `\(p\)`

value obtained by the Shapiro-Wilk test is greater than 0.05. Therefore, the null hypothesis `\(H_0\)`

of a normal distribution is not rejected. The differences are consistent with a normal distribution.

#### Summary and selection of the statistical test

The exploratory review of the data led to the following conclusion:

- The blink rate data (per minute) are ratio scaled.
- The Shapiro-Wilk normality test showed that the differences follow a normal distribution.
- The sample size is small (12 subjects) and there are outliers in the differences.

For these reasons, we choose the **Wilcoxon signed-rank test** for hypothesis testing.

#### Performing the statistical test

In the exploratory data analysis, we observed that the blink rate tends to decrease during visual-motor tasks compared to the resting state. We use the selected hypothesis test to test the significance of this decrease. The hypothesis test we use is the Wilcoxon signed-rank test for paired data. Therefore, we have to formulate the hypotheses using the differences between the two conditions (resting, straight) to be analysed.

`\(H_0\)`

: The median difference between the blink rates of the two conditions ‘resting’ and ‘straight’ is less than or equal to zero.`\(H_a\)`

: The median difference between the blink rates of the two conditions ‘resting’ and ‘straight’ is greater than zero.

The Wilcoxon signed-rank test is then performed. Two remarks:

- The reference in our test is the resting condition, see the ‘ref.group’ parameter in the command.
- In the command
`wilcox_test`

we have to specify the alternative hypotheses`\(H_a\)`

, parameter ‘alternative’.

```
1# use data selected for analysis
2resWilcoxTest <- dataAnalysis %>%
3 # select the columns Subject, Resting, Straight
4 dplyr::select(Subject, Resting, Straight) %>%
5 # create a new pivot table, with the columns 'Subject', 'condition' and 'eyeblinkrate'
6 pivot_longer(
7 # merge the two columns 'Resting' and 'Straight'
8 cols = c(Resting, Straight),
9 # former column headings 'Resting' and 'Straight' as labels in the new column 'condition'
10 names_to = "condition",
11 # values of 'Resting' and 'Straight' in the new column 'eyeblinkrate'
12 values_to = "eyeblinkrate"
13 ) %>%
14 # performing Wilcoxon signed-rank test
15 wilcox_test(
16 # compare eye blink rate of the conditions 'Resting' and 'Straight'
17 eyeblinkrate ~ condition,
18 # paired values
19 paired = TRUE,
20 # reference is the condition 'Resting'
21 ref.group = "Resting",
22 # alternative hypothesis is greater
23 alternative = "greater",
24 # detailed output of test results
25 detailed = TRUE) %>%
26 # add significance star code
27 add_significance()
```

The results of the test are shown in the table below.

```
1resWilcoxTest %>%
2 gt() %>%
3 # format the values of column estimate and p, 3 digits
4 fmt_number(
5 columns = c(estimate, p, conf.low, conf.high),
6 decimals = 3
7 ) %>%
8 # output '<0.001' for values smaller than 0.001
9 sub_small_vals(threshold = 0.001)
```

estimate | .y. | group1 | group2 | n1 | n2 | statistic | p | conf.low | conf.high | method | alternative | p.signif |
---|---|---|---|---|---|---|---|---|---|---|---|---|

4.817 | eyeblinkrate | Resting | Straight | 12 | 12 | 67 | 0.015 | 1.550 | Inf | Wilcoxon | greater | * |

The result of the hypothesis test can also be output as a string using the `get_test_label`

command from the `rstatix`

toolbox, as shown below.

```
1resWilcoxTest %>%
2 get_test_label(detailed = TRUE, type = "text")
3## [1] "Wilcoxon test, V = 67, p = 0.015, n = 12"
```

Finally, we determine the effect size of this statistical test. The effect size is a quantifiable measure of an empirical effect and is used to illustrate the practical relevance of the results of statistical tests. To do this, we calculate the Pearson correlation coefficient `\(r\)`

for the Wilcoxon sign rank test. We use the `wilcox_effsize`

command of the `rstatix`

toolbox and output the result as a table using `gt`

.

```
1# use data selected for analysis
2dataAnalysis %>%
3 # select the columns Subject, Resting, Straight
4 dplyr::select(Subject, Resting, Straight) %>%
5 # create a new pivot table, with the columns 'Subject', 'condition' and 'eyeblinkrate'
6 pivot_longer(
7 # merge the two columns 'Resting' and 'Straight'
8 cols = c(Resting, Straight),
9 # former column headings 'Resting' and 'Straight' as labels in the new column 'condition'
10 names_to = "condition",
11 # values of 'Resting' and 'Straight' in the new column 'eyeblinkrate'
12 values_to = "eyeblinkrate"
13 ) %>%
14 # estimate the effect size
15 wilcox_effsize(
16 # compare eye blink rate of the conditions 'Resting' and 'Straight'
17 eyeblinkrate ~ condition,
18 # paired values
19 paired = TRUE,
20 # reference is the condition 'Resting'
21 ref.group = "Resting",
22 # alternative hypothesis is greater
23 alternative = "greater") %>%
24 # output as table
25 gt()
```

.y. | group1 | group2 | effsize | n1 | n2 | magnitude |
---|---|---|---|---|---|---|

eyeblinkrate | Resting | Straight | 0.6343192 | 12 | 12 | large |

### Reporting the results of the statistical analysis

In our study we analysed the influence of visual-motor tasks on the eye blink rate and investigated whether there is a significant difference between the eye blink rate in the resting state and the eye blink rate during a simple visual-motor task. We retrospectively analysed the data of twelve subjects whose eye blink rate was measured under different conditions, at rest and during a simple visual-motor task (straight line).

As we were analysing paired data, we analysed the characteristics of the differences between the two conditions in order to select an appropriate hypothesis test. We performed a Shapiro-Wilk normality test, which showed no evidence of non-normality `\((W = 0.96, p = 0.79)\)`

. In the exploratory analysis using the Q-Q plot, one sample point deviated more from the normal distribution. The number of samples to be analysed is small `\((n = 12)\)`

. The differences between the two conditions contain two outliers. Based on this preliminary analysis, we chose the **Wilcoxon signed-rank test** to test the `\(H_0\)`

hypothesis: The median difference between the blink rates of the two conditions visuomotor task and resting state is less than or equal to zero.

The null hypothesis was rejected by the test `\((V_{\mathrm{Wilcoxon}} = 67\)`

, `\(p = 0.0155\)`

, effect size `\(r = 0.634)\)`

, the alternative hypothesis was accepted. We conclude that the blink rate is significantly reduced during simple visual-motor tasks compared to the resting state.

```
1# create the line with statistical information inserted at the top of the figure
2statExp = expression(paste(italic("V")[Wilcoxon], "", " = ", "67, ",
3 italic("p"), " = ", "0.016", "",
4 paste(", ", italic("r"), " = ", "0.634"),
5 paste(", ", italic("n")[pairs], " = ", "12")))
6
7# use data selected for analysis
8dataAnalysis %>%
9 # select the columns Subject, Resting, Straight
10 dplyr::select(Subject, Resting, Straight) %>%
11 # create a new pivot table, with the columns 'Subject', 'condition' and 'eyeblinkrate'
12 pivot_longer(
13 # merge the two columns 'Resting' and 'Straight'
14 cols = c(Resting, Straight),
15 # former column headings 'Resting' and 'Straight' as labels in the new column 'condition'
16 names_to = "condition",
17 # values of 'Resting' and 'Straight' in the new column 'eyeblinkrate'
18 values_to = "eyeblinkrate"
19 ) %>%
20 # initialize ggplot object, x-axis 'condition', y-axis 'eyeblinkrate'
21 ggplot(aes(x=condition, y=eyeblinkrate)) +
22 # add a violin plot, with a semi-transparent background
23 geom_violin(alpha=0.2) +
24 # add a small box plot
25 geom_boxplot(width=0.1, color="darkgrey", alpha=0.2, linewidth=.7) +
26 # add points for each mesurement, use different colors for conditions
27 geom_point(aes(fill=condition, group=Subject), size=4, shape=21, alpha=0.5) +
28 # connect paired measurements by dashed lines
29 geom_line(aes(group=Subject), linetype=2, linewidth=0.7) +
30 # use color palette 'Dark2' from https://colorbrewer2.org
31 scale_fill_brewer(palette="Dark2") +
32 # set x-, y-label and legend heading
33 labs(x = "Condition", y = "Eye blink rate per minute", fill = "Condition") +
34 annotate("text", 1.5, 31, label = statExp, parse = TRUE, size = 4) +
35 # select theme for the plot
36 theme_bw()
```

## Gnu R toolboxes that we used

We used a number of toolboxes for the statistical analysis of the data and the visualisation of the results and interim results. Below is a list of these toolboxes with a brief description and links to more detailed information:

`tidyverse`

— This toolbox includes a number of Gnu R software packages that use**tidy data**data structures and in which the design philosophy and grammar of**tidy data**are implemented, e.g. pipes. Information on**tidy data**: H. Wickham,*Tidy data*, The Journal of Statistical Software, Vol. 59, 2014, information on the toolbox: https://www.tidyverse.org/.`labelled`

— This package provides functions for working with labelled data, information on the toolbox: <http://larmarange.github.io/labelled/index.html.`gt`

— With the`gt`

package, it is possible to create beautiful looking tables using R, information on the toolbox: https://gt.rstudio.com/.`ggplot2`

— In the`ggplot2`

toolbox a system for the declarative creation of graphics is realised, which is based on ‘The Grammar of Graphics’, information on the toolbox: https://ggplot2.tidyverse.org/.`qqplotr`

— This package adds Q-Q plotting functionality to the`ggplot2`

package, information on the toolbox: https://github.com/aloy/qqplotr.`rstatix`

— This package provides a simple, intuitive and pipe-friendly framework for performing statistical tests, information on the toolbox: https://rpkgs.datanovia.com/rstatix/.`PairedData`

— Provides example data sets, information on the toolbox: https://cran.r-project.org/web/packages/PairedData/index.html.`effectsize`

— Functions for estimating a range of effect size indices are provided in this package, information on the toolbox: https://easystats.github.io/effectsize/.

Information about the hard- and software configuration of the computer on which this post was authored:

```
1pander(sessionInfo())
```

**R version 4.4.0 (2024-04-24 ucrt)**

**Platform:** x86_64-w64-mingw32/x64

**locale:**
*LC_COLLATE=German_Austria.utf8*, *LC_CTYPE=German_Austria.utf8*, *LC_MONETARY=German_Austria.utf8*, *LC_NUMERIC=C* and *LC_TIME=German_Austria.utf8*

**attached base packages:**
*stats*, *graphics*, *grDevices*, *utils*, *datasets*, *methods* and *base*

**other attached packages:**
*pander(v.0.6.5)*, *effectsize(v.0.8.8)*, *PairedData(v.1.1.1)*, *lattice(v.0.22-6)*, *mvtnorm(v.1.2-5)*, *gld(v.2.6.6)*, *MASS(v.7.3-60.2)*, *rstatix(v.0.7.2)*, *qqplotr(v.0.0.6)*, *gt(v.0.10.1)*, *labelled(v.2.13.0)*, *lubridate(v.1.9.3)*, *forcats(v.1.0.0)*, *stringr(v.1.5.1)*, *dplyr(v.1.1.4)*, *purrr(v.1.0.2)*, *readr(v.2.1.5)*, *tidyr(v.1.3.1)*, *tibble(v.3.2.1)*, *ggplot2(v.3.5.1)* and *tidyverse(v.2.0.0)*

**loaded via a namespace (and not attached):**
*bitops(v.1.0-7)*, *tcltk(v.4.4.0)*, *sandwich(v.3.1-0)*, *rlang(v.1.1.3)*, *magrittr(v.2.0.3)*, *multcomp(v.1.4-25)*, *qqconf(v.1.3.2)*, *matrixStats(v.1.3.0)*, *e1071(v.1.7-14)*, *compiler(v.4.4.0)*, *reshape2(v.1.4.4)*, *pbmcapply(v.1.5.1)*, *vctrs(v.0.6.5)*, *summarytools(v.1.0.1)*, *pkgconfig(v.2.0.3)*, *fastmap(v.1.2.0)*, *magick(v.2.8.3)*, *backports(v.1.4.1)*, *labeling(v.0.4.3)*, *caTools(v.1.18.2)*, *utf8(v.1.2.4)*, *rmarkdown(v.2.27)*, *tzdb(v.0.4.0)*, *pracma(v.2.4.4)*, *haven(v.2.5.4)*, *modeltools(v.0.2-23)*, *xfun(v.0.44)*, *cachem(v.1.1.0)*, *jsonlite(v.1.8.8)*, *highr(v.0.10)*, *pryr(v.0.1.6)*, *broom(v.1.0.6)*, *parallel(v.4.4.0)*, *R6(v.2.5.1)*, *coin(v.1.4-3)*, *bslib(v.0.7.0)*, *stringi(v.1.8.4)*, *RColorBrewer(v.1.1-3)*, *car(v.3.1-2)*, *jquerylib(v.0.1.4)*, *estimability(v.1.5.1)*, *Rcpp(v.1.0.12)*, *bookdown(v.0.39)*, *iterators(v.1.0.14)*, *knitr(v.1.46)*, *zoo(v.1.8-12)*, *base64enc(v.0.1-3)*, *parameters(v.0.21.7)*, *Matrix(v.1.7-0)*, *splines(v.4.4.0)*, *timechange(v.0.3.0)*, *tidyselect(v.1.2.1)*, *rstudioapi(v.0.16.0)*, *abind(v.1.4-5)*, *yaml(v.2.3.8)*, *doParallel(v.1.0.17)*, *codetools(v.0.2-20)*, *blogdown(v.1.19)*, *plyr(v.1.8.9)*, *opdisDownsampling(v.1.0.1)*, *withr(v.3.0.0)*, *bayestestR(v.0.13.2)*, *coda(v.0.19-4.1)*, *evaluate(v.0.23)*, *survival(v.3.5-8)*, *proxy(v.0.4-27)*, *xml2(v.1.3.6)*, *pillar(v.1.9.0)*, *carData(v.3.0-5)*, *stats4(v.4.4.0)*, *checkmate(v.2.3.1)*, *foreach(v.1.5.2)*, *insight(v.0.19.11)*, *generics(v.0.1.3)*, *hms(v.1.1.3)*, *munsell(v.0.5.1)*, *scales(v.1.3.0)*, *xtable(v.1.8-4)*, *class(v.7.3-22)*, *glue(v.1.7.0)*, *emmeans(v.1.10.2)*, *lmom(v.3.0)*, *tools(v.4.4.0)*, *robustbase(v.0.99-2)*, *rapportools(v.1.1)*, *grid(v.4.4.0)*, *libcoin(v.1.0-10)*, *datawizard(v.0.10.0)*, *colorspace(v.2.1-0)*, *cli(v.3.6.2)*, *twosamples(v.2.0.1)*, *fansi(v.1.0.6)*, *gtable(v.0.3.5)*, *DEoptimR(v.1.1-3)*, *sass(v.0.4.9)*, *digest(v.0.6.35)*, *TH.data(v.1.1-2)*, *farver(v.2.1.2)*, *htmltools(v.0.5.8.1)* and *lifecycle(v.1.0.4)*

## Theoretical foundations of paired sample hypothesis tests

### Paired samples

Two samples are said to be paired if each data point in the first sample is uniquely assigned to a data point in the second sample. In medical research, paired samples occur, for example, in intervention studies. Data is collected from the same subject before and after an intervention.

Suppose we have collected data from `\(n\)`

subjects. The two measurements of the `\(i\)`

-th subject are labelled `\(y_{1i}\)`

and `\(y_{2i}\)`

, and the `\(i\)`

-th difference between the two measurements is `\(d_i = y_{1i} - y_{2i}\)`

.

### Student `\(t\)`

-test for paired samples

#### Perequisites

The prerequisites for using the `\(t\)`

-test for paired samples are:

- The sampling distribution of the pairwise differences between the measurements
`\(d_i\)`

is normal. - There is independence within the
`\(n\)`

pairs of measured values.

If the requirements are met, we determine the sample mean `\(\bar{d}\)`

and the sample standard deviation `\(sd\)`

of `\(d_i\)`

. To formulate the hypothesis, we use the difference between the population means: `\(\mu_d = \mu_1 - \mu_2\)`

.

#### Hypotheses

The null and alternative hypotheses for the one-sided and two-sided variants of the paired samples `\(t\)`

-test are as follows:

- Two-sided test (two-tailed test)
`\(H_0\)`

:`\(\mu_d = 0\)`

`\(H_a\)`

:`\(\mu_d \ne 0\)`

- One-sided test (one-tailed test), case 1
`\(H_0\)`

:`\(\mu_d \le 0\)`

`\(H_a\)`

:`\(\mu_d > 0\)`

- One-sided test (one-tailed test), case 2
`\(H_0\)`

:`\(\mu_d \ge 0\)`

`\(H_a\)`

:`\(\mu_d < 0\)`

#### Test statistic

Using the sample mean `\(\bar{d}\)`

and the sample standard deviation `\(sd\)`

of `\(d_i\)`

, we calculate the test statistic as the quotient of the sample mean of the differences of the measured pairs and the standard error of the mean `$$t = \frac{\bar{d}}{sd/\sqrt{n}}\,,$$`

with the standard error of the mean `\(SEM = sd/\sqrt{n}\)`

.

#### Rejection region

For a specified probability `\(\alpha\)`

of a type I error and a degree of freedom `\(df = n - 1\)`

, we determine `\(t_{\alpha}\)`

. We then compare `\(t\)`

and `\(t_{\alpha}\)`

. The null hypothesis `\(H_0\)`

(see corresponding hypotheses above) is rejected if:

- Two-sided test (two-tailed test):
`\(|t| \ge t_{\alpha/2}\)`

- One-sided test (one-tailed test), case 1:
`\(t \ge t_{\alpha}\)`

- One-sided test (one-tailed test), case 2:
`\(t \le - t_{\alpha}\)`

### Wilcoxon signed rank test

#### Perequisites

The prerequisites for using the Wilcoxon signed rank test are:

- The sample data is at least
*ordinally scaled*. - There is independence within the
`\(n\)`

pairs of measured values.

If the requirements are met, we determine some values using the differences between the measurements `\(d_i\)`

:

- the number
`\(n^{\prime}\)`

of pairs of observations with a non-zero difference`\(d_i \ne 0\)`

- rank all
`\(n^{\prime}\)`

non-zero differences by their absolute values and assign the values`\((1, n^{\prime})\)`

to the ranked differences `\(T_{+}\)`

is the sum of the positive ranks, if there are no positive ranks,`\(T_{+} = 0\)`

`\(T_{-}\)`

is the sum of the negative ranks, if there are no negative ranks,`\(T_{-} = 0\)`

.

#### Hypotheses

The median difference `\(M\)`

in the population is used to formulate the hypotheses. The null and alternative hypotheses for the one-sided and two-sided variants of the Wilcoxon signed rank test are as follows:

- Two-sided test (two-tailed test)
`\(H_0\)`

:`\(M = 0\)`

`\(H_a\)`

:`\(M \ne 0\)`

- One-sided test (one-tailed test), case 1
`\(H_0\)`

:`\(M \le 0\)`

`\(H_a\)`

:`\(M > 0\)`

- One-sided test (one-tailed test), case 2
`\(H_0\)`

:`\(M \ge 0\)`

`\(H_a\)`

:`\(M < 0\)`

#### Case distinction, `\(n^{\prime} \le 25\)`

##### Test statistic

For the test statistic `\(T\)`

, use the following values:

- Two-sided test (two-tailed test):
`\(T = \min(T_-, T_+)\)`

- One-sided test (one-tailed test), case 1:
`\(T = T_-\)`

- One-sided test (one-tailed test), case 2:
`\(T = T_+\)`

##### Rejection region

The `\(H_0\)`

hypothesis is rejected for a given probability `\(\alpha\)`

of a Type I error and a number `\(n^{\prime} \le 25\)`

of pairs of observations with a non-zero difference if `\(T\)`

is less than or equal to the corresponding entry in a table, e.g. (Ott & Longnecker, 2010, Table 6).

#### Case distinction, `\(n^{\prime} > 25\)`

##### Test statistic

In the case of `\(n^{\prime} > 25\)`

, the test statistic is calculated as follows
`$$z = \frac{T - \frac{n^{\prime}(n^{\prime} + 1)}{4}}{\sqrt{\frac{n^{\prime}(n^{\prime} + 1)(2n^{\prime} + 1)}{24}}}$$`

##### Rejection region

For a specified probability `\(\alpha\)`

of a type I error we determine `\(z_{\alpha}\)`

, , e.g. (Ott & Longnecker, 2010, Table 1). We then compare `\(z\)`

and `\(z_{\alpha}\)`

. The null hypothesis `\(H_0\)`

(see corresponding hypotheses above) is rejected if:

- Two-sided test (two-tailed test):
`\(z < -z_{\alpha/2}\)`

- One-sided test (one-tailed test), case 1:
`\(z < -z_{\alpha}\)`

- One-sided test (one-tailed test), case 2:
`\(z < -z_{\alpha}\)`

## Further reading

Drew, G. C. (1951). Variations in reflex blink-rate during visual-motor tasks. *Quarterly Journal of Experimental Psychology*, *3*(2), 73–88. https://doi.org/10.1080/17470215108416776

Ott, R. L., & Longnecker, M. (2010). *An introduction to statistical methods and data analysis* (6th ed.). Brooks/Cole, Cengage Learning.

Petrie, A., & Sabin, C. (2005). *Medical statistics at a glance* (2nd ed.). Blackwell Publishing.

Rosner, B. (2016). *Fundamentals of biostatistics* (8th ed.). Cengage Learning.