This R package implements several non-parametric tests in chapters 1-5 of Higgins (2004), including tests for one sample, two samples, k samples, paired comparisons, blocked designs, trends and association. Built with Rcpp for efficiency and R6 for flexible, object-oriented design, it provides a unified framework for performing or creating custom permutation tests.
Install the stable version from CRAN:
install.packages("LearnNonparam")Install the development version from Github:
# install.packages("remotes")
remotes::install_github("qddyy/LearnNonparam")library(LearnNonparam)Construct a test object
t <- Wilcoxon$new(n_permu = 1e6)pmt
(permutation test)
wrapper# recommended for a unified API
t <- pmt("twosample.wilcoxon", n_permu = 1e6)Provide it with samples
set.seed(-1)
t$test(rnorm(10, 1), rnorm(10, 0))
Check the results
t$statistic
t$p_value
options(digits = 3)
t$print()
ggplot2::theme_set(ggplot2::theme_minimal())
t$plot(style = "ggplot2", binwidth = 1) # or ggplot2::autoplot(t, binwidth = 1)
Modify some settings and observe the change
t$type <- "asymp"
t$p_value
pmts() for tests implemented in this package.
pmts()| key | class | test | 
|---|---|---|
| onesample.quantile | Quantile | Quantile Test | 
| onesample.cdf | CDF | Inference on Cumulative Distribution Function | 
| twosample.difference | Difference | Two-Sample Test Based on Mean or Median | 
| twosample.wilcoxon | Wilcoxon | Two-Sample Wilcoxon Test | 
| twosample.ansari | AnsariBradley | Ansari-Bradley Test | 
| twosample.siegel | SiegelTukey | Siegel-Tukey Test | 
| twosample.rmd | RatioMeanDeviance | Ratio Mean Deviance Test | 
| distribution.ks | KolmogorovSmirnov | Two-Sample Kolmogorov-Smirnov Test | 
| distribution.kuiper | Kuiper | Two-Sample Kuiper Test | 
| distribution.cvm | CramerVonMises | Two-Sample Cramer-Von Mises Test | 
| distribution.ad | AndersonDarling | Two-Sample Anderson-Darling Test | 
| association.corr | Correlation | Test for Association Between Paired Samples | 
| paired.sign | Sign | Two-Sample Sign Test | 
| paired.difference | PairedDifference | Paired Comparison Based on Differences | 
| ksample.oneway | OneWay | One-Way Test for Equal Means | 
| ksample.kw | KruskalWallis | Kruskal-Wallis Test | 
| ksample.jt | JonckheereTerpstra | Jonckheere-Terpstra Test | 
| multcomp.studentized | Studentized | Multiple Comparison Based on Studentized Statistic | 
| rcbd.oneway | RCBDOneWay | One-Way Test for Equal Means in RCBD | 
| rcbd.friedman | Friedman | Friedman Test | 
| rcbd.page | Page | Page Test | 
| table.chisq | ChiSquare | Chi-Square Test on Contingency Table | 
define_pmt allows users to define new permutation tests.
Take the two-sample Wilcoxon test as an example:
t_custom <- define_pmt(
    # this is a two-sample permutation test
    method = "twosample",
    statistic = function(x, y) {
        # (optional) pre-calculate certain constants that remain invariant during permutation
        m <- length(x)
        n <- length(y)
        # return a closure to calculate the test statistic
        function(x, y) sum(x) / m - sum(y) / n
    },
    # reject the null hypothesis when the test statistic is too large or too small
    rejection = "<>", n_permu = 1e5
)
For R >= 4.4.0, the quickr package can
be used to accelerate statistic. However, this results in
repeated crossings of the R-Fortran boundary and makes pre-calculation
of constants impossible.
t_quickr <- define_pmt(
    method = "twosample", rejection = "<>", n_permu = 1e5,
    statistic = function(x, y) {
        sum(x) / length(x) - sum(y) / length(y)
    },
    quickr = TRUE
)
In cases where both pre-calculation and computational efficiency are required, the statistic can be written in C++. Leveraging Rcpp sugars and C++14 features, only minor modifications are needed to make it compatible with C++ syntax.
t_cpp <- define_pmt(
    method = "twosample", rejection = "<>", n_permu = 1e5,
    statistic = "[](const auto& x, const auto& y) {
        auto m = x.length();
        auto n = y.length();
        return [=](const auto& x, const auto& y) {
            return sum(x) / m - sum(y) / n;
        };
    }"
)
It’s easy to check that t_custom, t_quickr
and t_cpp are equivalent:
x <- rnorm(10, mean = 0)
y <- rnorm(10, mean = 5)
set.seed(0)
t_custom$test(x, y)$print()
set.seed(0)
t_quickr$test(x, y)$print()
set.seed(0)
t_cpp$test(x, y)$print()
coin is a commonly used R package for performing permutation tests. Below is a benchmark:
library(coin)
data <- c(x, y)
group <- factor(c(rep("x", length(x)), rep("y", length(y))))
options(LearnNonparam.pmt_progress = FALSE)
benchmark <- microbenchmark::microbenchmark(
    pure_R = t_custom$test(x, y),
    quickr = t_quickr$test(x, y),
    Rcpp = t_cpp$test(x, y),
    coin = wilcox_test(data ~ group, distribution = approximate(nresample = 1e5, parallel = "no"))
)
benchmark
It can be seen that C++ brings significantly better performance than pure R, which enables it to even surpass the coin package in its no-parallelization setting. However, all tests in this package are currently written in pure R with no plans for migration to C++ in the future. This is because the primary goal of this package is not to maximize performance but to offer a flexible framework for permutation tests.