Statistics-Tool-Box

This is a single header file inspired by stb.h by Sean Barrett with a bunch of useful statistical functions

============================================================================

 You MUST

		#define STB_STATS_DEFINE

 in EXACTLY _one_ C or C++ file that includes this header, BEFORE the
 include, like this:

		#define STB_STATS_DEFINE
		#include "stb_stats.h"

 All other files should just #include "stb_stats.h" without the #define.

============================================================================

Repository Structure

stb_stats.h - Main header file with statistical functions
examples/ - Example programs demonstrating library usage:
- dim_reduce.c - Dimensionality reduction (PCA, t-SNE, UMAP)
- deseq2_example.c - DESeq2-style differential expression analysis
- spearman.c - Spearman's rank correlation calculator
test_stb_stats.c - Comprehensive test suite
test_isolated.c - Isolated tests for specific functions

Functions included are:

stb_tsne (t-SNE: t-Distributed Stochastic Neighbor Embedding with Barnes-Hut approximation)
stb_umap (UMAP: Uniform Manifold Approximation and Projection)
stb_kdtree KD-tree data structure for efficient nearest neighbor search (used by t-SNE and UMAP)
stb_adjust_pvalues_bh (apply Benjamini-Hochberg FDR correction to array of p-values), stb_log2_fold_change
stb_moderated_ttest, stb_cosine_similarity, RSE Normalization (stb_calc_geometric_scaling_factors and stb_meanvar_counts_to_common_scale)
stb_shannon (Shannon's diversity index, Pilou evenness, stb_simpson (Simpson's Diversity Index), stb_jaccard (Jaccard similarity index), stb_bray_curtis (Bray–Curtis dissimilarity) and stb_create_htable a simple basic hash table
stb_pdf_hypgeo hypergeometric distribution probability density function, speedup stb_log_factorial using lookup table
stb_fisher2x2 simple fisher exact test for 2x2 contigency tables
stb_pdf_binom and stb_pdf_pois, the binomial and poison probability density functions
stb_polygamma, stb_trigamma_inverse gamme functions and stb_fit_f_dist for moment estimation of the scaled F-distribution
stb_qnorm and stb_qnorm_with_reference (also matrix variants) quantile normalization between columns with and without a reference
stb_neugas Neural gas clustering algorithm
stb_pca Principal Component Analysis
stb_csm (confident sequence method) for monte-carlo simulations
stb_kmeans k-means++ classical data clustering
stb_qsort (Quicksort), could be used to replace current sorting method
stb_cdf_gumbel, stb_pdf_gumbel, stb_icdf_gumbel and stb_est_gumbel, the (inverse) cumulative/probability density functions for the gumbel distribution and the ML estimator of the gumbel parameters
stb_kendall (Kendall's Rank correlation)
stb_jenks Initial port of O(k×n×log(n)) Jenks-Fisher algorithm originally created by Maarten Hilferink
stb_logistic_regression_L2 simple L2-regularized logistic regression
stb_spearman (Spearman's Rank correlation)
stb_invert_matrix, stb_transpose_matrix, stb_matrix_multiply, ..., stb_multi_linear_regression and stb_multi_logistic_regression
stb_ksample_anderson_darling, stb_2sample_anderson_darling, (one sample) stb_anderson_darling
stb_expfit (Exponential fitting), stb_polyfit (Polynomial fitting), stb_powfit (Power curve fitting), stb_linfit (Liniear fitting)
stb_trap, stb_trapezoidal (returns the integral (area under the cruve) of a given function and interval)
stb_lagrange (polynomial interpolation), stb_sum (Neumaier summation algorithm)
stb_mann_whitney, stb_kruskal_wallis (Unfinished, needs a better way to handle Dunn's post-hoc test)
stb_combinations
stb_allocmat (simple allocation of 2d array, but might not work on all systems?!)
stb_fgetln, stb_fgetlns
stb_pcg32 (PCG-XSH-RR) and stb_xoshiro512 (xoshiro512**) Pseudo Random Number Generators
stb_anova (One-Way Anova with Tukey HSD test and Scheffe T-statistics method (post-hoc) (Unfinished))
stb_quartiles
stb_histogram (very simple histogram), stb_print_histogram, ...
stb_factorial
stb_meanvar
stb_ttest, stb_uttest
stb_ftest,
stb_benjamini_hochberg
stb_chisqr, stb_chisqr_matrix, stb_gtest, stb_gtest_matrix,

Example Programs

deseq2_example - DESeq2-style Differential Expression Analysis

A sample C program demonstrating DESeq2-style differential expression analysis using stb_stats.h for normalization and statistical testing.

Features:

RSE (Relative Log Expression) normalization using geometric means
Dispersion estimation using stb_fit_f_dist
Moderated t-test for differential expression
Multiple testing correction using Benjamini-Hochberg FDR
Log2 fold change calculation

Input format:

TAB-delimited count matrix (rows x columns format)
First row: number of rows and columns
Subsequent rows: count data (genes as rows, samples as columns)

Quick start:

make
# Example with first 3 samples as group 1 (columns 0-2) and next 3 as group 2 (columns 3-5)
./deseq2_example sample_counts.txt --g1-start 0 --g1-count 3 --g2-start 3 --g2-count 3 -o results.txt

Options:

--g1-start N: Starting column index for group 1 (default: 1)
--g1-count N: Number of samples in group 1 (default: 3)
--g2-start N: Starting column index for group 2 (default: 4)
--g2-count N: Number of samples in group 2 (default: 3)
--fdr FLOAT: False discovery rate threshold (default: 0.05)
-o FILE: Output file (default: stdout)

Output columns:

Gene: Gene identifier
baseMean: Average expression across all samples
log2FoldChange: Log2 fold change between groups
stat: Test statistic (moderated t-statistic)
pvalue: P-value from statistical test
padj: Adjusted p-value (Benjamini-Hochberg FDR)

dim_reduce - Generic Dimension Reduction Tool

A flexible C program demonstrating the usage of PCA, t-SNE, and UMAP for dimensionality reduction on tabular data.

Features:

Support for PCA, t-SNE, and UMAP algorithms
TAB-delimited and gzipped file support
Row-major and column-major data orientations
Automatic Z-score normalization
Pre-PCA option for high-dimensional data

See DIM_REDUCE_README.md for detailed usage instructions.

Quick start:

make
./dim_reduce data.txt -a pca -o output.txt
./dim_reduce data.txt -a tsne --perplexity 30 -o output.txt
./dim_reduce data.txt -a umap --neighbors 15 -o output.txt

spearman - Spearman's Rank Correlation Calculator

A sample C program for calculating Spearman's rank correlation coefficient between two data files.

Features:

Reads tab-separated data files (e.g., HTSeq count files)
Supports standard HTSeq format: gene_ID, gene_name, counts
Optional header skipping
Optional filtering by minimum value threshold
Uses stb_spearman for efficient rank correlation calculation

Input format:

TAB-delimited files with three columns: gene_ID, gene_name, and count values
Commonly used for RNAseq data (HTSeq count files)

Example format:

ENSG00000000003	TSPAN6	1234
ENSG00000000005	TNMD	567

Quick start:

make
./spearman sample1.txt sample2.txt
./spearman sample1.txt sample2.txt -s           # Skip header
./spearman sample1.txt sample2.txt -m 10        # Filter with min value 10
./spearman sample1.txt sample2.txt -s -m 10     # Both options

Options:

-s, --skip-header: Skip the first line (header) in both files
-m, --min-val VALUE: Minimum value threshold for filtering (keeps pairs where at least one value meets the threshold)
-h, --help: Show help message

Output:

TAB-delimited format: file1\tfile2\tcorrelation_coefficient

CITATION

If you use this Tool-Box in a publication, please reference:

Voshol, G.P. (2024). STB: A simple Statistics Tool Box (Version 1.26) [Software]. Available from https://github.com/gerbenvoshol/Statistics-Tool-Box

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
examples		examples
.gitignore		.gitignore
DIM_REDUCE_README.md		DIM_REDUCE_README.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TEST_README.md		TEST_README.md
_codeql_detected_source_root		_codeql_detected_source_root
sample_counts.txt		sample_counts.txt
stb_stats.h		stb_stats.h
test_isolated.c		test_isolated.c
test_stb_stats.c		test_stb_stats.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistics-Tool-Box

Repository Structure

Example Programs

deseq2_example - DESeq2-style Differential Expression Analysis

dim_reduce - Generic Dimension Reduction Tool

spearman - Spearman's Rank Correlation Calculator

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Statistics-Tool-Box

Repository Structure

Example Programs

deseq2_example - DESeq2-style Differential Expression Analysis

dim_reduce - Generic Dimension Reduction Tool

spearman - Spearman's Rank Correlation Calculator

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages