All Tools

Reduction Quality Bench

Standardised quality reports for dimensionality reduction. How much should you trust your map?

Python
NumPy / SciPy
scikit-learn
UMAP
t-SNE
PCA

The Problem

Modern research and engineering increasingly depend on compressing high-dimensional data into low-dimensional representations that humans can see and reason about. When a biologist visualises 20,000 genes as a scatter plot, when a language engineer maps a million-word vocabulary into a 2D terrain, or when a materials scientist projects crystal properties into a navigable space, they are using dimensionality reduction.

But every reduction distorts something. Nearby points may be pulled apart. Distant points may be pushed together. Entire regions may be compressed or inflated. In practice, most people evaluate their reductions by looking at the resulting scatter plot — an approach roughly equivalent to checking a city’s financial records by glancing at the skyline.

The mathematical tools to measure reduction quality already exist, scattered across academic papers and research code. RQB assembles those measurements into a clear, readable report.

Metrics

Local Metrics

Trustworthiness: Are the neighbours you see genuine?

Continuity: Are real neighbours still visible?

kNN Preservation (Jaccard): How much neighbourhood structure survived?

Global Metrics

Shepard Goodness: Is the distance ranking preserved?

Normalised Stress: How much total distortion?

Distance Correlation: Are raw distances linearly related?

Usage

# Install
pip install -e ".[dev]"

# Run
rqb --high embeddings.npy --low umap_2d.npy --k 10

# With cluster labels
rqb --high data.npy --low projection.npy --labels clusters.npy

No network callsFully local — no data leaves your machine
DeterministicFixed seeds — results reproduce identically
Minimal dependenciesNumPy, SciPy, scikit-learn
Secure by defaultNo shell calls, no eval, no network

Validation Results

RQB was tested against five datasets with known structure, using three reduction methods: PCA (linear baseline), t-SNE (nonlinear), and a random projection (negative control).

Validation results across five datasets and three reduction methods
Dataset	Method	Trust.	Cont.	kNN	Shepard	Stress
Swiss Roll	PCA	0.940	0.982	0.205	0.852	0.260
Swiss Roll	t-SNE	1.000	0.997	0.760	0.394	2.212
Swiss Roll	Random	−0.000	−0.002	0.002	−0.014	0.900
Concentric Circles	PCA	0.677	0.875	0.043	0.781	0.401
Concentric Circles	t-SNE	0.889	0.853	0.239	0.711	22.106
Clustered Gaussians	PCA	0.877	0.906	0.039	0.896	0.337
Clustered Gaussians	t-SNE	0.943	0.931	0.195	0.689	0.420
MNIST Digits	PCA	0.475	0.843	0.026	0.490	0.647
MNIST Digits	t-SNE	0.960	0.928	0.301	0.414	0.976

Green = good (≥0.8, or ≤0.2 for stress) Yellow = moderate Red = poor.

Key Findings

The negative control works. The random projection scored near zero on every metric for every structured dataset. The metrics are not producing false positives.

Methods separate correctly on nonlinear structure. On the Swiss Roll and Concentric Circles, t-SNE outperforms PCA on trustworthiness (1.000 vs 0.940, and 0.889 vs 0.677).

MNIST results match published benchmarks. t-SNE trustworthiness on the 5,000-point MNIST subset scored 0.960. Published results typically fall in the 0.85–0.95 range.

Local and global metrics tell different stories. t-SNE scores perfectly on local metrics for the Swiss Roll but poorly on global distance preservation. PCA does the reverse. This reflects genuine properties of the methods — neither metric alone tells the whole story.

kNN Preservation (Jaccard) is the harshest metric. Even good reductions often score below 0.30. Use it as a relative comparison between methods, not as an absolute threshold.

Verification Checklist

Criterion	Result
Random baseline scores lowest on every structured dataset	Pass
t-SNE outperforms PCA on Swiss Roll (nonlinear manifold)	Pass
t-SNE outperforms PCA on Concentric Circles	Pass
Both methods score well on Clustered Gaussians (linear structure)	Pass
MNIST t-SNE trustworthiness within published range (0.85–0.95)	Pass (0.960)
All methods score poorly on random baseline (no structure)	Pass
Report is self-contained HTML, no JavaScript, no external resources	Pass
All computations deterministic (fixed seeds)	Pass
Total runtime under 5 minutes	Pass

Latent Language Explorer V2 — uses UMAP reduction that RQB can evaluate
All Tools