Reduction Quality Bench
Standardised quality reports for dimensionality reduction. How much should you trust your map?
Reduction Quality Bench on GITHUB https://github.com/Datasculptures/reduction-quality-bench
The Problem
Modern research and engineering increasingly depend on compressing high-dimensional data into low-dimensional representations that humans can see and reason about. When a biologist visualises 20,000 genes as a scatter plot, when a language engineer maps a million-word vocabulary into a 2D terrain, or when a materials scientist projects crystal properties into a navigable space, they are using dimensionality reduction.
But every reduction distorts something. Nearby points may be pulled apart. Distant points may be pushed together. Entire regions may be compressed or inflated. In practice, most people evaluate their reductions by looking at the resulting scatter plot — an approach roughly equivalent to checking a city’s financial records by glancing at the skyline.
The mathematical tools to measure reduction quality already exist, scattered across academic papers and research code. RQB assembles those measurements into a clear, readable report.
Metrics
Local Metrics
Trustworthiness: Are the neighbours you see genuine?
Continuity: Are real neighbours still visible?
kNN Preservation (Jaccard): How much neighbourhood structure survived?
Global Metrics
Shepard Goodness: Is the distance ranking preserved?
Normalised Stress: How much total distortion?
Distance Correlation: Are raw distances linearly related?
Usage
# Install
pip install -e ".[dev]"
# Run
rqb --high embeddings.npy --low umap_2d.npy --k 10
# With cluster labels
rqb --high data.npy --low projection.npy --labels clusters.npy
- No network callsFully local — no data leaves your machine
- DeterministicFixed seeds — results reproduce identically
- Minimal dependenciesNumPy, SciPy, scikit-learn
- Secure by defaultNo shell calls, no eval, no network
Validation Results
RQB was tested against five datasets with known structure, using three reduction methods: PCA (linear baseline), t-SNE (nonlinear), and a random projection (negative control).
| Dataset | Method | Trust. | Cont. | kNN | Shepard | Stress |
|---|---|---|---|---|---|---|
| Swiss Roll | PCA | 0.940 | 0.982 | 0.205 | 0.852 | 0.260 |
| Swiss Roll | t-SNE | 1.000 | 0.997 | 0.760 | 0.394 | 2.212 |
| Swiss Roll | Random | −0.000 | −0.002 | 0.002 | −0.014 | 0.900 |
| Concentric Circles | PCA | 0.677 | 0.875 | 0.043 | 0.781 | 0.401 |
| Concentric Circles | t-SNE | 0.889 | 0.853 | 0.239 | 0.711 | 22.106 |
| Clustered Gaussians | PCA | 0.877 | 0.906 | 0.039 | 0.896 | 0.337 |
| Clustered Gaussians | t-SNE | 0.943 | 0.931 | 0.195 | 0.689 | 0.420 |
| MNIST Digits | PCA | 0.475 | 0.843 | 0.026 | 0.490 | 0.647 |
| MNIST Digits | t-SNE | 0.960 | 0.928 | 0.301 | 0.414 | 0.976 |
Green = good (≥0.8, or ≤0.2 for stress) Yellow = moderate Red = poor.
Key Findings
Verification Checklist
| Criterion | Result |
|---|---|
| Random baseline scores lowest on every structured dataset | Pass |
| t-SNE outperforms PCA on Swiss Roll (nonlinear manifold) | Pass |
| t-SNE outperforms PCA on Concentric Circles | Pass |
| Both methods score well on Clustered Gaussians (linear structure) | Pass |
| MNIST t-SNE trustworthiness within published range (0.85–0.95) | Pass (0.960) |
| All methods score poorly on random baseline (no structure) | Pass |
| Report is self-contained HTML, no JavaScript, no external resources | Pass |
| All computations deterministic (fixed seeds) | Pass |
| Total runtime under 5 minutes | Pass |
Related
- Latent Language Explorer V2 — uses UMAP reduction that RQB can evaluate
- All Tools