Latent Language Explorer V2
A navigable terrain built from 36,125 concepts in 384-dimensional embedding space — finding and measuring the unnamed gaps between words.
Latent Language Explorer on GITHUB https://github.com/Datasculptures/Latent-Language-Explorer-v2
What It Does
There are ideas that exist but do not have words. Not because they are vague or contested, but because the structure of language left them unnamed — concepts that sit in the gaps between the categories our vocabulary happens to cover. LLE V2 finds them, measures them, and describes them.
The terrain is a navigable map of the embedding space of 36,125 concepts organized by Roget’s Thesaurus (1911). Peaks are dense clusters of named meaning. Valleys are transitions. The deserts — shallow fractures between conceptual territories — are where the embedding model encodes something real that has no syntactic representation in natural language.
Companion to the oil painting Are there deserts in vector space?
Quick Start
# Install & start (first run)
.\start.ps1 -Install # Windows
./start.sh --install # Mac/Linux
# Start both servers
.\start.ps1 # Windows
./start.sh # Mac/Linux
# Backend API: http://localhost:8000/api/docs
# Frontend: http://localhost:3000
Run the Pipeline
# Full pipeline from scratch
.\run_pipeline.ps1 # Windows
./run_pipeline.sh # Mac/Linux
# Downstream only (skip vocab rebuild)
.\run_pipeline.ps1 -Downstream # Windows
./run_pipeline.sh --downstream # Mac/Linux
The terrain data is not committed to the repo (large binary files). Run the full pipeline to generate it from scratch.
Architecture
Frontend
TypeScript + React + Vite. Single Three.js renderer shared between the Landscape and Discovery pages.
Backend
FastAPI Python backend. all-MiniLM-L6-v2 sentence-transformer embeddings. SQLite journal with atomic writes.
The Terrain
KDE density as terrain height — not a third UMAP dimension. 2D UMAP layout (seed 42). Never change the seed after the first embedding run.
Probe Deserts
Measured in 384-dimensional L2 space. Interior probe steps only (α 0.10–0.90). Deepest found: 0.9329 (chairperson vs composure).
V1 → V2
| Dimension | V1 | V2 |
|---|---|---|
| Vocabulary size | 8,735 | 36,125 |
| Embedding model | GloVe 300d (static) | all-MiniLM-L6-v2 (384d) |
| Taxonomy | 9 flat domains | 6 classes, 991 categories |
| Journal entries | 14 | 548 |
| Max desert depth | 0.076 (different scale) | 0.9329 |
| Architecture | Vanilla JS, two canvases | TypeScript, React, single renderer |
Key Discoveries
The unnamed quality of presiding-without-reacting — the composure required to chair.
Deserved standing — authority that derives from character rather than appointment.
Transformation at a scale smaller than observation — change without an observable mechanism.
The credential as a form of passage — the login as a kind of navigation.
Pair Selection & Desert Thresholds
- Gate thresholdProbe desert ≥ 0.50 (L2 on unit sphere)
- Shallow thresholdProbe desert ≥ 0.70
- Zipf frequency filter≥ 3.0 — excludes archaic/rare terms
- Shared neighbourhood≥ 1 common neighbour in top-20
- Cosine similarity≤ 0.85 — excludes near-synonyms
- Single words onlyNo hyphens or underscores
Environment
ANTHROPIC_API_KEY Required for generative decoding
PORT_BACKEND Default: 8000
PORT_FRONTEND Default: 3000
# Copy .env.example to .env — never commit .env