All Tools

Latent Language Explorer V2

A navigable terrain built from 36,125 concepts in 384-dimensional embedding space — finding and measuring the unnamed gaps between words.

Latent Language Explorer on GITHUB https://github.com/Datasculptures/Latent-Language-Explorer-v2

  • Python
  • TypeScript / React
  • FastAPI
  • Three.js
  • sentence-transformers
  • UMAP
  • SQLite
Latent Language Explorer terrain map view showing density peaks and desert regions
Terrain view — KDE density as height, UMAP 2D layout, desert regions in low-density zones.
Latent Language Explorer desert probe results showing named gaps between concept pairs
Desert probe results — measuring the depth and character of unnamed conceptual gaps.

What It Does

There are ideas that exist but do not have words. Not because they are vague or contested, but because the structure of language left them unnamed — concepts that sit in the gaps between the categories our vocabulary happens to cover. LLE V2 finds them, measures them, and describes them.

The terrain is a navigable map of the embedding space of 36,125 concepts organized by Roget’s Thesaurus (1911). Peaks are dense clusters of named meaning. Valleys are transitions. The deserts — shallow fractures between conceptual territories — are where the embedding model encodes something real that has no syntactic representation in natural language.

Companion to the oil painting Are there deserts in vector space?

Quick Start

# Install & start (first run)
.\start.ps1 -Install       # Windows
./start.sh --install        # Mac/Linux

# Start both servers
.\start.ps1                 # Windows
./start.sh                  # Mac/Linux

# Backend API:  http://localhost:8000/api/docs
# Frontend:     http://localhost:3000

Run the Pipeline

# Full pipeline from scratch
.\run_pipeline.ps1                  # Windows
./run_pipeline.sh                   # Mac/Linux

# Downstream only (skip vocab rebuild)
.\run_pipeline.ps1 -Downstream      # Windows
./run_pipeline.sh --downstream      # Mac/Linux

The terrain data is not committed to the repo (large binary files). Run the full pipeline to generate it from scratch.

Architecture

Frontend

TypeScript + React + Vite. Single Three.js renderer shared between the Landscape and Discovery pages.

Backend

FastAPI Python backend. all-MiniLM-L6-v2 sentence-transformer embeddings. SQLite journal with atomic writes.

The Terrain

KDE density as terrain height — not a third UMAP dimension. 2D UMAP layout (seed 42). Never change the seed after the first embedding run.

Probe Deserts

Measured in 384-dimensional L2 space. Interior probe steps only (α 0.10–0.90). Deepest found: 0.9329 (chairperson vs composure).

V1 → V2

DimensionV1V2
Vocabulary size8,73536,125
Embedding modelGloVe 300d (static)all-MiniLM-L6-v2 (384d)
Taxonomy9 flat domains6 classes, 991 categories
Journal entries14548
Max desert depth0.076 (different scale)0.9329
ArchitectureVanilla JS, two canvasesTypeScript, React, single renderer

Key Discoveries

chairperson vs composure depth 0.9329

The unnamed quality of presiding-without-reacting — the composure required to chair.

dean vs valiant depth 0.9199

Deserved standing — authority that derives from character rather than appointment.

magician vs molded depth 0.8948

Transformation at a scale smaller than observation — change without an observable mechanism.

navigator vs password depth 0.8015

The credential as a form of passage — the login as a kind of navigation.

Pair Selection & Desert Thresholds

  • Gate thresholdProbe desert ≥ 0.50 (L2 on unit sphere)
  • Shallow thresholdProbe desert ≥ 0.70
  • Zipf frequency filter≥ 3.0 — excludes archaic/rare terms
  • Shared neighbourhood≥ 1 common neighbour in top-20
  • Cosine similarity≤ 0.85 — excludes near-synonyms
  • Single words onlyNo hyphens or underscores

Environment

ANTHROPIC_API_KEY    Required for generative decoding
PORT_BACKEND         Default: 8000
PORT_FRONTEND        Default: 3000

# Copy .env.example to .env — never commit .env