Dataset Topology Sculptor
Reads a dataset, extracts its relational structure, and produces a physically constructable sculpture design — complete with a PDF fabrication document you can take into a workshop.
Design Philosophy
The sculpture IS the data. Node positions, edge lengths, and overall topology are derived directly from the data — not aesthetically adjusted after the fact. No manual position overrides are permitted.
DTS is a fabrication planner, not a visualization toy. Every output — regardless of input source, manipulation history, or material choice — must be physically buildable. Geometry serves construction.
Install & Run
pip install -e .
# CSV to sculpture PDF
dts --input data/my_dataset.csv --mode similarity
# Full options
dts --input <path> # Required: input CSV
--mode similarity|graph|cluster
--output <dir> # Default: ./output
--max-nodes <int> # 3–100, default: 100
--version
-h, --help
- Exit 0Success
- Exit 1Argument / validation error
- Exit 3Constraint violation (no files written)
Pipeline
nodes.csv + manifest.json
Topology Modes
Similarity (Phase 1)
Input: CSV with numeric columns. Relationship: pairwise Euclidean / cosine distance. Produces a dense cloud where nodes cluster by feature proximity.
Graph (Phase 5)
Input: explicit edge list (CSV or JSON). Uses Fruchterman-Reingold force-directed layout.
Cluster (Phase 6)
Input: CSV with numeric columns. Uses k-means. Centroid nodes connected to member nodes — constellation-like structure.
Edge Pruning
Threshold: keep edges where weight exceeds a percentile cutoff.
k-nearest: for each node, keep only the k strongest connections (default k=3).
Physical Constraints
These constraints are enforced before any output is written. Violations trigger automatic resolution (merge, prune, tighten) and are logged in the PDF.
Output Files
- PDFCover, design summary, node coordinate table, connection list, cut list, materials summary, build notes, data provenance
- nodes.csvNode ID, label, X (mm), Y (mm), Z (mm), degree
- edges.csvEdge ID, source, target, length_mm, weight
- manifest.jsonFull BOM, parameters, version, SHA-256 of input
All output written to a timestamped directory: output/YYYYMMDD_HHMMSS_<dataset-name>/
Security Design
# Immutable rules
No eval(), exec(), or shell=True — ever
No output written outside the designated output directory
Error messages contain filename only — never full path
All input files treated as untrusted
File size capped at 10 MB before reading
Paths resolved via pathlib.resolve() + allowlist directory check
Phased Implementation
| # | Name | Deliverable |
|---|---|---|
| 1 | Project Scaffold | dts --help runs |
| 2 | CSV Parser + Security | All security checks; SECURITY.md written |
| 3 | Similarity Topology | Dataset → weighted NetworkX Graph |
| 4 | 3D Layout + Constraints | Valid 3D coords within 1 m³ |
| 5 | PDF Renderer | Full pipeline CSV → PDF · v0.1.0 |
| 6 | Graph Mode | JSON + edge list with F-R layout |
| 7 | Cluster Mode | k-means topology · v0.2.0 |
| 8 | Polish & Audit | >80% coverage, mypy clean · v1.0.0 |
Related
- Data Sculptures — the physical artworks DTS is designed to generate plans for
- Reduction Quality Bench — evaluates the quality of the PCA reduction step
- All Tools