All Tools

Dataset Topology Sculptor

Reads a dataset, extracts its relational structure, and produces a physically constructable sculpture design — complete with a PDF fabrication document you can take into a workshop.

  • Python
  • NumPy
  • scikit-learn
  • NetworkX
  • SciPy
  • ReportLab

Design Philosophy

The sculpture IS the data. Node positions, edge lengths, and overall topology are derived directly from the data — not aesthetically adjusted after the fact. No manual position overrides are permitted.

DTS is a fabrication planner, not a visualization toy. Every output — regardless of input source, manipulation history, or material choice — must be physically buildable. Geometry serves construction.

Install & Run

pip install -e .

# CSV to sculpture PDF
dts --input data/my_dataset.csv --mode similarity

# Full options
dts --input <path>           # Required: input CSV
    --mode similarity|graph|cluster
    --output <dir>            # Default: ./output
    --max-nodes <int>         # 3–100, default: 100
    --version
    -h, --help
  • Exit 0Success
  • Exit 1Argument / validation error
  • Exit 3Constraint violation (no files written)

Pipeline

CSV → CSVParser → Dataset (validated, sanitized, size-capped at 10 MB)
Dataset → SimilarityEngine → Weighted NetworkX Graph (pairwise Euclidean / cosine)
Graph → PCA + ConstraintSolver → 3D coordinates within 1 m³
Sculpture → PDFRenderer → fabrication document + nodes.csv + manifest.json

Topology Modes

Similarity (Phase 1)

Input: CSV with numeric columns. Relationship: pairwise Euclidean / cosine distance. Produces a dense cloud where nodes cluster by feature proximity.

Graph (Phase 5)

Input: explicit edge list (CSV or JSON). Uses Fruchterman-Reingold force-directed layout.

Cluster (Phase 6)

Input: CSV with numeric columns. Uses k-means. Centroid nodes connected to member nodes — constellation-like structure.

Edge Pruning

Threshold: keep edges where weight exceeds a percentile cutoff.
k-nearest: for each node, keep only the k strongest connections (default k=3).

Physical Constraints

These constraints are enforced before any output is written. Violations trigger automatic resolution (merge, prune, tighten) and are logged in the PDF.

Volume 1000 × 1000 × 1000 mm
Min rod length 30 mm
Max rod length 950 mm
Max node degree 8 connections
Max edges 200 total
Node count 3 to 100 nodes

Output Files

  • PDFCover, design summary, node coordinate table, connection list, cut list, materials summary, build notes, data provenance
  • nodes.csvNode ID, label, X (mm), Y (mm), Z (mm), degree
  • edges.csvEdge ID, source, target, length_mm, weight
  • manifest.jsonFull BOM, parameters, version, SHA-256 of input

All output written to a timestamped directory: output/YYYYMMDD_HHMMSS_<dataset-name>/

Security Design

# Immutable rules
No eval(), exec(), or shell=True — ever
No output written outside the designated output directory
Error messages contain filename only — never full path
All input files treated as untrusted
File size capped at 10 MB before reading
Paths resolved via pathlib.resolve() + allowlist directory check

Phased Implementation

#NameDeliverable
1Project Scaffolddts --help runs
2CSV Parser + SecurityAll security checks; SECURITY.md written
3Similarity TopologyDataset → weighted NetworkX Graph
43D Layout + ConstraintsValid 3D coords within 1 m³
5PDF RendererFull pipeline CSV → PDF · v0.1.0
6Graph ModeJSON + edge list with F-R layout
7Cluster Modek-means topology · v0.2.0
8Polish & Audit>80% coverage, mypy clean · v1.0.0