Inka khipus are knotted textile devices that served as the primary recording medium of the Inka Empire (~1400–1532 CE). Despite 619 catalogued specimens in the Open Khipu Repository, the system remains undeciphered. No prior work had applied modern unsupervised learning to characterize structural groupings, nor used SHAP interpretability to identify which physical features predict geographic provenance.
A reproducible ML pipeline in 4 stages: (1) Feature engineering extracting 27 structural attributes per khipu — cord structure, twist direction, knot patterns, color entropy. (2) UMAP dimensionality reduction (27D → 2D) + HDBSCAN clustering. (3) XGBoost classification of geographic provenance with 5-fold stratified cross-validation. (4) SHAP TreeExplainer for interpretability.
UMAP+HDBSCAN yielded 3 structurally distinct clusters with silhouette = 0.769 and 0 noise points. Cluster 0 (17 khipus): Inka Late Horizon imperial style — completely isolated in UMAP space. XGBoost achieved F1 = 0.80 for Inka Late Horizon, reflecting high manufacturing standardization. SHAP identified knot Z-direction as the top discriminative feature — consistent with known Inka standardization conventions.
Cluster 2 (160 khipus) is dominated by European museum collections — Museum für Völkerkunde Berlin (41 specimens), University of Pennsylvania (14). This suggests 19th-century colonial acquisition patterns produce structurally distinct specimens, encoding colonial collection bias directly into the corpus. This has significant implications for any computational work that treats the OKR as a representative sample.
“The colonial gaze is structurally encoded in the data. That's not just a historical note — it's a machine learning problem.”