t1k:graphify
| Field | Value |
|---|---|
| Module | t1k-maintainer |
| Version | 2.18.3 |
| Effort | high |
| Tools | — |
Keywords: ast, code-analysis, codebase-understanding, graphify, knowledge-graph, tree-sitter
How to invoke
Section titled “How to invoke”/t1k:graphify[path] [--mcp|--report|--watch]Graphify — Knowledge Graph Builder
Section titled “Graphify — Knowledge Graph Builder”Turn any folder of code, docs, papers, or images into a queryable knowledge graph. Uses tree-sitter AST for code (20 languages), Whisper for audio/video, and LLM subagents for documents.
Pre-flight Step 0 — Fuzzy plan/path arg resolution (MANDATORY)
Section titled “Pre-flight Step 0 — Fuzzy plan/path arg resolution (MANDATORY)”If the user provides a fuzzy plan/path/phase arg (e.g. chaosforge-demo, plans/chaosforge-demo, phase-3), an empty arg, or natural-language ref like “active plan” / “current plan” / “this plan”, run the Fuzzy Plan / Path Resolution Protocol at skills/t1k-cook/references/fuzzy-plan-resolution.md BEFORE bail. Skill MUST NOT emit “no path matching” / “exact path required” until that protocol has been applied and Step 6 reached.
Prerequisites
Section titled “Prerequisites”Python 3.10+ required. This is an optional skill that integrates with the third-party graphifyy package.
Install:
pip install graphifyygraphify install # downloads tree-sitter grammarsOptional extras:
pip install 'graphifyy[mcp]' # MCP server modepip install 'graphifyy[all]' # full install (PDF, video, office, Leiden)pip install 'graphifyy[neo4j]' # Neo4j graph database integrationpip install 'graphifyy[leiden]' # Leiden community detection algorithmNote: The PyPI package is graphifyy (double-y). Other graphify* packages on PyPI are unaffiliated.
When to Use
Section titled “When to Use”- Understanding unfamiliar codebase architecture before planning
- Discovering cross-file relationships and dependency chains
- Finding “god nodes” (most-connected concepts) in large projects
- Navigating by structure instead of grepping every file
- Preparing context-efficient codebase representation (71.5x fewer tokens vs raw files)
Typically precedes: /t1k:plan (understand architecture before planning)
Related: /t1k:scout (quick file search), /t1k:repomix (full context dump)
Quick Start
Section titled “Quick Start”# Build knowledge graph from current directorygraphify .
# Build from specific pathgraphify /path/to/project
# Watch mode (auto-rebuild on file changes)graphify . --watchOutput Artifacts
Section titled “Output Artifacts”| File | Purpose |
|---|---|
graphify-out/graph.html | Interactive visualization with search + community filtering |
graphify-out/GRAPH_REPORT.md | God nodes, surprising connections, suggested questions |
graphify-out/graph.json | Persistent graph for queries across sessions |
graphify-out/cache/ | SHA256-based incremental updates (only reprocesses changed files) |
MCP Server Mode
Section titled “MCP Server Mode”Expose the graph as an MCP server for Claude to query directly:
python -m graphify.serve graphify-out/graph.jsonMCP Tools Available
Section titled “MCP Tools Available”| Tool | Purpose |
|---|---|
query_graph | Search for concepts and relationships |
get_node | Get details of a specific node |
get_neighbors | Find related concepts |
shortest_path | Find connection path between two concepts |
Claude Code MCP Setup
Section titled “Claude Code MCP Setup”Add to .claude/.mcp.json:
{ "mcpServers": { "graphify": { "command": "python", "args": ["-m", "graphify.serve", "graphify-out/graph.json"] } }}Three-Pass Architecture
Section titled “Three-Pass Architecture”- AST extraction (local, no API) — tree-sitter parses code in 20 languages deterministically
- Audio/video transcription (local) — Whisper runs on-device for media files
- Semantic extraction (API) — LLM subagents process docs, papers, images in parallel
Supported Languages (tree-sitter)
Section titled “Supported Languages (tree-sitter)”Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, Elixir, Objective-C, Julia
Confidence Tagging
Section titled “Confidence Tagging”Relationships in the graph are tagged by provenance:
| Tag | Meaning |
|---|---|
EXTRACTED | Directly from AST (imports, function calls, class inheritance) |
INFERRED | LLM-derived with confidence score |
AMBIGUOUS | Uncertain — needs human verification |
Workflow Integration
Section titled “Workflow Integration”Before Planning
Section titled “Before Planning”# Build graph first, then plan with contextgraphify .# Claude reads GRAPH_REPORT.md → understands architecture → better plansWith Scout
Section titled “With Scout”# Graph for high-level structure, scout for specific filesgraphify . # build graph/t1k:scout "auth module" # find specific filesIncremental Updates
Section titled “Incremental Updates”Graph rebuilds are incremental — only changed files get reprocessed. Cache at graphify-out/cache/ tracks file hashes.
Privacy
Section titled “Privacy”- Code: Processed locally via tree-sitter AST. No file contents leave your machine.
- Audio/Video: Transcribed locally via Whisper.
- Docs/Images: Sent to your configured model provider (Claude/OpenAI) for semantic extraction.
Limitations
Section titled “Limitations”- First build on large codebases can be slow (AST parsing + LLM calls)
- Semantic extraction quality depends on the underlying model
- Neo4j integration requires separate setup (
pip install 'graphifyy[neo4j]') - Leiden community detection requires
pip install 'graphifyy[leiden]' - Beta status: This skill depends on a third-party package (
graphifyy). API surface may change.
Tool Size Caps (E6)
Section titled “Tool Size Caps (E6)”When invoking the graphify MCP server’s query_graph and get_neighbors tools, this skill MUST pass an explicit maxResultSizeChars cap to prevent context blow-up on large graphs:
query_graph: cap atmaxResultSizeChars: 200_000(~50k tokens). Larger results break Claude’s working memory; instead, paginate via cursor.get_neighbors: cap atmaxResultSizeChars: 50_000(~12k tokens). Neighbor expansion can fan out exponentially in dense graphs.- Always include
limitANDmaxResultSizeChars—limitbounds nodes, but a single node with megabyte-sized properties still blows the budget.
If a query exceeds the cap, the MCP server returns truncated: true; the skill MUST surface this to the user with a hint to refine the query, NOT silently deliver a partial result as if it were complete.