# 0. Tree-Sitter Introspector - System Design > **Priority**: πŸ”΄ FOUNDATIONAL β€” Build this first > **Complexity**: Medium > **Effort Estimate**: 6-10 hours --- ## Overview The Tree-Sitter Introspector parses Python and TypeScript codebases locally using tree-sitter AST parsing, extracting structured metadata (functions, classes, imports) **without** sending raw code to Claude. This saves tokens, is faster, and produces reliable structured data. --- ## Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Introspector Module β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ File Scanner │───▢│ Tree-Sitter │───▢│ Summarizer β”‚ β”‚ β”‚ β”‚ (glob/git) β”‚ β”‚ Parsers β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β”‚ β”‚ File list + Per-file AST RepoSummary β”‚ β”‚ metadata extracts JSON β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## Core Types ```typescript interface RepoSummary { languages: ('python' | 'typescript')[]; root: string; analyzedAt: string; files: FileInfo[]; modules: ModuleInfo[]; config: ConfigInfo; git?: GitInfo; } interface ModuleInfo { path: string; exports: ExportInfo[]; imports: string[]; complexity: 'low' | 'medium' | 'high'; } interface ExportInfo { name: string; kind: 'function' | 'class' | 'constant' | 'type'; signature?: string; docstring?: string; lineNumber: number; isAsync?: boolean; } ``` --- ## Key Implementation Details ### Tree-Sitter Queries (Python) ```typescript const FUNCTION_QUERY = ` (function_definition name: (identifier) @name parameters: (parameters) @params return_type: (type)? @return_type ) @func `; const CLASS_QUERY = ` (class_definition name: (identifier) @name body: (block) @body ) @class `; ``` ### Git-Aware Incremental ```typescript async function getChangedFiles(since: string): Promise { const { stdout } = await exec(`git diff --name-only ${since}`); return stdout.split('\n').filter(f => /\.(py|ts|tsx)$/.test(f)); } ``` --- ## File Structure ``` src/introspector/ β”œβ”€β”€ index.ts # Main entry point β”œβ”€β”€ types.ts # TypeScript interfaces β”œβ”€β”€ scanner.ts # File discovery β”œβ”€β”€ parsers/ β”‚ β”œβ”€β”€ python.ts # Python tree-sitter queries β”‚ └── typescript.ts # TS tree-sitter queries β”œβ”€β”€ git.ts # Git integration └── summarizer.ts # Combine into RepoSummary ``` --- ## Dependencies ```json { "tree-sitter": "^0.21.0", "tree-sitter-python": "^0.21.0", "tree-sitter-typescript": "^0.21.0", "glob": "^10.3.0" } ``` --- ## Success Criteria - [ ] Parses Python files (functions, classes, imports) - [ ] Parses TypeScript files (functions, classes, imports) - [ ] Handles 1000+ file repos in <10 seconds - [ ] Incremental mode only parses changed files - [ ] Gracefully handles syntax errors