chore: Mark AI package implementation task as complete

- Moved completed AI package implementation task to done folder - Task successfully implemented the unified AI API (@mariozechner/pi-ai) - Package renamed, documentation improved, and published as v0.5.12
2026-04-17 09:02:08 +00:00 · 2025-08-30 21:53:23 +02:00 · 2025-08-30 21:53:23 +02:00 · 8b12312d72
commit 8b12312d72
parent 889208dccd
11 changed files with 25 additions and 86 deletions
--- a/todos/done/20250817-202050-ai-implementation-plan-analysis.md
+++ b/todos/done/20250817-202050-ai-implementation-plan-analysis.md
@ -0,0 +1,402 @@
+# AI Package Implementation Analysis
+
+## Overview
+Based on the comprehensive plan in `packages/ai/plan.md` and detailed API documentation for OpenAI, Anthropic, and Gemini SDKs, the AI package needs to provide a unified API that abstracts over these three providers while maintaining their unique capabilities.
+
+## OpenAI Responses API Investigation
+
+### API Structure
+The OpenAI SDK includes a separate Responses API (`client.responses`) alongside the Chat Completions API. This API is designed for models with reasoning capabilities (o1/o3) and provides access to thinking/reasoning content.
+
+### Key Differences from Chat Completions API
+
+1. **Input Format**: Uses `input` array instead of `messages`
+   - Supports `EasyInputMessage` type with roles: `user`, `assistant`, `system`, `developer`
+   - Content can be text, image, audio, or file references
+   - More structured approach with explicit types for each input type
+
+2. **Streaming Events**: Rich set of events for detailed streaming
+   - `ResponseReasoningTextDeltaEvent` - Incremental reasoning/thinking text
+   - `ResponseReasoningTextDoneEvent` - Complete reasoning text
+   - `ResponseTextDeltaEvent` - Main response text deltas
+   - `ResponseFunctionCallArgumentsDeltaEvent` - Tool call argument streaming
+   - `ResponseCompletedEvent` - Final completion with usage stats
+
+3. **Response Structure**: More complex response object
+   - `output` array containing various output items
+   - Explicit reasoning items with content
+   - Tool calls as part of output items
+   - Usage tracking with detailed token breakdowns
+
+### Implementation Examples
+
+#### Basic Responses API Usage
+
+```typescript
+// Creating a response with streaming
+const stream = await client.responses.create({
+  model: "o1-preview",
+  input: [
+    {
+      role: "developer", // or "system" for non-reasoning models
+      content: "You are a helpful assistant"
+    },
+    {
+      role: "user",
+      content: "Explain quantum computing step by step"
+    }
+  ],
+  stream: true,
+  temperature: 0.7,
+  max_completion_tokens: 2000
+});
+
+// Process streaming events
+for await (const event of stream) {
+  switch (event.type) {
+    case 'response.reasoning_text.delta':
+      // Thinking/reasoning content
+      console.log('[THINKING]', event.delta);
+      break;
+    
+    case 'response.text.delta':
+      // Main response text
+      console.log('[RESPONSE]', event.delta);
+      break;
+    
+    case 'response.function_call_arguments.delta':
+      // Tool call arguments being built
+      console.log('[TOOL ARGS]', event.delta);
+      break;
+    
+    case 'response.completed':
+      // Final response with usage
+      console.log('Usage:', event.usage);
+      break;
+  }
+}
+```
+
+#### Using ResponseStream Helper
+
+```typescript
+// The SDK provides a ResponseStream helper for easier streaming
+const responseStream = client.responses.stream({
+  model: "o1-preview",
+  input: [
+    { role: "user", content: "Solve this math problem..." }
+  ],
+  tools: [
+    {
+      type: "function",
+      function: {
+        name: "calculate",
+        description: "Perform calculations",
+        parameters: { /* JSON Schema */ }
+      }
+    }
+  ]
+});
+
+// Get final response after streaming
+const finalResponse = await responseStream.finalResponse();
+console.log('Output:', finalResponse.output);
+console.log('Usage:', finalResponse.usage);
+```
+
+#### Converting Messages for Responses API
+
+```typescript
+private convertToResponsesInput(messages: Message[], systemPrompt?: string): ResponseInputItem[] {
+  const input: ResponseInputItem[] = [];
+  
+  // Add system/developer prompt
+  if (systemPrompt) {
+    input.push({
+      type: "message",
+      role: this.isReasoningModel() ? "developer" : "system",
+      content: systemPrompt
+    });
+  }
+  
+  // Convert messages
+  for (const msg of messages) {
+    if (msg.role === "user") {
+      input.push({
+        type: "message",
+        role: "user",
+        content: msg.content
+      });
+    } else if (msg.role === "assistant") {
+      // Assistant messages with potential tool calls
+      const outputMessage: ResponseOutputMessage = {
+        type: "message",
+        role: "assistant",
+        content: []
+      };
+      
+      if (msg.content) {
+        outputMessage.content.push({
+          type: "text",
+          text: msg.content
+        });
+      }
+      
+      if (msg.toolCalls) {
+        // Tool calls need to be added as separate output items
+        for (const toolCall of msg.toolCalls) {
+          input.push({
+            type: "function_call",
+            id: toolCall.id,
+            name: toolCall.name,
+            arguments: JSON.stringify(toolCall.arguments)
+          });
+        }
+      }
+      
+      input.push(outputMessage);
+    } else if (msg.role === "toolResult") {
+      // Tool results as function call outputs
+      input.push({
+        type: "function_call_output",
+        call_id: msg.toolCallId,
+        output: msg.content
+      });
+    }
+  }
+  
+  return input;
+}
+```
+
+#### Processing Responses API Events
+
+```typescript
+private async completeWithResponsesAPI(request: Request, options?: OpenAIOptions): Promise<AssistantMessage> {
+  try {
+    const input = this.convertToResponsesInput(request.messages, request.systemPrompt);
+    
+    const stream = await this.client.responses.create({
+      model: this.model,
+      input,
+      stream: true,
+      max_completion_tokens: request.maxTokens,
+      temperature: request.temperature,
+      tools: request.tools ? this.convertTools(request.tools) : undefined,
+      tool_choice: options?.toolChoice
+    });
+    
+    let content = "";
+    let thinking = "";
+    const toolCalls: ToolCall[] = [];
+    let usage: TokenUsage = { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 };
+    let finishReason: string = "stop";
+    
+    for await (const event of stream) {
+      switch (event.type) {
+        case 'response.reasoning_text.delta':
+          thinking += event.delta;
+          request.onThinking?.(event.delta);
+          break;
+        
+        case 'response.reasoning_text.done':
+          // Complete reasoning text available
+          thinking = event.text;
+          break;
+        
+        case 'response.text.delta':
+          content += event.delta;
+          request.onText?.(event.delta);
+          break;
+        
+        case 'response.function_call_arguments.delta':
+          // Build up tool calls incrementally
+          // event.item_id identifies which tool call
+          // event.arguments contains the delta
+          break;
+        
+        case 'response.function_call_arguments.done':
+          // Complete tool call
+          toolCalls.push({
+            id: event.item_id,
+            name: event.name,
+            arguments: JSON.parse(event.arguments)
+          });
+          break;
+        
+        case 'response.completed':
+          // Final event with complete response and usage
+          usage = {
+            input: event.usage.input_tokens,
+            output: event.usage.output_tokens,
+            cacheRead: event.usage.input_tokens_details?.cached_tokens || 0,
+            cacheWrite: 0
+          };
+          finishReason = event.stop_reason || "stop";
+          break;
+        
+        case 'response.error':
+          throw new Error(event.error.message);
+      }
+    }
+    
+    return {
+      role: "assistant",
+      content: content || undefined,
+      thinking: thinking || undefined,
+      toolCalls: toolCalls.length > 0 ? toolCalls : undefined,
+      model: this.model,
+      usage,
+      stopReason: this.mapStopReason(finishReason)
+    };
+  } catch (error) {
+    // Error handling...
+  }
+}
+```
+
+### Important Notes
+
+1. **"[Thinking: X tokens]" Issue**: The current implementation shows a placeholder for thinking tokens in Chat Completions API. This should only show actual thinking content from Responses API or omit the field entirely.
+
+2. **Tool Calling Differences**: Responses API handles tool calls differently, with separate events for arguments delta and completion.
+
+3. **Usage Tracking**: Responses API provides more detailed usage information including reasoning tokens in a different structure.
+
+4. **Stream vs Iterator**: The Responses API returns an async iterable that can be used with `for await...of` directly.
+
+## Existing Codebase Context
+
+### Current Structure
+- Monorepo using npm workspaces with packages in `packages/` directory
+- Existing packages: `tui`, `agent`, `pods`
+- TypeScript/ESM modules with Node.js ≥20.0.0
+- Biome for linting and formatting
+- Lockstep versioning at 0.5.8
+
+### Package Location
+The AI package should be created at `packages/ai/` following the existing pattern.
+
+## Key Implementation Requirements
+
+### Core Features
+1. **Unified Client API** - Single interface for all providers
+2. **Streaming First** - All providers support streaming, non-streaming is collected events
+3. **Provider Adapters** - OpenAI, Anthropic, Gemini adapters
+4. **Event Normalization** - Consistent event types across providers
+5. **Tool/Function Calling** - Unified interface for tools across providers
+6. **Thinking/Reasoning** - Support for reasoning models (o1/o3, Claude thinking, Gemini thinking)
+7. **Token Tracking** - Usage and cost calculation
+8. **Abort Support** - Request cancellation via AbortController
+9. **Error Mapping** - Normalized error handling
+10. **Caching** - Automatic caching strategies per provider
+
+### Provider-Specific Handling
+
+#### OpenAI
+- Dual APIs: Chat Completions vs Responses API
+- Responses API for o1/o3 reasoning content
+- Developer role for o1/o3 system prompts
+- Stream options for token usage
+
+#### Anthropic  
+- Content blocks always arrays
+- Separate system parameter
+- Tool results as user messages
+- Explicit thinking budget allocation
+- Cache control per block
+
+#### Gemini
+- Parts-based content system
+- Separate systemInstruction parameter
+- Model role instead of assistant
+- Thinking via part.thought flag
+- Function calls in parts array
+
+## Implementation Structure
+
+```
+packages/ai/
+├── src/
+│   ├── index.ts           # Main exports
+│   ├── types.ts           # Unified type definitions
+│   ├── client.ts          # Main AI client class
+│   ├── adapters/
+│   │   ├── base.ts        # Base adapter interface
+│   │   ├── openai.ts      # OpenAI adapter
+│   │   ├── anthropic.ts   # Anthropic adapter
+│   │   └── gemini.ts      # Gemini adapter
+│   ├── models/
+│   │   ├── models.ts      # Model info lookup
+│   │   └── models-data.ts # Generated models database
+│   ├── errors.ts          # Error mapping
+│   ├── events.ts          # Event stream handling
+│   ├── costs.ts           # Cost tracking
+│   └── utils.ts           # Utility functions
+├── test/
+│   ├── openai.test.ts
+│   ├── anthropic.test.ts
+│   └── gemini.test.ts
+├── scripts/
+│   └── update-models.ts   # Update models database
+├── package.json
+├── tsconfig.build.json
+└── README.md
+```
+
+## Dependencies
+- `openai`: ^5.12.2 (for OpenAI SDK)
+- `@anthropic-ai/sdk`: Latest
+- `@google/genai`: Latest
+
+## Files to Create/Modify
+
+### New Files in packages/ai/
+1. `package.json` - Package configuration
+2. `tsconfig.build.json` - TypeScript build config
+3. `src/index.ts` - Main exports
+4. `src/types.ts` - Type definitions
+5. `src/client.ts` - Main AI class
+6. `src/adapters/base.ts` - Base adapter
+7. `src/adapters/openai.ts` - OpenAI implementation
+8. `src/adapters/anthropic.ts` - Anthropic implementation
+9. `src/adapters/gemini.ts` - Gemini implementation
+10. `src/models/models.ts` - Model info
+11. `src/errors.ts` - Error handling
+12. `src/events.ts` - Event streaming
+13. `src/costs.ts` - Cost tracking
+14. `README.md` - Package documentation
+
+### Files to Modify
+1. Root `tsconfig.json` - Add path mapping for @mariozechner/pi-ai
+2. Root `package.json` - Add to build script order
+
+## Implementation Strategy
+
+### Phase 1: Core Structure
+- Create package structure and configuration
+- Define unified types and interfaces
+- Implement base adapter interface
+
+### Phase 2: Provider Adapters
+- Implement OpenAI adapter (both APIs)
+- Implement Anthropic adapter
+- Implement Gemini adapter
+
+### Phase 3: Features
+- Add streaming support
+- Implement tool calling
+- Add thinking/reasoning support
+- Implement token tracking
+
+### Phase 4: Polish
+- Error mapping and handling
+- Cost calculation
+- Model information database
+- Documentation and examples
+
+## Testing Approach
+- Unit tests for each adapter
+- Integration tests with mock responses
+- Example scripts for manual testing
+- Verify streaming, tools, thinking for each provider
--- a/todos/done/20250817-202050-ai-implementation-plan.md
+++ b/todos/done/20250817-202050-ai-implementation-plan.md
@ -0,0 +1,43 @@
+# AI Package Implementation Plan
+**Status:** Done
+**Agent PID:** 54145
+
+## Original Todo
+ai: create an implementation plan based on packages/ai/plan.md and implement it
+
+## Description
+Implement the unified AI API as designed in packages/ai/plan.md. Create a single interface that works with OpenAI, Anthropic, and Gemini SDKs, handling their differences internally while exposing unified streaming events, tool calling, thinking/reasoning, and caching capabilities.
+
+*Read [analysis.md](./analysis.md) in full for detailed codebase research and context*
+*Read [plan.md](packages/ai/docs/plan.md) in full for the complete API design and implementation details*
+*Read API documentation: [anthropic-api.md](packages/ai/docs/anthropic-api.md), [openai-api.md](packages/ai/docs/openai-api.md), [gemini-api.md](packages/ai/docs/gemini-api.md)*
+
+## Implementation Plan
+- [x] Define unified types in src/types.ts based on plan.md interfaces (AIConfig, Message, Request, Event, TokenUsage, ModelInfo)
+- [x] Implement OpenAI provider in src/providers/openai.ts with both Chat Completions and Responses API support
+- [x] Implement Anthropic provider in src/providers/anthropic.ts with MessageStream and content blocks handling
+- [ ] Implement Gemini provider in src/providers/gemini.ts with parts system and thinking extraction
+- [ ] Create main AI class in src/index.ts that selects and uses appropriate adapter
+- [ ] Implement models database in src/models.ts with model information and cost data
+- [ ] Add cost calculation integrated into each adapter's token tracking
+- [ ] Create comprehensive test suite in test/ai.test.ts using Node.js test framework
+- [ ] Test: Model database lookup and capabilities detection
+- [ ] Test: Basic completion (non-streaming) for all providers (OpenAI, Anthropic, Gemini, OpenRouter, Groq)
+- [ ] Test: Streaming responses with event normalization across all providers
+- [ ] Test: Thinking/reasoning extraction (o1 via Responses API, Claude thinking, Gemini thinking)
+- [ ] Test: Tool calling flow with execution and continuation across providers
+- [ ] Test: Automatic caching (Anthropic explicit, OpenAI/Gemini automatic)
+- [ ] Test: Message serialization/deserialization with full conversation history
+- [ ] Test: Cross-provider conversation continuation (start with one provider, continue with another)
+- [ ] Test: Abort/cancellation via AbortController
+- [ ] Test: Error handling and retry logic for each provider
+- [ ] Test: Cost tracking accuracy with known token counts
+- [ ] Update root tsconfig.json paths to include @mariozechner/pi-ai
+- [ ] Update root package.json build script to include AI package
+
+## Notes
+- Package structure already exists at packages/ai with dependencies installed
+- Each adapter handles its own event normalization internally
+- Tests use Node.js built-in test framework as per project conventions
+- Available API keys: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, GROQ_API_KEY, OPENROUTER_API_KEY
+- **IMPORTANT**: Always run `npm run check` in the root directory before asking for approval to ensure code compiles and passes linting