- Moved completed AI package implementation task to done folder - Task successfully implemented the unified AI API (@mariozechner/pi-ai) - Package renamed, documentation improved, and published as v0.5.12
12 KiB
AI Package Implementation Analysis
Overview
Based on the comprehensive plan in packages/ai/plan.md and detailed API documentation for OpenAI, Anthropic, and Gemini SDKs, the AI package needs to provide a unified API that abstracts over these three providers while maintaining their unique capabilities.
OpenAI Responses API Investigation
API Structure
The OpenAI SDK includes a separate Responses API (client.responses) alongside the Chat Completions API. This API is designed for models with reasoning capabilities (o1/o3) and provides access to thinking/reasoning content.
Key Differences from Chat Completions API
-
Input Format: Uses
inputarray instead ofmessages- Supports
EasyInputMessagetype with roles:user,assistant,system,developer - Content can be text, image, audio, or file references
- More structured approach with explicit types for each input type
- Supports
-
Streaming Events: Rich set of events for detailed streaming
ResponseReasoningTextDeltaEvent- Incremental reasoning/thinking textResponseReasoningTextDoneEvent- Complete reasoning textResponseTextDeltaEvent- Main response text deltasResponseFunctionCallArgumentsDeltaEvent- Tool call argument streamingResponseCompletedEvent- Final completion with usage stats
-
Response Structure: More complex response object
outputarray containing various output items- Explicit reasoning items with content
- Tool calls as part of output items
- Usage tracking with detailed token breakdowns
Implementation Examples
Basic Responses API Usage
// Creating a response with streaming
const stream = await client.responses.create({
model: "o1-preview",
input: [
{
role: "developer", // or "system" for non-reasoning models
content: "You are a helpful assistant"
},
{
role: "user",
content: "Explain quantum computing step by step"
}
],
stream: true,
temperature: 0.7,
max_completion_tokens: 2000
});
// Process streaming events
for await (const event of stream) {
switch (event.type) {
case 'response.reasoning_text.delta':
// Thinking/reasoning content
console.log('[THINKING]', event.delta);
break;
case 'response.text.delta':
// Main response text
console.log('[RESPONSE]', event.delta);
break;
case 'response.function_call_arguments.delta':
// Tool call arguments being built
console.log('[TOOL ARGS]', event.delta);
break;
case 'response.completed':
// Final response with usage
console.log('Usage:', event.usage);
break;
}
}
Using ResponseStream Helper
// The SDK provides a ResponseStream helper for easier streaming
const responseStream = client.responses.stream({
model: "o1-preview",
input: [
{ role: "user", content: "Solve this math problem..." }
],
tools: [
{
type: "function",
function: {
name: "calculate",
description: "Perform calculations",
parameters: { /* JSON Schema */ }
}
}
]
});
// Get final response after streaming
const finalResponse = await responseStream.finalResponse();
console.log('Output:', finalResponse.output);
console.log('Usage:', finalResponse.usage);
Converting Messages for Responses API
private convertToResponsesInput(messages: Message[], systemPrompt?: string): ResponseInputItem[] {
const input: ResponseInputItem[] = [];
// Add system/developer prompt
if (systemPrompt) {
input.push({
type: "message",
role: this.isReasoningModel() ? "developer" : "system",
content: systemPrompt
});
}
// Convert messages
for (const msg of messages) {
if (msg.role === "user") {
input.push({
type: "message",
role: "user",
content: msg.content
});
} else if (msg.role === "assistant") {
// Assistant messages with potential tool calls
const outputMessage: ResponseOutputMessage = {
type: "message",
role: "assistant",
content: []
};
if (msg.content) {
outputMessage.content.push({
type: "text",
text: msg.content
});
}
if (msg.toolCalls) {
// Tool calls need to be added as separate output items
for (const toolCall of msg.toolCalls) {
input.push({
type: "function_call",
id: toolCall.id,
name: toolCall.name,
arguments: JSON.stringify(toolCall.arguments)
});
}
}
input.push(outputMessage);
} else if (msg.role === "toolResult") {
// Tool results as function call outputs
input.push({
type: "function_call_output",
call_id: msg.toolCallId,
output: msg.content
});
}
}
return input;
}
Processing Responses API Events
private async completeWithResponsesAPI(request: Request, options?: OpenAIOptions): Promise<AssistantMessage> {
try {
const input = this.convertToResponsesInput(request.messages, request.systemPrompt);
const stream = await this.client.responses.create({
model: this.model,
input,
stream: true,
max_completion_tokens: request.maxTokens,
temperature: request.temperature,
tools: request.tools ? this.convertTools(request.tools) : undefined,
tool_choice: options?.toolChoice
});
let content = "";
let thinking = "";
const toolCalls: ToolCall[] = [];
let usage: TokenUsage = { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 };
let finishReason: string = "stop";
for await (const event of stream) {
switch (event.type) {
case 'response.reasoning_text.delta':
thinking += event.delta;
request.onThinking?.(event.delta);
break;
case 'response.reasoning_text.done':
// Complete reasoning text available
thinking = event.text;
break;
case 'response.text.delta':
content += event.delta;
request.onText?.(event.delta);
break;
case 'response.function_call_arguments.delta':
// Build up tool calls incrementally
// event.item_id identifies which tool call
// event.arguments contains the delta
break;
case 'response.function_call_arguments.done':
// Complete tool call
toolCalls.push({
id: event.item_id,
name: event.name,
arguments: JSON.parse(event.arguments)
});
break;
case 'response.completed':
// Final event with complete response and usage
usage = {
input: event.usage.input_tokens,
output: event.usage.output_tokens,
cacheRead: event.usage.input_tokens_details?.cached_tokens || 0,
cacheWrite: 0
};
finishReason = event.stop_reason || "stop";
break;
case 'response.error':
throw new Error(event.error.message);
}
}
return {
role: "assistant",
content: content || undefined,
thinking: thinking || undefined,
toolCalls: toolCalls.length > 0 ? toolCalls : undefined,
model: this.model,
usage,
stopReason: this.mapStopReason(finishReason)
};
} catch (error) {
// Error handling...
}
}
Important Notes
-
"[Thinking: X tokens]" Issue: The current implementation shows a placeholder for thinking tokens in Chat Completions API. This should only show actual thinking content from Responses API or omit the field entirely.
-
Tool Calling Differences: Responses API handles tool calls differently, with separate events for arguments delta and completion.
-
Usage Tracking: Responses API provides more detailed usage information including reasoning tokens in a different structure.
-
Stream vs Iterator: The Responses API returns an async iterable that can be used with
for await...ofdirectly.
Existing Codebase Context
Current Structure
- Monorepo using npm workspaces with packages in
packages/directory - Existing packages:
tui,agent,pods - TypeScript/ESM modules with Node.js ≥20.0.0
- Biome for linting and formatting
- Lockstep versioning at 0.5.8
Package Location
The AI package should be created at packages/ai/ following the existing pattern.
Key Implementation Requirements
Core Features
- Unified Client API - Single interface for all providers
- Streaming First - All providers support streaming, non-streaming is collected events
- Provider Adapters - OpenAI, Anthropic, Gemini adapters
- Event Normalization - Consistent event types across providers
- Tool/Function Calling - Unified interface for tools across providers
- Thinking/Reasoning - Support for reasoning models (o1/o3, Claude thinking, Gemini thinking)
- Token Tracking - Usage and cost calculation
- Abort Support - Request cancellation via AbortController
- Error Mapping - Normalized error handling
- Caching - Automatic caching strategies per provider
Provider-Specific Handling
OpenAI
- Dual APIs: Chat Completions vs Responses API
- Responses API for o1/o3 reasoning content
- Developer role for o1/o3 system prompts
- Stream options for token usage
Anthropic
- Content blocks always arrays
- Separate system parameter
- Tool results as user messages
- Explicit thinking budget allocation
- Cache control per block
Gemini
- Parts-based content system
- Separate systemInstruction parameter
- Model role instead of assistant
- Thinking via part.thought flag
- Function calls in parts array
Implementation Structure
packages/ai/
├── src/
│ ├── index.ts # Main exports
│ ├── types.ts # Unified type definitions
│ ├── client.ts # Main AI client class
│ ├── adapters/
│ │ ├── base.ts # Base adapter interface
│ │ ├── openai.ts # OpenAI adapter
│ │ ├── anthropic.ts # Anthropic adapter
│ │ └── gemini.ts # Gemini adapter
│ ├── models/
│ │ ├── models.ts # Model info lookup
│ │ └── models-data.ts # Generated models database
│ ├── errors.ts # Error mapping
│ ├── events.ts # Event stream handling
│ ├── costs.ts # Cost tracking
│ └── utils.ts # Utility functions
├── test/
│ ├── openai.test.ts
│ ├── anthropic.test.ts
│ └── gemini.test.ts
├── scripts/
│ └── update-models.ts # Update models database
├── package.json
├── tsconfig.build.json
└── README.md
Dependencies
openai: ^5.12.2 (for OpenAI SDK)@anthropic-ai/sdk: Latest@google/genai: Latest
Files to Create/Modify
New Files in packages/ai/
package.json- Package configurationtsconfig.build.json- TypeScript build configsrc/index.ts- Main exportssrc/types.ts- Type definitionssrc/client.ts- Main AI classsrc/adapters/base.ts- Base adaptersrc/adapters/openai.ts- OpenAI implementationsrc/adapters/anthropic.ts- Anthropic implementationsrc/adapters/gemini.ts- Gemini implementationsrc/models/models.ts- Model infosrc/errors.ts- Error handlingsrc/events.ts- Event streamingsrc/costs.ts- Cost trackingREADME.md- Package documentation
Files to Modify
- Root
tsconfig.json- Add path mapping for @mariozechner/pi-ai - Root
package.json- Add to build script order
Implementation Strategy
Phase 1: Core Structure
- Create package structure and configuration
- Define unified types and interfaces
- Implement base adapter interface
Phase 2: Provider Adapters
- Implement OpenAI adapter (both APIs)
- Implement Anthropic adapter
- Implement Gemini adapter
Phase 3: Features
- Add streaming support
- Implement tool calling
- Add thinking/reasoning support
- Implement token tracking
Phase 4: Polish
- Error mapping and handling
- Cost calculation
- Model information database
- Documentation and examples
Testing Approach
- Unit tests for each adapter
- Integration tests with mock responses
- Example scripts for manual testing
- Verify streaming, tools, thinking for each provider