co-mono/packages/ai
Mario Zechner d073953ef7 feat(ai): Add zAI provider support
- Add 'zai' as a KnownProvider type
- Add ZAI_API_KEY environment variable mapping
- Generate 4 zAI models (glm-4.5-air, glm-4.5v, etc.) using anthropic-messages API
- Add comprehensive test coverage for zAI provider in generate.test.ts and empty.test.ts
- Models support reasoning/thinking capabilities and tool calling
2025-09-07 00:09:15 +02:00
..
scripts feat(ai): Add zAI provider support 2025-09-07 00:09:15 +02:00
src feat(ai): Add zAI provider support 2025-09-07 00:09:15 +02:00
test feat(ai): Add zAI provider support 2025-09-07 00:09:15 +02:00
package.json chore: bump version to 0.5.30 2025-09-04 12:42:18 +02:00
README.md refactor(ai): Simplify API with new streaming interface and model management 2025-09-03 01:25:19 +02:00
tsconfig.build.json feat(ai): Create unified AI package with OpenAI, Anthropic, and Gemini support 2025-08-17 20:18:45 +02:00
vitest.config.ts feat(ai): Migrate tests to Vitest and add provider test coverage 2025-08-29 21:32:45 +02:00

@mariozechner/pi-ai

Unified LLM API with automatic model discovery, provider configuration, token and cost tracking, and simple context persistence and hand-off to other models mid-session.

Note: This library only includes models that support tool calling (function calling), as this is essential for agentic workflows.

Supported Providers

  • OpenAI
  • Anthropic
  • Google
  • Groq
  • Cerebras
  • xAI
  • OpenRouter
  • Any OpenAI-compatible API: Ollama, vLLM, LM Studio, etc.

Installation

npm install @mariozechner/pi-ai

Quick Start

import { getModel, stream, complete, Context, Tool } from '@mariozechner/pi-ai';

// Fully typed with auto-complete support for both providers and models
const model = getModel('openai', 'gpt-4o-mini');

// Define tools
const tools: Tool[] = [{
  name: 'get_time',
  description: 'Get the current time',
  parameters: {
    type: 'object',
    properties: {},
    required: []
  }
}];

// Build a conversation context (easily serializable and transferable between models)
const context: Context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'What time is it?' }],
  tools
};

// Option 1: Streaming with all event types
const s = stream(model, context);

for await (const event of s) {
  switch (event.type) {
    case 'start':
      console.log(`Starting with ${event.partial.model}`);
      break;
    case 'text_start':
      console.log('\n[Text started]');
      break;
    case 'text_delta':
      process.stdout.write(event.delta);
      break;
    case 'text_end':
      console.log('\n[Text ended]');
      break;
    case 'thinking_start':
      console.log('[Model is thinking...]');
      break;
    case 'thinking_delta':
      process.stdout.write(event.delta);
      break;
    case 'thinking_end':
      console.log('[Thinking complete]');
      break;
    case 'toolCall':
      console.log(`\nTool called: ${event.toolCall.name}`);
      break;
    case 'done':
      console.log(`\nFinished: ${event.reason}`);
      break;
    case 'error':
      console.error(`Error: ${event.error}`);
      break;
  }
}

// Get the final message after streaming, add it to the context
const finalMessage = await s.finalMessage();
context.messages.push(finalMessage);

// Handle tool calls if any
const toolCalls = finalMessage.content.filter(b => b.type === 'toolCall');
for (const call of toolCalls) {
  // Execute the tool
  const result = call.name === 'get_time'
    ? new Date().toISOString()
    : 'Unknown tool';

  // Add tool result to context
  context.messages.push({
    role: 'toolResult',
    toolCallId: call.id,
    toolName: call.name,
    content: result,
    isError: false
  });
}

// Continue if there were tool calls
if (toolCalls.length > 0) {
  const continuation = await complete(model, context);
  context.messages.push(continuation);
  console.log('After tool execution:', continuation.content);
}

console.log(`Total tokens: ${finalMessage.usage.input} in, ${finalMessage.usage.output} out`);
console.log(`Cost: $${finalMessage.usage.cost.total.toFixed(4)}`);

// Option 2: Get complete response without streaming
const response = await complete(model, context);

for (const block of response.content) {
  if (block.type === 'text') {
    console.log(block.text);
  } else if (block.type === 'toolCall') {
    console.log(`Tool: ${block.name}(${JSON.stringify(block.arguments)})`);
  }
}

Image Input

Models with vision capabilities can process images. You can check if a model supports images via the input property. If you pass images to a non-vision model, they are silently ignored.

import { readFileSync } from 'fs';
import { getModel, complete } from '@mariozechner/pi-ai';

const model = getModel('openai', 'gpt-4o-mini');

// Check if model supports images
if (model.input.includes('image')) {
  console.log('Model supports vision');
}

const imageBuffer = readFileSync('image.png');
const base64Image = imageBuffer.toString('base64');

const response = await complete(model, {
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'What is in this image?' },
      { type: 'image', data: base64Image, mimeType: 'image/png' }
    ]
  }]
});

// Access the response
for (const block of response.content) {
  if (block.type === 'text') {
    console.log(block.text);
  }
}

Thinking/Reasoning

Many models support thinking/reasoning capabilities where they can show their internal thought process. You can check if a model supports reasoning via the reasoning property. If you pass reasoning options to a non-reasoning model, they are silently ignored.

Unified Interface (streamSimple/completeSimple)

import { getModel, streamSimple, completeSimple } from '@mariozechner/pi-ai';

// Many models across providers support thinking/reasoning
const model = getModel('anthropic', 'claude-sonnet-4-20250514');
// or getModel('openai', 'gpt-5-mini');
// or getModel('google', 'gemini-2.5-flash');
// or getModel('xai', 'grok-code-fast-1');
// or getModel('groq', 'openai/gpt-oss-20b');
// or getModel('cerebras', 'gpt-oss-120b');
// or getModel('openrouter', 'z-ai/glm-4.5v');

// Check if model supports reasoning
if (model.reasoning) {
  console.log('Model supports reasoning/thinking');
}

// Use the simplified reasoning option
const response = await completeSimple(model, {
  messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13' }]
}, {
  reasoning: 'medium'  // 'minimal' | 'low' | 'medium' | 'high'
});

// Access thinking and text blocks
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log('Thinking:', block.thinking);
  } else if (block.type === 'text') {
    console.log('Response:', block.text);
  }
}

Provider-Specific Options (stream/complete)

For fine-grained control, use the provider-specific options:

import { getModel, complete } from '@mariozechner/pi-ai';

// OpenAI Reasoning (o1, o3, gpt-5)
const openaiModel = getModel('openai', 'gpt-5-mini');
await complete(openaiModel, context, {
  reasoningEffort: 'medium',
  reasoningSummary: 'detailed'  // OpenAI Responses API only
});

// Anthropic Thinking (Claude Sonnet 4)
const anthropicModel = getModel('anthropic', 'claude-sonnet-4-20250514');
await complete(anthropicModel, context, {
  thinkingEnabled: true,
  thinkingBudgetTokens: 8192  // Optional token limit
});

// Google Gemini Thinking
const googleModel = getModel('google', 'gemini-2.5-flash');
await complete(googleModel, context, {
  thinking: {
    enabled: true,
    budgetTokens: 8192  // -1 for dynamic, 0 to disable
  }
});

Streaming Thinking Content

When streaming, thinking content is delivered through specific events:

const s = streamSimple(model, context, { reasoning: 'high' });

for await (const event of s) {
  switch (event.type) {
    case 'thinking_start':
      console.log('[Model started thinking]');
      break;
    case 'thinking_delta':
      process.stdout.write(event.delta);  // Stream thinking content
      break;
    case 'thinking_end':
      console.log('\n[Thinking complete]');
      break;
  }
}

Errors & Abort Signal

When a request ends with an error (including aborts), the API returns an AssistantMessage with:

  • stopReason: 'error' - Indicates the request ended with an error
  • error: string - Error message describing what happened
  • content: array - Partial content accumulated before the error
  • usage: Usage - Token counts and costs (may be incomplete depending on when error occurred)

Aborting

The abort signal allows you to cancel in-progress requests. Aborted requests return an AssistantMessage with stopReason === 'error'.

import { getModel, stream } from '@mariozechner/pi-ai';

const model = getModel('openai', 'gpt-4o-mini');
const controller = new AbortController();

// Abort after 2 seconds
setTimeout(() => controller.abort(), 2000);

const s = stream(model, {
  messages: [{ role: 'user', content: 'Write a long story' }]
}, {
  signal: controller.signal
});

for await (const event of s) {
  if (event.type === 'text_delta') {
    process.stdout.write(event.delta);
  } else if (event.type === 'error') {
    console.log('Error:', event.error);
  }
}

// Get results (may be partial if aborted)
const response = await s.finalMessage();
if (response.stopReason === 'error') {
  console.log('Error:', response.error);
  console.log('Partial content received:', response.content);
  console.log('Tokens used:', response.usage);
}

Continuing After Abort

Aborted messages can be added to the conversation context and continued in subsequent requests:

const context = {
  messages: [
    { role: 'user', content: 'Explain quantum computing in detail' }
  ]
};

// First request gets aborted after 2 seconds
const controller1 = new AbortController();
setTimeout(() => controller1.abort(), 2000);

const partial = await complete(model, context, { signal: controller1.signal });

// Add the partial response to context
context.messages.push(partial);
context.messages.push({ role: 'user', content: 'Please continue' });

// Continue the conversation
const continuation = await complete(model, context);

APIs, Models, and Providers

The library implements 4 API interfaces, each with its own streaming function and options:

  • anthropic-messages: Anthropic's Messages API (streamAnthropic, AnthropicOptions)
  • google-generative-ai: Google's Generative AI API (streamGoogle, GoogleOptions)
  • openai-completions: OpenAI's Chat Completions API (streamOpenAICompletions, OpenAICompletionsOptions)
  • openai-responses: OpenAI's Responses API (streamOpenAIResponses, OpenAIResponsesOptions)

Providers and Models

A provider offers models through a specific API. For example:

  • Anthropic models use the anthropic-messages API
  • Google models use the google-generative-ai API
  • OpenAI models use the openai-responses API
  • xAI, Cerebras, Groq, etc. models use the openai-completions API (OpenAI-compatible)

Querying Providers and Models

import { getProviders, getModels, getModel } from '@mariozechner/pi-ai';

// Get all available providers
const providers = getProviders();
console.log(providers); // ['openai', 'anthropic', 'google', 'xai', 'groq', ...]

// Get all models from a provider (fully typed)
const anthropicModels = getModels('anthropic');
for (const model of anthropicModels) {
  console.log(`${model.id}: ${model.name}`);
  console.log(`  API: ${model.api}`); // 'anthropic-messages'
  console.log(`  Context: ${model.contextWindow} tokens`);
  console.log(`  Vision: ${model.input.includes('image')}`);
  console.log(`  Reasoning: ${model.reasoning}`);
}

// Get a specific model (both provider and model ID are auto-completed in IDEs)
const model = getModel('openai', 'gpt-4o-mini');
console.log(`Using ${model.name} via ${model.api} API`);

Custom Models

You can create custom models for local inference servers or custom endpoints:

import { Model, stream } from '@mariozechner/pi-ai';

// Example: Ollama using OpenAI-compatible API
const ollamaModel: Model<'openai-completions'> = {
  id: 'llama-3.1-8b',
  name: 'Llama 3.1 8B (Ollama)',
  api: 'openai-completions',
  provider: 'ollama',
  baseUrl: 'http://localhost:11434/v1',
  reasoning: false,
  input: ['text'],
  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 128000,
  maxTokens: 32000
};

// Use the custom model
const response = await stream(ollamaModel, context, {
  apiKey: 'dummy' // Ollama doesn't need a real key
});

Type Safety

Models are typed by their API, ensuring type-safe options:

// TypeScript knows this is an Anthropic model
const claude = getModel('anthropic', 'claude-sonnet-4-20250514');

// So these options are type-checked for AnthropicOptions
await stream(claude, context, {
  thinkingEnabled: true,      // ✓ Valid for anthropic-messages
  thinkingBudgetTokens: 2048, // ✓ Valid for anthropic-messages
  // reasoningEffort: 'high'  // ✗ TypeScript error: not valid for anthropic-messages
});

Cross-Provider Handoffs

The library supports seamless handoffs between different LLM providers within the same conversation. This allows you to switch models mid-conversation while preserving context, including thinking blocks, tool calls, and tool results.

How It Works

When messages from one provider are sent to a different provider, the library automatically transforms them for compatibility:

  • User and tool result messages are passed through unchanged
  • Assistant messages from the same provider/API are preserved as-is
  • Assistant messages from different providers have their thinking blocks converted to text with <thinking> tags
  • Tool calls and regular text are preserved unchanged

Example: Multi-Provider Conversation

import { getModel, complete, Context } from '@mariozechner/pi-ai';

// Start with Claude
const claude = getModel('anthropic', 'claude-sonnet-4-20250514');
const context: Context = {
  messages: []
};

context.messages.push({ role: 'user', content: 'What is 25 * 18?' });
const claudeResponse = await complete(claude, context, {
  thinkingEnabled: true
});
context.messages.push(claudeResponse);

// Switch to GPT-5 - it will see Claude's thinking as <thinking> tagged text
const gpt5 = getModel('openai', 'gpt-5-mini');
context.messages.push({ role: 'user', content: 'Is that calculation correct?' });
const gptResponse = await complete(gpt5, context);
context.messages.push(gptResponse);

// Switch to Gemini
const gemini = getModel('google', 'gemini-2.5-flash');
context.messages.push({ role: 'user', content: 'What was the original question?' });
const geminiResponse = await complete(gemini, context);

Provider Compatibility

All providers can handle messages from other providers, including:

  • Text content
  • Tool calls and tool results
  • Thinking/reasoning blocks (transformed to tagged text for cross-provider compatibility)
  • Aborted messages with partial content

This enables flexible workflows where you can:

  • Start with a fast model for initial responses
  • Switch to a more capable model for complex reasoning
  • Use specialized models for specific tasks
  • Maintain conversation continuity across provider outages

Context Serialization

The Context object can be easily serialized and deserialized using standard JSON methods, making it simple to persist conversations, implement chat history, or transfer contexts between services:

import { Context, getModel, complete } from '@mariozechner/pi-ai';

// Create and use a context
const context: Context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'What is TypeScript?' }
  ]
};

const model = getModel('openai', 'gpt-4o-mini');
const response = await complete(model, context);
context.messages.push(response);

// Serialize the entire context
const serialized = JSON.stringify(context);
console.log('Serialized context size:', serialized.length, 'bytes');

// Save to database, localStorage, file, etc.
localStorage.setItem('conversation', serialized);

// Later: deserialize and continue the conversation
const restored: Context = JSON.parse(localStorage.getItem('conversation')!);
restored.messages.push({ role: 'user', content: 'Tell me more about its type system' });

// Continue with any model
const newModel = getModel('anthropic', 'claude-3-5-haiku-20241022');
const continuation = await complete(newModel, restored);

Note

: If the context contains images (encoded as base64 as shown in the Image Input section), those will also be serialized.

Browser Usage

The library supports browser environments. You must pass the API key explicitly since environment variables are not available in browsers:

import { getModel, complete } from '@mariozechner/pi-ai';

// API key must be passed explicitly in browser
const model = getModel('anthropic', 'claude-3-5-haiku-20241022');

const response = await complete(model, {
  messages: [{ role: 'user', content: 'Hello!' }]
}, {
  apiKey: 'your-api-key'
});

Security Warning: Exposing API keys in frontend code is dangerous. Anyone can extract and abuse your keys. Only use this approach for internal tools or demos. For production applications, use a backend proxy that keeps your API keys secure.

Environment Variables (Node.js only)

In Node.js environments, you can set environment variables to avoid passing API keys:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
GROQ_API_KEY=gsk_...
CEREBRAS_API_KEY=csk-...
XAI_API_KEY=xai-...
OPENROUTER_API_KEY=sk-or-...

When set, the library automatically uses these keys:

// Uses OPENAI_API_KEY from environment
const model = getModel('openai', 'gpt-4o-mini');
const response = await complete(model, context);

// Or override with explicit key
const response = await complete(model, context, {
  apiKey: 'sk-different-key'
});

License

MIT