co-mono/packages/coding-agent/docs/gemini.md
Mario Zechner 99b4b1aca0 Add Mistral as AI provider
- Add Mistral to KnownProvider type and model generation
- Implement Mistral-specific compat handling in openai-completions:
  - requiresToolResultName: tool results need name field
  - requiresAssistantAfterToolResult: synthetic assistant message between tool/user
  - requiresThinkingAsText: thinking blocks as <thinking> text
  - requiresMistralToolIds: tool IDs must be exactly 9 alphanumeric chars
- Add MISTRAL_API_KEY environment variable support
- Add Mistral tests across all test files
- Update documentation (README, CHANGELOG) for both ai and coding-agent packages
- Remove client IDs from gemini.md, reference upstream source instead

Closes #165
2025-12-10 20:36:19 +01:00

8.4 KiB

Gemini OAuth Integration Guide

This document provides a comprehensive analysis of how OAuth authentication could be implemented for Google Gemini in the pi coding-agent, based on the existing Anthropic OAuth implementation and the Gemini CLI's approach.

Table of Contents

  1. Current Anthropic OAuth Implementation
  2. Gemini CLI Authentication Analysis
  3. Gemini API Capabilities
  4. Gemini API Endpoints
  5. Implementation Plan

Current Anthropic OAuth Implementation

The pi coding-agent implements OAuth for Anthropic with the following architecture:

Key Components

  1. OAuth Flow (packages/coding-agent/src/core/oauth/anthropic.ts):

    • Uses PKCE (Proof Key for Code Exchange) flow for security
    • Client ID: 9d1c250a-e61b-44d9-88ed-5944d1962f5e
    • Authorization URL: https://claude.ai/oauth/authorize
    • Token URL: https://console.anthropic.com/v1/oauth/token
    • Scopes: org:create_api_key user:profile user:inference
  2. Token Storage (packages/coding-agent/src/core/oauth/storage.ts):

    • Stores credentials in ~/.pi/agent/oauth.json
    • File permissions set to 0600 (owner read/write only)
    • Format: { provider: { type: "oauth", refresh: string, access: string, expires: number } }
  3. Token Management (packages/coding-agent/src/core/oauth/index.ts):

    • Auto-refresh tokens when expired (with 5-minute buffer)
    • Supports multiple providers through SupportedOAuthProvider type
    • Provider info includes id, name, and availability status
  4. Model Integration (packages/coding-agent/src/core/model-config.ts):

    • Checks OAuth tokens first, then environment variables
    • OAuth status cached to avoid repeated file reads
    • Maps providers to OAuth providers via providerToOAuthProvider

Authentication Flow

  1. User initiates login with pi auth login
  2. Authorization URL is generated with PKCE challenge
  3. User opens URL in browser and authorizes
  4. User copies authorization code (format: code#state)
  5. Code is exchanged for access/refresh tokens
  6. Tokens are saved encrypted with expiry time

Gemini CLI Authentication Analysis

The Gemini CLI uses a more complex OAuth implementation with several key differences:

Authentication Methods

Gemini supports multiple authentication types:

  • LOGIN_WITH_GOOGLE (OAuth personal account)
  • USE_GEMINI (API key)
  • USE_VERTEX_AI (Vertex AI)
  • COMPUTE_ADC (Application Default Credentials)

OAuth Implementation Details

  1. OAuth Configuration:

    • Client ID and Secret: See google-gemini/gemini-cli oauth2.ts (public for installed apps per Google's OAuth docs)
    • Scopes:
      • https://www.googleapis.com/auth/cloud-platform
      • https://www.googleapis.com/auth/userinfo.email
      • https://www.googleapis.com/auth/userinfo.profile
  2. Authentication Flows:

    • Web Flow: Opens browser, runs local HTTP server for callback
    • User Code Flow: For environments without browser (NO_BROWSER=true)
    • Uses Google's google-auth-library for OAuth handling
  3. Token Storage:

    • Supports encrypted storage via OAuthCredentialStorage
    • Falls back to plain JSON storage
    • Stores user info (email) separately
  4. API Integration:

    • Uses CodeAssistServer for API calls
    • Endpoint: https://cloudcode-pa.googleapis.com
    • Includes user tier information (FREE, STANDARD, etc.)

Gemini API Capabilities

Based on the Gemini CLI analysis:

System Prompts

Yes, Gemini supports system prompts

  • Implemented via getCoreSystemPrompt() in the codebase
  • System instructions are part of the GenerateContentParameters

Tools/Function Calling

Yes, Gemini supports tools and function calling

  • Uses the Tool type from @google/genai
  • Extensive tool support including:
    • File system operations (read, write, edit)
    • Web search and fetch
    • MCP (Model Context Protocol) tools
    • Custom tool registration

Content Generation

  • Supports streaming and non-streaming generation
  • Token counting capabilities
  • Embedding support
  • Context compression for long conversations

Gemini API Endpoints

When using OAuth tokens, the Gemini CLI talks to:

Primary Endpoint

  • Base URL: https://cloudcode-pa.googleapis.com
  • API Version: v1internal

Key Methods

  • generateContent - Non-streaming content generation
  • streamGenerateContent - Streaming content generation
  • countTokens - Token counting
  • embedContent - Text embeddings
  • loadCodeAssist - User setup and tier information
  • onboardUser - User onboarding

Authentication

  • OAuth tokens are passed via AuthClient from google-auth-library
  • Tokens are automatically refreshed by the library
  • Project ID and session ID included in requests

Implementation Plan

1. Add Gemini OAuth Provider Support

File: packages/coding-agent/src/core/oauth/gemini.ts

import { OAuth2Client } from 'google-auth-library';
import { type OAuthCredentials, saveOAuthCredentials } from "./storage.js";

// OAuth credentials from google-gemini/gemini-cli:
// https://github.com/google-gemini/gemini-cli/blob/main/packages/core/src/code_assist/oauth2.ts
const SCOPES = [
  "https://www.googleapis.com/auth/cloud-platform",
  "https://www.googleapis.com/auth/userinfo.email",
  "https://www.googleapis.com/auth/userinfo.profile"
];

export async function loginGemini(
  onAuthUrl: (url: string) => void,
  onPromptCode: () => Promise<string>,
): Promise<void> {
  // Implementation similar to Anthropic but using google-auth-library
}

export async function refreshGeminiToken(refreshToken: string): Promise<OAuthCredentials> {
  // Use google-auth-library for refresh
}

2. Update OAuth Index

File: packages/coding-agent/src/core/oauth/index.ts

export type SupportedOAuthProvider = "anthropic" | "github-copilot" | "gemini";

// Add Gemini to provider list
{
  id: "gemini",
  name: "Google Gemini (Code Assist)",
  available: true,
}

// Add cases for Gemini in login/refresh functions

3. Create Gemini API Client

File: packages/ai/src/providers/gemini-oauth.ts

export class GeminiOAuthProvider implements Provider {
  // Implement Provider interface
  // Use CodeAssistServer approach from Gemini CLI
  // Map to standard pi-ai API format
}

4. Update Model Configuration

File: packages/coding-agent/src/core/model-config.ts

// Add to providerToOAuthProvider mapping
gemini: "gemini",

// Add Gemini OAuth token check
if (model.provider === "gemini") {
  const oauthToken = await getOAuthToken("gemini");
  if (oauthToken) return oauthToken;
  const oauthEnv = process.env.GEMINI_OAUTH_TOKEN;
  if (oauthEnv) return oauthEnv;
}

5. Dependencies

Add to package.json:

{
  "dependencies": {
    "google-auth-library": "^9.0.0"
  }
}

6. Environment Variables

Support these environment variables:

  • GEMINI_OAUTH_TOKEN - Manual OAuth token
  • GOOGLE_CLOUD_PROJECT - For project-specific features
  • NO_BROWSER - Force user code flow

Key Differences from Anthropic Implementation

  1. Authentication Library: Use google-auth-library instead of manual OAuth
  2. Multiple Auth Types: Support OAuth, API key, and ADC
  3. User Info: Fetch and cache user email/profile
  4. Project Context: Include project ID in API calls
  5. Tier Management: Handle user tier (FREE/STANDARD) responses

Challenges and Considerations

  1. API Access: The Code Assist API (cloudcode-pa.googleapis.com) might require special access or be in preview
  2. Model Naming: Need to map Gemini model names to Code Assist equivalents
  3. Rate Limits: Handle tier-based rate limits
  4. Error Handling: Map Google-specific errors to pi error types
  5. Token Scopes: Ensure scopes are sufficient for all operations

Testing Plan

  1. Test OAuth flow (browser and NO_BROWSER modes)
  2. Test token refresh
  3. Test API calls with OAuth tokens
  4. Test fallback to API keys
  5. Test error scenarios (invalid tokens, network errors)
  6. Test model switching and tier limits

Migration Path

  1. Users with GEMINI_API_KEY continue working unchanged
  2. New pi auth login gemini command for OAuth
  3. OAuth takes precedence over API keys when available
  4. Clear messaging about benefits (higher limits, better features)