co-mono/packages/browser-extension
2025-10-02 02:15:33 +02:00
..
scripts iframe and Firefox fixes 2025-10-02 02:15:33 +02:00
src iframe and Firefox fixes 2025-10-02 02:15:33 +02:00
icon-16.png feat: add cross-browser extension with AI reading assistant 2025-10-01 04:33:56 +02:00
icon-48.png feat: add cross-browser extension with AI reading assistant 2025-10-01 04:33:56 +02:00
icon-128.png feat: add cross-browser extension with AI reading assistant 2025-10-01 04:33:56 +02:00
manifest.chrome.json iframe and Firefox fixes 2025-10-02 02:15:33 +02:00
manifest.firefox.json iframe and Firefox fixes 2025-10-02 02:15:33 +02:00
package.json More browser extension work, disable ajv validation in browser extensions, it uses eval/new Function, which is not allowed in manifest v3 extensions 2025-10-01 20:30:49 +02:00
PLAN.md More browser extension work. Old interface fully ported. Direct transport. Small UX fixes. 2025-10-01 18:27:40 +02:00
README.md More browser extension work. Old interface fully ported. Direct transport. Small UX fixes. 2025-10-01 18:27:40 +02:00
tsconfig.build.json feat: add cross-browser extension with AI reading assistant 2025-10-01 04:33:56 +02:00
tsconfig.json feat: add cross-browser extension with AI reading assistant 2025-10-01 04:33:56 +02:00

Pi Reader Browser Extension

A cross-browser extension that provides an AI-powered reading assistant in a side panel (Chrome/Edge) or sidebar (Firefox), built with mini-lit components and Tailwind CSS v4.

Browser Support

  • Chrome/Edge - Uses Side Panel API (Manifest V3)
  • Firefox - Uses Sidebar Action API (Manifest V3)
  • Opera - Sidebar support (untested but should work with Firefox manifest)

Architecture

High-Level Overview

The extension is a full-featured AI chat interface that runs in your browser's side panel/sidebar. It can communicate with AI providers in two ways:

  1. Direct Mode (default) - Calls AI provider APIs directly from the browser using API keys stored locally
  2. Proxy Mode - Routes requests through a proxy server using an auth token

Browser Adaptation:

  • Chrome/Edge - Side Panel API for dedicated panel UI
  • Firefox - Sidebar Action API for sidebar UI
  • Page Content Access - Uses chrome.scripting.executeScript to extract page text

Core Architecture Layers

┌─────────────────────────────────────────────────────────────┐
│  UI Layer (sidepanel.ts)                                    │
│  ├─ Header (theme toggle, settings)                         │
│  └─ ChatPanel                                                │
│      └─ AgentInterface (main chat UI)                       │
│          ├─ MessageList (stable messages)                   │
│          ├─ StreamingMessageContainer (live updates)        │
│          └─ MessageEditor (input + attachments)             │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│  State Layer (state/)                                        │
│  └─ AgentSession                                             │
│      ├─ Manages conversation state                          │
│      ├─ Coordinates transport                                │
│      └─ Handles tool execution                               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│  Transport Layer (state/transports/)                         │
│  ├─ DirectTransport (uses KeyStore for API keys)            │
│  └─ ProxyTransport (uses auth token + proxy server)         │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│  AI Provider APIs / Proxy Server                             │
│  (Anthropic, OpenAI, Google, etc.)                          │
└─────────────────────────────────────────────────────────────┘

Directory Structure by Responsibility

src/
├── UI Components (what users see)
│   ├── sidepanel.ts              # App entry point, header
│   ├── ChatPanel.ts              # Main chat container, creates AgentSession
│   ├── AgentInterface.ts         # Complete chat UI (messages + input)
│   ├── MessageList.ts            # Renders stable messages
│   ├── StreamingMessageContainer.ts  # Handles streaming updates
│   ├── Messages.ts               # Message components (user, assistant, tool)
│   ├── MessageEditor.ts          # Input field with attachments
│   ├── ConsoleBlock.ts           # Console-style output display
│   ├── AttachmentTile.ts         # Attachment preview thumbnails
│   ├── AttachmentOverlay.ts     # Full-screen attachment viewer
│   └── ModeToggle.ts             # Toggle between document/text view
│
├── Dialogs (modal interactions)
│   ├── dialogs/
│   │   ├── DialogBase.ts         # Base class for all dialogs
│   │   ├── ModelSelector.ts      # Select AI model
│   │   ├── ApiKeysDialog.ts      # Manage API keys (for direct mode)
│   │   └── PromptDialog.ts       # Simple text input dialog
│
├── State Management (business logic)
│   ├── state/
│   │   ├── agent-session.ts      # Core state manager (pub/sub pattern)
│   │   ├── KeyStore.ts           # API key storage (Chrome local storage)
│   │   └── transports/
│   │       ├── types.ts          # Transport interface definitions
│   │       ├── DirectTransport.ts   # Direct API calls
│   │       └── ProxyTransport.ts    # Proxy server calls
│
├── Tools (AI function calling)
│   ├── tools/
│   │   ├── types.ts              # ToolRenderer interface
│   │   ├── renderer-registry.ts  # Global tool renderer registry
│   │   ├── index.ts              # Tool exports and registration
│   │   └── renderers/            # Custom tool UI renderers
│   │       ├── DefaultRenderer.ts    # Fallback for unknown tools
│   │       ├── CalculateRenderer.ts  # Calculator tool UI
│   │       ├── GetCurrentTimeRenderer.ts
│   │       └── BashRenderer.ts       # Bash command execution UI
│
├── Utilities (shared helpers)
│   └── utils/
│       ├── attachment-utils.ts   # PDF, Office, image processing
│       ├── auth-token.ts         # Proxy auth token management
│       ├── format.ts             # Token usage, cost formatting
│       └── i18n.ts               # Internationalization (EN + DE)
│
└── Entry Points (browser integration)
    ├── background.ts             # Service worker (opens side panel)
    ├── sidepanel.html            # HTML entry point
    └── live-reload.ts            # Hot reload during development

Common Development Tasks

"I want to add a new AI tool"

Tools are functions the AI can call (e.g., calculator, web search, code execution). Here's how to add one:

1. Define the Tool (use @mariozechner/pi-ai)

Tools come from the @mariozechner/pi-ai package. Use existing tools or create custom ones:

// src/tools/my-custom-tool.ts
import type { AgentTool } from "@mariozechner/pi-ai";

export const myCustomTool: AgentTool = {
  name: "my_custom_tool",
  label: "My Custom Tool",
  description: "Does something useful",
  parameters: {
    type: "object",
    properties: {
      input: { type: "string", description: "Input parameter" }
    },
    required: ["input"]
  },
  execute: async (params) => {
    // Your tool logic here
    const result = processInput(params.input);
    return {
      output: result,
      details: { /* any structured data */ }
    };
  }
};

2. Create a Custom Renderer (Optional)

Renderers control how the tool appears in the chat. If you don't create one, DefaultRenderer will be used.

// src/tools/renderers/MyCustomRenderer.ts
import { html } from "@mariozechner/mini-lit";
import type { ToolResultMessage } from "@mariozechner/pi-ai";
import type { ToolRenderer } from "../types.js";

export class MyCustomRenderer implements ToolRenderer {
  renderParams(params: any, isStreaming?: boolean) {
    // Show tool call parameters (e.g., "Searching for: <query>")
    return html`
      <div class="text-sm text-muted-foreground">
        ${isStreaming ? "Processing..." : `Input: ${params.input}`}
      </div>
    `;
  }

  renderResult(params: any, result: ToolResultMessage) {
    // Show tool result (e.g., search results, calculation output)
    if (result.isError) {
      return html`<div class="text-destructive">${result.output}</div>`;
    }
    return html`
      <div class="text-sm">
        <div class="font-medium">Result:</div>
        <div>${result.output}</div>
      </div>
    `;
  }
}

Renderer Tips:

  • Use ConsoleBlock for command output (see BashRenderer.ts)
  • Use <code-block> for code/JSON (from @mariozechner/mini-lit)
  • Use <markdown-block> for markdown content
  • Check isStreaming to show loading states

3. Register the Tool and Renderer

// src/tools/index.ts
import { myCustomTool } from "./my-custom-tool.js";
import { MyCustomRenderer } from "./renderers/MyCustomRenderer.js";
import { registerToolRenderer } from "./renderer-registry.js";

// Register the renderer
registerToolRenderer("my_custom_tool", new MyCustomRenderer());

// Export the tool so ChatPanel can use it
export { myCustomTool };

4. Add Tool to ChatPanel

// src/ChatPanel.ts
import { myCustomTool } from "./tools/index.js";

// In AgentSession constructor:
this.session = new AgentSession({
  initialState: {
    tools: [calculateTool, getCurrentTimeTool, myCustomTool], // Add here
    // ...
  }
});

File Locations:

  • Tool definition: src/tools/my-custom-tool.ts
  • Tool renderer: src/tools/renderers/MyCustomRenderer.ts
  • Registration: src/tools/index.ts (register renderer)
  • Integration: src/ChatPanel.ts (add to tools array)

"I want to change how messages are displayed"

Message components control how conversations appear:

  • User messages: Edit UserMessage in src/Messages.ts
  • Assistant messages: Edit AssistantMessage in src/Messages.ts
  • Tool call cards: Edit ToolMessage in src/Messages.ts
  • Markdown rendering: Comes from @mariozechner/mini-lit (can't customize easily)
  • Code blocks: Comes from @mariozechner/mini-lit (can't customize easily)

Example: Change user message styling

// src/Messages.ts - in UserMessage component
render() {
  return html`
    <div class="py-4 px-4 border-l-4 border-primary bg-primary/5">
      <!-- Your custom styling here -->
      <markdown-block .content=${content}></markdown-block>
    </div>
  `;
}

"I want to add a new model provider"

Models come from @mariozechner/pi-ai. The package supports:

  • anthropic (Claude)
  • openai (GPT)
  • google (Gemini)
  • groq, cerebras, xai, openrouter, etc.

To add a provider:

  1. Ensure @mariozechner/pi-ai supports it (check package docs)
  2. Add API key configuration in src/dialogs/ApiKeysDialog.ts:
    • Add provider to PROVIDERS array
    • Add test model to TEST_MODELS object
  3. Users can then select models via the model selector

No code changes needed - the extension auto-discovers all models from @mariozechner/pi-ai.


"I want to modify the transport layer"

Transport determines how requests reach AI providers:

Direct Mode (Default)

  • File: src/state/transports/DirectTransport.ts
  • How it works: Gets API keys from KeyStore → calls provider APIs directly
  • When to use: Local development, no proxy server
  • Configuration: API keys stored in Chrome local storage

Proxy Mode

  • File: src/state/transports/ProxyTransport.ts
  • How it works: Gets auth token → sends request to proxy server → proxy calls providers
  • When to use: Want to hide API keys, centralized auth, usage tracking
  • Configuration: Auth token stored in localStorage, proxy URL hardcoded

Switch transport mode in ChatPanel:

// src/ChatPanel.ts
this.session = new AgentSession({
  transportMode: "direct", // or "proxy"
  authTokenProvider: async () => getAuthToken(), // Only needed for proxy
  // ...
});

Proxy Server Requirements:

  • Must accept POST to /api/stream endpoint
  • Request format: { model, context, options }
  • Response format: SSE stream with delta events
  • See ProxyTransport.ts for expected event types

To add a new transport:

  1. Create src/state/transports/MyTransport.ts
  2. Implement AgentTransport interface:
    async *run(userMessage, cfg, signal): AsyncIterable<AgentEvent>
    
  3. Register in ChatPanel.ts constructor

"I want to change the system prompt"

System prompts guide the AI's behavior. Change in ChatPanel.ts:

// src/ChatPanel.ts
this.session = new AgentSession({
  initialState: {
    systemPrompt: "You are a helpful AI assistant specialized in code review.",
    // ...
  }
});

Or make it dynamic:

// Read from storage, settings dialog, etc.
const systemPrompt = await chrome.storage.local.get("system-prompt");

"I want to add attachment support for a new file type"

Attachment processing happens in src/utils/attachment-utils.ts:

  1. Add file type detection in loadAttachment():

    if (mimeType === "application/my-format" || fileName.endsWith(".myext")) {
      const { extractedText } = await processMyFormat(arrayBuffer, fileName);
      return { id, type: "document", fileName, mimeType, content, extractedText };
    }
    
  2. Add processor function:

    async function processMyFormat(buffer: ArrayBuffer, fileName: string) {
      // Extract text from your format
      const text = extractTextFromMyFormat(buffer);
      return { extractedText: `<myformat filename="${fileName}">\n${text}\n</myformat>` };
    }
    
  3. Update accepted types in MessageEditor.ts:

    acceptedTypes = "image/*,application/pdf,.myext,...";
    
  4. Optional: Add preview support in AttachmentOverlay.ts

Supported formats:

  • Images: All image/* (preview support)
  • PDF: Text extraction + thumbnail generation
  • Office: DOCX, PPTX, XLSX (text extraction)
  • Text: .txt, .md, .json, .xml, etc.

"I want to customize the UI theme"

The extension uses the Claude theme from @mariozechner/mini-lit. Colors are defined via CSS variables:

Option 1: Override theme variables

/* src/app.css */
@layer base {
  :root {
    --primary: 210 100% 50%;  /* Custom blue */
    --radius: 0.5rem;
  }
}

Option 2: Use a different mini-lit theme

/* src/app.css */
@import "@mariozechner/mini-lit/themes/default.css"; /* Instead of claude.css */

Available variables:

  • --background, --foreground - Base colors
  • --card, --card-foreground - Card backgrounds
  • --primary, --primary-foreground - Primary actions
  • --muted, --muted-foreground - Secondary elements
  • --accent, --accent-foreground - Hover states
  • --destructive - Error/delete actions
  • --border, --input - Border colors
  • --radius - Border radius

"I want to add a new settings option"

Settings currently managed via dialogs. To add persistent settings:

1. Create storage helpers

// src/utils/config.ts (create this file)
export async function getMySetting(): Promise<string> {
  const result = await chrome.storage.local.get("my-setting");
  return result["my-setting"] || "default-value";
}

export async function setMySetting(value: string): Promise<void> {
  await chrome.storage.local.set({ "my-setting": value });
}

2. Create or extend settings dialog

// src/dialogs/SettingsDialog.ts (create this file, similar to ApiKeysDialog)
// Add UI for your setting
// Call getMySetting() / setMySetting() on save

3. Open from header

// src/sidepanel.ts - in settings button onClick
SettingsDialog.open();

4. Use in ChatPanel

// src/ChatPanel.ts
const mySetting = await getMySetting();
this.session = new AgentSession({
  initialState: { /* use mySetting */ }
});

"I want to access the current page content"

Page content extraction is in sidepanel.ts:

// Example: Get page text
const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
const results = await chrome.scripting.executeScript({
  target: { tabId: tab.id },
  func: () => document.body.innerText,
});
const pageText = results[0].result;

To use in chat:

  1. Extract page content in ChatPanel
  2. Add to system prompt or first user message
  3. Or create a tool that reads page content

Permissions required:

  • activeTab - Access current tab
  • scripting - Execute scripts in pages
  • Already configured in manifest.*.json

Transport Modes Explained

Direct Mode (Default)

Flow:

Browser Extension
  → KeyStore (get API key)
  → DirectTransport
  → Provider API (Anthropic/OpenAI/etc.)
  → Stream response back

Pros:

  • No external dependencies
  • Lower latency (direct connection)
  • Works offline for API key management
  • Full control over requests

Cons:

  • API keys stored in browser (secure, but local)
  • Each user needs their own API keys
  • CORS restrictions (some providers may not work)
  • Can't track usage centrally

Setup:

  1. Open extension → Settings → Manage API Keys
  2. Add keys for desired providers (Anthropic, OpenAI, etc.)
  3. Select model and start chatting

Files involved:

  • src/state/transports/DirectTransport.ts - Transport implementation
  • src/state/KeyStore.ts - API key storage
  • src/dialogs/ApiKeysDialog.ts - API key UI

Proxy Mode

Flow:

Browser Extension
  → Auth Token (from localStorage)
  → ProxyTransport
  → Proxy Server (https://genai.mariozechner.at or custom)
  → Provider API
  → Stream response back through proxy

Pros:

  • No API keys in browser
  • Centralized auth/usage tracking
  • Can implement rate limiting, quotas
  • Custom logic server-side
  • No CORS issues

Cons:

  • Requires proxy server setup
  • Additional network hop (latency)
  • Dependency on proxy availability
  • Need to manage auth tokens

Setup:

  1. Get auth token from proxy server admin
  2. Extension prompts for token on first use
  3. Token stored in localStorage
  4. Start chatting (proxy handles provider APIs)

Proxy URL Configuration: Currently hardcoded in ProxyTransport.ts:

const PROXY_URL = "https://genai.mariozechner.at";

To make configurable:

  1. Add storage helper in utils/config.ts
  2. Add UI in SettingsDialog
  3. Pass to ProxyTransport constructor

Proxy Server Requirements:

The proxy server must implement:

Endpoint: POST /api/stream

Request:

{
  model: Model,           // Provider + model ID
  context: Context,       // System prompt, messages, tools
  options: {
    temperature?: number,
    maxTokens?: number,
    reasoning?: string
  }
}

Response: SSE (Server-Sent Events) stream

Event Types:

data: {"type":"start","partial":{...}}
data: {"type":"text_start","contentIndex":0}
data: {"type":"text_delta","contentIndex":0,"delta":"Hello"}
data: {"type":"text_end","contentIndex":0,"contentSignature":"..."}
data: {"type":"thinking_start","contentIndex":1}
data: {"type":"thinking_delta","contentIndex":1,"delta":"..."}
data: {"type":"toolcall_start","contentIndex":2,"id":"...","toolName":"..."}
data: {"type":"toolcall_delta","contentIndex":2,"delta":"..."}
data: {"type":"toolcall_end","contentIndex":2}
data: {"type":"done","reason":"stop","usage":{...}}

Auth: Bearer token in Authorization header

Error Handling:

  • Return 401 for invalid auth → extension clears token and re-prompts
  • Return 4xx/5xx with JSON: {"error":"message"}

Reference Implementation: See src/state/transports/ProxyTransport.ts for full event parsing logic.


Switching Between Modes

At runtime (in ChatPanel):

const mode = await getTransportMode(); // "direct" or "proxy"
this.session = new AgentSession({
  transportMode: mode,
  authTokenProvider: mode === "proxy" ? async () => getAuthToken() : undefined,
  // ...
});

Storage helpers (create these):

// src/utils/config.ts
export type TransportMode = "direct" | "proxy";

export async function getTransportMode(): Promise<TransportMode> {
  const result = await chrome.storage.local.get("transport-mode");
  return (result["transport-mode"] as TransportMode) || "direct";
}

export async function setTransportMode(mode: TransportMode): Promise<void> {
  await chrome.storage.local.set({ "transport-mode": mode });
}

UI for switching (create this):

// src/dialogs/SettingsDialog.ts
// Radio buttons: ○ Direct (use API keys) / ○ Proxy (use auth token)
// On save: setTransportMode(), reload AgentSession

Understanding mini-lit

Before working on the UI, read these files to understand the component library:

  • node_modules/@mariozechner/mini-lit/README.md - Complete component documentation
  • node_modules/@mariozechner/mini-lit/llms.txt - LLM-friendly component reference
  • node_modules/@mariozechner/mini-lit/dist/*.ts - Source files for specific components

Key concepts:

  • Functional Components - Stateless functions that return TemplateResult (Button, Badge, etc.)
  • Custom Elements - Stateful LitElement classes (<theme-toggle>, <markdown-block>, etc.)
  • Reactive State - Use createState() for reactive UI updates
  • Claude Theme - We use the Claude theme from mini-lit

Project Structure

packages/browser-extension/
├── src/
│   ├── app.css                 # Tailwind v4 entry point with Claude theme
│   ├── background.ts           # Service worker for opening side panel
│   ├── sidepanel.html          # Side panel HTML entry point
│   └── sidepanel.ts            # Main side panel app with hot reload
├── scripts/
│   ├── build.mjs               # esbuild bundler configuration
│   └── dev-server.mjs          # WebSocket server for hot reloading
├── manifest.chrome.json        # Chrome/Edge manifest
├── manifest.firefox.json       # Firefox manifest
├── icon-*.png                  # Extension icons
├── dist-chrome/                # Chrome build (git-ignored)
└── dist-firefox/               # Firefox build (git-ignored)

Development Setup

Prerequisites

  1. Install dependencies from monorepo root:

    npm install
    
  2. Build the extension:

    # Build for both browsers
    npm run build -w @mariozechner/pi-reader-extension
    
    # Or build for specific browser
    npm run build:chrome -w @mariozechner/pi-reader-extension
    npm run build:firefox -w @mariozechner/pi-reader-extension
    
  3. Load the extension:

    Chrome/Edge:

    • Open chrome://extensions/ or edge://extensions/
    • Enable "Developer mode"
    • Click "Load unpacked"
    • Select packages/browser-extension/dist-chrome/

    Firefox:

    • Open about:debugging
    • Click "This Firefox"
    • Click "Load Temporary Add-on"
    • Select any file in packages/browser-extension/dist-firefox/

Development Workflow

  1. Start the dev server (from monorepo root):

    # For Chrome development
    npm run dev -w @mariozechner/pi-reader-extension
    
    # For Firefox development
    npm run dev:firefox -w @mariozechner/pi-reader-extension
    

    This runs three processes in parallel:

    • esbuild - Watches and rebuilds TypeScript files
    • Tailwind CSS v4 - Watches and rebuilds styles
    • WebSocket server - Watches dist/ and triggers extension reload
  2. Automatic reloading:

    • Any change to source files triggers a rebuild
    • The WebSocket server detects dist/ changes
    • Side panel connects to ws://localhost:8765
    • Extension auto-reloads via chrome.runtime.reload()
  3. Open the side panel:

    • Click the extension icon in Chrome toolbar
    • Or use Chrome's side panel button (top-right)

Key Files

src/sidepanel.ts

Main application logic:

  • Extracts page content via chrome.scripting.executeScript
  • Manages chat UI with mini-lit components
  • Handles WebSocket connection for hot reload
  • Direct AI API calls (no background worker needed)

src/app.css

Tailwind v4 configuration:

  • Imports Claude theme from mini-lit
  • Uses @source directive to scan mini-lit components
  • Compiled to dist/app.css during build

scripts/build.mjs

Build configuration:

  • Uses esbuild for fast TypeScript bundling
  • Copies static files (HTML, manifest, icons)
  • Supports watch mode for development

scripts/dev-server.mjs

Hot reload server:

  • WebSocket server on port 8765
  • Watches dist/ directory for changes
  • Sends reload messages to connected clients

Working with mini-lit Components

Basic Usage

Read ../../mini-lit/llms.txt and ../../mini-lit/README.md in full. If in doubt, find the component in ../../mini-lit/src/ and read its source file in full.

Tailwind Classes

All standard Tailwind utilities work, plus mini-lit's theme variables:

  • bg-background, text-foreground - Theme-aware colors
  • bg-card, border-border - Component backgrounds
  • text-muted-foreground - Secondary text
  • bg-primary, text-primary-foreground - Primary actions

Troubleshooting

Extension doesn't reload automatically

  • Check WebSocket server is running (port 8765)
  • Check console for connection errors
  • Manually reload at chrome://extensions/

Side panel doesn't open

  • Check manifest permissions
  • Ensure background service worker is loaded
  • Try clicking extension icon directly

Styles not updating

  • Ensure Tailwind watcher is running
  • Check src/app.css imports
  • Clear Chrome extension cache

Building for Production

npm run build -w @mariozechner/pi-reader-extension

This creates an optimized build in dist/ without hot reload code.