diff --git a/CLAUDE.md b/CLAUDE.md index b43ec83..cbc0c18 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -111,6 +111,7 @@ ## Change Tracking +- If the user asks to "push" changes, treat that as permission to commit and push all current workspace changes, not a hand-picked subset, unless the user explicitly scopes the push. - Keep CLI subcommands and HTTP endpoints in sync. - Update `docs/cli.mdx` when CLI behavior changes. - Regenerate `docs/openapi.json` when HTTP contracts change. diff --git a/foundry/CLAUDE.md b/foundry/CLAUDE.md index 027279f..074514f 100644 --- a/foundry/CLAUDE.md +++ b/foundry/CLAUDE.md @@ -1,9 +1,5 @@ # Project Instructions -## Breaking Changes - -Do not preserve legacy compatibility. Implement the best current architecture, even if breaking. - ## Language Policy Use TypeScript for all source code. @@ -44,14 +40,15 @@ Use `pnpm` workspaces and Turborepo. - Tail compose logs: `just foundry-dev-logs` - Stop the preview stack: `just foundry-preview-down` - Tail preview logs: `just foundry-preview-logs` -- Production deploys should go through `git push` to the deployment branch/workflow. Do not use `railway up` for Foundry deploys. ## Railway Logs - Production Foundry Railway logs can be read from a linked workspace with `railway logs --deployment --lines 200` or `railway logs --deployment --lines 200`. +- Production deploys should go through `git push` to the deployment branch/workflow. Do not use `railway up` for Foundry deploys. - If Railway logs fail because the workspace is not linked to the correct project/service/environment, run: `railway link --project 33e3e2df-32c5-41c5-a4af-dca8654acb1d --environment cf387142-61fd-4668-8cf7-b3559e0983cb --service 91c7e450-d6d2-481a-b2a4-0a916f4160fc` - That links this directory to the `sandbox-agent` project, `production` environment, and `foundry-api` service. +- Production proxy chain: `api.sandboxagent.dev` routes through Cloudflare → Fastly/Varnish → Railway. When debugging request duplication, timeouts, or retry behavior, check headers like `cf-ray`, `x-varnish`, `x-railway-edge`, and `cdn-loop` to identify which layer is involved. ## Frontend + Client Boundary @@ -118,12 +115,18 @@ For all Rivet/RivetKit implementation: - Every actor key must be prefixed with workspace namespace (`["ws", workspaceId, ...]`). - CLI/TUI/GUI must use `@sandbox-agent/foundry-client` (`packages/client`) for backend access; `rivetkit/client` imports are only allowed inside `packages/client`. - Do not add custom backend REST endpoints (no `/v1/*` shim layer). -- Do not build blocking flows that wait on external systems to become ready or complete. Prefer push-based progression driven by actor messages, events, webhooks, or queue/workflow state changes. -- Do not rely on retries for correctness or normal control flow. If a queue/workflow/external dependency is not ready yet, model that explicitly and resume from a push/event, instead of polling or retry loops. - We own the sandbox-agent project; treat sandbox-agent defects as first-party bugs and fix them instead of working around them. - Keep strict single-writer ownership: each table/row has exactly one actor writer. - Parent actors (`workspace`, `project`, `task`, `history`, `sandbox-instance`) use command-only loops with no timeout. - Periodic syncing lives in dedicated child actors with one timeout cadence each. +- Do not build blocking flows that wait on external systems to become ready or complete. Prefer push-based progression driven by actor messages, events, webhooks, or queue/workflow state changes. +- Use workflows/background commands for any repo sync, sandbox provisioning, agent install, branch restack/rebase, or other multi-step external work. Do not keep user-facing actions/requests open while that work runs. +- `send` policy: always `await` the `send(...)` call itself so enqueue failures surface immediately, but default to `wait: false`. +- Only use `send(..., { wait: true })` for short, bounded mutations that should finish quickly and do not depend on external readiness, polling actors, provider setup, repo/network I/O, or long-running queue drains. +- Request/action contract: wait only until the minimum resource needed for the client's next step exists. Example: task creation may wait for task actor creation/identity, but not for sandbox provisioning or session bootstrap. +- Read paths must not force refresh/sync work inline. Serve the latest cached projection, mark staleness explicitly, and trigger background refresh separately when needed. +- If a workflow needs to resume after some external work completes, model that as workflow state plus follow-up messages/events instead of holding the original request open. +- Do not rely on retries for correctness or normal control flow. If a queue/workflow/external dependency is not ready yet, model that explicitly and resume from a push/event, instead of polling or retry loops. - Actor handle policy: - Prefer explicit `get` or explicit `create` based on workflow intent; do not default to `getOrCreate`. - Use `get`/`getForId` when the actor is expected to already exist; if missing, surface an explicit `Actor not found` error with recovery context. diff --git a/foundry/packages/backend/src/actors/_scripts/generate-actor-migrations.ts b/foundry/packages/backend/src/actors/_scripts/generate-actor-migrations.ts index a9e380d..6b74dd3 100644 --- a/foundry/packages/backend/src/actors/_scripts/generate-actor-migrations.ts +++ b/foundry/packages/backend/src/actors/_scripts/generate-actor-migrations.ts @@ -1,5 +1,6 @@ import { mkdir, readdir, readFile, rm, writeFile } from "node:fs/promises"; import { dirname, join, resolve } from "node:path"; +import { createErrorContext, createFoundryLogger } from "@sandbox-agent/foundry-shared"; type Journal = { entries?: Array<{ @@ -11,6 +12,10 @@ type Journal = { }>; }; +const logger = createFoundryLogger({ + service: "foundry-backend-migrations", +}); + function padMigrationKey(idx: number): string { return `m${String(idx).padStart(4, "0")}`; } @@ -128,8 +133,6 @@ async function main(): Promise { } main().catch((error: unknown) => { - const message = error instanceof Error ? (error.stack ?? error.message) : String(error); - // eslint-disable-next-line no-console - console.error(message); + logger.error(createErrorContext(error), "generate_actor_migrations_failed"); process.exitCode = 1; }); diff --git a/foundry/packages/backend/src/logging.ts b/foundry/packages/backend/src/logging.ts index de16fd5..0bf5170 100644 --- a/foundry/packages/backend/src/logging.ts +++ b/foundry/packages/backend/src/logging.ts @@ -1,11 +1,5 @@ -import { pino } from "pino"; +import { createFoundryLogger } from "@sandbox-agent/foundry-shared"; -const level = process.env.FOUNDRY_LOG_LEVEL ?? process.env.LOG_LEVEL ?? process.env.RIVET_LOG_LEVEL ?? "info"; - -export const logger = pino({ - level, - base: { - service: "foundry-backend", - }, - timestamp: pino.stdTimeFunctions.isoTime, +export const logger = createFoundryLogger({ + service: "foundry-backend", }); diff --git a/foundry/packages/backend/src/services/app-github.ts b/foundry/packages/backend/src/services/app-github.ts index 058bb05..b5b6706 100644 --- a/foundry/packages/backend/src/services/app-github.ts +++ b/foundry/packages/backend/src/services/app-github.ts @@ -1,4 +1,5 @@ import { createHmac, createPrivateKey, createSign, timingSafeEqual } from "node:crypto"; +import { logger } from "../logging.js"; export class GitHubAppError extends Error { readonly status: number; @@ -51,6 +52,10 @@ interface GitHubPageResponse { nextUrl: string | null; } +const githubOAuthLogger = logger.child({ + scope: "github-oauth", +}); + export interface GitHubWebhookEvent { action?: string; installation?: { id: number; account?: { login?: string; type?: string; id?: number } | null }; @@ -167,13 +172,16 @@ export class GitHubAppClient { code, redirect_uri: this.redirectUri, }; - console.log("[github-oauth] exchangeCode request", { - url: `${this.authBaseUrl}/login/oauth/access_token`, - client_id: this.clientId, - redirect_uri: this.redirectUri, - code_length: code.length, - code_prefix: code.slice(0, 6), - }); + githubOAuthLogger.debug( + { + url: `${this.authBaseUrl}/login/oauth/access_token`, + clientId: this.clientId, + redirectUri: this.redirectUri, + codeLength: code.length, + codePrefix: code.slice(0, 6), + }, + "exchange_code_request", + ); const response = await fetch(`${this.authBaseUrl}/login/oauth/access_token`, { method: "POST", @@ -185,10 +193,13 @@ export class GitHubAppClient { }); const responseText = await response.text(); - console.log("[github-oauth] exchangeCode response", { - status: response.status, - body: responseText.slice(0, 300), - }); + githubOAuthLogger.debug( + { + status: response.status, + bodyPreview: responseText.slice(0, 300), + }, + "exchange_code_response", + ); let payload: GitHubTokenResponse; try { payload = JSON.parse(responseText) as GitHubTokenResponse; diff --git a/foundry/packages/cli/src/backend/manager.ts b/foundry/packages/cli/src/backend/manager.ts index d05d937..c120d47 100644 --- a/foundry/packages/cli/src/backend/manager.ts +++ b/foundry/packages/cli/src/backend/manager.ts @@ -6,6 +6,7 @@ import { fileURLToPath } from "node:url"; import { checkBackendHealth } from "@sandbox-agent/foundry-client"; import type { AppConfig } from "@sandbox-agent/foundry-shared"; import { CLI_BUILD_ID } from "../build-id.js"; +import { logger } from "../logging.js"; const HEALTH_TIMEOUT_MS = 1_500; const START_TIMEOUT_MS = 30_000; @@ -237,7 +238,17 @@ async function startBackend(host: string, port: number): Promise { }); child.on("error", (error) => { - console.error(`failed to launch backend: ${String(error)}`); + logger.error( + { + host, + port, + command: launch.command, + args: launch.args, + errorMessage: error instanceof Error ? error.message : String(error), + errorStack: error instanceof Error ? error.stack : undefined, + }, + "failed_to_launch_backend", + ); }); child.unref(); diff --git a/foundry/packages/cli/src/index.ts b/foundry/packages/cli/src/index.ts index a495ea8..3e77291 100644 --- a/foundry/packages/cli/src/index.ts +++ b/foundry/packages/cli/src/index.ts @@ -5,6 +5,7 @@ import { homedir } from "node:os"; import { AgentTypeSchema, CreateTaskInputSchema, type TaskRecord } from "@sandbox-agent/foundry-shared"; import { readBackendMetadata, createBackendClientFromConfig, formatRelativeAge, groupTaskStatus, summarizeTasks } from "@sandbox-agent/foundry-client"; import { ensureBackendRunning, getBackendStatus, parseBackendPort, stopBackend } from "./backend/manager.js"; +import { writeStderr, writeStdout } from "./io.js"; import { openEditorForTask } from "./task-editor.js"; import { spawnCreateTmuxWindow } from "./tmux.js"; import { loadConfig, resolveWorkspace, saveConfig } from "./workspace/config.js"; @@ -87,7 +88,7 @@ function positionals(args: string[]): string[] { } function printUsage(): void { - console.log(` + writeStdout(` Usage: hf backend start [--host HOST] [--port PORT] hf backend stop [--host HOST] [--port PORT] @@ -120,7 +121,7 @@ Tips: } function printStatusUsage(): void { - console.log(` + writeStdout(` Usage: hf status [--workspace WS] [--json] @@ -146,7 +147,7 @@ JSON Output: } function printHistoryUsage(): void { - console.log(` + writeStdout(` Usage: hf history [--workspace WS] [--limit N] [--branch NAME] [--task ID] [--json] @@ -195,13 +196,13 @@ async function handleBackend(args: string[]): Promise { const pid = status.pid ?? "unknown"; const version = status.version ?? "unknown"; const stale = status.running && !status.versionCurrent ? " [outdated]" : ""; - console.log(`running=true pid=${pid} version=${version}${stale} log=${status.logPath}`); + writeStdout(`running=true pid=${pid} version=${version}${stale} log=${status.logPath}`); return; } if (sub === "stop") { await stopBackend(host, port); - console.log(`running=false host=${host} port=${port}`); + writeStdout(`running=false host=${host} port=${port}`); return; } @@ -210,7 +211,7 @@ async function handleBackend(args: string[]): Promise { const pid = status.pid ?? "unknown"; const version = status.version ?? "unknown"; const stale = status.running && !status.versionCurrent ? " [outdated]" : ""; - console.log(`running=${status.running} pid=${pid} version=${version}${stale} host=${host} port=${port} log=${status.logPath}`); + writeStdout(`running=${status.running} pid=${pid} version=${version}${stale} host=${host} port=${port} log=${status.logPath}`); return; } @@ -224,7 +225,7 @@ async function handleBackend(args: string[]): Promise { const inspectorUrl = `https://inspect.rivet.dev?u=${encodeURIComponent(managerEndpoint)}`; const openCmd = process.platform === "darwin" ? "open" : "xdg-open"; spawnSync(openCmd, [inspectorUrl], { stdio: "ignore" }); - console.log(inspectorUrl); + writeStdout(inspectorUrl); return; } @@ -253,7 +254,7 @@ async function handleWorkspace(args: string[]): Promise { // Backend may not be running yet. Config is already updated. } - console.log(`workspace=${name}`); + writeStdout(`workspace=${name}`); } async function handleList(args: string[]): Promise { @@ -265,12 +266,12 @@ async function handleList(args: string[]): Promise { const rows = await client.listTasks(workspaceId); if (format === "json") { - console.log(JSON.stringify(rows, null, 2)); + writeStdout(JSON.stringify(rows, null, 2)); return; } if (rows.length === 0) { - console.log("no tasks"); + writeStdout("no tasks"); return; } @@ -281,7 +282,7 @@ async function handleList(args: string[]): Promise { const task = row.task.length > 60 ? `${row.task.slice(0, 57)}...` : row.task; line += `\t${row.title}\t${task}\t${row.activeSessionId ?? "-"}\t${row.activeSandboxId ?? "-"}`; } - console.log(line); + writeStdout(line); } } @@ -294,7 +295,7 @@ async function handlePush(args: string[]): Promise { const workspaceId = resolveWorkspace(readOption(args, "--workspace"), config); const client = createBackendClientFromConfig(config); await client.runAction(workspaceId, taskId, "push"); - console.log("ok"); + writeStdout("ok"); } async function handleSync(args: string[]): Promise { @@ -306,7 +307,7 @@ async function handleSync(args: string[]): Promise { const workspaceId = resolveWorkspace(readOption(args, "--workspace"), config); const client = createBackendClientFromConfig(config); await client.runAction(workspaceId, taskId, "sync"); - console.log("ok"); + writeStdout("ok"); } async function handleKill(args: string[]): Promise { @@ -320,15 +321,15 @@ async function handleKill(args: string[]): Promise { const abandon = hasFlag(args, "--abandon"); if (deleteBranch) { - console.log("info: --delete-branch flag set, branch will be deleted after kill"); + writeStdout("info: --delete-branch flag set, branch will be deleted after kill"); } if (abandon) { - console.log("info: --abandon flag set, Graphite abandon will be attempted"); + writeStdout("info: --abandon flag set, Graphite abandon will be attempted"); } const client = createBackendClientFromConfig(config); await client.runAction(workspaceId, taskId, "kill"); - console.log("ok"); + writeStdout("ok"); } async function handlePrune(args: string[]): Promise { @@ -341,26 +342,26 @@ async function handlePrune(args: string[]): Promise { const prunable = rows.filter((r) => r.status === "archived" || r.status === "killed"); if (prunable.length === 0) { - console.log("nothing to prune"); + writeStdout("nothing to prune"); return; } for (const row of prunable) { const age = formatRelativeAge(row.updatedAt); - console.log(`${dryRun ? "[dry-run] " : ""}${row.taskId}\t${row.branchName}\t${row.status}\t${age}`); + writeStdout(`${dryRun ? "[dry-run] " : ""}${row.taskId}\t${row.branchName}\t${row.status}\t${age}`); } if (dryRun) { - console.log(`\n${prunable.length} task(s) would be pruned`); + writeStdout(`\n${prunable.length} task(s) would be pruned`); return; } if (!yes) { - console.log("\nnot yet implemented: auto-pruning requires confirmation"); + writeStdout("\nnot yet implemented: auto-pruning requires confirmation"); return; } - console.log(`\n${prunable.length} task(s) would be pruned (pruning not yet implemented)`); + writeStdout(`\n${prunable.length} task(s) would be pruned (pruning not yet implemented)`); } async function handleStatusline(args: string[]): Promise { @@ -375,11 +376,11 @@ async function handleStatusline(args: string[]): Promise { const errorCount = summary.byStatus.error; if (format === "claude-code") { - console.log(`hf:${running}R/${idle}I/${errorCount}E`); + writeStdout(`hf:${running}R/${idle}I/${errorCount}E`); return; } - console.log(`running=${running} idle=${idle} error=${errorCount}`); + writeStdout(`running=${running} idle=${idle} error=${errorCount}`); } async function handleDb(args: string[]): Promise { @@ -387,12 +388,12 @@ async function handleDb(args: string[]): Promise { if (sub === "path") { const config = loadConfig(); const dbPath = config.backend.dbPath.replace(/^~/, homedir()); - console.log(dbPath); + writeStdout(dbPath); return; } if (sub === "nuke") { - console.log("WARNING: hf db nuke would delete the entire database. This is a placeholder and does not delete anything."); + writeStdout("WARNING: hf db nuke would delete the entire database. This is a placeholder and does not delete anything."); return; } @@ -465,12 +466,12 @@ async function handleCreate(args: string[]): Promise { const switched = await client.switchTask(workspaceId, task.taskId); const attached = await client.attachTask(workspaceId, task.taskId); - console.log(`Branch: ${task.branchName ?? "-"}`); - console.log(`Task: ${task.taskId}`); - console.log(`Provider: ${task.providerId}`); - console.log(`Session: ${attached.sessionId ?? "none"}`); - console.log(`Target: ${switched.switchTarget || attached.target}`); - console.log(`Title: ${task.title ?? "-"}`); + writeStdout(`Branch: ${task.branchName ?? "-"}`); + writeStdout(`Task: ${task.taskId}`); + writeStdout(`Provider: ${task.providerId}`); + writeStdout(`Session: ${attached.sessionId ?? "none"}`); + writeStdout(`Target: ${switched.switchTarget || attached.target}`); + writeStdout(`Title: ${task.title ?? "-"}`); const tmuxResult = spawnCreateTmuxWindow({ branchName: task.branchName ?? task.taskId, @@ -479,14 +480,14 @@ async function handleCreate(args: string[]): Promise { }); if (tmuxResult.created) { - console.log(`Window: created (${task.branchName})`); + writeStdout(`Window: created (${task.branchName})`); return; } - console.log(""); - console.log(`Run: hf switch ${task.taskId}`); + writeStdout(""); + writeStdout(`Run: hf switch ${task.taskId}`); if ((switched.switchTarget || attached.target).startsWith("/")) { - console.log(`cd ${switched.switchTarget || attached.target}`); + writeStdout(`cd ${switched.switchTarget || attached.target}`); } } @@ -510,7 +511,7 @@ async function handleStatus(args: string[]): Promise { const summary = summarizeTasks(rows); if (hasFlag(args, "--json")) { - console.log( + writeStdout( JSON.stringify( { workspaceId, @@ -528,16 +529,16 @@ async function handleStatus(args: string[]): Promise { return; } - console.log(`workspace=${workspaceId}`); - console.log(`backend running=${backendStatus.running} pid=${backendStatus.pid ?? "unknown"} version=${backendStatus.version ?? "unknown"}`); - console.log(`tasks total=${summary.total}`); - console.log( + writeStdout(`workspace=${workspaceId}`); + writeStdout(`backend running=${backendStatus.running} pid=${backendStatus.pid ?? "unknown"} version=${backendStatus.version ?? "unknown"}`); + writeStdout(`tasks total=${summary.total}`); + writeStdout( `status queued=${summary.byStatus.queued} running=${summary.byStatus.running} idle=${summary.byStatus.idle} archived=${summary.byStatus.archived} killed=${summary.byStatus.killed} error=${summary.byStatus.error}`, ); const providerSummary = Object.entries(summary.byProvider) .map(([provider, count]) => `${provider}=${count}`) .join(" "); - console.log(`providers ${providerSummary || "-"}`); + writeStdout(`providers ${providerSummary || "-"}`); } async function handleHistory(args: string[]): Promise { @@ -560,12 +561,12 @@ async function handleHistory(args: string[]): Promise { }); if (hasFlag(args, "--json")) { - console.log(JSON.stringify(rows, null, 2)); + writeStdout(JSON.stringify(rows, null, 2)); return; } if (rows.length === 0) { - console.log("no events"); + writeStdout("no events"); return; } @@ -576,7 +577,7 @@ async function handleHistory(args: string[]): Promise { if (payload.length > 120) { payload = `${payload.slice(0, 117)}...`; } - console.log(`${ts}\t${row.kind}\t${target}\t${payload}`); + writeStdout(`${ts}\t${row.kind}\t${target}\t${payload}`); } } @@ -611,19 +612,19 @@ async function handleSwitchLike(cmd: string, args: string[]): Promise { if (cmd === "switch") { const result = await client.switchTask(workspaceId, taskId); - console.log(`cd ${result.switchTarget}`); + writeStdout(`cd ${result.switchTarget}`); return; } if (cmd === "attach") { const result = await client.attachTask(workspaceId, taskId); - console.log(`target=${result.target} session=${result.sessionId ?? "none"}`); + writeStdout(`target=${result.target} session=${result.sessionId ?? "none"}`); return; } if (cmd === "merge" || cmd === "archive") { await client.runAction(workspaceId, taskId, cmd); - console.log("ok"); + writeStdout("ok"); return; } @@ -726,6 +727,6 @@ async function main(): Promise { main().catch((err: unknown) => { const msg = err instanceof Error ? (err.stack ?? err.message) : String(err); - console.error(msg); + writeStderr(msg); process.exit(1); }); diff --git a/foundry/packages/cli/src/io.ts b/foundry/packages/cli/src/io.ts new file mode 100644 index 0000000..b188206 --- /dev/null +++ b/foundry/packages/cli/src/io.ts @@ -0,0 +1,7 @@ +export function writeStdout(message = ""): void { + process.stdout.write(`${message}\n`); +} + +export function writeStderr(message = ""): void { + process.stderr.write(`${message}\n`); +} diff --git a/foundry/packages/cli/src/logging.ts b/foundry/packages/cli/src/logging.ts new file mode 100644 index 0000000..a7c5892 --- /dev/null +++ b/foundry/packages/cli/src/logging.ts @@ -0,0 +1,5 @@ +import { createFoundryLogger } from "@sandbox-agent/foundry-shared"; + +export const logger = createFoundryLogger({ + service: "foundry-cli", +}); diff --git a/foundry/packages/cli/src/tui.ts b/foundry/packages/cli/src/tui.ts index d19a569..d561565 100644 --- a/foundry/packages/cli/src/tui.ts +++ b/foundry/packages/cli/src/tui.ts @@ -2,6 +2,7 @@ import type { AppConfig, TaskRecord } from "@sandbox-agent/foundry-shared"; import { spawnSync } from "node:child_process"; import { createBackendClientFromConfig, filterTasks, formatRelativeAge, groupTaskStatus } from "@sandbox-agent/foundry-client"; import { CLI_BUILD_ID } from "./build-id.js"; +import { writeStdout } from "./io.js"; import { resolveTuiTheme, type TuiTheme } from "./theme.js"; interface KeyEventLike { @@ -412,7 +413,7 @@ export async function runTui(config: AppConfig, workspaceId: string): Promise { const snapshotMetrics = await measureWorkbenchSnapshot(client, workspaceId, 3); snapshotSeries.push(snapshotMetrics); - console.info( - "[workbench-load-snapshot]", - JSON.stringify({ + logger.info( + { taskIndex: taskIndex + 1, ...snapshotMetrics, - }), + }, + "workbench_load_snapshot", ); } @@ -296,7 +309,7 @@ describe("e2e(client): workbench load", () => { snapshotTranscriptFinalCount: lastSnapshot.transcriptEventCount, }; - console.info("[workbench-load-summary]", JSON.stringify(summary)); + logger.info(summary, "workbench_load_summary"); expect(createTaskLatencies.length).toBe(taskCount); expect(provisionLatencies.length).toBe(taskCount); diff --git a/foundry/packages/desktop/package.json b/foundry/packages/desktop/package.json index 825d62d..fee1a61 100644 --- a/foundry/packages/desktop/package.json +++ b/foundry/packages/desktop/package.json @@ -16,6 +16,7 @@ "tsx": "^4" }, "dependencies": { + "@sandbox-agent/foundry-shared": "workspace:*", "@tauri-apps/api": "^2", "@tauri-apps/plugin-shell": "^2" } diff --git a/foundry/packages/desktop/scripts/build-frontend.ts b/foundry/packages/desktop/scripts/build-frontend.ts index 9b9117c..742231e 100644 --- a/foundry/packages/desktop/scripts/build-frontend.ts +++ b/foundry/packages/desktop/scripts/build-frontend.ts @@ -2,15 +2,22 @@ import { execSync } from "node:child_process"; import { cpSync, readFileSync, writeFileSync, rmSync, existsSync } from "node:fs"; import { resolve, dirname } from "node:path"; import { fileURLToPath } from "node:url"; +import { createFoundryLogger } from "@sandbox-agent/foundry-shared"; const __dirname = dirname(fileURLToPath(import.meta.url)); const desktopRoot = resolve(__dirname, ".."); const repoRoot = resolve(desktopRoot, "../../.."); const frontendDist = resolve(desktopRoot, "../frontend/dist"); const destDir = resolve(desktopRoot, "frontend-dist"); +const logger = createFoundryLogger({ + service: "foundry-desktop-build", + bindings: { + script: "build-frontend", + }, +}); function run(cmd: string, opts?: { cwd?: string; env?: NodeJS.ProcessEnv }) { - console.log(`> ${cmd}`); + logger.info({ command: cmd, cwd: opts?.cwd ?? repoRoot }, "run_command"); execSync(cmd, { stdio: "inherit", cwd: opts?.cwd ?? repoRoot, @@ -19,7 +26,7 @@ function run(cmd: string, opts?: { cwd?: string; env?: NodeJS.ProcessEnv }) { } // Step 1: Build the frontend with the desktop-specific backend endpoint -console.log("\n=== Building frontend for desktop ===\n"); +logger.info("building_frontend"); run("pnpm --filter @sandbox-agent/foundry-frontend build", { env: { VITE_HF_BACKEND_ENDPOINT: "http://127.0.0.1:7741/v1/rivet", @@ -27,7 +34,7 @@ run("pnpm --filter @sandbox-agent/foundry-frontend build", { }); // Step 2: Copy dist to frontend-dist/ -console.log("\n=== Copying frontend build output ===\n"); +logger.info({ frontendDist, destDir }, "copying_frontend_dist"); if (existsSync(destDir)) { rmSync(destDir, { recursive: true }); } @@ -39,4 +46,4 @@ let html = readFileSync(indexPath, "utf-8"); html = html.replace(/]*><\/script>\s*/g, ""); writeFileSync(indexPath, html); -console.log("\n=== Frontend build complete ===\n"); +logger.info({ indexPath }, "frontend_build_complete"); diff --git a/foundry/packages/desktop/scripts/build-sidecar.ts b/foundry/packages/desktop/scripts/build-sidecar.ts index 5ec8350..58ef4b0 100644 --- a/foundry/packages/desktop/scripts/build-sidecar.ts +++ b/foundry/packages/desktop/scripts/build-sidecar.ts @@ -2,10 +2,17 @@ import { execSync } from "node:child_process"; import { mkdirSync, existsSync } from "node:fs"; import { resolve, dirname } from "node:path"; import { fileURLToPath } from "node:url"; +import { createFoundryLogger } from "@sandbox-agent/foundry-shared"; const __dirname = dirname(fileURLToPath(import.meta.url)); const desktopRoot = resolve(__dirname, ".."); const sidecarDir = resolve(desktopRoot, "src-tauri/sidecars"); +const logger = createFoundryLogger({ + service: "foundry-desktop-build", + bindings: { + script: "build-sidecar", + }, +}); const isDev = process.argv.includes("--dev"); @@ -35,7 +42,7 @@ const targets: Array<{ bunTarget: string; tripleTarget: string }> = isDev ]; function run(cmd: string, opts?: { cwd?: string; env?: NodeJS.ProcessEnv }) { - console.log(`> ${cmd}`); + logger.info({ command: cmd, cwd: opts?.cwd ?? desktopRoot }, "run_command"); execSync(cmd, { stdio: "inherit", cwd: opts?.cwd ?? desktopRoot, @@ -44,7 +51,7 @@ function run(cmd: string, opts?: { cwd?: string; env?: NodeJS.ProcessEnv }) { } // Step 1: Build the backend with tsup -console.log("\n=== Building backend with tsup ===\n"); +logger.info("building_backend"); run("pnpm --filter @sandbox-agent/foundry-backend build", { cwd: resolve(desktopRoot, "../../.."), }); @@ -55,14 +62,14 @@ mkdirSync(sidecarDir, { recursive: true }); const backendEntry = resolve(desktopRoot, "../backend/dist/index.js"); if (!existsSync(backendEntry)) { - console.error(`Backend build output not found at ${backendEntry}`); + logger.error({ backendEntry }, "backend_build_output_not_found"); process.exit(1); } for (const { bunTarget, tripleTarget } of targets) { const outfile = resolve(sidecarDir, `foundry-backend-${tripleTarget}`); - console.log(`\n=== Compiling sidecar for ${tripleTarget} ===\n`); + logger.info({ bunTarget, tripleTarget, outfile }, "compiling_sidecar"); run(`bun build --compile --target ${bunTarget} ${backendEntry} --outfile ${outfile}`); } -console.log("\n=== Sidecar build complete ===\n"); +logger.info({ targets: targets.map((target) => target.tripleTarget) }, "sidecar_build_complete"); diff --git a/foundry/packages/frontend/src/components/mock-layout.tsx b/foundry/packages/frontend/src/components/mock-layout.tsx index 213109e..baab797 100644 --- a/foundry/packages/frontend/src/components/mock-layout.tsx +++ b/foundry/packages/frontend/src/components/mock-layout.tsx @@ -1,9 +1,11 @@ import { memo, useCallback, useEffect, useLayoutEffect, useMemo, useRef, useState, useSyncExternalStore, type PointerEvent as ReactPointerEvent } from "react"; import { useNavigate } from "@tanstack/react-router"; import { useStyletron } from "baseui"; +import { createErrorContext } from "@sandbox-agent/foundry-shared"; import { PanelLeft, PanelRight } from "lucide-react"; import { useFoundryTokens } from "../app/theme"; +import { logger } from "../logging.js"; import { DiffContent } from "./mock-layout/diff-content"; import { MessageList } from "./mock-layout/message-list"; @@ -437,7 +439,13 @@ const TranscriptPanel = memo(function TranscriptPanel({ await window.navigator.clipboard.writeText(message.text); setCopiedMessageId(message.id); } catch (error) { - console.error("Failed to copy transcript message", error); + logger.error( + { + messageId: message.id, + ...createErrorContext(error), + }, + "failed_to_copy_transcript_message", + ); } }, []); @@ -1108,7 +1116,13 @@ export function MockLayout({ workspaceId, selectedTaskId, selectedSessionId }: M const { tabId } = await taskWorkbenchClient.addTab({ taskId: activeTask.id }); syncRouteSession(activeTask.id, tabId, true); } catch (error) { - console.error("failed to auto-create workbench session", error); + logger.error( + { + taskId: activeTask.id, + ...createErrorContext(error), + }, + "failed_to_auto_create_workbench_session", + ); } finally { autoCreatingSessionForTaskRef.current.delete(activeTask.id); } diff --git a/foundry/packages/frontend/src/components/mock-layout/right-sidebar.tsx b/foundry/packages/frontend/src/components/mock-layout/right-sidebar.tsx index d3c67f3..4adddc4 100644 --- a/foundry/packages/frontend/src/components/mock-layout/right-sidebar.tsx +++ b/foundry/packages/frontend/src/components/mock-layout/right-sidebar.tsx @@ -4,6 +4,8 @@ import { LabelSmall } from "baseui/typography"; import { Archive, ArrowUpFromLine, ChevronRight, FileCode, FilePlus, FileX, FolderOpen, GitPullRequest, PanelRight } from "lucide-react"; import { useFoundryTokens } from "../../app/theme"; +import { createErrorContext } from "@sandbox-agent/foundry-shared"; +import { logger } from "../../logging.js"; import { type ContextMenuItem, ContextMenuOverlay, PanelHeaderBar, SPanel, ScrollBody, useContextMenu } from "./ui"; import { type FileTreeNode, type Task, diffTabId } from "./view-model"; @@ -131,7 +133,13 @@ export const RightSidebar = memo(function RightSidebar({ await window.navigator.clipboard.writeText(path); } catch (error) { - console.error("Failed to copy file path", error); + logger.error( + { + path, + ...createErrorContext(error), + }, + "failed_to_copy_file_path", + ); } }, []); diff --git a/foundry/packages/frontend/src/logging.ts b/foundry/packages/frontend/src/logging.ts new file mode 100644 index 0000000..08b15c6 --- /dev/null +++ b/foundry/packages/frontend/src/logging.ts @@ -0,0 +1,5 @@ +import { createFoundryLogger } from "@sandbox-agent/foundry-shared"; + +export const logger = createFoundryLogger({ + service: "foundry-frontend", +}); diff --git a/foundry/packages/shared/package.json b/foundry/packages/shared/package.json index 1b4f3ee..2dfb8c0 100644 --- a/foundry/packages/shared/package.json +++ b/foundry/packages/shared/package.json @@ -11,6 +11,7 @@ "test": "vitest run" }, "dependencies": { + "pino": "^10.3.1", "zod": "^4.1.5" }, "devDependencies": { diff --git a/foundry/packages/shared/src/index.ts b/foundry/packages/shared/src/index.ts index a8002e0..d3f4b93 100644 --- a/foundry/packages/shared/src/index.ts +++ b/foundry/packages/shared/src/index.ts @@ -1,5 +1,6 @@ export * from "./app-shell.js"; export * from "./contracts.js"; export * from "./config.js"; +export * from "./logging.js"; export * from "./workbench.js"; export * from "./workspace.js"; diff --git a/foundry/packages/shared/src/logging.ts b/foundry/packages/shared/src/logging.ts new file mode 100644 index 0000000..8d9ceb7 --- /dev/null +++ b/foundry/packages/shared/src/logging.ts @@ -0,0 +1,63 @@ +import { pino, type Logger, type LoggerOptions } from "pino"; + +export interface FoundryLoggerOptions { + service: string; + bindings?: Record; + level?: string; +} + +type ProcessLike = { + env?: Record; +}; + +function resolveEnvVar(name: string): string | undefined { + const value = (globalThis as { process?: ProcessLike }).process?.env?.[name]; + if (typeof value !== "string") { + return undefined; + } + + const trimmed = value.trim(); + return trimmed.length > 0 ? trimmed : undefined; +} + +function defaultLevel(): string { + return resolveEnvVar("FOUNDRY_LOG_LEVEL") ?? resolveEnvVar("LOG_LEVEL") ?? resolveEnvVar("RIVET_LOG_LEVEL") ?? "info"; +} + +function isBrowserRuntime(): boolean { + return typeof window !== "undefined" && typeof document !== "undefined"; +} + +export function createFoundryLogger(options: FoundryLoggerOptions): Logger { + const browser = isBrowserRuntime(); + const loggerOptions: LoggerOptions = { + level: options.level ?? defaultLevel(), + base: { + service: options.service, + ...(options.bindings ?? {}), + }, + }; + + if (browser) { + loggerOptions.browser = { + asObject: true, + }; + } else { + loggerOptions.timestamp = pino.stdTimeFunctions.isoTime; + } + + return pino(loggerOptions); +} + +export function createErrorContext(error: unknown): { errorMessage: string; errorStack?: string } { + if (error instanceof Error) { + return { + errorMessage: error.message, + errorStack: error.stack, + }; + } + + return { + errorMessage: String(error), + }; +} diff --git a/foundry/research/friction/rivet.mdx b/foundry/research/friction/rivet.mdx index 878cc26..c9cb8eb 100644 --- a/foundry/research/friction/rivet.mdx +++ b/foundry/research/friction/rivet.mdx @@ -1,5 +1,35 @@ # Rivet Friction Log +## 2026-03-12 - 63df393 + +### What I Was Working On + +Resolving GitHub OAuth callback failures caused by stale actor state after squashing Drizzle migrations. + +### Friction / Issue + +1. **Squashing Drizzle migrations breaks existing actors on Rivet Cloud.** When Drizzle migrations are squashed into a new baseline (`0000_*.sql`), the squashed migration has a different hash/name than the original migrations tracked in each actor's `__drizzle_migrations` journal table. On next wake, Drizzle sees the squashed baseline as a "new" migration and attempts to re-run `CREATE TABLE` statements, which fail because the tables already exist. This silently poisons the actor — RivetKit wraps the migration error as a generic "Internal error" on the action response, making root-cause diagnosis difficult. + +2. **No programmatic way to list or destroy actors on Rivet Cloud without the service key.** The public runner token (`pk_*`) lacks permissions for actor management (list/destroy). The Cloud API token (`cloud_api_*`) in our `.env` was returning "token not found". The actual working token format is the service key (`sk_*`) from the namespace connection URL. This was not documented — the destroy docs reference "admin tokens" which are described as "currently not supported on Rivet Cloud" ([#3530](https://github.com/rivet-dev/rivet/issues/3530)), but the `sk_*` token works. The disconnect between the docs and reality cost significant debugging time. + +3. **Actor errors during `getOrCreate` are opaque.** When the `workspace.completeAppGithubAuth` action triggered `getOrCreate` for org workspace actors, the migration failure inside the newly-woken actor was surfaced as `"Internal error"` with no indication that it was a migration/schema issue. The actual error (`table already exists`) was only visible in actor-level logs, not in the action response or the calling backend's logs. + +### Attempted Fix / Workaround + +1. Initially tried adding `IF NOT EXISTS` to all `CREATE TABLE`/`CREATE UNIQUE INDEX` statements in the squashed baseline migrations. This masked the symptom but violated Drizzle's migration tracking contract — the journal would still be inconsistent. + +2. Reverted the `IF NOT EXISTS` hack and instead destroyed all stale actors via the Rivet Cloud API (`DELETE /actors/{actorId}?namespace={ns}` with the `sk_*` service key). Fresh actors get a clean migration journal matching the squashed baseline. + +### Outcome + +- All 4 stale workspace actors destroyed (3 org workspaces + 1 old v2-prefixed app workspace). +- Reverted `IF NOT EXISTS` migration changes so Drizzle migrations remain standard. +- After redeploy, new actors will be created fresh with the correct squashed migration journal. +- **RivetKit improvement opportunities:** + - Surface migration errors in action responses instead of generic "Internal error". + - Document the `sk_*` service key as the correct token for actor management API calls, or make `cloud_api_*` tokens work. + - Consider a migration reconciliation mode for Drizzle actors that detects "tables exist but journal doesn't match" and adopts the current schema state instead of failing. + ## 2026-02-18 - uncommitted ### What I Was Working On diff --git a/foundry/research/specs/async-action-fixes/01-task-creation-bootstrap-only.md b/foundry/research/specs/async-action-fixes/01-task-creation-bootstrap-only.md new file mode 100644 index 0000000..c8de3ba --- /dev/null +++ b/foundry/research/specs/async-action-fixes/01-task-creation-bootstrap-only.md @@ -0,0 +1,51 @@ +# Task Creation Should Return After Actor Bootstrap + +## Problem + +Task creation currently waits for full provisioning: naming, repo checks, sandbox creation/resume, sandbox-agent install/start, sandbox-instance wiring, and session creation. + +That makes a user-facing action depend on queue-backed and provider-backed work that can take minutes. The client only needs the task actor to exist so it can navigate to the task and observe progress. + +## Target Contract + +- `createTask` returns once the task actor exists and initial task metadata is persisted. +- The response includes the task identity the client needs for follow-up reads and subscriptions. +- Provisioning continues in the background through the task workflow. +- Progress and failure are surfaced through task state, history events, and workbench updates. + +## Proposed Fix + +1. Restore the async split between `initialize` and `provision`. +2. Keep `task.command.initialize` responsible for: + - creating the task actor + - bootstrapping DB rows + - persisting any immediately-known metadata + - returning the current task record +3. After initialize completes, enqueue `task.command.provision` with `wait: false`. +4. Change `workspace.createTask` to: + - create or resolve the project + - create the task actor + - call `task.initialize(...)` + - stop awaiting `task.provision(...)` + - broadcast a workbench/task update + - return the task record immediately +5. Persist a clear queued/running state for provisioning so the frontend can distinguish: + - `init_enqueue_provision` + - `init_ensure_name` + - `init_create_sandbox` + - `init_ensure_agent` + - `init_create_session` + - `running` + - `error` + +## Client Impact + +- Task creation UI should navigate immediately to the task page. +- The page should render a provisioning state from task status instead of treating create as an all-or-nothing spinner. +- Any tab/session creation that depends on provisioning should observe task state and wait for readiness asynchronously. + +## Acceptance Criteria + +- Creating a task never waits on sandbox creation or session creation. +- A timeout in provider setup does not make the original create request fail after several minutes. +- After a backend restart, the task workflow can resume provisioning from durable state without requiring the client to retry create. diff --git a/foundry/research/specs/async-action-fixes/02-repo-overview-from-cached-projection.md b/foundry/research/specs/async-action-fixes/02-repo-overview-from-cached-projection.md new file mode 100644 index 0000000..0c9be29 --- /dev/null +++ b/foundry/research/specs/async-action-fixes/02-repo-overview-from-cached-projection.md @@ -0,0 +1,45 @@ +# Repo Overview Should Read Cached State Only + +## Problem + +Repo overview currently forces PR sync and branch sync inline before returning data. That turns a read path into: + +- repo fetch +- branch enumeration +- diff/conflict calculations +- GitHub PR listing + +The frontend polls repo overview repeatedly, so this design multiplies slow work and ties normal browsing to sync latency. + +## Target Contract + +- `getRepoOverview` returns the latest cached repo projection immediately. +- Sync happens on a background cadence or on an explicit async refresh trigger. +- Overview responses include freshness metadata so the client can show "refreshing" or "stale" state without blocking. + +## Proposed Fix + +1. Remove inline `forceProjectSync()` from `getRepoOverview`. +2. Add freshness fields to the project projection, for example: + - `branchSyncAt` + - `prSyncAt` + - `branchSyncStatus` + - `prSyncStatus` +3. Let the existing polling actors own cache refresh. +4. If the client needs a manual refresh, add a non-blocking command such as `project.requestOverviewRefresh` that: + - enqueues refresh work + - updates sync status to `queued` or `running` + - returns immediately +5. Keep `getRepoOverview` as a pure read over project SQLite state. + +## Client Impact + +- The repo overview screen should render cached rows immediately. +- If the user requests a refresh, the UI should show a background sync indicator instead of waiting for the GET call to complete. +- Polling frequency can be reduced because reads are now cheap and sync is event-driven. + +## Acceptance Criteria + +- `getRepoOverview` does not call `force()` on polling actors. +- Opening the repo overview page does not trigger network/git work inline. +- Slow branch sync or PR sync no longer blocks the page request. diff --git a/foundry/research/specs/async-action-fixes/03-repo-actions-via-background-workflow.md b/foundry/research/specs/async-action-fixes/03-repo-actions-via-background-workflow.md new file mode 100644 index 0000000..ac6a7c3 --- /dev/null +++ b/foundry/research/specs/async-action-fixes/03-repo-actions-via-background-workflow.md @@ -0,0 +1,50 @@ +# Repo Sync And Stack Actions Should Run In Background Workflows + +## Problem + +Repo stack actions currently run inside a synchronous action and surround the action with forced sync before and after. Branch-backed task creation also forces repo sync inline before it can proceed. + +These flows depend on repo/network state and can take minutes. They should not hold an action open. + +## Target Contract + +- Repo-affecting actions are accepted quickly and run in the background. +- The project actor owns a durable action record with progress and final result. +- Clients observe status via project/task state instead of waiting for a single response. + +## Proposed Fix + +1. Introduce a project-level workflow/job model for repo actions, for example: + - `sync_repo` + - `restack_repo` + - `restack_subtree` + - `rebase_branch` + - `reparent_branch` + - `register_existing_branch` +2. Persist a job row with: + - job id + - action kind + - target branch fields + - status + - message + - timestamps +3. Change `runRepoStackAction` to: + - validate cheap local inputs only + - create a job row + - enqueue the workflow with `wait: false` + - return the job id and accepted status immediately +4. Move pre/post sync into the background workflow. +5. For branch-backed task creation: + - use the cached branch projection if present + - if branch data is stale or missing, enqueue branch registration/refresh work and surface pending state instead of blocking create + +## Client Impact + +- Repo action buttons should show queued/running/completed/error job state. +- Task creation from an existing branch may produce a task in a pending branch-attach state rather than blocking on repo sync. + +## Acceptance Criteria + +- No repo stack action waits for full git-spice execution inside the request. +- No action forces branch sync or PR sync inline. +- Action result state survives retries and backend restarts because the workflow status is persisted. diff --git a/foundry/research/specs/async-action-fixes/04-workbench-session-creation-without-inline-provisioning.md b/foundry/research/specs/async-action-fixes/04-workbench-session-creation-without-inline-provisioning.md new file mode 100644 index 0000000..1319be7 --- /dev/null +++ b/foundry/research/specs/async-action-fixes/04-workbench-session-creation-without-inline-provisioning.md @@ -0,0 +1,36 @@ +# Workbench Session Creation Must Not Trigger Inline Provisioning + +## Problem + +Creating a workbench tab currently provisions the whole task if no active sandbox exists. A user action that looks like "open tab" can therefore block on sandbox creation and agent setup. + +## Target Contract + +- Creating a tab returns quickly. +- If the task is not provisioned yet, the tab enters a pending state and becomes usable once provisioning completes. +- Provisioning remains a task workflow concern, not a workbench request concern. + +## Proposed Fix + +1. Split tab creation from sandbox session creation. +2. On `createWorkbenchSession`: + - create session metadata or a placeholder tab row immediately + - if the task is not provisioned, enqueue the required background work and return the placeholder id + - if the task is provisioned, enqueue background session creation if that step can also be slow +3. Add a tab/session state model such as: + - `pending_provision` + - `pending_session_create` + - `ready` + - `error` +4. When provisioning or session creation finishes, update the placeholder row with the real sandbox/session identifiers and notify the workbench. + +## Client Impact + +- The workbench can show a disabled composer or "Preparing environment" state for a pending tab. +- The UI no longer needs to block on the mutation itself. + +## Acceptance Criteria + +- `createWorkbenchSession` never calls task provisioning inline. +- Opening a tab on an unprovisioned task returns promptly with a placeholder tab id. +- The tab transitions to ready through background updates only. diff --git a/foundry/research/specs/async-action-fixes/05-workbench-snapshot-from-derived-state.md b/foundry/research/specs/async-action-fixes/05-workbench-snapshot-from-derived-state.md new file mode 100644 index 0000000..af8734f --- /dev/null +++ b/foundry/research/specs/async-action-fixes/05-workbench-snapshot-from-derived-state.md @@ -0,0 +1,43 @@ +# Workbench Snapshots Should Read Derived State, Not Recompute It + +## Problem + +Workbench snapshot reads currently execute expensive sandbox commands and transcript reads inline: + +- `git status` +- `git diff --numstat` +- one diff per changed file +- file tree enumeration +- transcript reads for each session +- session status lookups + +The remote workbench client refreshes after each action and on update events, so this synchronous snapshot work is amplified. + +## Target Contract + +- `getWorkbench` reads a cached projection only. +- Expensive sandbox- or session-derived data is updated asynchronously and stored in actor-owned tables. +- Detail-heavy payloads are fetched separately when the user actually opens that view. + +## Proposed Fix + +1. Split the current monolithic workbench snapshot into: + - lightweight task/workbench summary + - session transcript endpoint + - file diff endpoint + - file tree endpoint +2. Cache derived git state in SQLite, updated by background jobs or targeted invalidation after mutating actions. +3. Cache transcript/session metadata incrementally from sandbox events instead of reading full transcripts on every snapshot. +4. Keep `getWorkbench` limited to summary fields needed for the main screen. +5. Update the remote workbench client to rely more on push updates and less on immediate full refresh after every mutation. + +## Client Impact + +- Main workbench loads faster and remains responsive with many tasks/files/sessions. +- Heavy panes can show their own loading states when opened. + +## Acceptance Criteria + +- `getWorkbench` does not run per-file diff commands inline. +- `getWorkbench` does not read full transcripts for every tab inline. +- Full workbench refresh cost stays roughly proportional to task count, not task count times changed files times sessions. diff --git a/foundry/research/specs/async-action-fixes/06-daytona-provisioning-staged-background-flow.md b/foundry/research/specs/async-action-fixes/06-daytona-provisioning-staged-background-flow.md new file mode 100644 index 0000000..028e4fd --- /dev/null +++ b/foundry/research/specs/async-action-fixes/06-daytona-provisioning-staged-background-flow.md @@ -0,0 +1,51 @@ +# Daytona Provisioning Should Be A Staged Background Flow + +## Problem + +Daytona provisioning currently performs long-running setup inline: + +- sandbox create/start +- package/tool installation +- repo clone/fetch/checkout +- sandbox-agent install +- agent plugin install +- sandbox-agent boot +- health wait loop + +This is acceptable inside a durable background workflow, but not as part of a user-facing action response. + +## Target Contract + +- Requests that need Daytona resources only wait for persisted actor/job creation. +- Daytona setup progresses through durable stages with explicit status. +- Follow-up work resumes from persisted state after crashes or restarts. + +## Proposed Fix + +1. Introduce a provider-facing staged readiness model, for example: + - `sandbox_allocated` + - `repo_prepared` + - `agent_installing` + - `agent_starting` + - `agent_ready` + - `session_creating` + - `ready` + - `error` +2. Persist stage transitions in task or sandbox-instance state. +3. Keep provider calls inside background workflow steps only. +4. Replace synchronous health-wait loops in request paths with: + - background step execution + - status updates after each step + - follow-up workflow progression once the prior stage completes +5. If sandbox-agent session creation is also slow, treat that as its own stage instead of folding it into request completion. + +## Client Impact + +- Users see staged progress instead of a long spinner. +- Failures point to a concrete stage, which makes retries and debugging much easier. + +## Acceptance Criteria + +- No user-facing request waits for Daytona package installs, repo clone, sandbox-agent installation, or health polling. +- Progress survives backend restarts because the stage is persisted. +- The system can resume from the last completed stage instead of replaying the whole provisioning path blindly. diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 01fdb81..9e2854b 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -523,6 +523,9 @@ importers: foundry/packages/desktop: dependencies: + '@sandbox-agent/foundry-shared': + specifier: workspace:* + version: link:../shared '@tauri-apps/api': specifier: ^2 version: 2.10.1 @@ -619,6 +622,9 @@ importers: foundry/packages/shared: dependencies: + pino: + specifier: ^10.3.1 + version: 10.3.1 zod: specifier: ^4.1.5 version: 4.3.6 diff --git a/research/acp/friction.md b/research/acp/friction.md index b7f92f5..983b966 100644 --- a/research/acp/friction.md +++ b/research/acp/friction.md @@ -218,6 +218,16 @@ Update this file continuously during the migration. - Status: resolved - Links: `server/packages/sandbox-agent/src/router.rs`, `server/packages/sandbox-agent/src/acp_runtime/mod.rs`, `server/packages/sandbox-agent/tests/v1_api/acp_transport.rs`, `docs/advanced/acp-http-client.mdx` +- Date: 2026-03-13 +- Area: Actor runtime shutdown and draining +- Issue: Actors can continue receiving or finishing action work after shutdown has started, while actor cleanup clears runtime resources such as the database handle. In RivetKit this can surface as `Database not enabled` from `c.db` even when the actor definition correctly includes `db`. +- Impact: User requests can fail with misleading internal errors during runner eviction or shutdown, and long-lived request paths can bubble up as HTTP 502/timeout failures instead of a clear retryable stopping/draining signal. +- Proposed direction: Add a real runner draining state so actors stop receiving traffic before shutdown, and ensure actor cleanup does not clear `#db` until in-flight actions are fully quiesced or aborted. App-side request paths should also avoid waiting inline on long actor workflows when possible. +- Decision: Open. +- Owner: Unassigned. +- Status: open +- Links: `foundry/packages/backend/src/actors/workspace/app-shell.ts`, `/Users/nathan/rivet/rivetkit-typescript/packages/rivetkit/src/actor/instance/mod.ts`, `/Users/nathan/rivet/rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts` + - Date: 2026-03-12 - Area: Foundry RivetKit serverless routing on Railway - Issue: Moving Foundry from `/api/rivet` to `/v1/rivet` exposed three RivetKit deployment couplings: `serverless.basePath` had to be updated explicitly for metadata/start routes, `configureRunnerPool` could not be used in production because the current Rivet token lacked permission to list datacenters, and wrapping `registry.handler(c.req.raw)` inside Hono route handlers produced unstable serverless runner startup under Railway until `/v1/rivet` was dispatched directly from `Bun.serve`.