sandbox-agent

381 commits 182 branches 37 tags 106 MiB

Author	SHA1	Message	Date
Nathan Flurry	e7b9ac6854	fix(foundry): move Better Auth operations from queues to actions to fix production auth timeout The org actor's workflow queue is shared with GitHub sync, webhooks, task mutations, and billing (20+ queue names processed sequentially). During OAuth callback, auth operations would time out waiting behind long-running queue handlers, causing Better Auth's parseState to redirect to ?error=please_restart_the_process. Auth operations are simple SQLite reads/writes with no cross-actor side effects, so they are safe to run as actions that execute immediately without competing in the queue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 21:26:13 -07:00
Nathan Flurry	a171956298	feat(foundry): revert actions to queue/workflow pattern with direct sends Revert actor communication from direct action calls to queue/workflow-based patterns for better observability (workflow history in RivetKit inspector), replay/recovery semantics, and idiomatic RivetKit usage. - Add queue/workflow infrastructure to all actors: organization, task, user, github-data, sandbox, and audit-log - Mutations route through named queues processed by workflow command loops with ctx.step() wrapping for c.state/c.db access and observability - Remove command action wrappers (~460 lines) — callers use .send() directly to queue names with expectQueueResponse() for wait:true results - Keep sendPrompt and runProcess as direct sandbox actions (long-running / large responses that would block the workflow loop or exceed 128KB limit) - Fix workspace fire-and-forget calls (enqueueWorkspaceEnsureSession, enqueueWorkspaceRefresh) to self-send to task queue instead of calling directly outside workflow step context Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 18:46:53 -07:00
Nathan Flurry	f45a467484	chore(foundry): migrate to actions (#262 ) * feat(foundry): checkpoint actor and workspace refactor * docs(foundry): add agent handoff context * wip(foundry): continue actor refactor * wip(foundry): capture remaining local changes * Complete Foundry refactor checklist * Fix Foundry validation fallout * wip * wip: convert all actors from workflow to plain run handlers Workaround for RivetKit bug where c.queue.iter() never yields messages for actors created via getOrCreate from another actor's context. The queue accepts messages (visible in inspector) but the iterator hangs. Sleep/wake fixes it, but actors with active connections never sleep. Converted organization, github-data, task, and user actors from run: workflow(...) to plain run: async (c) => { for await ... }. Also fixes: - Missing auth tables in org migration (auth_verification etc) - default_model NOT NULL constraint on org profile upsert - Nested workflow step in github-data (HistoryDivergedError) - Removed --force from frontend Dockerfile pnpm install Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Convert all actors from queues/workflows to direct actions, lazy task creation Major refactor replacing all queue-based workflow communication with direct RivetKit action calls across all actors. This works around a RivetKit bug where c.queue.iter() deadlocks for actors created from another actor's context. Key changes: - All actors (organization, task, user, audit-log, github-data) converted from run: workflow(...) to actions-only (no run handler, no queues) - PR sync creates virtual task entries in org local DB instead of spawning task actors — prevents OOM from 200+ actors created simultaneously - Task actors created lazily on first user interaction via getOrCreate, self-initialize from org's getTaskIndexEntry data - Removed requireRepoExists cross-actor call (caused 500s), replaced with local resolveTaskRepoId from org's taskIndex table - Fixed getOrganizationContext to thread overrides through all sync phases - Fixed sandbox repo path (/home/user/repo for E2B compatibility) - Fixed buildSessionDetail to skip transcript fetch for pending sessions - Added process crash protection (uncaughtException/unhandledRejection) - Fixed React infinite render loop in mock-layout useEffect dependencies - Added sandbox listProcesses error handling for expired E2B sandboxes - Set E2B sandbox timeout to 1 hour (was 5 min default) - Updated CLAUDE.md with lazy task creation rules, no-silent-catch policy, React hook dependency safety rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix E2B sandbox timeout comment, frontend stability, and create-flow improvements - Add TEMPORARY comment on E2B timeoutMs with pointer to rivetkit sandbox resilience proposal for when autoPause lands - Fix React useEffect dependency stability in mock-layout and organization-dashboard to prevent infinite re-render loops - Fix terminal-pane ref handling - Improve create-flow service and tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 15:23:59 -07:00
Nathan Flurry	57a07f6a0a	wip (#256 )	2026-03-14 23:47:43 -07:00
Nathan Flurry	99abb9d42e	chore(foundry): workbench action responsiveness (#254 ) * wip * wip	2026-03-14 20:42:18 -07:00
Nathan Flurry	ae191d1ae1	Refactor Foundry GitHub state and sandbox runtime (#247 ) * Move Foundry HTTP APIs out of /api/rivet * Move Foundry HTTP APIs onto /v1 * Fix Foundry Rivet base path and frontend endpoint fallback * Configure Foundry Rivet runner pool for /v1 * Remove Foundry Rivet runner override * Serve Foundry Rivet routes directly from Bun * Log Foundry RivetKit deployment friction * Add actor display metadata * Tighten actor schema constraints * Reset actor persistence baseline * Remove temporary actor key version prefix Railway has no persistent volumes so stale actors are wiped on each deploy. The v2 key rotation is no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Cache app workspace actor handle across requests Every request was calling getOrCreate on the Rivet engine API to resolve the workspace actor, even though it's always the same actor. Cache the handle and invalidate on error so retries re-resolve. This eliminates redundant cross-region round-trips to api.rivet.dev on every request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add temporary debug logging to GitHub OAuth exchange Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Make squashed baseline migrations idempotent Use CREATE TABLE IF NOT EXISTS and CREATE UNIQUE INDEX IF NOT EXISTS so the squashed baseline can run against actors that already have tables from the pre-squash migration sequence. This fixes the "table already exists" error when org workspace actors wake up with stale migration journals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "Make squashed baseline migrations idempotent" This reverts commit `356c146035`. * Fix GitHub OAuth callback by removing retry wrapper OAuth authorization codes are single-use. The appWorkspaceAction wrapper retries failed calls up to 20 times, but if the code exchange succeeds and a later step fails, every retry sends the already-consumed code, producing "bad_verification_code" from GitHub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add runner versioning to RivetKit registry Uses Date.now() so each process start gets a unique version. This ensures Rivet Cloud migrates actors to the new runner on deploy instead of routing requests to stale runners. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add backend request and workspace logging * Log callback request headers * Make GitHub OAuth callback idempotent against duplicate requests Clear oauthState before exchangeCode so duplicate callback requests fail the state check instead of hitting GitHub with a consumed code. Marked as HACK — root cause of duplicate HTTP requests is unknown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add temporary header dump on GitHub OAuth callback Log all request headers on the callback endpoint to diagnose the source of duplicate requests (Railway proxy, Cloudflare, browser). Remove once root cause is identified. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Defer slow GitHub org sync to workflow queue for fast OAuth callback Split syncGithubSessionFromToken into a fast path (initGithubSession: exchange code, get viewer, store token+identity) and a slow path (syncGithubOrganizations: list orgs/installations, sync workspaces). completeAppGithubAuth now returns the 302 redirect in ~2s instead of ~18s by enqueuing the org sync to the workspace workflow queue (fire-and-forget). This eliminates the proxy timeout window that was causing duplicate callback requests. bootstrapAppGithubSession (dev-only) still calls the full synchronous sync since proxy timeouts are not a concern and it needs the session fully populated before returning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * foundry: async app repo import on org select * foundry: parallelize app snapshot org reads * repo: push all current workspace changes * foundry: update runner version and snapshot logging * Refactor Foundry GitHub state and sandbox runtime Refactors Foundry around organization/repository ownership and adds an organization-scoped GitHub state actor plus a user-scoped GitHub auth actor, removing the old project PR/branch sync actors and repo PR cache. Updates sandbox provisioning to rely on sandbox-agent for in-sandbox work, hardens Daytona startup and image-build behavior, and surfaces runtime and task-startup errors more clearly in the UI. Extends workbench and GitHub state handling to track merged PR state, adds runtime-issue tracking, refreshes client/test/config wiring, and documents the main live Foundry test flow plus actor coordination rules. Also updates the remaining Sandbox Agent install-version references in docs/examples to the current pinned minor channel. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 02:45:07 -07:00
Nathan Flurry	d75e8c31d1	Rename Foundry handoffs to tasks (#239 ) * Restore foundry onboarding stack * Consolidate foundry rename * Create foundry tasks without prompts * Rename Foundry handoffs to tasks	2026-03-11 13:23:54 -07:00

Renamed from factory/packages/backend/CLAUDE.md (Browse further)

7 commits