mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-15 14:03:52 +00:00

Fix Foundry UI bugs: org names, sessions, and repo selection (#250 )

* Fix Foundry auth: migrate to Better Auth adapter, fix access token retrieval

- Remove @ts-nocheck from better-auth.ts, auth-user/index.ts, app-shell.ts
  and fix all type errors
- Fix getAccessTokenForSession: read GitHub token directly from account
  record instead of calling Better Auth's internal /get-access-token
  endpoint which returns 403 on server-side calls
- Re-implement workspaceAuth helper functions (workspaceAuthColumn,
  normalizeAuthValue, workspaceAuthClause, workspaceAuthWhere) that were
  accidentally deleted
- Remove all retry logic (withRetries, isRetryableAppActorError)
- Implement CORS origin allowlist from configured environment
- Document cachedAppWorkspace singleton pattern
- Add inline org sync fallback in buildAppSnapshot for post-OAuth flow
- Add no-retry rule to CLAUDE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Foundry dev panel from fix-git-data branch

Port the dev panel component that was left out when PR #243 was replaced
by PR #247. Adapted to remove runtime/mock-debug references that don't
exist on the current branch.

- Toggle with Shift+D, persists visibility to localStorage
- Shows context, session, GitHub sync status sections
- Dev-only (import.meta.env.DEV)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add full Docker image defaults, fix actor deadlocks, and improve dev experience

- Add Dockerfile.full and --all flag to install-agent CLI for pre-built images
- Centralize Docker image constant (FULL_IMAGE) pinned to 0.3.1-full
- Remove examples/shared/Dockerfile{,.dev} and daytona snapshot example
- Expand Docker docs with full runnable Dockerfile
- Fix self-deadlock in createWorkbenchSession (fire-and-forget provisioning)
- Audit and convert 12 task actions from wait:true to wait:false
- Add bun --hot for dev backend hot reload
- Remove --force from pnpm install in dev Dockerfile for faster startup
- Add env_file support to compose.dev.yaml for automatic credential loading
- Add mock frontend compose config and dev panel
- Update CLAUDE.md with wait:true policy and dev environment setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* WIP: async action fixes and interest manager

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix Foundry UI bugs: org names, hanging sessions, and wrong repo creation

- Fix org display name using GitHub description instead of name field
- Fix createWorkbenchSession hanging when sandbox is provisioning
- Fix auto-session creation retry storm on errors
- Fix task creation using wrong repo due to React state race conditions
- Remove Bun hot-reload from backend Dockerfile (causes port drift)
- Add GitHub sync/install status to dev panel

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-13 20:48:22 -07:00

22 KiB

Raw Blame History

Project Instructions

Language Policy

Use TypeScript for all source code.

Never add raw JavaScript source files (.js, .mjs, .cjs).
Prefer .ts/.tsx for runtime code, scripts, tests, and tooling.
If touching old JavaScript, migrate it to TypeScript instead of extending it.

Monorepo + Tooling

Use pnpm workspaces and Turborepo.

Workspace root uses pnpm-workspace.yaml and turbo.json.
Packages live in packages/*.
core is renamed to shared.
packages/cli is disabled and excluded from active workspace validation.
Integrations and providers live under packages/backend/src/{integrations,providers}.

CLI Status

packages/cli is fully disabled for active development.
Do not implement new behavior in packages/cli unless explicitly requested.
Frontend is the primary product surface; prioritize packages/frontend + supporting packages/client/packages/backend.
Workspace build, typecheck, and test intentionally exclude @sandbox-agent/foundry-cli.
pnpm-workspace.yaml excludes packages/cli from workspace package resolution.

Common Commands

Foundry is the canonical name for this product tree. Do not introduce or preserve legacy pre-Foundry naming in code, docs, commands, or runtime paths.
Install deps: pnpm install
Full active-workspace validation: pnpm -w typecheck, pnpm -w build, pnpm -w test
Start the full dev stack (real backend + frontend): just foundry-dev — frontend on port 4173, backend on port 7741 (Docker via compose.dev.yaml)
Start the mock frontend stack (no backend): just foundry-mock — mock frontend on port 4174 (Docker via compose.mock.yaml)
Start the local production-build preview stack: just foundry-preview
Start only the backend locally: just foundry-backend-start
Start only the frontend locally: pnpm --filter @sandbox-agent/foundry-frontend dev
Start the mock frontend locally (no Docker): just foundry-dev-mock — mock frontend on port 4174
Dev and mock stacks can run simultaneously on different ports (4173 and 4174).
Stop the compose dev stack: just foundry-dev-down
Tail compose dev logs: just foundry-dev-logs
Stop the mock stack: just foundry-mock-down
Tail mock logs: just foundry-mock-logs
Stop the preview stack: just foundry-preview-down
Tail preview logs: just foundry-preview-logs

Dev Environment Setup

compose.dev.yaml loads foundry/.env (optional) for credentials needed by the backend (GitHub OAuth, Stripe, Daytona, API keys, etc.).
The canonical source for these credentials is ~/misc/the-foundry.env. If foundry/.env does not exist, copy it: cp ~/misc/the-foundry.env foundry/.env
foundry/.env is gitignored and must never be committed.
The backend does not hot reload. Bun's --hot flag causes the server to re-bind on a different port (e.g. 6421 instead of 6420), breaking all client connections while the container still exposes the original port. After backend code changes, restart the backend container: just foundry-dev-down && just foundry-dev.

Railway Logs

Production Foundry Railway logs can be read from a linked workspace with railway logs --deployment --lines 200 or railway logs <deployment-id> --deployment --lines 200.
Production deploys should go through git push to the deployment branch/workflow. Do not use railway up for Foundry deploys.
If Railway logs fail because the workspace is not linked to the correct project/service/environment, run: railway link --project 33e3e2df-32c5-41c5-a4af-dca8654acb1d --environment cf387142-61fd-4668-8cf7-b3559e0983cb --service 91c7e450-d6d2-481a-b2a4-0a916f4160fc
That links this directory to the sandbox-agent project, production environment, and foundry-api service.
Production proxy chain: api.sandboxagent.dev routes through Cloudflare → Fastly/Varnish → Railway. When debugging request duplication, timeouts, or retry behavior, check headers like cf-ray, x-varnish, x-railway-edge, and cdn-loop to identify which layer is involved.

Frontend + Client Boundary

Keep a browser-friendly GUI implementation aligned with the TUI interaction model wherever possible.
Do not import rivetkit directly in CLI or GUI packages. RivetKit client access must stay isolated inside packages/client.
All backend interaction (actor calls, metadata/health checks, backend HTTP endpoint access) must go through the dedicated client library in packages/client.
Outside packages/client, do not call backend endpoints directly (for example fetch(.../v1/rivet...)), except in black-box E2E tests that intentionally exercise raw transport behavior.
GUI state should update in realtime (no manual refresh buttons). Prefer RivetKit push reactivity and actor-driven events; do not add polling/refetch for normal product flows.
Keep the mock workbench types and mock client in packages/shared + packages/client up to date with the frontend contract. The mock is the UI testing reference implementation while backend functionality catches up.
Keep frontend route/state coverage current in code and tests; there is no separate page-inventory doc to maintain.
If Foundry uses a shared component from @sandbox-agent/react, make changes in sdks/react instead of copying or forking that component into Foundry.
When changing shared React components in sdks/react for Foundry, verify they still work in the Sandbox Agent Inspector before finishing.
When making UI changes, verify the live flow with agent-browser, take screenshots of the updated UI, and offer to open those screenshots in Preview when you finish.
When asked for screenshots, capture all relevant affected screens and modal states, not just a single viewport. Include empty, populated, success, and blocked/error states when they are part of the changed flow.
If a screenshot catches a transition frame, blank modal, or otherwise misleading state, retake it before reporting it.

Realtime Data Architecture

All client data flows follow the same pattern:

Connect to the actor via WebSocket.
Fetch initial state via an action call to get the current materialized snapshot.
Subscribe to events on the connection. Events carry full replacement payloads for the changed entity (not empty notifications, not patches — the complete new state of the thing that changed).
Unsubscribe after a 30-second grace period when interest ends (screen navigation, component unmount). The grace period prevents thrashing during screen transitions and React double-renders.

Do not use polling (refetchInterval), empty "go re-fetch" broadcast events, or full-snapshot re-fetches on every mutation. Every mutation broadcasts the new absolute state of the changed entity to connected clients.

Materialized state in coordinator actors

Workspace actor materializes sidebar-level data in its own SQLite: repo catalog, task summaries (title, status, branch, PR, updatedAt), repo summaries (overview/branch state), and session summaries (id, name, status, unread, model — no transcript). Task actors push summary changes to the workspace actor when they mutate. The workspace actor broadcasts the updated entity to connected clients. getWorkspaceSummary reads from local tables only — no fan-out to child actors.
Task actor materializes its own detail state (session summaries, sandbox info, diffs, file tree). getTaskDetail reads from the task actor's own SQLite. The task actor broadcasts updates directly to clients connected to it.
Session data lives on the task actor but is a separate subscription topic. The task topic includes sessions_summary (list without content). The session topic provides full transcript and draft state. Clients subscribe to the session topic for whichever session tab is active, and filter sessionUpdated events by session ID (ignoring events for other sessions on the same actor).
The expensive fan-out (querying every project/task actor) only exists as a background reconciliation/rebuild path, never on the hot read path.

Interest manager

The interest manager (packages/client) is a global singleton that manages WebSocket connections, cached state, and subscriptions for all topics. It:

Deduplicates — multiple subscribers to the same topic share one connection and one cached state.
Grace period (30s) — when the last subscriber leaves, the connection and state stay alive for 30 seconds before teardown. This keeps data warm for back-navigation and prevents thrashing.
Exposes a single hook — useInterest(topicKey, params) returns { data, status, error }. Null params = no subscription (conditional interest).
Shared harness, separate implementations — the InterestManager interface is shared between mock and remote implementations. The mock implementation uses in-memory state. The remote implementation uses WebSocket connections. The API/client exposure is identical for both.

Topics

Each topic maps to one actor connection and one event stream:

Topic	Actor	Event	Data
`app`	Workspace `"app"`	`appUpdated`	Auth, orgs, onboarding
`workspace`	Workspace `{workspaceId}`	`workspaceUpdated`	Repo catalog, task summaries, repo summaries
`task`	Task `{workspaceId, repoId, taskId}`	`taskUpdated`	Session summaries, sandbox info, diffs, file tree
`session`	Task `{workspaceId, repoId, taskId}` (filtered by sessionId)	`sessionUpdated`	Transcript, draft state
`sandboxProcesses`	SandboxInstance	`processesUpdated`	Process list

The client subscribes to app always, workspace when entering a workspace, task when viewing a task, and session when viewing a specific session tab. At most 4 actor connections at a time (app + workspace + task + sandbox if terminal is open). The session topic reuses the task actor connection and filters by session ID.

Rules

Do not add useQuery with refetchInterval for data that should be push-based.
Do not broadcast empty notification events. Events must carry the full new state of the changed entity.
Do not re-fetch full snapshots after mutations. The mutation triggers a server-side broadcast with the new entity state; the client replaces it in local state.
All event subscriptions go through the interest manager. Do not create ad-hoc handle.connect() + conn.on() patterns.
Backend mutations that affect sidebar data (task title, status, branch, PR state) must push the updated summary to the parent workspace actor, which broadcasts to workspace subscribers.
Comment architecture-related code: add doc comments explaining the materialized state pattern, why deltas flow the way they do, and the relationship between parent/child actor broadcasts. New contributors should understand the data flow from comments alone.

UI System

Foundry's base UI system is BaseUI with Styletron, plus Foundry-specific theme/tokens on top. Treat that as the default UI foundation.
The full BaseUI reference for available components and guidance on animations, customization, composition, and forms is at https://base-ui.com/llms.txt.
Prefer existing BaseUI components and composition patterns whenever possible instead of building custom controls from scratch.
Reuse the established Foundry theme/token layer for colors, typography, spacing, and surfaces instead of introducing ad hoc visual values.
If the same UI pattern is shared with the Inspector or other consumers, prefer extracting or reusing it through @sandbox-agent/react rather than duplicating it in Foundry.
If a requested UI cannot be implemented cleanly with an existing BaseUI component, stop and ask the user whether they are sure they want to diverge from the system.
In that case, recommend the closest existing BaseUI components or compositions that could satisfy the need before proposing custom UI work.
Only introduce custom UI primitives when BaseUI and existing Foundry patterns are not sufficient, or when the user explicitly confirms they want the divergence.

Runtime Policy

Runtime is Bun-native.
Use Bun for CLI/backend execution paths and process spawning.
Do not add Node compatibility fallbacks for OpenTUI/runtime execution.

Defensive Error Handling

Write code defensively: validate assumptions at boundaries and state transitions.
If the system reaches an unexpected state, raise an explicit error with actionable context.
Do not fail silently, swallow errors, or auto-ignore inconsistent data.
Prefer fail-fast behavior over hidden degradation when correctness is uncertain.

RivetKit Dependency Policy

For all Rivet/RivetKit implementation:

Use SQLite + Drizzle for persistent state.
SQLite is per actor instance (per actor key), not a shared backend-global database:
- Each actor instance gets its own SQLite DB.
- Schema design should assume a single actor instance owns the entire DB.
- Do not add workspaceId/repoId/taskId columns just to "namespace" rows for a given actor instance; use actor state and/or the actor key instead.
- Example: the task actor instance already represents (workspaceId, repoId, taskId), so its SQLite tables should not need those columns for primary keys.
Do not use backend-global SQLite singletons; database access must go through actor db providers (c.db).
The default dependency source for RivetKit is the published rivetkit package so workspace installs and CI remain self-contained.
When working on coordinated RivetKit changes, you may temporarily relink to a local checkout instead of the published package.
- Dedicated local checkout for this workspace: /Users/nathan/conductor/workspaces/task/rivet-checkout
- Preferred local link target: ../rivet-checkout/rivetkit-typescript/packages/rivetkit
- Sub-packages (@rivetkit/sqlite-vfs, etc.) resolve transitively from the RivetKit workspace when using the local checkout.

Before using a local checkout, build RivetKit in the rivet repo:

cd ../rivet-checkout/rivetkit-typescript
pnpm install
pnpm build -F rivetkit

Rivet Routing

Mount RivetKit directly on /v1/rivet via registry.handler(c.req.raw).
Do not add an extra proxy or manager-specific route layer in the backend.
Let RivetKit own metadata/public endpoint behavior for /v1/rivet.

Workspace + Actor Rules

Everything is scoped to a workspace.
Workspace resolution order: --workspace flag -> config default -> "default".
ControlPlaneActor is replaced by WorkspaceActor (workspace coordinator).
Every actor key must be prefixed with workspace namespace (["ws", workspaceId, ...]).
CLI/TUI/GUI must use @sandbox-agent/foundry-client (packages/client) for backend access; rivetkit/client imports are only allowed inside packages/client.
Do not add custom backend REST endpoints (no /v1/* shim layer).
We own the sandbox-agent project; treat sandbox-agent defects as first-party bugs and fix them instead of working around them.
Keep strict single-writer ownership: each table/row has exactly one actor writer.
Parent actors (workspace, project, task, history, sandbox-instance) use command-only loops with no timeout.
Periodic syncing lives in dedicated child actors with one timeout cadence each.
Do not build blocking flows that wait on external systems to become ready or complete. Prefer push-based progression driven by actor messages, events, webhooks, or queue/workflow state changes.
Use workflows/background commands for any repo sync, sandbox provisioning, agent install, branch restack/rebase, or other multi-step external work. Do not keep user-facing actions/requests open while that work runs.
send policy: always await the send(...) call itself so enqueue failures surface immediately, but default to wait: false.
Only use send(..., { wait: true }) for short, bounded local mutations (e.g. a DB write that returns a result the caller needs). Never use wait: true for operations that depend on external readiness, polling actors, provider setup, repo/network I/O, sandbox sessions, GitHub API calls, or long-running queue drains.
Never self-send with wait: true from inside a workflow handler — the workflow processes one message at a time, so the handler would deadlock waiting for the new message to be dequeued.
When an action is void-returning and triggers external work, use wait: false and let the UI react to state changes pushed by the workflow.
Request/action contract: wait only until the minimum resource needed for the client's next step exists. Example: task creation may wait for task actor creation/identity, but not for sandbox provisioning or session bootstrap.
Read paths must not force refresh/sync work inline. Serve the latest cached projection, mark staleness explicitly, and trigger background refresh separately when needed.
If a workflow needs to resume after some external work completes, model that as workflow state plus follow-up messages/events instead of holding the original request open.
No retries: never add retry loops (withRetries, setTimeout retry, exponential backoff) anywhere in the codebase. If an operation fails, surface the error immediately. If a dependency is not ready yet, model that explicitly with workflow state and resume from a push/event instead of polling or retry loops.
Actor handle policy:
Prefer explicit get or explicit create based on workflow intent; do not default to getOrCreate.
Use get/getForId when the actor is expected to already exist; if missing, surface an explicit Actor not found error with recovery context.
Use create semantics only on explicit provisioning/create paths where creating a new actor instance is intended.
getOrCreate is a last resort for create paths when an explicit create API is unavailable; never use it in read/command paths.
For long-lived cross-actor links (for example sandbox/session runtime access), persist actor identity (actorId) and keep a fallback lookup path by actor id.
Docker dev: compose.dev.yaml mounts a named volume at /root/.local/share/foundry/repos to persist backend-managed git clones across restarts. Code must still work if this volume is not present (create directories as needed).
RivetKit actor c.state is durable, but in Docker it is stored under /root/.local/share/rivetkit. If that path is not persisted, actor state-derived indexes (for example, in project actor state) can be lost after container recreation even when other data still exists.
Workflow history divergence policy:
Production: never auto-delete actor state to resolve HistoryDivergedError; ship explicit workflow migrations (ctx.removed(...), step compatibility).
Development: manual local state reset is allowed as an operator recovery path when migrations are not yet available.
Storage rule of thumb:
Put simple metadata in c.state (KV state): small scalars and identifiers like { taskId }, { repoId }, booleans, counters, timestamps, status strings.
If it grows beyond trivial (arrays, maps, histories, query/filter needs, relational consistency), use SQLite + Drizzle in c.db.

Testing Policy

Never use vitest mocks (vi.mock, vi.spyOn, vi.fn). Instead, define driver interfaces for external I/O and pass test implementations via the actor runtime context.
All external service calls (git CLI, GitHub CLI, sandbox-agent HTTP, tmux) must go through the BackendDriver interface on the runtime context.
Integration tests use setupTest() from rivetkit/test and are gated behind HF_ENABLE_ACTOR_INTEGRATION_TESTS=1.
End-to-end testing must run against the dev backend started via docker compose -f compose.dev.yaml up (host -> container). Do not run E2E against an in-process test runtime.
- E2E tests should talk to the backend over HTTP (default http://127.0.0.1:7741/v1/rivet) and use real GitHub repos/PRs.
- For Foundry live verification, use rivet-dev/sandbox-agent-testing as the default testing repo unless the task explicitly says otherwise.
- Secrets (e.g. OPENAI_API_KEY, GITHUB_TOKEN/GH_TOKEN) must be provided via environment variables, never hardcoded in the repo.
- ~/misc/env.txt and ~/misc/the-foundry.env contain the expected local OpenAI + GitHub OAuth/App config for dev.
- Do not assume gh auth token is sufficient for Foundry task provisioning against private repos. Sandbox/bootstrap git clone, push, and PR flows require a repo-capable GITHUB_TOKEN/GH_TOKEN in the backend container.
- Preferred product behavior for org workspaces is to mint a GitHub App installation token from the workspace installation and inject it into backend/sandbox git operations. Do not rely on an operator's ambient CLI auth as the long-term solution.
Treat client E2E tests in packages/client/test as the primary end-to-end source of truth for product behavior.
Keep backend tests small and targeted. Only retain backend-only tests for invariants or persistence rules that are not well-covered through client E2E.
Do not keep large browser E2E suites around in a broken state. If a frontend browser E2E is not maintained and producing signal, remove it until it can be replaced with a reliable test.

Config

Keep config path at ~/.config/foundry/config.toml.
Evolve properties in place; do not move config location.

Project Guidance

Project-specific guidance lives in README.md, CONTRIBUTING.md, and the relevant files under research/.

Keep those updated when:

Commands change
Configuration options change
Architecture changes
Plugins/providers change
Actor ownership changes

Friction Logs

Track friction at:

research/friction/rivet.mdx
research/friction/sandbox-agent.mdx
research/friction/sandboxes.mdx
research/friction/general.mdx

Category mapping:

rivet: Rivet/RivetKit runtime, actor model, queues, keys
sandbox-agent: sandbox-agent SDK/API behavior
sandboxes: provider implementations (worktree/daytona/etc)
general: everything else

Each entry must include:

Date (YYYY-MM-DD)
Commit SHA (or uncommitted)
What you were implementing
Friction/issue
Attempted fix/workaround and outcome

History Events

Log notable workflow changes to events so hf history remains complete:

create
attach
push/sync/merge
archive/kill
status transitions
PR state transitions

Validation After Changes

Always run and fix failures:

pnpm -w typecheck
pnpm -w build
pnpm -w test

After making code changes, always update the dev server before declaring the work complete. If the dev stack is running through Docker Compose, restart or recreate the relevant dev services so the running app reflects the latest code.

22 KiB Raw Blame History