mirror of https://github.com/harivansh-afk/sandbox-agent.git synced 2026-04-20 13:02:18 +00:00

Nathan Flurry 29e5821fef wip: convert all actors from workflow to plain run handlers

Workaround for RivetKit bug where c.queue.iter() never yields messages
for actors created via getOrCreate from another actor's context. The
queue accepts messages (visible in inspector) but the iterator hangs.
Sleep/wake fixes it, but actors with active connections never sleep.

Converted organization, github-data, task, and user actors from
run: workflow(...) to plain run: async (c) => { for await ... }.

Also fixes:
- Missing auth tables in org migration (auth_verification etc)
- default_model NOT NULL constraint on org profile upsert
- Nested workflow step in github-data (HistoryDivergedError)
- Removed --force from frontend Dockerfile pnpm install

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-16 14:22:37 -07:00

9.8 KiB

Raw Blame History

Backend Notes

Actor Hierarchy

Keep the backend actor tree aligned with this shape unless we explicitly decide to change it:

OrganizationActor (direct coordinator for tasks)
├─ AuditLogActor (organization-scoped global feed)
├─ GithubDataActor
├─ TaskActor(task)
│  ├─ taskSessions      → session metadata/transcripts
│  └─ taskSandboxes     → sandbox instance index
└─ SandboxInstanceActor(sandboxProviderId, sandboxId) × N

Coordinator Pattern

Actors follow a coordinator pattern where each coordinator is responsible for:

Index tables — keeping a local SQLite index/summary of its child actors' data
Create/destroy — handling lifecycle of child actors
Routing — resolving lookups to the correct child actor

Children push updates up to their direct coordinator only. Coordinators broadcast changes to connected clients. This keeps the read path local (no fan-out to children).

Coordinator hierarchy and index tables

OrganizationActor (coordinator for tasks + auth users)
│
│  Index tables:
│  ├─ taskIndex          → TaskActor index (taskId → repoId + branchName)
│  ├─ taskSummaries      → TaskActor materialized sidebar projection
│  ├─ authSessionIndex   → UserActor index (session token → userId)
│  ├─ authEmailIndex     → UserActor index (email → userId)
│  └─ authAccountIndex   → UserActor index (OAuth account → userId)
│
├─ TaskActor (coordinator for sessions + sandboxes)
│  │
│  │  Index tables:
│  │  ├─ taskWorkspaceSessions → Session index (session metadata + transcript)
│  │  └─ taskSandboxes         → SandboxInstanceActor index (sandbox history)
│  │
│  └─ SandboxInstanceActor (leaf)
│
├─ AuditLogActor (organization-scoped audit log, not a coordinator)
└─ GithubDataActor (GitHub API cache, not a coordinator)

When adding a new index table, annotate it in the schema file with a doc comment identifying it as a coordinator index and which child actor it indexes (see existing examples).

Ownership Rules

OrganizationActor is the organization coordinator, direct coordinator for tasks, and lookup/index owner. It owns the task index, task summaries, and repo catalog.
AuditLogActor is organization-scoped. There is one organization-level audit log feed.
TaskActor is one branch. Treat 1 task = 1 branch once branch assignment is finalized.
TaskActor can have many sessions.
TaskActor can reference many sandbox instances historically, but should have only one active sandbox/session at a time.
Session unread state and draft prompts are backend-owned workspace state, not frontend-local state.
Branch names are immutable after task creation. Do not implement branch-rename flows.
SandboxInstanceActor stays separate from TaskActor; tasks/sessions reference it by identity.
The backend stores no local git state. No clones, no refs, no working trees, and no git-spice. Repository metadata comes from GitHub API data and webhook events. Any working-tree git operation runs inside a sandbox via executeInSandbox().
When a backend request path must aggregate multiple independent actor calls or reads, prefer bounded parallelism over sequential fan-out when correctness permits. Do not serialize independent work by default.
Only a coordinator creates/destroys its children. Do not create child actors from outside the coordinator.
Children push state changes up to their direct coordinator only. Task actors push summary updates directly to the organization actor.
Read paths must use the coordinator's local index tables. Do not fan out to child actors on the hot read path.
Never build "enriched" read actions that chain through multiple actors (e.g., coordinator → child actor → sibling actor). If data from multiple actors is needed for a read, it should already be materialized in the coordinator's index tables via push updates. If it's not there, fix the write path to push it — do not add a fan-out read path.

Drizzle Migration Maintenance

After changing any actor's db/schema.ts, you must regenerate the corresponding migration so the runtime creates the tables that match the schema. Forgetting this step causes no such table errors at runtime.

Generate a new drizzle migration. Run from packages/backend:
```
npx drizzle-kit generate --config=./src/actors/<actor>/db/drizzle.config.ts
```
If the interactive prompt is unavailable (e.g. in a non-TTY), manually create a new .sql file under ./src/actors/<actor>/db/drizzle/ and add the corresponding entry to meta/_journal.json.

Regenerate the compiled migrations.ts. Run from the foundry root:

npx tsx packages/backend/src/actors/_scripts/generate-actor-migrations.ts

Verify insert/upsert calls. Every column with .notNull() (and no .default(...)) must be provided a value in all insert() and onConflictDoUpdate() calls. Missing a NOT NULL column causes a runtime constraint violation, not a type error.

Nuke RivetKit state in dev after migration changes to start fresh:

docker compose -f compose.dev.yaml down
docker volume rm foundry_foundry_rivetkit_storage
docker compose -f compose.dev.yaml up -d

Actors with drizzle migrations: organization, audit-log, task. Other actors (user, github-data) use inline migrations without drizzle.

Workflow Step Nesting — FORBIDDEN

Never call c.step() / ctx.step() from inside another step's run callback. RivetKit workflow steps cannot be nested. Doing so causes the runtime error: "Cannot start a new workflow entry while another is in progress."

This means:

Functions called from within a step run callback must NOT use c.step(), c.loop(), c.sleep(), or c.queue.next().
If a mutation function needs to be called both from a step and standalone, it must only do plain DB/API work — no workflow primitives. The workflow step wrapping belongs in the workflow file, not in the mutation.
Helper wrappers that conditionally call c.step() (like a runSyncStep pattern) are dangerous — if the caller is already inside a step, the nested c.step() will crash at runtime with no compile-time warning.

Rule of thumb: Workflow primitives (step, loop, sleep, queue.next) may only appear at the top level of a workflow function or inside a loop callback — never inside a step's run.

SQLite Constraints

Single-row tables must use an integer primary key with CHECK (id = 1) to enforce the singleton invariant at the database level.
Follow the task actor pattern for metadata/profile rows and keep the fixed row id in code as 1, not a string sentinel.

Multiplayer Correctness

Per-user UI state must live on the user actor, not on shared task/session actors. This is critical for multiplayer — multiple users may view the same task simultaneously with different active sessions, unread states, and in-progress drafts.

Per-user state (user actor): active session tab, unread counts, draft text, draft attachments. Keyed by (userId, taskId, sessionId).

Task-global state (task actor): session transcript, session model, session runtime status, sandbox identity, task status, branch name, PR state. These are shared across all users viewing the task — that is correct behavior.

Do not store per-user preferences, selections, or ephemeral UI state on shared actors. If a field's value should differ between two users looking at the same task, it belongs on the user actor.

Audit Log Maintenance

Every new action or command handler that represents a user-visible or workflow-significant event must append to the audit log actor. The audit log must remain a comprehensive record of significant operations.

Debugging Actors

RivetKit Inspector UI

The RivetKit inspector UI at http://localhost:6420/ui/ is the most reliable way to debug actor state in local development. The inspector HTTP API (/inspector/workflow-history) has a known bug where it returns empty {} even when the workflow has entries — always cross-check with the UI.

Useful inspector URL pattern:

http://localhost:6420/ui/?u=http%3A%2F%2F127.0.0.1%3A6420&ns=default&r=default&n=[%22<actor-name>%22]&actorId=<actor-id>&tab=<tab>

Tabs: workflow, database, state, queue, connections, metadata.

To find actor IDs:

curl -s 'http://127.0.0.1:6420/actors?name=organization'

To query actor DB via bun (inside container):

docker compose -f compose.dev.yaml exec -T backend bun -e '
  var Database = require("bun:sqlite");
  var db = new Database("/root/.local/share/foundry/rivetkit/databases/<actor-id>.db", { readonly: true });
  console.log(JSON.stringify(db.query("SELECT name FROM sqlite_master WHERE type=?").all("table")));
'

To call actor actions via inspector:

curl -s -X POST 'http://127.0.0.1:6420/gateway/<actor-id>/inspector/action/<actionName>' \
  -H 'Content-Type: application/json' -d '{"args":[{}]}'

Known inspector API bugs

GET /inspector/workflow-history may return {"history":{}} even when workflow has run. Use the UI's Workflow tab instead.
GET /inspector/queue is reliable for checking pending messages.
GET /inspector/state is reliable for checking actor state.

Maintenance

Keep this file up to date whenever actor ownership, hierarchy, or lifecycle responsibilities change.
If the real actor tree diverges from this document, update this document in the same change.
When adding, removing, or renaming coordinator index tables, update the hierarchy diagram above in the same change.
When adding a new coordinator index table in a schema file, add a doc comment identifying which child actor it indexes (pattern: /** Coordinator index of {ChildActor} instances. ... */).

9.8 KiB Raw Blame History Unescape Escape