diff --git a/foundry/research/specs/async-action-fixes/00-end-to-end-async-realtime-plan.md b/foundry/research/specs/async-action-fixes/00-end-to-end-async-realtime-plan.md new file mode 100644 index 0000000..cd9dcbf --- /dev/null +++ b/foundry/research/specs/async-action-fixes/00-end-to-end-async-realtime-plan.md @@ -0,0 +1,308 @@ +# End-To-End Async + Realtime Plan + +## Purpose + +This is the umbrella plan for the Foundry issues we traced across app shell, workbench, and actor runtime behavior: + +- long-running work still sits inline in request/action paths +- monolithic snapshot reads fan out across too many actors +- the client uses polling and full refreshes where it should use realtime subscriptions +- websocket subscriptions reconnect too aggressively +- actor shutdown can race in-flight actions and clear `c.db` underneath them + +The goal is not just to make individual endpoints faster. The goal is to move Foundry to a model where: + +- request paths only validate, create minimal state, and enqueue background work +- list views read actor-owned projections instead of recomputing deep state +- detail views connect directly to the actor that owns the visible state +- polling is replaced by actor events and bounded bootstrap fetches +- actor shutdown drains active work before cleaning up resources + +## Problem Summary + +### App shell + +- `getAppSnapshot` still rebuilds app shell state by reading the app session row and fanning out to every eligible organization actor. +- `RemoteFoundryAppStore` still polls every `500ms` while any org is `syncing`. +- Org sync/import is now off the select path, but the steady-state read path is still snapshot-based instead of subscription-based. + +### Workbench + +- `getWorkbench` still represents a monolithic workspace read that aggregates repo, project, and task state. +- The remote workbench store still responds to every event by pulling a full fresh snapshot. +- Some task/workbench detail is still too expensive to compute inline and too broad to refresh after every mutation. + +### Realtime transport + +- `subscribeWorkbench` and related connection helpers keep one connection per shared key, but the client contract still treats the socket as an invalidation channel for a later snapshot pull. +- Reconnect/error handling is weak, so connection churn amplifies backend load instead of settling into long-lived subscriptions. + +### Runtime + +- RivetKit currently lets shutdown proceed far enough to clean up actor resources while actions can still be in flight or still be routed to the actor. +- That creates the `Database not enabled` / missing `c.db` failure mode under stop/replay pressure. + +## Target Architecture + +### Request-path rule + +Every request/action should do only one of these: + +1. return actor-owned cached state +2. persist a cheap mutation +3. enqueue or signal background work + +Requests should not block on provider calls, repo sync, sandbox provisioning, transcript enumeration, or deep cross-actor fan-out unless the UI cannot render at all without the result. + +### View-model rule + +- App shell view connects to app/session state and only the org actors visible on screen. +- Workspace/task-list view connects to a workspace-owned summary projection. +- Task detail view connects directly to the selected task actor. +- Sandbox/session detail connects only when the user opens that detail. + +Do not replace one monolith with one connection per row. List screens should still come from actor-owned projections. + +### Runtime rule + +Stopping actors must stop accepting new work and must not clear actor resources until active actions and requests have drained or been cancelled. + +## Workstreams + +### 1. Runtime hardening first + +This is the only workstream that is not Foundry-only. It should start immediately because it is the only direct fix for the `c.db` shutdown race. + +#### Changes + +1. Add active action/request accounting in RivetKit actor instances. +2. Mark actors as draining before cleanup starts. +3. Reject or reroute new requests/actions once draining begins. +4. Wait for active actions to finish or abort before `#cleanupDatabase()` runs. +5. Delay clearing `#db` until no active actions remain. +6. Add actor stop logs with: + - actor id + - active action count + - active request count + - drain start/end timestamps + - cleanup start/end timestamps + +#### Acceptance criteria + +- No action can successfully enter user code after actor draining begins. +- `Database not enabled` cannot be produced by an in-flight action after stop has begun. +- Stop logs make it obvious whether shutdown delay is run-handler time, active-action drain time, background promise time, or routing delay. + +### 2. App shell moves from snapshot polling to subscriptions + +The app shell should stop using `/app/snapshot` as the steady-state read model. + +#### Changes + +1. Introduce a small app-shell projection owned by the app workspace actor: + - auth status + - current user summary + - active org id + - visible org ids + - per-org lightweight status summary +2. Add app actor events, for example: + - `appSessionUpdated` + - `activeOrganizationChanged` + - `organizationSyncStatusChanged` +3. Expose connection helpers from the backend client for: + - app actor subscription + - organization actor subscription by id +4. Update `RemoteFoundryAppStore` so it: + - does one bootstrap fetch on first subscribe + - connects to the app actor for ongoing updates + - connects only to the org actors needed for the current view + - disposes org subscriptions when they are no longer visible +5. Remove `scheduleSyncPollingIfNeeded()` and the `500ms` refresh loop. + +#### Likely files + +- `foundry/packages/backend/src/actors/workspace/app-shell.ts` +- `foundry/packages/client/src/backend-client.ts` +- `foundry/packages/client/src/remote/app-client.ts` +- `foundry/packages/shared/src/app-shell.ts` +- app shell frontend consumers + +#### Acceptance criteria + +- No app shell polling loop remains. +- Selecting an org returns quickly and the UI updates from actor events. +- App shell refresh cost is bounded by visible state, not every eligible organization on every poll. + +### 3. Workspace summary becomes a projection, not a full snapshot + +The task list should read a workspace-owned summary projection instead of calling into every task actor on each refresh. + +#### Changes + +1. Define a durable workspace summary model with only list-screen fields: + - repo summary + - project summary + - task summary + - selected/open task ids + - unread/session status summary + - coarse git/PR state summary +2. Update workspace actor workflows so task/project changes incrementally update this projection. +3. Change `getWorkbench` to return the projection only. +4. Change `workbenchUpdated` from "invalidate and refetch everything" to "here is the updated projection version or changed entity ids". +5. Remove task-actor fan-out from the default list read path. + +#### Likely files + +- `foundry/packages/backend/src/actors/workspace/actions.ts` +- `foundry/packages/backend/src/actors/project/actions.ts` +- `foundry/packages/backend/src/actors/task/index.ts` +- `foundry/packages/backend/src/actors/task/workbench.ts` +- task/workspace DB schema and migrations +- `foundry/packages/client/src/remote/workbench-client.ts` + +#### Acceptance criteria + +- Workbench list refresh does not call every task actor. +- A websocket event does not force a full cross-actor rebuild. +- Initial task-list load time scales roughly with workspace summary size, not repo count times task count times detail reads. + +### 4. Task detail moves to direct actor reads and events + +Heavy task detail should move out of the workspace summary and into the selected task actor. + +#### Changes + +1. Split task detail into focused reads/subscriptions: + - task header/meta + - tabs/session summary + - transcript stream + - diff/file tree + - sandbox process state +2. Open a task actor connection only for the selected task. +3. Open sandbox/session subscriptions only for the active tab/pane. +4. Dispose those subscriptions when the user changes selection. +5. Keep expensive derived state cached in actor-owned tables and update it from background jobs or event ingestion. + +#### Acceptance criteria + +- Opening the task list does not open connections to every task actor. +- Opening a task shows staged loading for heavy panes instead of blocking the whole workbench snapshot. +- Transcript, diff, and file-tree reads are not recomputed for unrelated tasks. + +### 5. Finish moving long-running mutations to background workflows + +This extends and completes the existing async-action briefs in this folder. + +#### Existing briefs to implement under this workstream + +1. `01-task-creation-bootstrap-only.md` +2. `02-repo-overview-from-cached-projection.md` +3. `03-repo-actions-via-background-workflow.md` +4. `04-workbench-session-creation-without-inline-provisioning.md` +5. `05-workbench-snapshot-from-derived-state.md` +6. `06-daytona-provisioning-staged-background-flow.md` + +#### Additional rule + +Every workflow-backed mutation should leave behind durable status rows or events that realtime clients can observe without polling. + +### 6. Subscription lifecycle and reconnect behavior need one shared model + +The current client-side connection pattern is too ad hoc. It needs a single lifecycle policy so sockets are long-lived and bounded. + +#### Changes + +1. Create one shared subscription manager in the client for: + - reference counting + - connection reuse + - reconnect backoff + - connection state events + - clean disposal +2. Make invalidation optional. Prefer payload-bearing events or projection version updates. +3. Add structured logs/metrics in the client for: + - connection created/disposed + - reconnect attempts + - subscription count per actor key + - refresh triggered by event vs bootstrap vs mutation +4. Stop calling full `refresh()` after every mutation when the mutation result or follow-up event already contains enough state to update locally. + +#### Acceptance criteria + +- Idle screens maintain stable websocket counts. +- Transient socket failures do not create refresh storms. +- The client can explain why any given refresh happened. + +### 7. Clean up HTTP surface after realtime migration + +Do not delete bootstrap endpoints first. Shrink them after the subscription model is working. + +#### Changes + +1. Keep one-shot bootstrap/read endpoints only where they still add value: + - initial app load + - initial workbench load + - deep-link fallback +2. Remove or de-emphasize monolithic snapshot endpoints for steady-state use. +3. Keep HTTP for control-plane and external integrations. + +#### Acceptance criteria + +- Main interactive screens do not depend on polling. +- Snapshot endpoints are bootstrap/fallback paths, not the primary UI contract. + +## Suggested Implementation Order + +1. Runtime hardening in RivetKit +2. `01-task-creation-bootstrap-only.md` +3. `03-repo-actions-via-background-workflow.md` +4. `06-daytona-provisioning-staged-background-flow.md` +5. App shell realtime subscription model +6. `02-repo-overview-from-cached-projection.md` +7. Workspace summary projection +8. `04-workbench-session-creation-without-inline-provisioning.md` +9. `05-workbench-snapshot-from-derived-state.md` +10. Task-detail direct actor reads/subscriptions +11. Client subscription lifecycle cleanup +12. `07-auth-identity-simplification.md` + +## Why This Order + +- Runtime hardening removes the most dangerous correctness bug before more UI load shifts onto actor connections. +- The first async workflow items reduce the biggest user-visible stalls quickly. +- App shell realtime is smaller and lower-risk than the workbench migration, and it removes the current polling loop. +- Workspace summary and task-detail split should happen after the async workflow moves so the projection model does not encode old synchronous assumptions. +- Auth simplification is valuable but not required to remove the current refresh/polling/runtime problems. + +## Observability Requirements + +Before or alongside implementation, add metrics/logs for: + +- app snapshot bootstrap duration +- workbench bootstrap duration +- actor connection count by actor type and view +- reconnect count by actor key +- projection rebuild/update duration +- workflow queue latency +- actor drain duration and active-action counts during stop + +Each log line should include a request id or actor/event correlation id where possible. + +## Rollout Strategy + +1. Ship runtime hardening and observability first. +2. Ship app-shell realtime behind a client flag while keeping snapshot bootstrap. +3. Ship workspace summary projection behind a separate flag. +4. Migrate one heavy detail pane at a time off the monolithic workbench payload. +5. Remove polling once the matching event path is proven stable. +6. Only then remove or demote the old snapshot-heavy steady-state flows. + +## Done Means + +This initiative is done when all of the following are true: + +- no user-visible screen depends on `500ms` polling +- no list view recomputes deep task/session/diff state inline on every refresh +- long-running repo/provider/sandbox work always runs in durable background workflows +- the client connects only to actors relevant to the current view and disposes them when the view changes +- websocket counts stay stable on idle screens +- actor shutdown cannot invalidate `c.db` underneath active actions diff --git a/foundry/research/specs/async-action-fixes/01-task-creation-bootstrap-only.md b/foundry/research/specs/async-action-fixes/01-task-creation-bootstrap-only.md index c8de3ba..2aa9f50 100644 --- a/foundry/research/specs/async-action-fixes/01-task-creation-bootstrap-only.md +++ b/foundry/research/specs/async-action-fixes/01-task-creation-bootstrap-only.md @@ -1,11 +1,22 @@ # Task Creation Should Return After Actor Bootstrap +Read `00-end-to-end-async-realtime-plan.md` first for the governing migration order, runtime constraints, and realtime client model this brief assumes. + ## Problem Task creation currently waits for full provisioning: naming, repo checks, sandbox creation/resume, sandbox-agent install/start, sandbox-instance wiring, and session creation. That makes a user-facing action depend on queue-backed and provider-backed work that can take minutes. The client only needs the task actor to exist so it can navigate to the task and observe progress. +## Current Code Context + +- Workspace entry point: `foundry/packages/backend/src/actors/workspace/actions.ts` +- Project task creation path: `foundry/packages/backend/src/actors/project/actions.ts` +- Task action surface: `foundry/packages/backend/src/actors/task/index.ts` +- Task workflow: `foundry/packages/backend/src/actors/task/workflow/index.ts` +- Task init/provision steps: `foundry/packages/backend/src/actors/task/workflow/init.ts` +- Provider-backed long steps currently happen inside the task provision workflow. + ## Target Contract - `createTask` returns once the task actor exists and initial task metadata is persisted. @@ -38,6 +49,16 @@ That makes a user-facing action depend on queue-backed and provider-backed work - `running` - `error` +## Files Likely To Change + +- `foundry/packages/backend/src/actors/workspace/actions.ts` +- `foundry/packages/backend/src/actors/project/actions.ts` +- `foundry/packages/backend/src/actors/task/index.ts` +- `foundry/packages/backend/src/actors/task/workflow/index.ts` +- `foundry/packages/backend/src/actors/task/workflow/init.ts` +- `foundry/packages/frontend/src/components/workspace-dashboard.tsx` +- `foundry/packages/client/src/remote/workbench-client.ts` + ## Client Impact - Task creation UI should navigate immediately to the task page. @@ -49,3 +70,9 @@ That makes a user-facing action depend on queue-backed and provider-backed work - Creating a task never waits on sandbox creation or session creation. - A timeout in provider setup does not make the original create request fail after several minutes. - After a backend restart, the task workflow can resume provisioning from durable state without requiring the client to retry create. + +## Implementation Notes + +- Preserve the existing task actor as the single writer for task runtime state. +- Do not introduce a second creator path for task actors; keep one create/bootstrap path and one background provision path. +- Fresh-agent check: verify that `createWorkbenchTask` and any dashboard create flow still have enough data to navigate immediately after this change. diff --git a/foundry/research/specs/async-action-fixes/02-repo-overview-from-cached-projection.md b/foundry/research/specs/async-action-fixes/02-repo-overview-from-cached-projection.md index 0c9be29..27afad5 100644 --- a/foundry/research/specs/async-action-fixes/02-repo-overview-from-cached-projection.md +++ b/foundry/research/specs/async-action-fixes/02-repo-overview-from-cached-projection.md @@ -1,5 +1,7 @@ # Repo Overview Should Read Cached State Only +Read `00-end-to-end-async-realtime-plan.md` first for the governing migration order, runtime constraints, and realtime client model this brief assumes. + ## Problem Repo overview currently forces PR sync and branch sync inline before returning data. That turns a read path into: @@ -11,6 +13,14 @@ Repo overview currently forces PR sync and branch sync inline before returning d The frontend polls repo overview repeatedly, so this design multiplies slow work and ties normal browsing to sync latency. +## Current Code Context + +- Workspace overview entry point: `foundry/packages/backend/src/actors/workspace/actions.ts` +- Project overview implementation: `foundry/packages/backend/src/actors/project/actions.ts` +- Branch sync poller: `foundry/packages/backend/src/actors/project-branch-sync/index.ts` +- PR sync poller: `foundry/packages/backend/src/actors/project-pr-sync/index.ts` +- Repo overview client polling: `foundry/packages/frontend/src/components/workspace-dashboard.tsx` + ## Target Contract - `getRepoOverview` returns the latest cached repo projection immediately. @@ -32,6 +42,16 @@ The frontend polls repo overview repeatedly, so this design multiplies slow work - returns immediately 5. Keep `getRepoOverview` as a pure read over project SQLite state. +## Files Likely To Change + +- `foundry/packages/backend/src/actors/workspace/actions.ts` +- `foundry/packages/backend/src/actors/project/actions.ts` +- `foundry/packages/backend/src/actors/project/db/schema.ts` +- `foundry/packages/backend/src/actors/project/db/migrations.ts` +- `foundry/packages/backend/src/actors/project-branch-sync/index.ts` +- `foundry/packages/backend/src/actors/project-pr-sync/index.ts` +- `foundry/packages/frontend/src/components/workspace-dashboard.tsx` + ## Client Impact - The repo overview screen should render cached rows immediately. @@ -43,3 +63,9 @@ The frontend polls repo overview repeatedly, so this design multiplies slow work - `getRepoOverview` does not call `force()` on polling actors. - Opening the repo overview page does not trigger network/git work inline. - Slow branch sync or PR sync no longer blocks the page request. + +## Implementation Notes + +- Favor adding explicit freshness metadata over implicit timing assumptions in the frontend. +- The overview query should remain safe to call frequently even if the UI still polls during the transition. +- Fresh-agent check: confirm no other read paths call `forceProjectSync()` inline after this change. diff --git a/foundry/research/specs/async-action-fixes/03-repo-actions-via-background-workflow.md b/foundry/research/specs/async-action-fixes/03-repo-actions-via-background-workflow.md index ac6a7c3..2c1738c 100644 --- a/foundry/research/specs/async-action-fixes/03-repo-actions-via-background-workflow.md +++ b/foundry/research/specs/async-action-fixes/03-repo-actions-via-background-workflow.md @@ -1,11 +1,20 @@ # Repo Sync And Stack Actions Should Run In Background Workflows +Read `00-end-to-end-async-realtime-plan.md` first for the governing migration order, runtime constraints, and realtime client model this brief assumes. + ## Problem Repo stack actions currently run inside a synchronous action and surround the action with forced sync before and after. Branch-backed task creation also forces repo sync inline before it can proceed. These flows depend on repo/network state and can take minutes. They should not hold an action open. +## Current Code Context + +- Workspace repo action entry point: `foundry/packages/backend/src/actors/workspace/actions.ts` +- Project repo action implementation: `foundry/packages/backend/src/actors/project/actions.ts` +- Branch/task index state lives in the project actor SQLite DB. +- Current forced sync uses the PR and branch polling actors before and after the action. + ## Target Contract - Repo-affecting actions are accepted quickly and run in the background. @@ -38,6 +47,15 @@ These flows depend on repo/network state and can take minutes. They should not h - use the cached branch projection if present - if branch data is stale or missing, enqueue branch registration/refresh work and surface pending state instead of blocking create +## Files Likely To Change + +- `foundry/packages/backend/src/actors/workspace/actions.ts` +- `foundry/packages/backend/src/actors/project/actions.ts` +- `foundry/packages/backend/src/actors/project/db/schema.ts` +- `foundry/packages/backend/src/actors/project/db/migrations.ts` +- `foundry/packages/frontend/src/components/workspace-dashboard.tsx` +- Any shared types in `foundry/packages/shared/src` + ## Client Impact - Repo action buttons should show queued/running/completed/error job state. @@ -48,3 +66,9 @@ These flows depend on repo/network state and can take minutes. They should not h - No repo stack action waits for full git-spice execution inside the request. - No action forces branch sync or PR sync inline. - Action result state survives retries and backend restarts because the workflow status is persisted. + +## Implementation Notes + +- Keep validation cheap in the request path; expensive repo inspection belongs in the workflow. +- If job rows are added, decide whether they are project-owned only or also mirrored into history events for UI consumption. +- Fresh-agent check: branch-backed task creation and explicit repo stack actions should use the same background job/status vocabulary where possible. diff --git a/foundry/research/specs/async-action-fixes/04-workbench-session-creation-without-inline-provisioning.md b/foundry/research/specs/async-action-fixes/04-workbench-session-creation-without-inline-provisioning.md index 1319be7..9221780 100644 --- a/foundry/research/specs/async-action-fixes/04-workbench-session-creation-without-inline-provisioning.md +++ b/foundry/research/specs/async-action-fixes/04-workbench-session-creation-without-inline-provisioning.md @@ -1,9 +1,19 @@ # Workbench Session Creation Must Not Trigger Inline Provisioning +Read `00-end-to-end-async-realtime-plan.md` first for the governing migration order, runtime constraints, and realtime client model this brief assumes. + ## Problem Creating a workbench tab currently provisions the whole task if no active sandbox exists. A user action that looks like "open tab" can therefore block on sandbox creation and agent setup. +## Current Code Context + +- Workspace workbench action entry point: `foundry/packages/backend/src/actors/workspace/actions.ts` +- Task workbench behavior: `foundry/packages/backend/src/actors/task/workbench.ts` +- Task provision action: `foundry/packages/backend/src/actors/task/index.ts` +- Sandbox session creation path: `foundry/packages/backend/src/actors/sandbox-instance/index.ts` +- Remote workbench refresh behavior: `foundry/packages/client/src/remote/workbench-client.ts` + ## Target Contract - Creating a tab returns quickly. @@ -24,6 +34,16 @@ Creating a workbench tab currently provisions the whole task if no active sandbo - `error` 4. When provisioning or session creation finishes, update the placeholder row with the real sandbox/session identifiers and notify the workbench. +## Files Likely To Change + +- `foundry/packages/backend/src/actors/workspace/actions.ts` +- `foundry/packages/backend/src/actors/task/workbench.ts` +- `foundry/packages/backend/src/actors/task/index.ts` +- `foundry/packages/backend/src/actors/task/db/schema.ts` +- `foundry/packages/backend/src/actors/task/db/migrations.ts` +- `foundry/packages/client/src/remote/workbench-client.ts` +- `foundry/packages/frontend/src/components/mock-layout.tsx` + ## Client Impact - The workbench can show a disabled composer or "Preparing environment" state for a pending tab. @@ -34,3 +54,9 @@ Creating a workbench tab currently provisions the whole task if no active sandbo - `createWorkbenchSession` never calls task provisioning inline. - Opening a tab on an unprovisioned task returns promptly with a placeholder tab id. - The tab transitions to ready through background updates only. + +## Implementation Notes + +- The main design choice here is placeholder identity. Decide early whether placeholder tab ids are durable synthetic ids or whether a pending row can be updated in place once a real session exists. +- Avoid coupling this design to Daytona specifically; it should work for local and remote providers. +- Fresh-agent check: confirm composer, unread state, and tab close behavior all handle pending/error tabs cleanly. diff --git a/foundry/research/specs/async-action-fixes/05-workbench-snapshot-from-derived-state.md b/foundry/research/specs/async-action-fixes/05-workbench-snapshot-from-derived-state.md index af8734f..55401a7 100644 --- a/foundry/research/specs/async-action-fixes/05-workbench-snapshot-from-derived-state.md +++ b/foundry/research/specs/async-action-fixes/05-workbench-snapshot-from-derived-state.md @@ -1,5 +1,7 @@ # Workbench Snapshots Should Read Derived State, Not Recompute It +Read `00-end-to-end-async-realtime-plan.md` first for the governing migration order, runtime constraints, and realtime client model this brief assumes. + ## Problem Workbench snapshot reads currently execute expensive sandbox commands and transcript reads inline: @@ -13,6 +15,14 @@ Workbench snapshot reads currently execute expensive sandbox commands and transc The remote workbench client refreshes after each action and on update events, so this synchronous snapshot work is amplified. +## Current Code Context + +- Workspace workbench snapshot builder: `foundry/packages/backend/src/actors/workspace/actions.ts` +- Task workbench snapshot builder: `foundry/packages/backend/src/actors/task/workbench.ts` +- Sandbox session event persistence: `foundry/packages/backend/src/actors/sandbox-instance/persist.ts` +- Remote workbench client refresh loop: `foundry/packages/client/src/remote/workbench-client.ts` +- Mock layout consumer: `foundry/packages/frontend/src/components/mock-layout.tsx` + ## Target Contract - `getWorkbench` reads a cached projection only. @@ -31,6 +41,16 @@ The remote workbench client refreshes after each action and on update events, so 4. Keep `getWorkbench` limited to summary fields needed for the main screen. 5. Update the remote workbench client to rely more on push updates and less on immediate full refresh after every mutation. +## Files Likely To Change + +- `foundry/packages/backend/src/actors/workspace/actions.ts` +- `foundry/packages/backend/src/actors/task/workbench.ts` +- `foundry/packages/backend/src/actors/task/db/schema.ts` +- `foundry/packages/backend/src/actors/task/db/migrations.ts` +- `foundry/packages/client/src/remote/workbench-client.ts` +- `foundry/packages/shared/src` +- `foundry/packages/frontend/src/components/mock-layout.tsx` + ## Client Impact - Main workbench loads faster and remains responsive with many tasks/files/sessions. @@ -41,3 +61,9 @@ The remote workbench client refreshes after each action and on update events, so - `getWorkbench` does not run per-file diff commands inline. - `getWorkbench` does not read full transcripts for every tab inline. - Full workbench refresh cost stays roughly proportional to task count, not task count times changed files times sessions. + +## Implementation Notes + +- This is the broadest UI-facing refactor in the set. +- Prefer introducing lighter cached summary fields first, then moving heavy detail into separate reads. +- Fresh-agent check: define the final snapshot contract before changing frontend consumers, otherwise the refactor will sprawl. diff --git a/foundry/research/specs/async-action-fixes/06-daytona-provisioning-staged-background-flow.md b/foundry/research/specs/async-action-fixes/06-daytona-provisioning-staged-background-flow.md index 028e4fd..287d9fa 100644 --- a/foundry/research/specs/async-action-fixes/06-daytona-provisioning-staged-background-flow.md +++ b/foundry/research/specs/async-action-fixes/06-daytona-provisioning-staged-background-flow.md @@ -1,5 +1,7 @@ # Daytona Provisioning Should Be A Staged Background Flow +Read `00-end-to-end-async-realtime-plan.md` first for the governing migration order, runtime constraints, and realtime client model this brief assumes. + ## Problem Daytona provisioning currently performs long-running setup inline: @@ -14,6 +16,14 @@ Daytona provisioning currently performs long-running setup inline: This is acceptable inside a durable background workflow, but not as part of a user-facing action response. +## Current Code Context + +- Daytona provider implementation: `foundry/packages/backend/src/providers/daytona/index.ts` +- Task provisioning workflow: `foundry/packages/backend/src/actors/task/workflow/index.ts` +- Task init activities: `foundry/packages/backend/src/actors/task/workflow/init.ts` +- Sandbox-instance actor: `foundry/packages/backend/src/actors/sandbox-instance/index.ts` +- Provider registry/runtime context: `foundry/packages/backend/src/providers/index.ts` and `foundry/packages/backend/src/actors/context.ts` + ## Target Contract - Requests that need Daytona resources only wait for persisted actor/job creation. @@ -39,6 +49,16 @@ This is acceptable inside a durable background workflow, but not as part of a us - follow-up workflow progression once the prior stage completes 5. If sandbox-agent session creation is also slow, treat that as its own stage instead of folding it into request completion. +## Files Likely To Change + +- `foundry/packages/backend/src/providers/daytona/index.ts` +- `foundry/packages/backend/src/actors/task/workflow/index.ts` +- `foundry/packages/backend/src/actors/task/workflow/init.ts` +- `foundry/packages/backend/src/actors/task/db/schema.ts` +- `foundry/packages/backend/src/actors/task/db/migrations.ts` +- `foundry/packages/backend/src/actors/sandbox-instance/index.ts` +- Potentially shared provider types in `foundry/packages/backend/src/providers/provider-api/index.ts` + ## Client Impact - Users see staged progress instead of a long spinner. @@ -49,3 +69,9 @@ This is acceptable inside a durable background workflow, but not as part of a us - No user-facing request waits for Daytona package installs, repo clone, sandbox-agent installation, or health polling. - Progress survives backend restarts because the stage is persisted. - The system can resume from the last completed stage instead of replaying the whole provisioning path blindly. + +## Implementation Notes + +- If this is implemented after item 1, much of the user-facing pain disappears immediately; this item then becomes about reliability and clearer progress reporting. +- Keep the stage model provider-agnostic where possible so local and future providers can share the same task runtime semantics. +- Fresh-agent check: decide whether stage ownership lives on the task actor, sandbox-instance actor, or both before changing schema. diff --git a/foundry/research/specs/async-action-fixes/07-auth-identity-simplification.md b/foundry/research/specs/async-action-fixes/07-auth-identity-simplification.md new file mode 100644 index 0000000..50f3b56 --- /dev/null +++ b/foundry/research/specs/async-action-fixes/07-auth-identity-simplification.md @@ -0,0 +1,328 @@ +# Auth & Identity Simplification: Adopt BetterAuth + Extract User Model + +Read `00-end-to-end-async-realtime-plan.md` first for the governing migration order, runtime constraints, and realtime client model this brief assumes. + +## Problem + +Authentication and user identity are conflated into a single `appSessions` table that serves as the session store, user record, OAuth credential store, navigation state, and onboarding tracker simultaneously. There is no canonical user record — identity fields are denormalized into every session row. BetterAuth env vars exist (`BETTER_AUTH_URL`, `BETTER_AUTH_SECRET`) but the library is not used; all OAuth and session handling is hand-rolled. + +### Specific issues + +1. **No user table.** Same GitHub user in two browsers = two independent copies of identity fields with no shared record. Org membership, onboarding state, and role are per-session instead of per-user. +2. **Unsigned session tokens.** Session IDs are plain UUIDs in `localStorage`, sent via `x-foundry-session` header. The backend trusts them at face value — no signature verification. +3. **Unstable user IDs.** User ID is `user-${slugify(viewer.login)}` which breaks on GitHub username renames. GitHub numeric `id` is available from the API but not used as the stable key. +4. **Dead BetterAuth references.** `BETTER_AUTH_URL` is used as a URL alias in `app-shell-runtime.ts:65`. `BETTER_AUTH_SECRET` is documented but never read. This creates confusion about what auth system is actually in use. +5. **Overloaded session row.** `appSessions` has 15+ columns mixing auth credentials, user identity, org navigation, onboarding state, and transient OAuth flow state. + +## Current Code Context + +- Custom OAuth flow: `foundry/packages/backend/src/services/app-github.ts` (`buildAuthorizeUrl`, `exchangeCode`, `getViewer`) +- Session + identity management: `foundry/packages/backend/src/actors/workspace/app-shell.ts` (`ensureAppSession`, `updateAppSession`, `initGithubSession`, `syncGithubOrganizations`) +- Session schema: `foundry/packages/backend/src/actors/workspace/db/schema.ts` (`appSessions` table) +- Shared types: `foundry/packages/shared/src/app-shell.ts` (`FoundryUser`, `FoundryAppSnapshot`) +- HTTP routes: `foundry/packages/backend/src/index.ts` (`resolveSessionId`, `/v1/auth/github/*`, all `/v1/app/*` routes) +- Frontend session persistence: `foundry/packages/client/src/backend-client.ts` (`persistAppSessionId`, `x-foundry-session` header, `foundrySession` URL param extraction) +- Runtime config: `foundry/packages/backend/src/services/app-shell-runtime.ts` (`BETTER_AUTH_URL` fallback) +- Compose config: `foundry/compose.dev.yaml` (`BETTER_AUTH_URL`, `BETTER_AUTH_SECRET` env vars) +- Self-hosting docs: `docs/deploy/foundry-self-hosting.mdx` (documents both env vars) + +## Target State + +### BetterAuth owns auth plumbing + +- BetterAuth handles GitHub OAuth (authorize URL, code exchange, CSRF state, token storage). +- BetterAuth manages session lifecycle (signed tokens, expiration, revocation). +- BetterAuth creates and maintains `user`, `session`, and `account` tables with proper FKs. +- `BETTER_AUTH_SECRET` is actually used for session signing. +- `BETTER_AUTH_URL` is actually used as the auth callback base URL. + +### Custom actor-routed adapter + +- BetterAuth uses a custom adapter that routes all DB operations through RivetKit actors. +- Each user has their own actor. BetterAuth's `user`, `session`, and `account` tables live in the per-user actor's SQLite via `c.db`. +- The adapter resolves which actor to target based on the primary key BetterAuth passes for each operation (user ID, session ID, account ID). +- A lightweight **session index** on the app-shell workspace actor maps session tokens → user actor identity, so inbound requests can be routed to the correct user actor without knowing the user ID upfront. + +### Canonical user record + +- Users are identified by GitHub numeric account ID (immutable across renames). +- BetterAuth's `user` table in the per-user actor is the single source of truth for identity. +- App-specific user fields (`eligibleOrganizationIds`, `starterRepoStatus`, `roleLabel`) live in a `userProfiles` table in the same per-user actor, keyed by user ID, not duplicated per session. + +### Thin sessions + +- Sessions reference a user ID (FK) instead of duplicating identity fields. +- App-specific session state (`activeOrganizationId`) lives in a `sessionState` table in the per-user actor or as BetterAuth session additional fields. +- Transient OAuth flow state (`oauthState`, `oauthStateExpiresAt`) is handled by BetterAuth internally. + +### Snapshot projection unchanged + +- `FoundryAppSnapshot` and `FoundryUser` types remain the same — they're already the right shape. +- The snapshot builder reads from the user actor's BetterAuth tables + `userProfiles` instead of reading everything from `appSessions`. + +## Architecture: Custom Actor-Routed BetterAuth Adapter + +### Why a custom adapter + +BetterAuth expects a single database. Foundry uses per-actor SQLite — each actor instance gets its own `c.db`. Users each have their own actor, so BetterAuth's `user`, `session`, and `account` records must live inside the correct user actor's database. The adapter must route each BetterAuth DB operation to the right actor based on the primary key. + +### Routing challenge: session → user actor + +When an HTTP request arrives, the backend has a session token but doesn't know the user ID yet. BetterAuth calls adapter methods like `findSession(sessionId)` to resolve this. But which actor holds that session row? + +**Solution: session index on the app-shell workspace actor.** + +The app-shell workspace actor (which already handles auth routing) maintains a lightweight index table: + +``` +sessionIndex +├── sessionId (text, PK) +├── userActorKey (text) — actor key for the user actor that owns this session +├── createdAt (integer) +``` + +The adapter flow for session lookup: +1. BetterAuth calls `findSession(sessionId)`. +2. Adapter queries `sessionIndex` on the workspace actor to resolve `userActorKey`. +3. Adapter gets the user actor handle and queries BetterAuth's `session` table in that actor's `c.db`. + +The adapter flow for user creation (OAuth callback): +1. BetterAuth calls `createUser(userData)`. +2. Adapter resolves the GitHub numeric ID from the user data. +3. Adapter creates/gets the user actor keyed by GitHub ID. +4. Adapter inserts into BetterAuth's `user` table in that actor's `c.db`. +5. When `createSession` follows, adapter writes to the user actor's `session` table AND inserts into the workspace actor's `sessionIndex`. + +### User actor shape + +```text +UserActor (key: ["ws", workspaceId, "user", githubNumericId]) +├── BetterAuth tables: user, session, account (managed by BetterAuth schema) +├── userProfiles (app-specific: eligibleOrganizationIds, starterRepoStatus, roleLabel) +└── sessionState (app-specific: activeOrganizationId per session) +``` + +### BetterAuth adapter interface (concrete) + +BetterAuth uses `createAdapterFactory` from `"better-auth/adapters"`. The adapter is **model-based, not entity-based** — it receives a `model` string (`"user"`, `"session"`, `"account"`, `"verification"`) and generic CRUD parameters. All methods are **async** and return Promises. The adapter can do arbitrary async work including actor handle resolution and cross-actor messages. + +```typescript +// Adapter methods (all async, all receive model name + generic params): +create: ({ model, data, select? }) => Promise +findOne: ({ model, where, select?, join? }) => Promise +findMany: ({ model, where, limit?, offset?, sortBy?, join? }) => Promise +update: ({ model, where, update }) => Promise +updateMany: ({ model, where, update }) => Promise +delete: ({ model, where }) => Promise +deleteMany: ({ model, where }) => Promise +count: ({ model, where }) => Promise +``` + +The `where` clauses use `{ field, value, operator?, connector? }` objects (operators: `eq`, `ne`, `in`, `contains`, etc.). + +#### Routing logic inside the adapter + +The adapter must inspect `model` and `where` to determine the target actor: + +| Model | Routing strategy | +|-------|-----------------| +| `user` (by id) | User actor key derived directly from user ID | +| `user` (by email) | `emailIndex` on workspace actor → user actor key | +| `session` (by token) | `sessionIndex` on workspace actor → user actor key | +| `session` (by id) | `sessionIndex` on workspace actor → user actor key | +| `session` (by userId) | User actor key derived directly from userId | +| `account` | Always has `userId` in where or data → user actor key | +| `verification` | Workspace actor (not user-scoped — used for email verification, password reset) | + +On `create` for `session` model: write to user actor's `session` table AND insert into workspace actor's `sessionIndex`. +On `delete` for `session` model: delete from user actor's `session` table AND remove from workspace actor's `sessionIndex`. + +#### Adapter construction + +The adapter is instantiated at BetterAuth init time with a closure over the RivetKit registry. It does **not** depend on an ambient actor context — it resolves actor handles on demand via the registry. + +```typescript +import { createAdapterFactory } from "better-auth/adapters"; + +const actorRoutedAdapter = (registry: Registry) => { + return createAdapterFactory({ + config: { + adapterId: "rivetkit-actor", + adapterName: "RivetKit Actor Adapter", + supportsJSON: false, // SQLite — auto-serialize JSON + supportsDates: false, // SQLite — ISO string conversion + supportsBooleans: false, // SQLite — 0/1 conversion + }, + adapter: ({ getModelName, transformInput, transformOutput, transformWhereClause }) => ({ + create: async ({ model, data }) => { + const actorKey = resolveActorKeyForCreate(model, data); + const actor = await registry.get("user", actorKey); + // delegate insert to actor's c.db + // if model === "session", also write sessionIndex + }, + findOne: async ({ model, where }) => { + const actorKey = await resolveActorKeyForQuery(model, where); + // ... + }, + // ... remaining methods + }), + }); +}; +``` + +#### BetterAuth session tokens + +BetterAuth uses **opaque session tokens** stored in the `session` table's `token` column. By default, the token is set as a cookie (`better-auth.session_token`). On every request, BetterAuth looks up the session in the DB by token and checks `expiresAt`. + +**Cookie caching** can be enabled to reduce DB lookups: the session data is signed (HMAC-SHA256) or encrypted (AES-256) and embedded in the cookie. When the cache is fresh (configurable `maxAge`, e.g., 5 minutes), BetterAuth validates the signature locally without hitting the adapter. This **eliminates the hot-path actor lookup for most requests** — the adapter is only called when the cache expires or on write operations. + +```typescript +session: { + cookieCache: { + enabled: true, + maxAge: 5 * 60, // 5 minutes — most requests skip the adapter entirely + strategy: "compact", // HMAC-signed, minimal size + }, +} +``` + +#### BetterAuth core tables + +Four tables, all in the per-user actor's SQLite (except `verification` which goes on workspace actor): + +**`user`**: `id`, `name`, `email`, `emailVerified`, `image`, `createdAt`, `updatedAt` +**`session`**: `id`, `token`, `userId`, `expiresAt`, `ipAddress?`, `userAgent?`, `createdAt`, `updatedAt` +**`account`**: `id`, `userId`, `accountId` (GitHub numeric ID), `providerId` ("github"), `accessToken?`, `refreshToken?`, `scope?`, `createdAt`, `updatedAt` +**`verification`**: `id`, `identifier`, `value`, `expiresAt`, `createdAt`, `updatedAt` + +For `findUserByEmail`, a secondary index (email → user actor key) is needed on the workspace actor alongside `sessionIndex`. + +## Implementation Plan + +### Phase 0: Spike — custom adapter feasibility + +Research confirms: +- BetterAuth adapter methods are **fully async** (`Promise`-based). Arbitrary async work (actor handle resolution, cross-actor messages) is allowed. +- The adapter is instantiated at BetterAuth init time and receives no request context — it's a plain object of async functions. This means the adapter can close over a RivetKit registry reference and resolve actor handles on demand. +- Cookie caching (`cookieCache.enabled: true`) eliminates the adapter hot-path for most read requests — the session is validated from the signed cookie, and the adapter is only called when the cache expires or on writes. + +**Remaining spike work:** + +1. **Prototype the adapter + user actor end-to-end** — wire up `createAdapterFactory` with a minimal actor-routed implementation. Confirm that BetterAuth's GitHub OAuth flow completes successfully with user/session/account records landing in the correct per-user actor's SQLite. +2. **Verify `findOne` for session model** — confirm the `where` clause BetterAuth passes for session lookup includes the `token` field (not just `id`), so the adapter can route via `sessionIndex` keyed by token. +3. **Measure cookie-cached vs uncached request latency** — confirm that with cookie caching enabled, the adapter is not called on every request, and that the uncached fallback (workspace actor index → user actor → session table) is acceptable. + +### Phase 1: User actor + adapter infrastructure (no behavior change) + +1. **Install `better-auth` package** in `packages/backend`. +2. **Define `UserActor`** with actor key `["ws", workspaceId, "user", githubNumericId]`. Include BetterAuth's required tables (`user`, `session`, `account`) plus app-specific tables in its schema. +3. **Create `userProfiles` table** in user actor schema: + ``` + userProfiles + ├── userId (text, PK) — GitHub numeric account ID (string form) + ├── githubLogin (text) + ├── roleLabel (text) + ├── eligibleOrganizationIdsJson (text) + ├── starterRepoStatus (text) + ├── starterRepoStarredAt (integer, nullable) + ├── starterRepoSkippedAt (integer, nullable) + ├── createdAt (integer) + ├── updatedAt (integer) + ``` +4. **Create `sessionState` table** in user actor schema: + ``` + sessionState + ├── sessionId (text, PK) — references BetterAuth session ID + ├── activeOrganizationId (text, nullable) + ├── createdAt (integer) + ├── updatedAt (integer) + ``` +5. **Create `sessionIndex` and `emailIndex` tables** on the app-shell workspace actor: + ``` + sessionIndex + ├── sessionId (text, PK) + ├── userActorKey (text) + ├── createdAt (integer) + + emailIndex + ├── email (text, PK) + ├── userActorKey (text) + ├── updatedAt (integer) + ``` +6. **Implement the custom BetterAuth adapter** that routes operations through the index tables and user actors. +7. **Configure BetterAuth** with GitHub OAuth provider using existing `GITHUB_CLIENT_ID`, `GITHUB_CLIENT_SECRET` env vars. Wire `BETTER_AUTH_SECRET` for session signing and `BETTER_AUTH_URL` as the auth base URL. +8. **Keep `appSessions` table operational** — no reads/writes change yet. + +### Phase 2: Migrate OAuth flow to BetterAuth + +1. **Replace `startAppGithubAuth`** — delegate to BetterAuth's GitHub OAuth initiation instead of hand-rolling `buildAuthorizeUrl` + `oauthState` + `oauthStateExpiresAt`. +2. **Replace `completeAppGithubAuth`** — delegate to BetterAuth's callback handler. BetterAuth creates/updates the user record in the user actor and creates a signed session. The adapter writes to `sessionIndex` on the workspace actor. +3. **After BetterAuth callback completes**, populate `userProfiles` in the user actor with app-specific fields and enqueue the slow org sync (same background workflow pattern as today). +4. **Replace `signOutApp`** — delegate to BetterAuth session invalidation. Adapter removes entry from `sessionIndex`. +5. **Update `resolveSessionId`** in `index.ts` — validate the session via BetterAuth (which routes through the adapter → `sessionIndex` → user actor). BetterAuth verifies the signature and checks expiration. +6. **Keep `bootstrapAppGithubSession`** (dev-only) — adapt it to create a BetterAuth session from a raw token for local development. + +### Phase 3: Migrate reads to new tables + +1. **Update `getAppSnapshot`** — read user identity from BetterAuth's user table in the user actor, app-specific fields from `userProfiles`, and active org from `sessionState`. +2. **Update `selectOrganization`** — write to `sessionState` in the user actor instead of `appSessions`. +3. **Update `syncGithubOrganizations`** — write `eligibleOrganizationIds` to `userProfiles` in the user actor instead of `appSessions`. This fixes the multi-session divergence bug. +4. **Update onboarding actions** (`skipAppStarterRepo`, `starAppStarterRepo`) — write to `userProfiles` in the user actor instead of `appSessions`. +5. **Update `FoundryUser.id`** — use GitHub numeric ID (from BetterAuth's `account.providerAccountId`) instead of `user-${slugify(login)}`. + +### Phase 4: Frontend migration + +1. **Replace `x-foundry-session` header** with BetterAuth's session mechanism (likely a signed cookie or Authorization header, depending on BetterAuth config). +2. **Remove `foundrySession` URL param extraction** from `backend-client.ts` — BetterAuth handles post-OAuth session establishment via cookies. +3. **Remove `localStorage` session persistence** — BetterAuth manages this via HTTP-only cookies. +4. **Update `signInWithGithub`** — redirect to BetterAuth's auth endpoint instead of `/v1/auth/github/start`. + +### Phase 5: Cleanup + +1. **Drop `appSessions` table** (migration). +2. **Remove hand-rolled OAuth functions** from `app-shell.ts`: `ensureAppSession`, `updateAppSession`, `initGithubSession`, `encodeOauthState`, `decodeOauthState`, `requireAppSessionRow`, `requireSignedInSession`. +3. **Remove `buildAuthorizeUrl` and `exchangeCode`** from `GitHubAppClient` (keep `getViewer`, installation token methods, webhook verification). +4. **Update `foundry-self-hosting.mdx`** — document `BETTER_AUTH_SECRET` as required for session signing (already documented, now actually true). +5. **Remove `BETTER_AUTH_URL` fallback** from `app-shell-runtime.ts` — BetterAuth reads it directly. + +## Constraints + +- **Actor-routed adapter.** BetterAuth does not natively support per-user actor databases. The custom adapter must route every DB operation to the correct actor. This adds a layer of indirection and latency (actor handle resolution + message) on adapter calls. +- **Session index cost is mitigated by cookie caching.** With `cookieCache` enabled, BetterAuth validates sessions from a signed cookie on most requests — the adapter (and thus the `sessionIndex` lookup + user actor round-trip) is only called when the cache expires or on writes. Without caching, every authenticated request would hit the workspace actor's `sessionIndex` table then the user actor. +- **Two-actor write on session create/destroy.** Creating or destroying a session requires writing to both the user actor (BetterAuth's `session` table) and the workspace actor (`sessionIndex`). These must be consistent — if the user actor write succeeds but the index write fails, the session exists but is unreachable. +- **Background org sync pattern must be preserved.** The fast-path/slow-path split (`initGithubSession` returns immediately, `syncGithubOrganizations` runs in workflow queue) is critical for avoiding proxy timeout retries. BetterAuth handles the OAuth exchange, but the org sync stays as a background workflow. +- **`GitHubAppClient` is still needed.** BetterAuth replaces the OAuth user-auth flow, but installation tokens, webhook verification, repo listing, and org listing are GitHub App operations that BetterAuth does not cover. +- **User ID migration.** Changing user IDs from `user-${slugify(login)}` to GitHub numeric IDs affects `organizationMembers`, `seatAssignments`, and any cross-actor references to user IDs. Existing data needs a migration path. +- **`findUserByEmail` requires a secondary index.** BetterAuth sometimes looks up users by email (e.g., account linking). An `emailIndex` table on the workspace actor is needed. This must be kept in sync with the user actor's email field. + +## Risk Assessment + +- **Adapter call context — RESOLVED.** Research confirms BetterAuth adapter methods are plain async functions with no request context dependency. The adapter closes over the RivetKit registry at init time and resolves actor handles on demand. No ambient `c` context needed. +- **Hot-path latency — MITIGATED.** Cookie caching (`cookieCache` with `strategy: "compact"`) means most authenticated requests validate the session from a signed cookie without calling the adapter at all. The adapter (and thus the actor round-trip) is only hit when the cache expires (configurable, e.g., every 5 minutes) or on writes. This makes the session index + user actor lookup acceptable. +- **Two-actor consistency.** Session create/destroy touches two actors (user actor + workspace index). If either write fails, the system is in an inconsistent state. Recommended: write index first, then user actor. A dangling index entry pointing to a nonexistent session is benign — BetterAuth treats it as "session not found" and the user just re-authenticates. +- **Cookie vs header auth.** BetterAuth defaults to HTTP-only cookies (`better-auth.session_token`). The current system uses a custom `x-foundry-session` header with `localStorage`. BetterAuth supports `bearer` token mode for programmatic clients via its `bearer` plugin. Enable both for browser + API access. +- **Dev bootstrap flow.** `bootstrapAppGithubSession` bypasses the normal OAuth flow for local development. BetterAuth supports programmatic session creation via its internal adapter — the dev path can call the adapter's `create` method directly for the `session` and `account` models. +- **Actor lifecycle for users.** User actors are long-lived but low-traffic. RivetKit will idle/unload them. With cookie caching, cold-start only happens when the cache expires — not on every request. Acceptable. + +## Suggested Implementation Order + +1. **Phase 0 spike** — confirm adapter feasibility (go/no-go gate) +2. Phase 1 (user actor + adapter infrastructure, no behavior change) +3. Phase 2 (OAuth migration) +4. Phase 3 (read path migration) +5. Phase 4 (frontend migration) +6. Phase 5 (cleanup) + +Phases 2-4 can be deployed incrementally. Each phase should leave the system fully functional — no big-bang cutover. + +## Alternative: Fix Without BetterAuth + +If the BetterAuth + actor SQLite spike fails, the same goals can be achieved without BetterAuth: + +1. Extract `userProfiles` and `sessionState` tables (same as Phase 1). +2. Sign session tokens with HMAC using `BETTER_AUTH_SECRET` (rename to `SESSION_SECRET`). +3. Use GitHub numeric ID as user PK. +4. Keep the custom OAuth flow but thin it out. +5. Drop `appSessions` once migration is complete. + +This is more code to maintain but avoids the BetterAuth integration risk. diff --git a/foundry/research/specs/async-action-fixes/README.md b/foundry/research/specs/async-action-fixes/README.md new file mode 100644 index 0000000..1dae650 --- /dev/null +++ b/foundry/research/specs/async-action-fixes/README.md @@ -0,0 +1,56 @@ +# Async Action Fixes Handoff + +## Purpose + +This folder contains implementation briefs for removing long-running synchronous waits from Foundry request and action paths. + +Start with `00-end-to-end-async-realtime-plan.md`. It is the umbrella plan for the broader migration away from monolithic snapshots and polling, and it adds the missing runtime hardening and subscription-lifecycle work that the numbered implementation briefs did not previously cover. + +The governing policy now lives in `foundry/CLAUDE.md`: + +- always await `send(...)` +- default to `wait: false` +- only use `wait: true` for short, bounded mutations +- do not force repo/provider sync in read paths +- only block until the minimum client-needed resource exists + +## Shared Context + +- Backend actor entry points live under `foundry/packages/backend/src/actors`. +- Provider-backed long-running work lives under `foundry/packages/backend/src/providers`. +- The main UI consumers are: + - `foundry/packages/frontend/src/components/workspace-dashboard.tsx` + - `foundry/packages/frontend/src/components/mock-layout.tsx` + - `foundry/packages/client/src/remote/workbench-client.ts` +- Existing non-blocking examples already exist in app-shell GitHub auth/import flows. Use those as the reference pattern for request returns plus background completion. + +## Suggested Implementation Order + +1. `00-end-to-end-async-realtime-plan.md` +2. `01-task-creation-bootstrap-only.md` +3. `03-repo-actions-via-background-workflow.md` +4. `06-daytona-provisioning-staged-background-flow.md` +5. App shell realtime subscription work from `00-end-to-end-async-realtime-plan.md` +6. `02-repo-overview-from-cached-projection.md` +7. Workspace summary projection work from `00-end-to-end-async-realtime-plan.md` +8. `04-workbench-session-creation-without-inline-provisioning.md` +9. `05-workbench-snapshot-from-derived-state.md` +10. Task-detail direct subscription work from `00-end-to-end-async-realtime-plan.md` +11. `07-auth-identity-simplification.md` + +## Why This Order + +- Runtime hardening and the first async workflow items remove the highest-risk correctness and timeout issues first. +- App shell realtime is a smaller migration than the workbench and removes the current polling loop early. +- Workspace summary and task-detail subscription work are easier once long-running mutations already report durable background state. +- Auth simplification is important, but it should not block the snapshot/polling/runtime fixes. + +## Fresh Agent Checklist + +Before implementing any item: + +1. Read `foundry/CLAUDE.md` runtime and actor rules. +2. Read the specific item doc in this folder. +3. Confirm the current code paths named in that doc still match the repo. +4. Preserve actor single-writer ownership. +5. Prefer workflow status and push updates over synchronous completion.