getcompanion-ai/co-mono

Fork 0

mirror of https://github.com/getcompanion-ai/co-mono.git synced 2026-04-15 08:03:39 +00:00

Mario Zechner 2a0f239288 mom: remove dynamic timestamp from system prompt for better cache hits

2025-12-11 12:26:59 +01:00

16 KiB

Raw Blame History

Compaction Research & Redesign

Current Pi Compaction Implementation

Settings (defaults)

reserveTokens: 16384 - Buffer to leave for new responses
keepRecentTokens: 20000 - How many tokens of recent messages to keep

Trigger Conditions

Threshold: After each turn, if contextTokens > contextWindow - reserveTokens
Overflow: If LLM returns context overflow error, compact and retry

Current Process

Find cut point by walking backwards until keepRecentTokens accumulated
Generate single summary of everything before cut point
If cutting mid-turn, also generate "turn prefix summary"
Save CompactionEntry with summary and firstKeptEntryIndex

Current Prompt

You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- Absolute file paths of any relevant files that were read or modified
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue

Be concise, structured, and focused on helping the next LLM seamlessly continue the work.

maxTokens for Summarization

History summary: 0.8 * reserveTokens (≈13K tokens)
Turn prefix summary: 0.5 * reserveTokens (≈8K tokens)

Claude Code's Approach

Key Differences

Much more structured, detailed prompt
Uses <analysis> tags for chain-of-thought before summary
Uses <summary> tags for structured output
9-section format with explicit requirements
Supports custom summarization instructions via user input

Full Prompt (reconstructed from cli.js)

Your task is to create a detailed summary of the conversation so far, paying close attention to the user's explicit requests and your previous actions.
This summary should be thorough in capturing technical details, code patterns, and architectural decisions that would be essential for continuing development work without losing context.

Before providing your final summary, wrap your analysis in <analysis> tags to organize your thoughts and ensure you've covered all necessary points. In your analysis process:

1. Chronologically analyze each message and section of the conversation. For each section thoroughly identify:
   - The user's explicit requests and intents
   - Your approach to addressing the user's requests
   - Key decisions, technical concepts and code patterns
   - Specific details like:
     - file names
     - full code snippets
     - function signatures
     - file edits
   - Errors that you ran into and how you fixed them
   - Pay special attention to specific user feedback that you received, especially if the user told you to do something differently.
2. Double-check for technical accuracy and completeness, addressing each required element thoroughly.

Your summary should include the following sections:

1. Primary Request and Intent: Capture all of the user's explicit requests and intents in detail
2. Key Technical Concepts: List all important technical concepts, technologies, and frameworks discussed.
3. Files and Code Sections: Enumerate specific files and code sections examined, modified, or created. Pay special attention to the most recent messages and include full code snippets where applicable and include a summary of why this file read or edit is important.
4. Errors and fixes: List all errors that you ran into, and how you fixed them. Pay special attention to specific user feedback that you received, especially if the user told you to do something differently.
5. Problem Solving: Document problems solved and any ongoing troubleshooting efforts.
6. All user messages: List ALL user messages that are not tool results. These are critical for understanding the users' feedback and changing intent.
7. Pending Tasks: Outline any pending tasks that you have explicitly been asked to work on.
8. Current Work: Describe in detail precisely what was being worked on immediately before this summary request, paying special attention to the most recent messages from both user and assistant. Include file names and code snippets where applicable.
9. Optional Next Step: List the next step that you will take that is related to the most recent work you were doing. IMPORTANT: ensure that this step is DIRECTLY in line with the user's most recent explicit requests, and the task you were working on immediately before this summary request. If your last task was concluded, then only list next steps if they are explicitly in line with the users request. Do not start on tangential requests or really old requests that were already completed without confirming with the user first.
   If there is a next step, include direct quotes from the most recent conversation showing exactly what task you were working on and where you left off. This should be verbatim to ensure there's no drift in task interpretation.

<example>
<analysis>
[Your thought process, ensuring all points are covered thoroughly and accurately]
</analysis>

<summary>
1. Primary Request and Intent:
   [Detailed description]

2. Key Technical Concepts:
   - [Concept 1]
   - [Concept 2]
   - [...]

3. Files and Code Sections:
   - [File Name 1]
      - [Summary of why this file is important]
      - [Summary of the changes made to this file, if any]
      - [Important Code Snippet]
   - [...]

4. Errors and fixes:
    - [Detailed description of error 1]:
      - [How you fixed the error]
      - [User feedback on the error if any]
    - [...]

5. Problem Solving:
   [Description of solved problems and ongoing troubleshooting]

6. All user messages: 
    - [Detailed non tool use user message]
    - [...]

7. Pending Tasks:
   - [Task 1]
   - [Task 2]
   - [...]

8. Current Work:
   [Precise description of current work]

9. Optional Next Step:
   [Optional Next step to take]

</summary>
</example>

There may be additional summarization instructions provided in the included context. If so, remember to follow these instructions when creating the above summary. Examples of instructions include:
<example>
## Compact Instructions
When summarizing the conversation focus on typescript code changes and also remember the mistakes you made and how you fixed them.
</example>

Additional Features

Supports custom instructions: When you are using compact - please focus on test output and code changes. Include file reads verbatim.
Post-processes to extract <analysis> and <summary> sections
Has "microcompact" for tool results (abbreviated tool outputs)

OpenAI Codex's Approach

Compaction Prompt (`codex-rs/core/templates/compact/prompt.md`)

You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue

Be concise, structured, and focused on helping the next LLM seamlessly continue the work.

Summary Prefix (`codex-rs/core/templates/compact/summary_prefix.md`)

Another language model started to solve this problem and produced a summary of its thinking process. You also have access to the state of the tools that were used by that language model. Use this to build on the work that has already been done and avoid duplicating work. Here is the summary produced by the other language model, use the information in this summary to assist with your own analysis:

Notes

Very similar to our current prompt (likely we derived from same source)
Supports custom compact_prompt override in config
Has experimental_compact_prompt_file for loading from file

SST OpenCode's Approach

Compaction System Prompt (`session/prompt/compaction.txt`)

You are a helpful AI assistant tasked with summarizing conversations.

When asked to summarize, provide a detailed but concise summary of the conversation. 
Focus on information that would be helpful for continuing the conversation, including:
- What was done
- What is currently being worked on
- Which files are being modified
- What needs to be done next
- Key user requests, constraints, or preferences that should persist
- Important technical decisions and why they were made

Your summary should be comprehensive enough to provide context but concise enough to be quickly understood.

User Message for Compaction

Provide a detailed prompt for continuing our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next considering new session will not have access to our conversation.

Short Summary Prompt (`session/prompt/summarize.txt`)

Summarize the following conversation into 2 sentences MAX explaining what the assistant did and why
Do not explain the user's input.
Do not speak in the third person about the assistant.

Additional Features

Pruning: Goes backwards through parts, after 40K tokens of tool calls, erases output of older tool calls
Prune thresholds: PRUNE_MINIMUM = 20_000, PRUNE_PROTECT = 40_000
Marks tool outputs as compacted with timestamp to avoid re-pruning

Factory Droid's Approach (from binary strings)

Scratchpad Feature

From extracted strings:

Edit the session scratchpad using multiple operations in a single call. Operations can be str_replace, insert, or overwrite commands and are applied in order. The scratchpad is working memory that persists when conversation history is compacted or summarized.

Summary Guidance

Once you are done with the task, you can summarize the changes you made in a 1-4 sentences, don't go into too much detail.

Compaction Model

Uses external model for summarization with configurable providers (Anthropic, OpenAI, generic chat completion API).

Proposed Slice-Based Compaction

Concept

Instead of summarizing the entire history in one call:

Segment session into slices (possibly overlapping)
Summarize each slice with budget = 1/10th of slice token count
Stitch slice summaries together into unified summary

Benefits

More parallelizable (summarize slices concurrently)
Less risk of losing detail in long sessions
Better "compression ratio" control per slice
Overlapping slices can preserve continuity/context
Can prioritize recent slices with larger budgets

Proposed Algorithm

interface SliceConfig {
  sliceTokens: number;       // Target tokens per slice (e.g., 20K)
  overlapTokens: number;     // Overlap between slices (e.g., 2K)
  compressionRatio: number;  // Summary budget as fraction of slice (e.g., 0.1)
  recentBoost: number;       // Multiplier for most recent slice budget (e.g., 2.0)
}

async function sliceBasedCompaction(
  messages: Message[],
  config: SliceConfig
): Promise<string> {
  // 1. Segment into slices
  const slices = segmentIntoSlices(messages, config.sliceTokens, config.overlapTokens);
  
  // 2. Calculate budget per slice
  const budgets = slices.map((slice, i) => {
    const base = estimateTokens(slice) * config.compressionRatio;
    // Boost recent slices
    const isRecent = i >= slices.length - 2;
    return Math.floor(isRecent ? base * config.recentBoost : base);
  });
  
  // 3. Summarize slices in parallel
  const summaries = await Promise.all(
    slices.map((slice, i) => summarizeSlice(slice, budgets[i], i, slices.length))
  );
  
  // 4. Stitch summaries together
  return stitchSummaries(summaries);
}

Slice Summarization Prompt (per slice)

You are summarizing slice ${sliceIndex + 1} of ${totalSlices} from a coding session.

${sliceIndex === 0 ? 'This is the BEGINNING of the session.' : ''}
${sliceIndex === totalSlices - 1 ? 'This is the MOST RECENT activity.' : ''}

Summarize the key information in this slice:
- User requests and intent changes
- Files read, created, or modified (with paths)
- Key code changes or patterns
- Errors encountered and how they were resolved
- Decisions made and their rationale

${sliceIndex === totalSlices - 1 ? `
For the most recent slice, also include:
- Current work in progress
- Exact state of any pending tasks
- Next steps that were planned
` : ''}

Be precise and technical. Preserve file paths and important code snippets.
Budget: approximately ${budget} tokens.

Stitching Prompt

You have ${summaries.length} chronological slice summaries from a coding session.
Combine them into a single coherent handoff summary for another LLM.

Requirements:
- Preserve chronological flow
- Deduplicate information that appears in overlapping sections  
- Emphasize the most recent work and next steps
- Keep all file paths and critical code snippets
- Total budget: ${totalBudget} tokens

Slice summaries:
${summaries.map((s, i) => `--- Slice ${i + 1} ---\n${s}`).join('\n\n')}

Comparison Table

Feature	Pi (Current)	Claude Code	OpenAI Codex	SST OpenCode
Prompt detail	Basic	Very detailed	Basic	Medium
Structured output	No	Yes ( )	No	No
Chain-of-thought	No	Yes ()	No	No
Custom instructions	Yes	Yes	Yes (config)	No
Tool output pruning	No	Yes (microcompact)	No	Yes
Parallel summarization	No	No	No	No
Scratchpad/persistent memory	No	No	No	No

Test Harness

A CLI test tool is available at test/compaction-strategies.ts to compare strategies:

npx tsx test/compaction-strategies.ts before-compaction
npx tsx test/compaction-strategies.ts large-session

This outputs results to test/compaction-results/[fixture]-[strategy].md and a comparison file.

Implemented Strategies

single-shot: Current approach, one LLM call with full transcript
parallel-stitch: Slice into chunks, summarize in parallel, LLM-merge results
sequential-accumulated: Slice into chunks, summarize each with all previous summaries as context
sequential-rolling: Slice into chunks, each call updates/rewrites the running summary

Example Results (30K token session, 4 slices)

Strategy	Input Tokens	Output Tokens	API Calls	Time (ms)
single-shot	35706	1284	1	31914
parallel-stitch	37850	3087	5	34010
sequential-accumulated	39136	2996	4	66907
sequential-rolling	38873	4557	4	98032

Observations:

single-shot: Fastest, simplest, but entire context in one call
parallel-stitch: Similar wall-clock (parallel), needs extra stitch call
sequential-accumulated: 2x time, but each slice knows full prior context
sequential-rolling: Slowest, most output (rewrites summary each time)

Recommendations

Short Term

Improve prompt: Adopt Claude Code's structured format with sections
Add pruning: Implement tool output pruning like OpenCode (mark old outputs as compacted)
Better token estimation: Use actual tokenizer instead of chars/4 heuristic

Medium Term

Slice-based compaction: Implement parallel slice summarization
Persistent scratchpad: Add working memory that survives compaction
Custom instructions: Support user-provided compaction focus

Long Term

Semantic chunking: Use embeddings to find natural break points
Importance scoring: Weight messages by relevance to current task
Incremental compaction: Compact older portions while keeping recent detailed

16 KiB Raw Blame History

Compaction Research & Redesign

Current Pi Compaction Implementation

Settings (defaults)

Trigger Conditions

Current Process

Current Prompt

maxTokens for Summarization

Claude Code's Approach

Key Differences

Full Prompt (reconstructed from cli.js)

Additional Features

OpenAI Codex's Approach

Compaction Prompt (codex-rs/core/templates/compact/prompt.md)

Summary Prefix (codex-rs/core/templates/compact/summary_prefix.md)

Notes

SST OpenCode's Approach

Compaction System Prompt (session/prompt/compaction.txt)

User Message for Compaction

Short Summary Prompt (session/prompt/summarize.txt)

Additional Features

Factory Droid's Approach (from binary strings)

Scratchpad Feature

Summary Guidance

Compaction Model

Proposed Slice-Based Compaction

Concept

Benefits

Proposed Algorithm

Slice Summarization Prompt (per slice)

Stitching Prompt

Comparison Table

Test Harness

Implemented Strategies

Example Results (30K token session, 4 slices)

Recommendations

Short Term

Medium Term

Long Term

16 KiB

Raw Blame History

Compaction Prompt (`codex-rs/core/templates/compact/prompt.md`)

Summary Prefix (`codex-rs/core/templates/compact/summary_prefix.md`)

Compaction System Prompt (`session/prompt/compaction.txt`)

Short Summary Prompt (`session/prompt/summarize.txt`)