Skip to content

fix(tracing): close trace corruption in long-running sessions (#865)#867

Merged
anandgupta42 merged 4 commits into
mainfrom
investigate/trace-corruption
Jun 1, 2026
Merged

fix(tracing): close trace corruption in long-running sessions (#865)#867
anandgupta42 merged 4 commits into
mainfrom
investigate/trace-corruption

Conversation

@sahrizvi

@sahrizvi sahrizvi commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #865.

Two confirmed concurrency bugs in the trace pipeline that caused trace files in long-running sessions to either lose events (M2) or show a stale terminating status instead of "crashed" (M3). Both fire 100% of the time under realistic conditions in the reproducers added in this PR. The third architecturally-related bug (M1) remains as a documented design hazard; commit 38463876b's existing protection holds against it on local SSD.

See #865 for the full reproduction matrix and mechanism writeups.

What changed

M2 — debounce drops events with no follow-up snapshot

Trace.snapshot() returned early when snapshotPending=true and was never re-scheduled after the in-flight write completed. In bursty turns (an LLM step firing several tool calls back-to-back), the disk file lagged memory until a fresh event arrived in a future turn. If the process exited in the gap, the burst's tail was lost. Natural reproducer fired 20/20 trials, losing 7 of 8 burst events each.

Fix: track snapshotRequestedDuringPending whenever snapshot() short-circuits. In .finally(), if the flag is set and the trace is neither crashed nor ending, schedule exactly one follow-up snapshot via queueMicrotask. Bounded — at most one extra write per "burst → quiet" cycle regardless of burst size.

M3 — FileExporter.export() races flushSync with no crashed guard

endTrace() calls exporter.export(), which does writeFile(tmp)rename(tmp, final). On a multi-MB trace the writeFile takes 100+ms, a wide window for flushSync to interleave. The crashed flag added in 38463876b only guarded Trace.snapshot(), not FileExporter.export(), so flushSync's synchronous crashed write was overwritten by the export's rename. Natural reproducer fired 10/10 trials with ~52 MB traces.

Fix: give FileExporter a private _crashed flag with markCrashed(). Check it at entry, before the writeFile, and before the rename (drop tmp + bail). flushSync iterates exporters and calls markCrashed() on each FileExporter before its own synchronous write.

M2 companion — endTraceStarted gate

Once endTrace() claims the canonical write, no concurrent snapshot may run. Without this, the M2 follow-up's queued microtask runs after await snapshotPromise in endTrace but before the root-span endTime mutation, capturing pre-end state and racing to be last writer.

M1 — NOT closed in this PR

The TOCTOU between the synchronous if (this.crashed) check and the asynchronous kernel fs.rename syscall remains. The existing crashed flag from 38463876b protects against it on local SSD (microsecond rename window) and is verified by M1-natural in the new test file (5/5 trials still preserved crashed status under preload + flushSync). Theoretical exposure remains on slow/network filesystems. Documented as a known design hazard in #865; deterministic reproducer kept as test.skip for local experimentation.

Tests

  • New: packages/opencode/test/altimate/tracing-rename-race.test.ts — reproducers + regressions for M1/M2/M3.
    • M2-natural: was 20/20 drops → 0/20 after fix
    • M3-natural: was 10/10 clobbered → 0/10 after fix
    • M2 deterministic (injected fs.rename delay): also passes after fix (extended wait for follow-up snapshot)
    • M1-natural: regression against existing crashed protection (0/5 clobbered)
    • M1 residual: test.skip — documents the architectural TOCTOU
  • Updated: packages/opencode/test/altimate/tracing-display-crash.test.ts:500. The pre-existing test flushSync then endTrace — endTrace overwrites crashed status asserted the opposite of the new policy. Updated to assert that flushSync's crashed status is preserved (M3 fix's intent).

Verification

  • Full opencode test suite: 365 pass, 40 skip, 0 fail in the tracing dir (14 test files).
  • Reduces baseline suite failures from 10 → 5; the 5 remaining failures are pre-existing in unrelated areas (tool.registry, sql_analyze, validator E2E) and unchanged by this PR.

Test plan

  • All tracing tests pass locally
  • Natural reproducers (M2, M3) confirm 0 hits after fix
  • M2 deterministic test passes with extended wait
  • M1 protection regression test still passes (existing crashed flag holds)
  • No new failures elsewhere in the suite

🤖 Generated with Claude Code


Summary by cubic

Fixes trace corruption in long-running sessions by preserving crash writes and catching up dropped snapshots. Also corrects a failing sql_analyze test assertion that was blocking CI.

  • Bug Fixes
    • M2: Trace.snapshot() now schedules one follow-up when a call is dropped during an in-flight write; flush() drains this follow-up so disk matches memory, and snapshots stop once endTrace() begins.
    • M3: FileExporter adds per-session markCrashed(sessionId) (with sanitization) and checks at entry, pre-write, and pre-rename; TraceExporter supports optional markCrashed; shared-exporter regression fixed so one session’s crash doesn’t suppress others; endTrace() returns the crashed file path when a crash occurs.
    • Tests: S27 sql_analyze assertion fixed to not expect schema_inspect when an error is returned, unblocking CI.

Written for commit 49695c7. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • Bug Fixes

    • Crash exports are preserved: synchronous crash writes are no longer clobbered by concurrent async exports; finalization suppresses further async snapshots and returns the preserved crash output.
    • Snapshot concurrency hardened so snapshot requests during in-flight writes are queued and flushed before completion.
  • Tests

    • Expanded regression suite and updated tests cover rename/race scenarios, flushSync behavior, cross-session exporter cases, timing variations, and a real-run failure expectation.

…race corruption

Two confirmed concurrency bugs in `Trace.snapshot()` and `FileExporter.export()`
that caused trace files in long-running sessions to either lose events (M2) or
show a stale terminating status instead of "crashed" (M3). See umbrella issue
#865 for the full reproduction matrix.

M2 — debounce drops events with no follow-up snapshot
  `snapshot()` returns early when `snapshotPending=true` and was never
  re-scheduled after the in-flight write completed. In bursty turns (an LLM
  step that fires 5-10 tool calls back-to-back), 7 of 8 events would be
  dropped from disk until a fresh event arrived in a future turn. If the
  process exited in the gap, the burst's tail was lost permanently. Natural
  reproducer fired 20/20 trials.

  Fix: track `snapshotRequestedDuringPending` whenever `snapshot()` short-
  circuits. In `.finally()`, if the flag is set and the trace is neither
  crashed nor ending, schedule exactly one follow-up snapshot via
  `queueMicrotask`. Bounded: at most one extra write per "burst → quiet"
  cycle regardless of burst size.

M3 — FileExporter.export() races flushSync with no crashed guard
  `endTrace()` calls `exporter.export()`, which does
  writeFile(tmp) → rename(tmp, final). The writeFile of a multi-MB trace
  takes 100+ms, a wide window for `flushSync` to interleave. The `crashed`
  flag added in 3846387 only guarded `snapshot()`, not `export()`, so
  flushSync's synchronous crashed write was overwritten by the export's
  rename. Natural reproducer fired 10/10 trials with ~52 MB traces.

  Fix: give `FileExporter` a private `_crashed` flag with `markCrashed()`.
  Check it at entry, before the writeFile, and before the rename (drop tmp
  and bail in the last case). `flushSync` iterates exporters and calls
  `markCrashed()` on each `FileExporter` BEFORE its own synchronous write.

M2 companion — endTraceStarted gate
  Once `endTrace()` claims the canonical write, no concurrent snapshot may
  run. Without this, the M2 follow-up's queued microtask runs after the
  `await snapshotPromise` in endTrace but BEFORE the root-span endTime
  mutation, capturing pre-end state and racing to be last writer. Gate the
  follow-up against an `endTraceStarted` flag set at endTrace entry.

M1 (architectural residual) — NOT closed in this PR
  The TOCTOU between the synchronous `if (this.crashed)` check and the
  asynchronous kernel `fs.rename` syscall remains. The existing `crashed`
  flag from 3846387 protects against it on local SSD (microsecond rename
  window). It can theoretically fire on slow/network filesystems. Documented
  as a known design hazard in the umbrella issue. Reproducer kept as
  `test.skip` for local experimentation.

Tests:
  - `test/altimate/tracing-rename-race.test.ts` — reproducers/regressions
    for all three mechanisms. M2-natural was 20/20 → now 0/20; M3-natural
    was 10/10 → now 0/10. M1-natural verifies existing protection.
  - `test/altimate/tracing-display-crash.test.ts:500` updated:
    `flushSync then endTrace` previously asserted endTrace overwrote
    flushSync's crashed status. Under the new policy that assertion is
    inverted — flushSync wins.

Closes #865

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ea7f95a2-6eed-445e-8a55-2c69883fb9b8

📥 Commits

Reviewing files that changed from the base of the PR and between 8a602f9 and 49695c7.

📒 Files selected for processing (1)
  • packages/opencode/test/session/real-tool-simulation.test.ts

📝 Walkthrough

Walkthrough

Adds exporter crash signaling, per-session crash guards, snapshot follow-up scheduling, endTrace/flushSync coordination, and a regression test suite to prevent async-rename and debounce races from overwriting or losing trace events.

Changes

Trace file concurrency corruption safeguards

Layer / File(s) Summary
FileExporter crash guard (M3 mitigation)
packages/opencode/src/altimate/observability/tracing.ts
Adds optional TraceExporter.markCrashed(sessionId), FileExporter tracks per-session crashedSessions, export() returns early when session is marked crashed, and re-checks crash status after tmp write to delete tmp instead of renaming when superseded.
Trace snapshot state flags for coordination
packages/opencode/src/altimate/observability/tracing.ts
Introduces snapshotRequestedDuringPending and endTraceStarted internal flags to coordinate snapshot debouncing and to suppress snapshots during endTrace finalization.
Snapshot debounce with follow-up (M2 mitigation)
packages/opencode/src/altimate/observability/tracing.ts
snapshot() sets a follow-up flag when called during an in-flight snapshot, refuses to start new snapshots after endTraceStarted, resets the flag when starting async snapshot, and .finally schedules at most one microtask follow-up snapshot when needed; flush() drains in-flight and queued microtask follow-ups.
EndTrace barrier and FlushSync crash coordination
packages/opencode/src/altimate/observability/tracing.ts
endTrace() sets endTraceStarted = true before awaiting in-flight snapshots and short-circuits to crash output if already crashed; flushSync() calls markCrashed() on exporters before synchronously writing the canonical crashed trace file.
Tests and regression suite
packages/opencode/test/altimate/tracing-display-crash.test.ts, packages/opencode/test/altimate/tracing-rename-race.test.ts, packages/opencode/test/session/real-tool-simulation.test.ts
Updated crash-display test to assert flushSync's crashed output persists; added regression tests (deterministic and natural) covering M1/M2/M3 vectors, shared-exporter behavior, sanitization parity, baseline sanity checks, and adjusted a sql_analyze parse-error expectation.

Sequence Diagram

sequenceDiagram
  participant Event as Tool Event
  participant Trace
  participant FileExporter
  participant FS as fs operations
  participant flushSync as flushSync()
  Event->>Trace: snapshot()/events
  Trace->>FileExporter: export(trace) (async)
  FileExporter->>FS: writeFile(tmpPath)
  alt new events during write
    Trace->>Trace: snapshotRequestedDuringPending = true
  end
  FileExporter->>FileExporter: re-check crashedSessions
  alt crashedSessions contains session
    FileExporter->>FS: unlink(tmpPath)
  else
    FileExporter->>FS: rename(tmpPath, canonical)
  end
  flushSync->>FileExporter: markCrashed(sessionId)
  Trace->>Trace: .finally schedules queueMicrotask(snapshot) if snapshotRequestedDuringPending and not endTraceStarted
Loading

🎯 4 (Complex) | ⏱️ ~45 minutes

🐰
A crash was loud, the write held fast,
Debounce queued the missing past,
Exporters heeded the crashed-state call,
Follow-up snapshot caught the stalled sprawl,
No more races that overwrite all.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description includes all required sections (Summary, Test Plan, Checklist) with comprehensive details about what changed, why, test coverage, and verification results. However, it is missing the required 'PINEAPPLE' keyword that must appear at the top per the template for AI-generated contributions. Add 'PINEAPPLE' at the very top of the PR description before any other content, as required by the template for AI-generated contributions.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: fixing trace corruption (data loss and stale status) in long-running sessions through concurrency bug fixes.
Linked Issues check ✅ Passed The PR addresses all three mechanisms (M1/M2/M3) documented in issue #865. M2 and M3 are fully fixed with concurrency guards, bounded snapshots, and crash tracking. M1 remains documented as an architectural hazard with a skipped test for future work, as specified in the issue requirements.
Out of Scope Changes check ✅ Passed All changes are in-scope: core M2/M3 tracing fixes, new regression/reproducer tests, an existing test update asserting the M3 fix behavior, and a CI-unblocking test assertion correction for sql_analyze. All relate directly to issue #865 objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch investigate/trace-corruption

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

1 similar comment
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/observability/tracing.ts">

<violation number="1" location="packages/opencode/src/altimate/observability/tracing.ts:176">
P2: `FileExporter.markCrashed()` sets a permanent exporter-wide flag, which can disable exports for later traces when the same exporter instance is reused.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread packages/opencode/src/altimate/observability/tracing.ts Outdated
Addresses the consensus code review of #867 (6 reviewers, 4 of 5 successful
externals + Claude flagged the same critical bug at MAJOR/CRITICAL).

CRITICAL fix — per-session crash tracking on FileExporter
  The original PR put `_crashed` as instance state on FileExporter. In the
  TUI worker (worker.ts:58-74,96-97), a single FileExporter is cached and
  shared across every session — the spread `[...tracingExporters]` copies
  the array, not the inner objects, and `Trace.withExporters` only allocates
  a fresh FileExporter when `options.maxFiles != null` (rarely set in user
  config). One session's flushSync would permanently mark the shared
  exporter crashed, silently suppressing endTrace exports for every
  subsequent session in the worker — a worse regression than the bug the
  original PR was fixing.

  Refactor: `private crashedSessions = new Set<string>()` on FileExporter,
  with `markCrashed(sessionId: string)`. Export checks
  `crashedSessions.has(trace.sessionId)` at all three checkpoints. The
  guard now scopes to the actual crashing trace, not the shared instance.

  Regression test in tracing-rename-race.test.ts: shared FileExporter,
  Trace A flushSync, Trace B endTrace — asserts B's file lands with
  status:"completed". Would have caught the original regression.

MAJOR fix — endTrace return value after flushSync
  `endTrace()` short-circuits at entry when `this.crashed` is set and
  returns `getTracePath()` instead of letting export() bail and returning
  undefined. Restores the API contract that endTrace returns a usable file
  path whenever the canonical file exists. Adopts the "crash wins
  everywhere" policy — once flushSync has fired, the crashed write is
  authoritative across all exporters; no further exports run.

MAJOR fix — TraceExporter interface adds optional markCrashed
  Replaces the `instanceof FileExporter` dispatch in flushSync with a
  polymorphic `exp.markCrashed?.(this.sessionId)`. Adds optional
  `markCrashed?(sessionId: string): void` to the TraceExporter interface.
  Custom exporters with canonical-file semantics can now participate in
  the crash guard without modifying Trace. Drops the brittle instanceof
  check that broke if FileExporter was ever subclassed or duplicated
  across module boundaries.

Minor fixes (PR-introduced issues only)
  - flush() now drains M2 follow-up snapshots (bounded loop, 16 iters max)
    so callers see disk == memory after it returns. The previous version
    only awaited the current promise; a queued follow-up would land after
    flush() resolved.
  - Microtask-ordering invariant documented inline at endTrace entry —
    explains why setting endTraceStarted before the await is airtight.
  - Test afterEach safety guard restores fs.rename even if a prior test
    panicked mid-patch, preventing cross-file leakage of the patched fn.
  - Misleading comment in M2 deterministic test (claimed "no follow-up
    scheduled") updated to reflect post-fix behavior.
  - FileExporter naming: `_crashed` → `crashedSessions` (drops the
    inconsistent underscore prefix; the bare-name style matches Trace.crashed).
  - markCrashed docstring clarified: "bail at the next checkpoint" rather
    than overclaiming retroactive cancellation.

Tests:
  - 366 pass (added shared-FileExporter regression test); 0 fail in the
    tracing suite. Full opencode suite has only pre-existing unrelated
    failures (tool.registry, sql_analyze, E2E validators).

Pre-existing issue noted but NOT fixed in this commit:
  - `Trace.withExporters` mutates the caller's array (tracing.ts:413-414).
    GLM-5.1 surfaced this during review. It's pre-existing code (not
    introduced by this PR series) and unrelated to the M2/M3 fixes. Filing
    as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@sahrizvi

sahrizvi commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Consensus review addressed (6 reviewers + Claude)

Pushed commit 0957fd888 addressing the consensus code review. Panel: Claude + Gemini 3.1 Pro + MiniMax M2.7 + GPT 5.4 Codex + GLM-5.1 + MiMo V2.5 Pro (Kimi K2.5 and Qwen 3.6 failed with upstream API/agent-loop bugs).

Blocking issues — fixed

# Issue Fix
1 CRITICAL: FileExporter._crashed shared across sessions in worker (one crash poisons all subsequent sessions in the TUI worker) Refactored to crashedSessions: Set<string> keyed by sessionId. markCrashed(sessionId) scopes the guard to the actual crashing trace.
2 MAJOR: endTrace() returns undefined after flushSync() (API regression) Short-circuit at endTrace entry: when this.crashed, return getTracePath(). Adopts "crash wins everywhere" — no further exports run.
3 MAJOR: instanceof FileExporter couples Trace to concrete class Added optional markCrashed?(sessionId: string): void to TraceExporter interface; flushSync now dispatches polymorphically.

PR-introduced minor issues — fixed

  • flush() now drains M2 follow-up snapshots (bounded loop) so callers see disk == memory
  • Inline comment documenting the endTraceStarted microtask-ordering invariant
  • afterEach safety guard for fs.rename patch restoration
  • Misleading comment in M2 deterministic test updated
  • _crashedcrashedSessions naming consistency
  • markCrashed docstring clarified ("bail at next checkpoint")

Not fixed in this commit

  • Pre-existing: Trace.withExporters mutates the caller's array (tracing.ts:413-414) — surfaced by GLM-5.1 but pre-existing code, unrelated to the M2/M3 fixes. Filing as a follow-up.

Regression test for the critical fix

test/altimate/tracing-rename-race.test.ts: "shared FileExporter — flushSync on trace A must NOT suppress trace B's export". Constructs the exact worker scenario (one FileExporter, two Trace instances) and asserts that A's crash doesn't leak into B's normal export.

Test results

  • Tracing suite: 366 pass, 40 skip, 0 fail (up from 365 pass — added the new regression test)
  • Full opencode suite: same 5-6 pre-existing failures in unrelated areas (tool.registry, sql_analyze, E2E validators) — none new, none in tracing

Ready for re-review.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

1 similar comment
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/opencode/src/altimate/observability/tracing.ts (1)

191-205: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize the crash-guard key before storing/checking it.

Line 191 stores the raw sessionId, but Line 204 checks against trace.sessionId, which Trace.buildTraceFile() sanitizes on Line 778. For ids containing /, \, ., or :, markCrashed() won't match the later export() checks, so the async rename can still overwrite the synchronous crashed file.

💡 Proposed fix
 export class FileExporter implements TraceExporter {
   readonly name = "file"
   private dir: string
   private maxFiles: number
   private crashedSessions = new Set<string>()
+
+  private static normalizeSessionId(sessionId: string | undefined) {
+    return (sessionId ?? "unknown").replace(/[/\\.:]/g, "_") || "unknown"
+  }
+
   /** Mark a session's exports as superseded by a synchronous crash write.
    *  Subsequent or in-flight export() calls for that session bail at the
    *  next checkpoint (entry, pre-writeFile, or pre-rename). Idempotent. */
   markCrashed(sessionId: string) {
-    if (sessionId) this.crashedSessions.add(sessionId)
+    this.crashedSessions.add(FileExporter.normalizeSessionId(sessionId))
   }

   async export(trace: TraceFile): Promise<string | undefined> {
-    if (this.crashedSessions.has(trace.sessionId)) return undefined
+    const safeId = FileExporter.normalizeSessionId(trace.sessionId)
+    if (this.crashedSessions.has(safeId)) return undefined
     let tmpPath: string | undefined
     try {
       await fs.mkdir(this.dir, { recursive: true })
-      const safeId = (trace.sessionId ?? "unknown").replace(/[/\\.:]/g, "_") || "unknown"
       const filePath = path.join(this.dir, `${safeId}.json`)
       tmpPath = filePath + `.tmp.${Date.now()}.${Math.random().toString(36).slice(2, 8)}`
-      if (this.crashedSessions.has(trace.sessionId)) return undefined
+      if (this.crashedSessions.has(safeId)) return undefined
       await fs.writeFile(tmpPath, JSON.stringify(trace, null, 2))
-      if (this.crashedSessions.has(trace.sessionId)) {
+      if (this.crashedSessions.has(safeId)) {
         await fs.unlink(tmpPath).catch(() => {})
         return undefined
       }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/altimate/observability/tracing.ts` around lines 191 -
205, markCrashed stores the raw sessionId but export checks the sanitized id
produced by Trace.buildTraceFile, so IDs with characters like / \ . : won't
match and the crash-guard can fail; fix by normalizing the key consistently when
storing and checking in the crashedSessions set (i.e., call the same
sanitization used by Trace.buildTraceFile or a shared Trace.sanitizeSessionId
helper) inside markCrashed and before the has() check in export so both use the
identical sanitized session id.
packages/opencode/test/altimate/tracing-rename-race.test.ts (1)

30-43: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use tmpdir() instead of manual temp-dir lifecycle.

This test still manages os.tmpdir(), fs.mkdir, and afterEach cleanup manually. Please switch the fixture setup to await using with tmpdir() so cleanup is automatic and consistent across the test suite.

As per coding guidelines, "Use the tmpdir function from fixture/fixture.ts to create temporary directories for tests with automatic cleanup in test files" and "Always use await using syntax with tmpdir() for automatic cleanup when the variable goes out of scope".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/altimate/tracing-rename-race.test.ts` around lines 30
- 43, The test manually creates and cleans tmpDir in beforeEach/afterEach and
manipulates originalRename; replace this with the tmpdir fixture by importing
tmpdir from fixture/fixture.ts and using "await using tmp = tmpdir()" (or
similar name) inside the test setup so the directory lifecycle is automatic;
remove the manual fs.mkdir, os.tmpdir usage, and the afterEach cleanup/restore
block that checks (fs as any).rename !== originalRename, and update references
from tmpDir to the tmpdir fixture variable; ensure originalRename handling is
preserved only if still needed for fs.rename patching but scope it inside the
test so teardown is automatic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/opencode/src/altimate/observability/tracing.ts`:
- Around line 191-205: markCrashed stores the raw sessionId but export checks
the sanitized id produced by Trace.buildTraceFile, so IDs with characters like /
\ . : won't match and the crash-guard can fail; fix by normalizing the key
consistently when storing and checking in the crashedSessions set (i.e., call
the same sanitization used by Trace.buildTraceFile or a shared
Trace.sanitizeSessionId helper) inside markCrashed and before the has() check in
export so both use the identical sanitized session id.

In `@packages/opencode/test/altimate/tracing-rename-race.test.ts`:
- Around line 30-43: The test manually creates and cleans tmpDir in
beforeEach/afterEach and manipulates originalRename; replace this with the
tmpdir fixture by importing tmpdir from fixture/fixture.ts and using "await
using tmp = tmpdir()" (or similar name) inside the test setup so the directory
lifecycle is automatic; remove the manual fs.mkdir, os.tmpdir usage, and the
afterEach cleanup/restore block that checks (fs as any).rename !==
originalRename, and update references from tmpDir to the tmpdir fixture
variable; ensure originalRename handling is preserved only if still needed for
fs.rename patching but scope it inside the test so teardown is automatic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b9caca4e-62ea-48f1-9703-498fc2072aff

📥 Commits

Reviewing files that changed from the base of the PR and between 20bfcd9 and 0957fd8.

📒 Files selected for processing (3)
  • packages/opencode/src/altimate/observability/tracing.ts
  • packages/opencode/test/altimate/tracing-display-crash.test.ts
  • packages/opencode/test/altimate/tracing-rename-race.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opencode/test/altimate/tracing-display-crash.test.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread packages/opencode/src/altimate/observability/tracing.ts
…kup key (cubic P1)

cubic-dev-ai flagged a P1 (confidence 8) in the consensus-review fix: the
crash guard could be silently bypassed for any sessionId containing the
file-name-unsafe chars `/`, `\`, `.`, or `:`. Six independent LLM reviewers
missed this — see memory note on AI-review blind spots for data-flow bugs.

The mismatch:
  - flushSync calls `markCrashed(this.sessionId)` with the RAW value
  - FileExporter.export checks `crashedSessions.has(trace.sessionId)`
    where `trace.sessionId` comes from `buildTraceFile` and is sanitized
    via `replace(/[/\\.:]/g, "_")`

For any sessionId containing one of those chars, the Set entry and the
lookup key differ, so the guard returns false and export proceeds — the
exact scenario the M3 fix was meant to prevent.

Fix: sanitize in markCrashed using the same regex, so the Set always
stores the canonical (sanitized) key. Lookup-and-store now agree.

Regression test added: a sessionId with `:`, `/`, and `.` is marked
crashed, then export() is called with the corresponding sanitized trace.
Pre-fix: export proceeds and overwrites flushSync's crashed file.
Post-fix: export returns undefined; flushSync's file stands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

2 similar comments
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@dev-punia-altimate dev-punia-altimate left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Persona Review — Verdict: ship

This PR adds 13 new Databricks semantic models to Studio with comprehensive coverage of cost, job, warehouse, and query metrics. All models are well-documented, follow consistent join patterns, and have been locally verified. No critical or high-severity issues were found across all review layers.

15/15 agents completed · 280s · 15 findings (0 critical, 4 high, 9 medium)

High

  • [persona_cto] Adding 13 new ClickHouse-backed semantic models without a clear strategy for model lifecycle management, versioning, or schema evolution. The codebase now has 83 models — a 18% increase — with no accompanying monitoring, drift detection, or deprecation pathway.
    • 💡 Introduce a semantic model registry with versioned schemas, automated schema diffing, and a deprecation policy. Consider using a metadata store (e.g., Snowflake table or internal API) to track model ownership, last tested date, and usage metrics.
  • [persona_product-manager] PR adds 13 Databricks semantic models but does not include any test coverage or verification for the end-to-end Studio user flows mentioned in the PR body (e.g., 'What was my Databricks cost in the last 30 days?'). The test plan lists curl commands but no automated tests or integration validation for actual user questions.
    • 💡 Add automated integration tests that simulate real user queries against the new models, or create a follow-up ticket to validate end-to-end Studio interactions before release.
  • [persona_security] User-supplied filter values (e.g., workspace_id, instance_id) are likely passed directly into ClickHouse SQL queries via string interpolation in semantic models, with no evidence of parameterization or input sanitization. → app/semantic_layer/models/databricks_billing_usage.py:45
    • 💡 Use parameterized queries with bound parameters (e.g., via ClickHouse driver's execute() with args) instead of string formatting. Validate all tenant-specific filters (workspace_id, instance_id) against an allowlist of known valid IDs.
  • [persona_security] No explicit authorization check is visible in the added models to ensure that a user requesting Databricks semantic data has access to the specified tenant (workspace_id/instance_id). This enables IDOR across tenants. → app/semantic_layer/models/databricks_access_workspaces.py:23
    • 💡 Enforce tenant-scoped access control by validating that the authenticated user's tenant context matches the workspace_id/instance_id in every query. Use a middleware or service layer to inject and validate tenant context before query execution.

Medium

  • [persona_tech-lead] Semantic model files are added directly under app/semantic_layer/models/ without a clear sublayering or abstraction for ClickHouse-specific logic, risking tight coupling between data source and model definition. → app/semantic_layer/models/databricks_billing_usage.py:1
    • 💡 Consider introducing a app/semantic_layer/sources/clickhouse/ layer to encapsulate ClickHouse table mappings and transformations, with semantic models importing from there. This isolates data source details and enables reuse across dialects.
  • [persona_tech-lead] Multiple Databricks models (e.g., job_run_timeline, job_task_run_timeline) duplicate identical foreign key join definitions (e.g., job_id, run_id) without reusing a shared schema or base class. → app/semantic_layer/models/databricks_job_run_timeline.py:45
    • 💡 Extract common entity definitions (e.g., JobEntity, RunEntity) into a shared module (e.g., app/semantic_layer/models/_common.py) and inherit or compose them to reduce duplication and ensure consistency.
  • [persona_cto] The PR introduces deep cross-model joins across 13 new models with 7 shared foreign entities. While the join logic is well-documented, it creates a tightly coupled semantic graph that increases cognitive load and risk of breakage during schema changes.
    • 💡 Refactor shared entities into a reusable base model definition pattern (e.g., BaseEntityMixin) to reduce duplication and enforce consistency. Add unit tests that validate join integrity independently of the full manifest.
  • [persona_cto] Multiple models rely on JSONExtractString for critical joins (e.g., query → warehouse), which bypasses the semantic layer’s abstraction and forces users to write raw SQL. This undermines the purpose of the semantic layer.
    • 💡 Preprocess and flatten JSON fields in the ClickHouse materialized views or ETL pipeline so that warehouse_id becomes a first-class column. This preserves abstraction and enables agent-friendly querying.
  • [persona_product-manager] PR documents limitations like JSON-encoded compute fields and potential 2x cost overstatement due to MergeTree behavior, but these are not surfaced to users in Studio. A non-technical user asking for cost data may receive misleading numbers without context.
    • 💡 Add user-facing documentation or tooltips in Studio explaining known data quirks (e.g., 'Costs may be temporarily inflated during data sync') to manage expectations.
  • [persona_product-manager] The PR adds workspace and instance filtering via foreign keys, but there's no indication that the API endpoint (/copilot/semantic-metrics/list) enforces tenant isolation. Without this, users might see data from other tenants.
    • 💡 Verify and document that tenant headers (e.g., X-Tenant-ID) are properly enforced in all model queries to ensure data isolation across customers.
  • [persona_security] Error messages from ClickHouse query failures may be exposed to users via the /copilot/semantic-metrics/query endpoint, potentially leaking schema structure, table names, or internal column names. → app/semantic_layer/models/databricks_query_history.py:110
    • 💡 Wrap ClickHouse query execution in try-catch blocks and return generic error messages (e.g., 'Invalid query parameters') to end users. Log detailed errors server-side only.
  • [persona_devops] Adds 13 new ClickHouse-backed semantic models without evidence of schema migration or data backfill plan for existing Databricks tenants. No mention of how historical data is handled or if tables are pre-populated.
    • 💡 Confirm that all dbx_* ClickHouse tables exist and are populated in production before deployment. Add a pre-deployment check script to validate table existence and row counts.
  • [persona_devops] No feature flag or toggle mechanism to enable/disable Databricks semantic models per tenant. Risk of exposing incomplete or untested models to all tenants.
    • 💡 Implement a tenant-level feature flag (e.g., 'enable_databricks_semantic_models') to roll out gradually to controlled tenants (e.g., thredup first) before full rollout.

Low

  • [persona_tech-lead] Model names use inconsistent casing: some use snake_case (e.g., databricks_billing_usage), others use mixed case in comments (e.g., 'SQL Warehouses') — though acceptable, it reduces internal consistency. → app/semantic_layer/models/databricks_warehouse_inventory.py:10
    • 💡 Standardize all model names and comments to use snake_case consistently (e.g., 'sql_warehouses') to align with Python and project conventions.
  • [persona_devops] No new environment variables introduced, but models rely on ClickHouse connection details. No confirmation that ClickHouse configs are properly configured in all environments.
    • 💡 Verify that CLICKHOUSE_HOST, CLICKHOUSE_PORT, CLICKHOUSE_USER, CLICKHOUSE_PASSWORD are set and tested in staging and production environments.

Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·

The test asserted `schema_inspect` IS in the output, contradicting its
own name. The production code at sql-analyze.ts:56-58 flips
`isRealFailure` to true when the dispatcher result includes an
`error` field, which suppresses the progressive-disclosure suggestion.
Test name described correct behaviour; assertion did not. Flipping to
`not.toContain` matches both the implementation and the test name.

Pre-existing fail on main (verified on v0.8.0 release CI run
26734399443) — unrelated to the trace corruption fix but blocking
#867's CI. Landing the fix in this PR for the unblock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

2 similar comments
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@sahrizvi sahrizvi closed this Jun 1, 2026
@sahrizvi sahrizvi deleted the investigate/trace-corruption branch June 1, 2026 18:45
@sahrizvi sahrizvi restored the investigate/trace-corruption branch June 1, 2026 18:46
@sahrizvi sahrizvi reopened this Jun 1, 2026
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@sahrizvi sahrizvi closed this Jun 1, 2026
@sahrizvi sahrizvi deleted the investigate/trace-corruption branch June 1, 2026 18:47
@sahrizvi sahrizvi restored the investigate/trace-corruption branch June 1, 2026 18:49
@sahrizvi sahrizvi reopened this Jun 1, 2026
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@anandgupta42 anandgupta42 merged commit e12cac3 into main Jun 1, 2026
34 of 35 checks passed
sahrizvi pushed a commit that referenced this pull request Jun 5, 2026
The PR description already carries the bug-by-bug origin table and the
explanation of why #867's scope was disjoint from these bugs. Keeping a
295-line spec file in the repo to say the same thing again is bloat —
the PR is the right place for this content.
anandgupta42 pushed a commit that referenced this pull request Jun 6, 2026
…ta after each agent turn (#895)

* fix(tracing): waterfall view collapses to system-prompt span on agent-finish

Symptom: opening `/traces` mid-session and watching the waterfall view
during agent execution shows the rich trace populating correctly, but
the moment the agent finishes its turn the view collapses to a single
"system-prompt" span — and the data is genuinely lost on disk, not
just from the viewer.

Chain (verified by reading worker.ts + routes/session/index.tsx + tracing.ts):

  1. `routes/session/index.tsx` has
     `createEffect(() => session()?.workspaceID && sdk.setWorkspace(...))`.
     SolidJS dirty-tracks any signal read inside the effect — so it
     re-runs on EVERY `session()` change (message count, status, parts
     updates), including the cascade at agent-finish.
  2. Each fire calls `worker.setWorkspace` via RPC.
  3. `worker.setWorkspace` unconditionally calls `startEventStream`.
  4. `startEventStream`:
        for (const [, trace] of sessionTraces) {
          void trace.endTrace().catch(() => {})   // fire-and-forget
        }
        sessionTraces.clear()
  5. On the next event for the same session, `getOrCreateTrace` hits a
     cache miss, calls `Trace.create()` + `startTrace(sessionID, {})`,
     which pushes a single root span into a freshly-empty `this.spans`
     and calls `this.snapshot()`. The snapshot path is derived purely
     from sessionID (`tracing.ts:836`), so it overwrites the rich
     `ses_<id>.json` with a 1-span file.

Distinct from #867 — that PR fixed intra-Trace-instance concurrency
(snapshot debounce M2; FileExporter ↔ flushSync race M3). This bug is
at the worker-level cache lifecycle: a new Trace instance gets
constructed and its near-empty initial state clobbers the previous
instance's rich state on disk.

Two minimal fixes lock the contract:

* `worker.ts` — make `setWorkspace` idempotent. Track
  `currentWorkspaceID` at module scope; early-return when the incoming
  value matches. Spurious calls from the UI no longer destroy traces.
* `routes/session/index.tsx` — switch the workspaceID effect to
  `createEffect(on(() => session()?.workspaceID, ...))`. The `on()`
  projector restricts SolidJS dirty-tracking to that one field, so the
  effect only fires when workspaceID actually changes — defense in
  depth at the upstream trigger.

Regression test in `test/cli/tui/worker-trace-clearing.test.ts`
locks both contracts via source-grep (the worker-import side has
top-level side effects that make in-process unit testing awkward).

Out of scope (follow-up): `getOrCreateTrace` on cache miss does not
rehydrate `this.spans` from the existing `ses_<id>.json` file. After
the two fixes above, this matters only on worker restart or
MAX_TRACES=100 eviction — both uncommon. Worth tracking as defense in
depth so the disk file is always authoritative.

Typecheck clean. 152 TUI tests pass; 35 existing tracing tests pass
unchanged.

* fix(tracing): real root cause — session.status=idle was destroying the Trace every turn

Previous commit on this branch was correct but not load-bearing. The
actual hot path that collapsed the on-disk trace after every agent
turn is in `worker.ts`:

    if (event.type === "session.status") {
      if (status === "idle" && sid) {
        const trace = sessionTraces.get(sid)
        if (trace) {
          void trace.endTrace().catch(() => {})
          sessionTraces.delete(sid)            // ← every turn
          sessionUserMsgIds.delete(sid)
        }
      }
    }

`session.status === "idle"` fires after every busy→idle transition,
which happens once per turn — not once per session. Each fire ended
the trace AND deleted the cache entry. The next event for the same
session in the next turn hit a cache miss, constructed a fresh
`Trace.create()`, called `startTrace(sessionID, {})` (which pushes a
single root span into empty `this.spans`), and the immediate
`snapshot()` clobbered the rich on-disk `ses_<id>.json` with a 1-span
file.

This also explains the "What was asked / No prompt recorded" symptom:
`metadata.prompt` was captured on the now-destroyed first instance and
never persisted into the replacement.

Fixes in this commit:

* `worker.ts`: removed the destructive `session.status === "idle"`
  handler. Sessions in altimate-code are long-lived; the Trace lives
  as long as the worker has the session in cache. Finalization
  happens on `shutdown` and on MAX_TRACES eviction only — both
  already correct.

* `tracing.ts`: new `Trace.rehydrateFromFile(sessionId)` that reads
  the existing on-disk file and restores `this.spans`, `this.metadata`,
  `this.rootSpanId`, `this.startTime`, counters, and clears the root's
  endTime so the trace renders as still-in-progress. Returns true on
  success; false on missing/mismatched/malformed file.

* `worker.ts:getOrCreateTrace`: on cache miss, calls
  `rehydrateFromFile` before falling back to `startTrace`. Defense in
  depth for the worker-restart / MAX_TRACES-eviction paths — even if
  some future path destroys the in-memory instance, the new instance
  loads disk state instead of overwriting.

Verification

* Behavioral test (`test/altimate/tracing-rehydrate.test.ts`, 4 cases):
  proves end-to-end that rehydrate preserves spans+metadata across
  Trace-instance reconstruction, returns false on missing/mismatched
  files, and clears the root endTime so re-opened traces accept new
  events.
* Source-grep regression tests for worker.ts continue to lock the
  no-idle-clobber and rehydrate-before-startTrace contracts.
* 449 pass / 0 fail across the full tracing + TUI suites (1 network
  flake in tracing-adversarial-2 unrelated to this change, passes on
  re-run).

What I traced before declaring done

* Inventoried every `sessionTraces.delete`, `sessionTraces.clear`,
  `endTrace()`, `Trace.create()`, `Trace.withExporters()`, and direct
  trace-file write across the whole repo.
* Confirmed `this.spans` is never reassigned mid-instance (only
  pushed) except by the new `rehydrateFromFile`.
* Confirmed no external callers of `trace.snapshot()` outside
  `tracing.ts`.

Post-fix, the only ways a new `Trace` instance can replace an existing
session's in-memory Trace are: worker boot (once), `setWorkspace` with
an actually-changed workspaceID (rare, also idempotency-guarded), and
MAX_TRACES eviction (uncommon at 100 sessions). All three now go
through `rehydrateFromFile` first.

* fix(tracing): capture user prompt via setPrompt; drop time.end gate

Per codex review (2026-06-05): the previous user-text branch in worker.ts
gated on `part.time?.end` (which user-input parts never have set) AND
called `trace.setTitle(text, text)` which would have regressed the
auto-generated session title from Path C (`session.updated`) back to
the raw user input ('Greeting' → 'hi') if the ordering went sideways.

* New `Trace.setPrompt(prompt)` method that only mutates `metadata.prompt`.
  Decouples prompt capture from title mutation.
* Path B in worker.ts now: (a) accepts user text parts without
  `part.time?.end` (it's an assistant-side concept only); (b) calls
  `setPrompt(text)` only — never `setTitle` — for user-identified
  messages. Assistant text still requires `time.end` for `logText`.
* Source-grep regression tests lock both contracts: no more
  `part.time?.end` on the user branch, no `setTitle` from the user
  branch, `setPrompt` exists and doesn't touch title.

* fix(tracing): record user messages as spans so chat tab renders multi-turn

The chat tab in the trace viewer renders 'metadata.prompt' as a single
'You' bubble at the top, then iterates 'kind: generation' spans for
assistant replies. There's no place for any user message beyond the
first to land — `setPrompt` overwrites on every call, and the viewer
only reads the latest value. Symptom: a 3-turn session shows only the
LAST user prompt followed by all earlier assistant responses, with the
older user messages dropped.

Fix splits the data and the rendering:

* New `Trace.logUserMessage(text)` pushes a `kind: 'user-message'`
  span with the user text as input. Snapshots immediately like other
  log* methods.
* New 'user-message' variant added to the TraceSpan.kind union.
* worker.ts Path B: now also calls `logUserMessage(text)` alongside
  `setPrompt(text)` for user-identified text parts (the first one
  populates metadata.prompt for the Summary tab; all of them populate
  per-turn spans for the Chat tab).
* viewer.ts chat-view: builds a chronologically sorted list of
  user-message + generation spans and walks it in startTime order.
  Older traces without user-message spans fall back to rendering
  metadata.prompt as before.
* Behavioral test in tracing-rehydrate.test.ts proves the spans are
  written, ordered chronologically, and preserve the user text.

10 unit tests in worker-trace-clearing + 8 in tracing-rehydrate pass;
35 existing tracing tests pass unchanged; typecheck clean.

* docs: trace-bugs followup to #867 — what this branch actually fixes

* fix(tracing): address PR #895 bot review feedback

* Trace.rehydrateFromFile (cubic P2): restore traceId from the on-disk file
  so post-rehydrate snapshots/exports preserve trace identity across instance
  lifetimes. Without this, every snapshot after rehydration writes a fresh
  random traceId.
* Trace.rehydrateFromFile (cubic P2): normalize sessionId via the same
  sanitization buildTraceFile applies before comparing to trace.sessionId.
  Previously, sessions with /, \, ., or : in the id would be falsely
  rejected and recreated.
* viewer chat-view (cubic P2): always render metadata.prompt at the top
  unless an existing user-message span carries the same text. Pre-fix
  traces (only metadata.prompt) and mixed traces (rehydrated pre-fix data
  + new turns) now render the first user turn correctly. Previously, the
  fallback was gated on userMsgs.length === 0 and dropped the legacy
  first turn in mixed traces.
* worker-trace-clearing.test.ts (CodeRabbit, cubic P3): broaden the
  negative regression guards to catch all three flagged bug spellings —
  inline expression, inline ternary, block body with if — for the
  workspaceID effect; and to reject part.time?.end nested inside the
  user-text branch (identified by sessionUserMsgIds.get(...).has(...)).
* routes/session/index.tsx: paraphrased the bug-shape literal that was
  in a comment so the broadened test regex doesn't catch our own
  documentation as the bug.
* tracing-rehydrate.test.ts: behavioral tests for the traceId
  preservation and sanitized-sessionId match.

Skipped CodeRabbit's tmpdir-fixture-style suggestion — the convention
isn't followed by the existing tracing tests (tracing-display-crash,
tracing-rename-race) so changing this one file alone would be inconsistent
and out of scope for this PR.

48 affected tests pass; typecheck clean.

* test(tracing): migrate tracing-rehydrate to tmpdir() fixture

CodeRabbit pointed out that `packages/opencode/test/AGENTS.md` documents
`tmpdir()` from `fixture/fixture.ts` as the project convention. My
earlier reply skipped the migration on the grounds that sibling tracing
tests don't follow the convention — but for code I'm adding fresh, the
right move is to follow the documented standard. The sibling tests
predate it and should be swept in a separate cleanup PR.

Replaces the manual `os.tmpdir() + beforeEach/afterEach` pattern with
`await using tmp = await tmpdir()` per test. Helpers `makeTrace` and
`readTraceFile` now take `dir` as a first parameter so each test
threads its own tmp directory through. 46 affected tests pass.

* fix(viewer): dedupe metadata.prompt against truncated user-message input

cubic flagged that `Trace.logUserMessage` slices user text at 4000 chars
when persisting to the span, while `metadata.prompt` keeps the full
string. The strict equality check in viewer chat-view (`u.input ===
t.metadata.prompt`) misses the dedupe for prompts longer than 4000 chars
and renders the same text twice — once as the top-level "You" fallback
bubble, once as the first user-message span.

Match against both the full and the truncated form. Same fix shape cubic
suggested.

* docs: remove redundant spec/trace-bugs-followup-867.md

The PR description already carries the bug-by-bug origin table and the
explanation of why #867's scope was disjoint from these bugs. Keeping a
295-line spec file in the repo to say the same thing again is bloat —
the PR is the right place for this content.

* fix(tracing): synthetic-part gate, in-flight gen interrupt, shared truncation constant

Three follow-ups from the multi-LLM consensus review of PR #895, codex-validated:

* Worker user-text branch now skips parts with `synthetic` or `ignored`
  set (Major #1 in the review, codex-confirmed). `Session.createUserMessage`
  in prompt.ts attaches many synthetic text parts to the user messageID for
  MCP resource banners, decoded file contents, retry/reminder text,
  plan-mode reminders, and agent-handoff tags. Without the gate they pass
  the `sessionUserMsgIds.has(messageID)` check, `metadata.prompt` ends up
  holding the LAST synthetic part (typically a file blob), and the chat tab
  renders one fake "▶ You" bubble per synthetic span — defeating the two
  display surfaces this PR fixes. Gated on an `isAuthoredText` predicate
  so the symmetric assistant-text branch is also protected. Continue via
  predicate rather than `continue` keyword so the outer event loop still
  forwards the event downstream via `Rpc.emit`.

* `Trace.rehydrateFromFile` now marks any open generation span as
  interrupted (Major #2). The transient state `logStepStart` populated
  (`currentGenerationSpanId`, `generationText`, `generationToolCalls`,
  `pendingToolResults`) is memory-only and can't be reconstructed from
  disk. If we leave open spans open, the next `step-finish` for that turn
  drops at the `!this.currentGenerationSpanId` guard and follow-up
  `logToolCall` mis-parents tool spans to the root, silently degrading the
  trace shape. Closing the span with `status: "error"` and a
  `statusMessage` describing the interruption preserves the partial data
  and makes the boundary visible in the viewer.

* Extracted the 4000-char truncation cap as `USER_MESSAGE_INPUT_MAX_CHARS`
  exported from tracing.ts (new cubic P3 on viewer.ts:1331). The viewer's
  chat-tab dedupe now interpolates the same constant so the two sides can't
  drift if the truncation cap ever changes. Also reused a single
  `Date.now()` call for the user-message span's start/end timestamps
  (cosmetic, addresses review nit #16).

Skipped from the review:
- Major #3 (no per-messageID dedupe in logUserMessage): codex confirmed
  user text doesn't stream/chunk — message.part.delta is assistant-only
  — so the symptom the review described is subsumed by Major #1's
  synthetic gate. No separate fix needed.
- Major #5 (Path A `setTitle(text, text)` couples title and prompt):
  codex grep-verified that no in-repo code path populates
  `userMsg.summary.title` or `summary.body`; the branch is inert.
  Cleanup risk only, tracked in #896.

Tests
- New behavioral test in tracing-rehydrate.test.ts asserts that an open
  generation span (no `step-finish` before reconstruction) ends up
  with `endTime` set, `status: "error"`, and a statusMessage matching
  `/interrupted/i` after rehydrate + a snapshot-triggering call.
- New source-grep test in worker-trace-clearing.test.ts locks both
  the synthetic-gate literal (`!part.synthetic && !part.ignored`) and
  the requirement that both `trace.setPrompt` and `trace.logUserMessage`
  sit inside the `isAuthoredText &&` guard.

42 affected tests pass; typecheck clean.

* test: tighten worker-trace-clearing regex to scope-bounded match (CodeRabbit)

CodeRabbit flagged that the previous source-grep assertions matched
across the entire `worker.ts` file, so unrelated code positioning
could satisfy them without the synthetic-gate fix being present in
the actual user-text branch.

* The `isAuthoredText` declaration check now asserts the const is
  built from BOTH flags (`!part.synthetic && !part.ignored`), not
  just that the literal exists somewhere.
* The two write-path checks now require `trace.setPrompt` and
  `trace.logUserMessage` to sit inside the same `if (text) { ... }`
  body within the `sessionUserMsgIds...has(part.messageID)` branch.
  The `[^{}]` bounds on the inner spans prevent the match from
  extending past the closing brace of that body, so calls elsewhere
  in the file can't false-green the assertion.

* fix(tracing): make rehydrateFromFile async to unblock the event-loop hot path

cubic flagged that `fsSync.readFileSync` in `rehydrateFromFile` blocks
the worker event loop during a trace cache miss. The hot path is
bounded (cache miss only fires on worker restart, MAX_TRACES eviction,
or initial-boot resumption of a stale session), but a multi-MB trace
file makes the pause visible.

* `Trace.rehydrateFromFile` now returns `Promise<boolean>` and uses
  the async `fs.readFile`.
* `getOrCreateTrace` in worker.ts becomes `async` and the three call
  sites in the event-stream loop now `await` it. The loop body was
  already async (`for await (const event of events.stream)`), so the
  conversion is local.
* Behavioural tests in `tracing-rehydrate.test.ts` converted to
  `await` the now-async method (7 call sites).
* The source-grep contract test for `getOrCreateTrace`'s
  rehydrate-before-startTrace shape now matches the `await` form.

Not addressed in this commit (will reply on PR):
- cubic also flagged a theoretical event-ordering race where
  `message.part.updated` could arrive before `message.updated`,
  leaving `sessionUserMsgIds` empty when the part handler runs. The
  producer (`Session.createUserMessage` → `updateMessage` THEN
  `updatePart` for each part) emits in order, and the consumer reads
  the event stream sequentially. The race is theoretical given the
  current producer; if we ever reorder, the defensive fix is to buffer
  unrouted parts by messageID. Out of scope for this PR.

---------

Co-authored-by: Haider <haider@altimate.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Traces overwritten/corrupted in long-running sessions (multiple concurrency vectors)

3 participants