feat: annotate traces with de.* attributes via metadata channel + procedural classifier#938
feat: annotate traces with de.* attributes via metadata channel + procedural classifier#938sahrizvi wants to merge 1 commit into
de.* attributes via metadata channel + procedural classifier#938Conversation
…rocedural classifier
Wires the existing trace-attribute scaffolding (`de-attributes.ts`, `setSpanAttributes`,
viewer grouped rendering at `viewer.ts:514`) so real values flow onto every tool span
and the session root span. Two channels:
- **Tool-provided metadata**: tools surface structured fields by setting `de.*`
keys on their returned `metadata` object. `Trace.logToolCall` lifts those keys
onto `span.attributes`. Tools never import the observability layer — the seam
is the existing `ToolStateCompleted.metadata` field in `message-v2`. Per-value
10 KB and total 32 KB byte caps prevent runaway payloads.
- **Procedural classifier** (`observability/annotator.ts`): pure function over
`(toolName, input, output)`. Lookup-table tool taxonomy, regex bash-intent and
dbt-layer-from-path classification, deterministic outcome/artifacts/env
rollups. Best-effort — returns undefined rather than emitting wrong attributes.
Called from `logToolCall` (per-span) and `endTrace` + `flushSync` (session-level).
Layer 1 (tool-emitted) wins over Layer 2 (derived) on key conflicts in both
per-tool and session merges.
Four tool opt-ins:
- `sql_execute`: `de.warehouse.{system,rows_returned}`, `de.sql.dialect` from
the Registry-resolved warehouse type
- `altimate_core_column_lineage`: structured `de.sql.lineage.{input_tables,
output_table,columns_read,columns_written}` from the altimate-core parser,
preserving case via structured endpoint extraction
- `schema_inspect`: `de.warehouse.rows_total`, `de.sql.lineage.output_table`
- `project_scan`: `de.env.{dbt_present,dbt_manifest_present,warehouse_type,
tools_detected}` from authoritative scan results
Vocabulary extensions:
- `de-attributes.ts`: `DE.WORKFLOW`, `DE.OUTCOME`, `DE.ARTIFACTS`, `DE.ENV`,
`DE.TOOL` sub-namespaces; `de.dbt.layer`; `de.warehouse.rows_total`
- `viewer.ts`: matching prefixes added to the grouped rendering
Reviewed by Codex per chunk; fixes folded in: relative dbt-layer path matching,
SQL-text capping (slice instead of drop), workflow classifier counts dbt-bash
spans by `de.tool.bash_intent` not by all bash spans, scalar `output_table`,
structured endpoint extraction for column lineage, `flushSync` session rollup,
absent-key merge direction so explicit `setSpanAttributes(..., "session")`
callers still win.
E2E verified locally against the Altimate gateway: tool spans receive their
category, dbt-layer, bash-intent, and dbt-command attributes; session root
receives outcome, artifacts.files_read, outcome.executed.
|
👋 This PR was automatically closed by our quality checks. Common reasons:
If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you. |
📝 WalkthroughWalkthroughThis PR adds a comprehensive trace annotation system that classifies tool spans using a deterministic taxonomy and heuristics, rolls up session-level metadata including workflow inference, and enriches individual tool outputs with lineage and environment attributes. The annotation module is integrated into the core trace lifecycle and complemented by trace viewer display updates. ChangesTrace Annotation and Enrichment
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes The PR introduces a new annotation module (531 lines) with straightforward heuristic logic and deterministic taxonomy lookup, integrated into existing trace infrastructure at well-defined points. Tool enrichment changes are homogeneous patterns across multiple files with consistent Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 This PR was automatically closed by our quality checks. Common reasons:
If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you. |
1 similar comment
|
👋 This PR was automatically closed by our quality checks. Common reasons:
If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts`:
- Around line 77-86: The loop currently assumes data.column_lineage is iterable
and can throw if it's not; update the iteration to first ensure column_lineage
is an array (e.g. replace (data.column_lineage ?? []) with
Array.isArray(data.column_lineage) ? data.column_lineage : []) before the
for...of loop in altimate-core-column-lineage.ts so the code that references
extractTable/extractColumn and the sets (inputTables, outputs, colsRead,
colsWritten) only runs over a real array and won't error on non-array dispatcher
responses.
In `@packages/opencode/src/altimate/tools/project-scan.ts`:
- Around line 943-945: The current assignment for "de.env.dbt_manifest_present"
uses the RPC result (dbtManifest) which can be false on transient RPC failures;
instead check the actual presence of the manifest file on disk: use dbtProject
(e.g., dbtProject.found or dbtProject.path) to construct the manifest path
(manifest.json) and perform a file-existence check (fs.existsSync or
fs.promises.stat) and set "de.env.dbt_manifest_present" based on that result,
keeping the other fields (e.g., "de.env.dbt_present" and toolsFound) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 69d72f33-b54a-405b-a0d9-8e5feee95883
📒 Files selected for processing (8)
packages/opencode/src/altimate/observability/annotator.tspackages/opencode/src/altimate/observability/de-attributes.tspackages/opencode/src/altimate/observability/tracing.tspackages/opencode/src/altimate/observability/viewer.tspackages/opencode/src/altimate/tools/altimate-core-column-lineage.tspackages/opencode/src/altimate/tools/project-scan.tspackages/opencode/src/altimate/tools/schema-inspect.tspackages/opencode/src/altimate/tools/sql-execute.ts
| for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) { | ||
| const srcTable = extractTable(edge, "source") | ||
| const tgtTable = extractTable(edge, "target") | ||
| const srcCol = extractColumn(edge, "source") | ||
| const tgtCol = extractColumn(edge, "target") | ||
| if (srcTable) inputTables.add(srcTable) | ||
| if (tgtTable) outputs.add(tgtTable) | ||
| if (srcCol) colsRead.add(srcCol) | ||
| if (tgtCol) colsWritten.add(tgtCol) | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Guard column_lineage to an array before iterating.
Line 77 assumes data.column_lineage is iterable. If the dispatcher returns a non-array object, this throws and converts the call into an error path unnecessarily.
Suggested fix
- for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
+ const lineageEdges = Array.isArray(data.column_lineage) ? data.column_lineage : []
+ for (const edge of lineageEdges as Record<string, any>[]) {
const srcTable = extractTable(edge, "source")
const tgtTable = extractTable(edge, "target")
const srcCol = extractColumn(edge, "source")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) { | |
| const srcTable = extractTable(edge, "source") | |
| const tgtTable = extractTable(edge, "target") | |
| const srcCol = extractColumn(edge, "source") | |
| const tgtCol = extractColumn(edge, "target") | |
| if (srcTable) inputTables.add(srcTable) | |
| if (tgtTable) outputs.add(tgtTable) | |
| if (srcCol) colsRead.add(srcCol) | |
| if (tgtCol) colsWritten.add(tgtCol) | |
| } | |
| const lineageEdges = Array.isArray(data.column_lineage) ? data.column_lineage : [] | |
| for (const edge of lineageEdges as Record<string, any>[]) { | |
| const srcTable = extractTable(edge, "source") | |
| const tgtTable = extractTable(edge, "target") | |
| const srcCol = extractColumn(edge, "source") | |
| const tgtCol = extractColumn(edge, "target") | |
| if (srcTable) inputTables.add(srcTable) | |
| if (tgtTable) outputs.add(tgtTable) | |
| if (srcCol) colsRead.add(srcCol) | |
| if (tgtCol) colsWritten.add(tgtCol) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` around
lines 77 - 86, The loop currently assumes data.column_lineage is iterable and
can throw if it's not; update the iteration to first ensure column_lineage is an
array (e.g. replace (data.column_lineage ?? []) with
Array.isArray(data.column_lineage) ? data.column_lineage : []) before the
for...of loop in altimate-core-column-lineage.ts so the code that references
extractTable/extractColumn and the sets (inputTables, outputs, colsRead,
colsWritten) only runs over a real array and won't error on non-array dispatcher
responses.
| "de.env.dbt_present": dbtProject.found, | ||
| "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null, | ||
| ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }), |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Use file existence for de.env.dbt_manifest_present, not RPC success.
Line 944 currently reflects whether dbt.manifest returned data, not whether manifest.json exists. If the RPC fails transiently, this emits false even when the file is present, which breaks the de.env.dbt_manifest_present semantic contract.
Suggested fix
const deAttrs: Record<string, unknown> = {
"de.env.dbt_present": dbtProject.found,
- "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,
+ "de.env.dbt_manifest_present": Boolean(dbtProject.manifestPath),
...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),
...(deWhTypes.length === 1 && { "de.env.warehouse_type": deWhTypes[0] }),
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "de.env.dbt_present": dbtProject.found, | |
| "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null, | |
| ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }), | |
| "de.env.dbt_present": dbtProject.found, | |
| "de.env.dbt_manifest_present": Boolean(dbtProject.manifestPath), | |
| ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }), |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/src/altimate/tools/project-scan.ts` around lines 943 - 945,
The current assignment for "de.env.dbt_manifest_present" uses the RPC result
(dbtManifest) which can be false on transient RPC failures; instead check the
actual presence of the manifest file on disk: use dbtProject (e.g.,
dbtProject.found or dbtProject.path) to construct the manifest path
(manifest.json) and perform a file-existence check (fs.existsSync or
fs.promises.stat) and set "de.env.dbt_manifest_present" based on that result,
keeping the other fields (e.g., "de.env.dbt_present" and toolsFound) unchanged.
There was a problem hiding this comment.
4 issues found across 8 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/observability/annotator.ts">
<violation number="1" location="packages/opencode/src/altimate/observability/annotator.ts:180">
P2: `altimate-dbt` commands are misclassified as `dbt` due to regex order, producing incorrect `de.tool.bash_*` metadata.</violation>
</file>
<file name="packages/opencode/src/altimate/observability/tracing.ts">
<violation number="1" location="packages/opencode/src/altimate/observability/tracing.ts:912">
P2: Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.</violation>
</file>
<file name="packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts">
<violation number="1" location="packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts:77">
P2: Guard `data.column_lineage` with `Array.isArray()` before iterating. The `?? []` fallback only handles `null`/`undefined`; if the dispatcher returns a non-array object, `for...of` will throw a `TypeError`.</violation>
</file>
<file name="packages/opencode/src/altimate/tools/project-scan.ts">
<violation number="1" location="packages/opencode/src/altimate/tools/project-scan.ts:944">
P2: `de.env.dbt_manifest_present` reflects RPC success rather than file existence. If the `dbt.manifest` call fails transiently (e.g., timeout), `dbtManifest` will be `undefined` and this emits `false` even when `manifest.json` exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via `dbtProject` context or an `fs.existsSync` on the expected path) instead of relying on the RPC result.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
|
||
| // dbt CLI: detect verb after `dbt`. Broad list of subcommands. | ||
| const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness" | ||
| const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i")) |
There was a problem hiding this comment.
P2: altimate-dbt commands are misclassified as dbt due to regex order, producing incorrect de.tool.bash_* metadata.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/annotator.ts, line 180:
<comment>`altimate-dbt` commands are misclassified as `dbt` due to regex order, producing incorrect `de.tool.bash_*` metadata.</comment>
<file context>
@@ -0,0 +1,531 @@
+
+ // dbt CLI: detect verb after `dbt`. Broad list of subcommands.
+ const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness"
+ const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i"))
+ if (dbtMatch) {
+ return { intent: "dbt", invoked: "dbt", dbtCommand: dbtMatch[1].toLowerCase() }
</file context>
| try { | ||
| const serialized = JSON.stringify(v) | ||
| if (serialized === undefined) continue | ||
| if (serialized.length > ATTR_VALUE_MAX_BYTES) continue |
There was a problem hiding this comment.
P2: Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/tracing.ts, line 912:
<comment>Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.</comment>
<file context>
@@ -874,6 +879,71 @@ export class Trace {
+ try {
+ const serialized = JSON.stringify(v)
+ if (serialized === undefined) continue
+ if (serialized.length > ATTR_VALUE_MAX_BYTES) continue
+ if (totalBytes + serialized.length > ATTR_TOTAL_MAX_BYTES) continue
+ // Store original value (matches setSpanAttributes() at line ~1135 for
</file context>
| return undefined | ||
| } | ||
|
|
||
| for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) { |
There was a problem hiding this comment.
P2: Guard data.column_lineage with Array.isArray() before iterating. The ?? [] fallback only handles null/undefined; if the dispatcher returns a non-array object, for...of will throw a TypeError.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts, line 77:
<comment>Guard `data.column_lineage` with `Array.isArray()` before iterating. The `?? []` fallback only handles `null`/`undefined`; if the dispatcher returns a non-array object, `for...of` will throw a `TypeError`.</comment>
<file context>
@@ -30,9 +30,78 @@ export const AltimateCoreColumnLineageTool = Tool.define("altimate_core_column_l
+ return undefined
+ }
+
+ for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
+ const srcTable = extractTable(edge, "source")
+ const tgtTable = extractTable(edge, "target")
</file context>
| for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) { | |
| for (const edge of (Array.isArray(data.column_lineage) ? data.column_lineage : []) as Record<string, any>[]) { |
| ] | ||
| const deAttrs: Record<string, unknown> = { | ||
| "de.env.dbt_present": dbtProject.found, | ||
| "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null, |
There was a problem hiding this comment.
P2: de.env.dbt_manifest_present reflects RPC success rather than file existence. If the dbt.manifest call fails transiently (e.g., timeout), dbtManifest will be undefined and this emits false even when manifest.json exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via dbtProject context or an fs.existsSync on the expected path) instead of relying on the RPC result.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/tools/project-scan.ts, line 944:
<comment>`de.env.dbt_manifest_present` reflects RPC success rather than file existence. If the `dbt.manifest` call fails transiently (e.g., timeout), `dbtManifest` will be `undefined` and this emits `false` even when `manifest.json` exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via `dbtProject` context or an `fs.existsSync` on the expected path) instead of relying on the RPC result.</comment>
<file context>
@@ -929,6 +929,24 @@ export const ProjectScanTool = Tool.define("project_scan", {
+ ]
+ const deAttrs: Record<string, unknown> = {
+ "de.env.dbt_present": dbtProject.found,
+ "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,
+ ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),
+ ...(deWhTypes.length === 1 && { "de.env.warehouse_type": deWhTypes[0] }),
</file context>
What's still on the table (and why we stopped at 4 opt-ins)The 4 tool opt-ins in this PR were the highest-yield first batch, not exhaustive. The classifier gives every other tool a useful baseline ( But many tools have structured truth that lives inside them and gets thrown away before tracing sees it — those are the candidates for Layer 1 opt-ins in follow-up PRs. Concrete opt-in candidates (incremental, each ~10–30 lines, no infra changes)
That's roughly 20–30 tools with meaningful structured data the classifier can't reach via regex over output text. Why we stopped at 4 in this PRTo keep surface area small enough to land cleanly. The point of this PR was to prove the channel works end-to-end — which it does:
Each follow-up tool opt-in is independent and incremental — the architecture supports adding them one at a time without touching the tool framework, the tracer hook, or the classifier. Tools still don't import the observability layer; they just set additional Suggested follow-up order (highest value first)
So: deliberate stopping point, not an oversight. The opportunities are real, known, and incrementally addressable. |
dev-punia-altimate
left a comment
There was a problem hiding this comment.
🤖 Code Review — OpenCodeReview (Gemini) — 12 finding(s)
- 11 anchored to a line (posted inline when the comment stream is on)
- 1 without a line anchor
All findings (full text)
1. packages/opencode/src/altimate/observability/viewer.ts (L519)
[🔵 LOW] The use of var for variable declarations is strictly prohibited. Please use const or let instead. In this case, since groups is not reassigned, const is the appropriate choice.
Suggested change:
const groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];
2. packages/opencode/src/altimate/tools/project-scan.ts (L938)
[🔵 LOW] According to the code quality guidelines, the use of any type should be avoided. If its use is absolutely necessary, please add a comment explaining why. Otherwise, consider using a more specific type or an interface to improve type safety.
Suggested change:
.map((w: { type?: string }) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))
3. packages/opencode/src/altimate/tools/sql-execute.ts (L95-L98)
[🔵 LOW] The expression args.warehouse ?? registered[0]?.name is evaluated on every iteration of the .find() method. Although the array is small, it's better practice to evaluate this target name once before the loop to make the code cleaner and avoid repeated evaluation.
Suggested change:
try {
const registered = Registry.list().warehouses
const targetName = args.warehouse ?? registered[0]?.name
return registered.find((w) => w.name === targetName)
} catch {
4. packages/opencode/src/altimate/tools/sql-execute.ts (L108-L112)
[🔵 LOW] Since both metadata properties depend on the exact same condition (warehouseEntry?.type), they can be combined into a single object spread. This reduces redundancy and makes the code slightly more concise.
Suggested change:
// altimate_change start — de.* attributes lifted onto span by tracer
"de.warehouse.rows_returned": result.row_count,
...(warehouseEntry?.type && {
"de.warehouse.system": warehouseEntry.type,
"de.sql.dialect": warehouseEntry.type
}),
// altimate_change end
5. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L49)
[🔵 LOW] Avoid using the any type to comply with TypeScript best practices. Since edge properties are accessed dynamically via string index, unknown is safe and appropriate here.
Suggested change:
const extractTable = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => {
6. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L64-L68)
[🟠 MEDIUM] There are two issues here:
- When falling back to extracting the column from the
endpointstring, it currently returns the entire string (e.g.,schema.table.column). To make this consistent with thedirectextraction (which returns just the column name) andextractTablebehavior, it should extract only the column part by stripping the table prefix. - Replace
Record<string, any>withRecord<string, unknown>to comply with the TypeScript rule against usingany.
Suggested change:
const extractColumn = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => {
const direct = edge[`${side}_column`] ?? edge[`${side}Column`]
if (typeof direct === "string" && direct) return direct
const endpoint = edge[side]
if (typeof endpoint === "string") {
return endpoint.includes(".") ? endpoint.split(".").pop() : endpoint
}
7. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L77)
[🔵 LOW] Replace any with unknown to adhere to the rule against using any types.
Suggested change:
for (const edge of (data.column_lineage ?? []) as Record<string, unknown>[]) {
8. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L87-L93)
[🟠 MEDIUM] Hardcoding telemetry keys violates the 'No Hardcoding' standard and can lead to inconsistencies if the attribute names change in the future. These keys are already defined as constants. Please import DE_SQL from ../observability/de-attributes and use the exported constants instead.
Suggested change:
if (inputTables.size > 0) lineageAttrs[DE_SQL.LINEAGE_INPUT_TABLES] = [...inputTables].slice(0, 50)
// Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute
// type to array when there are multiple outputs. Omit the attribute instead.
if (outputs.size === 1) lineageAttrs[DE_SQL.LINEAGE_OUTPUT_TABLE] = [...outputs][0]
if (colsRead.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_READ] = [...colsRead].slice(0, 100)
if (colsWritten.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_WRITTEN] = [...colsWritten].slice(0, 100)
if (args.dialect) lineageAttrs[DE_SQL.DIALECT] = args.dialect
9. packages/opencode/src/altimate/observability/tracing.ts
[🟠 MEDIUM] There are two improvements that can be made here:
- Inaccurate byte size computation:
serialized.lengthmeasures string length (UTF-16 code units), not actual UTF-8 bytes. If multi-byte characters are present, the true payload byte size could exceed the intended limits. UsingBuffer.byteLength(serialized)evaluates the actual byte length accurately. - Code duplication: The identical serialization and size-limiting logic is repeated for both Layer 1 and Layer 2. Extracting this into a reusable
addAttributehelper improves maintainability and keeps the code DRY.
Suggested change:
const addAttribute = (k: string, v: unknown) => {
if (v === undefined || k in spanAttributes) return
try {
const serialized = JSON.stringify(v)
if (serialized === undefined) return
// Use Buffer.byteLength to accurately count UTF-8 bytes
const byteLen = Buffer.byteLength(serialized)
if (byteLen > ATTR_VALUE_MAX_BYTES) return
if (totalBytes + byteLen > ATTR_TOTAL_MAX_BYTES) return
// Store original value (matches setSpanAttributes() for
// consistent overwrite semantics if both paths target the same key).
spanAttributes[k] = v
totalBytes += byteLen
} catch {
// Bad metadata value must never break the tracer
}
}
// Layer 1: tool-provided structured metadata (high fidelity — driver
// values, parser output). Filtered to the de.* prefix.
const rawMetadata = state.metadata
if (rawMetadata && typeof rawMetadata === "object") {
for (const [k, v] of Object.entries(rawMetadata)) {
if (typeof k === "string" && k.startsWith("de.")) {
addAttribute(k, v)
}
}
}
// Layer 2: derived classification from (name, input, output). Best-effort
// procedural — taxonomy lookup, bash intent, dbt layer from path, etc.
// Tool-provided metadata (Layer 1) wins on conflicts.
try {
const derived = annotateToolSpan(toolName, safeInput, isError ? errorStr : outputStr)
for (const [k, v] of Object.entries(derived)) {
addAttribute(k, v)
}
} catch {
// Annotator must never break the tracer
}
10. packages/opencode/src/altimate/observability/annotator.ts (L179-L180)
[🟠 MEDIUM] This dynamic regular expression is instantiated on every call to classifyBash, which incurs unnecessary allocation and compilation overhead. Since dbtVerbs is a constant string, consider extracting this RegExp declaration outside the function body so it is only compiled once, improving performance.
11. packages/opencode/src/altimate/observability/annotator.ts (L310-L312)
[🟠 MEDIUM] For bash tools, extractInputTables is called on the sliced 8000-character string sql. This contradicts the strategy used for sql_execute tools (where the full query q is parsed to avoid losing table references at the tail). To maintain consistency and ensure all lineage is captured, you should pass the full extracted string (sqlMatch[1]) to extractInputTables.
Suggested change:
const sql = sqlMatch[1].slice(0, 8000)\n out[DE.SQL.QUERY_TEXT] = sql\n const tables = extractInputTables(sqlMatch[1])
12. packages/opencode/src/altimate/observability/annotator.ts (L442-L443)
[🔵 LOW] According to the coding standards, the use of != is prohibited. Since dbtPresent and manifestPresent are defined as boolean | undefined, you should use strict inequality !== undefined instead of != null.
Suggested change:
if (env.dbtPresent !== undefined) out[DE.ENV.DBT_PRESENT] = env.dbtPresent\n if (env.manifestPresent !== undefined) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent
| // DE attributes grouped | ||
| var a = span.attributes || {}; | ||
| var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange']]; | ||
| var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']]; |
There was a problem hiding this comment.
[🔵 LOW] The use of var for variable declarations is strictly prohibited. Please use const or let instead. In this case, since groups is not reassigned, const is the appropriate choice.
Suggested change:
| var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']]; | |
| const groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']]; |
| const deWhTypes = [ | ||
| ...new Set( | ||
| (schemaCache?.warehouses ?? []) | ||
| .map((w: any) => (typeof w?.type === "string" ? w.type.toLowerCase() : "")) |
There was a problem hiding this comment.
[🔵 LOW] According to the code quality guidelines, the use of any type should be avoided. If its use is absolutely necessary, please add a comment explaining why. Otherwise, consider using a more specific type or an interface to improve type safety.
Suggested change:
| .map((w: any) => (typeof w?.type === "string" ? w.type.toLowerCase() : "")) | |
| .map((w: { type?: string }) => (typeof w?.type === "string" ? w.type.toLowerCase() : "")) |
| try { | ||
| const registered = Registry.list().warehouses | ||
| return registered.find((w) => w.name === (args.warehouse ?? registered[0]?.name)) | ||
| } catch { |
There was a problem hiding this comment.
[🔵 LOW] The expression args.warehouse ?? registered[0]?.name is evaluated on every iteration of the .find() method. Although the array is small, it's better practice to evaluate this target name once before the loop to make the code cleaner and avoid repeated evaluation.
Suggested change:
| try { | |
| const registered = Registry.list().warehouses | |
| return registered.find((w) => w.name === (args.warehouse ?? registered[0]?.name)) | |
| } catch { | |
| try { | |
| const registered = Registry.list().warehouses | |
| const targetName = args.warehouse ?? registered[0]?.name | |
| return registered.find((w) => w.name === targetName) | |
| } catch { |
| // altimate_change start — de.* attributes lifted onto span by tracer | ||
| "de.warehouse.rows_returned": result.row_count, | ||
| ...(warehouseEntry?.type && { "de.warehouse.system": warehouseEntry.type }), | ||
| ...(warehouseEntry?.type && { "de.sql.dialect": warehouseEntry.type }), | ||
| // altimate_change end |
There was a problem hiding this comment.
[🔵 LOW] Since both metadata properties depend on the exact same condition (warehouseEntry?.type), they can be combined into a single object spread. This reduces redundancy and makes the code slightly more concise.
Suggested change:
| // altimate_change start — de.* attributes lifted onto span by tracer | |
| "de.warehouse.rows_returned": result.row_count, | |
| ...(warehouseEntry?.type && { "de.warehouse.system": warehouseEntry.type }), | |
| ...(warehouseEntry?.type && { "de.sql.dialect": warehouseEntry.type }), | |
| // altimate_change end | |
| // altimate_change start — de.* attributes lifted onto span by tracer | |
| "de.warehouse.rows_returned": result.row_count, | |
| ...(warehouseEntry?.type && { | |
| "de.warehouse.system": warehouseEntry.type, | |
| "de.sql.dialect": warehouseEntry.type | |
| }), | |
| // altimate_change end |
| const colsRead = new Set<string>() | ||
| const colsWritten = new Set<string>() | ||
|
|
||
| const extractTable = (edge: Record<string, any>, side: "source" | "target"): string | undefined => { |
There was a problem hiding this comment.
[🔵 LOW] Avoid using the any type to comply with TypeScript best practices. Since edge properties are accessed dynamically via string index, unknown is safe and appropriate here.
Suggested change:
| const extractTable = (edge: Record<string, any>, side: "source" | "target"): string | undefined => { | |
| const extractTable = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => { |
| return undefined | ||
| } | ||
|
|
||
| for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) { |
There was a problem hiding this comment.
[🔵 LOW] Replace any with unknown to adhere to the rule against using any types.
Suggested change:
| for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) { | |
| for (const edge of (data.column_lineage ?? []) as Record<string, unknown>[]) { |
| if (inputTables.size > 0) lineageAttrs["de.sql.lineage.input_tables"] = [...inputTables].slice(0, 50) | ||
| // Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute | ||
| // type to array when there are multiple outputs. Omit the attribute instead. | ||
| if (outputs.size === 1) lineageAttrs["de.sql.lineage.output_table"] = [...outputs][0] | ||
| if (colsRead.size > 0) lineageAttrs["de.sql.lineage.columns_read"] = [...colsRead].slice(0, 100) | ||
| if (colsWritten.size > 0) lineageAttrs["de.sql.lineage.columns_written"] = [...colsWritten].slice(0, 100) | ||
| if (args.dialect) lineageAttrs["de.sql.dialect"] = args.dialect |
There was a problem hiding this comment.
[🟠 MEDIUM] Hardcoding telemetry keys violates the 'No Hardcoding' standard and can lead to inconsistencies if the attribute names change in the future. These keys are already defined as constants. Please import DE_SQL from ../observability/de-attributes and use the exported constants instead.
Suggested change:
| if (inputTables.size > 0) lineageAttrs["de.sql.lineage.input_tables"] = [...inputTables].slice(0, 50) | |
| // Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute | |
| // type to array when there are multiple outputs. Omit the attribute instead. | |
| if (outputs.size === 1) lineageAttrs["de.sql.lineage.output_table"] = [...outputs][0] | |
| if (colsRead.size > 0) lineageAttrs["de.sql.lineage.columns_read"] = [...colsRead].slice(0, 100) | |
| if (colsWritten.size > 0) lineageAttrs["de.sql.lineage.columns_written"] = [...colsWritten].slice(0, 100) | |
| if (args.dialect) lineageAttrs["de.sql.dialect"] = args.dialect | |
| if (inputTables.size > 0) lineageAttrs[DE_SQL.LINEAGE_INPUT_TABLES] = [...inputTables].slice(0, 50) | |
| // Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute | |
| // type to array when there are multiple outputs. Omit the attribute instead. | |
| if (outputs.size === 1) lineageAttrs[DE_SQL.LINEAGE_OUTPUT_TABLE] = [...outputs][0] | |
| if (colsRead.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_READ] = [...colsRead].slice(0, 100) | |
| if (colsWritten.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_WRITTEN] = [...colsWritten].slice(0, 100) | |
| if (args.dialect) lineageAttrs[DE_SQL.DIALECT] = args.dialect |
| const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness" | ||
| const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i")) |
There was a problem hiding this comment.
[🟠 MEDIUM] This dynamic regular expression is instantiated on every call to classifyBash, which incurs unnecessary allocation and compilation overhead. Since dbtVerbs is a constant string, consider extracting this RegExp declaration outside the function body so it is only compiled once, improving performance.
| const sql = sqlMatch[1].slice(0, 8000) | ||
| out[DE.SQL.QUERY_TEXT] = sql | ||
| const tables = extractInputTables(sql) |
There was a problem hiding this comment.
[🟠 MEDIUM] For bash tools, extractInputTables is called on the sliced 8000-character string sql. This contradicts the strategy used for sql_execute tools (where the full query q is parsed to avoid losing table references at the tail). To maintain consistency and ensure all lineage is captured, you should pass the full extracted string (sqlMatch[1]) to extractInputTables.
Suggested change:
| const sql = sqlMatch[1].slice(0, 8000) | |
| out[DE.SQL.QUERY_TEXT] = sql | |
| const tables = extractInputTables(sql) | |
| const sql = sqlMatch[1].slice(0, 8000)\n out[DE.SQL.QUERY_TEXT] = sql\n const tables = extractInputTables(sqlMatch[1]) |
| if (env.dbtPresent != null) out[DE.ENV.DBT_PRESENT] = env.dbtPresent | ||
| if (env.manifestPresent != null) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent |
There was a problem hiding this comment.
[🔵 LOW] According to the coding standards, the use of != is prohibited. Since dbtPresent and manifestPresent are defined as boolean | undefined, you should use strict inequality !== undefined instead of != null.
Suggested change:
| if (env.dbtPresent != null) out[DE.ENV.DBT_PRESENT] = env.dbtPresent | |
| if (env.manifestPresent != null) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent | |
| if (env.dbtPresent !== undefined) out[DE.ENV.DBT_PRESENT] = env.dbtPresent\n if (env.manifestPresent !== undefined) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent |
🤖 Code Review — OpenCodeReview (Gemini) — No Issues FoundNo supported files changed. |
Summary
Wire the existing trace-attribute scaffolding (
de-attributes.ts,setSpanAttributes, viewer grouped rendering atviewer.ts:514) so real values flow onto every tool span and the session root.Before this change, 0/101 traces on disk had any populated
attributesfield — the schema slot, vocabulary, setter, and viewer rendering all existed but had zero callers. This PR connects the seam.Two layers, no LLM:
Layer 1 — tool-provided metadata (high fidelity, from real driver/parser values). Tools surface structured fields by setting
de.*keys on their returnedmetadataobject.Trace.logToolCalllifts those keys ontospan.attributes. Tools never import the observability layer — the seam is the existingToolStateCompleted.metadatafield already plumbed throughmessage-v2. Per-value 10 KB + total 32 KB byte caps prevent runaway payloads.Layer 2 — procedural classifier (
observability/annotator.ts). Pure function over(toolName, input, output). Lookup-table tool taxonomy, regex bash-intent + dbt-layer-from-path classification, deterministic outcome/artifacts/env rollups. Best-effort — returnsundefinedrather than emitting wrong attributes.Layer 1 wins over Layer 2 on key conflicts in both per-tool and session merges.
What got added (didn't exist before)
1. Tracer reads tool metadata (
tracing.ts:logToolCall)state.metadatais filtered to keys starting withde., JSON-validated per value, and merged intospan.attributes. Bounded by per-value (10 KB) and total (32 KB) byte caps so a giantde.sql.query_textor lineage array can't balloon snapshots.2. Procedural classifier (
observability/annotator.ts, new file)Two pure functions:
annotateToolSpan(name, input, output)— derivesde.tool.*,de.dbt.layer,de.dbt.command,de.sql.query_text,de.sql.lineage.input_tables.annotateSession(trace)— derivesde.workflow.type(with confidence),de.outcome.*,de.artifacts.*,de.env.*.Called from
logToolCall(per-span),endTrace(session rollup), andflushSync(crash-path session rollup). Best-effort, never throws.3. Five new sub-namespaces in
de-attributes.tsDE.WORKFLOWtype,intent,type_confidenceDE.OUTCOMEclass,executed,change_appliedDE.ARTIFACTSfiles_read,files_edited,models_mentioned,tables_referencedDE.ENVdbt_present,dbt_manifest_present,warehouse_type,tools_detectedDE.TOOLcategory,subcategory,vendor,bash_intent,bash_invokedPlus two new keys in existing namespaces:
de.dbt.layer(staging / intermediate / dim / fact / agg / mart / source / seed / macro / test / snapshot)de.warehouse.rows_total(forschema_inspect's table-level row count — distinct from query-resultrows_returned)4. Viewer grouping extension (
viewer.ts:514)Added 5 color-coded sections (Workflow, Outcome, Artifacts, Environment, Tool) so the new attributes render with their own labeled groups in the side panel instead of falling through to the generic key-value bucket.
5. Four tool opt-ins (Layer 1, high-fidelity)
sql_executede.warehouse.{system,rows_returned},de.sql.dialect(resolved from Registry warehouse)altimate_core_column_lineagede.sql.lineage.{input_tables,output_table,columns_read,columns_written}from altimate-core's parser — preserves case via structured endpoint extraction (no dot-split that would corrupt quoted identifiers)schema_inspectde.warehouse.rows_total,de.sql.lineage.output_tableproject_scande.env.{dbt_present,dbt_manifest_present,warehouse_type,tools_detected}from authoritative scan resultsThese four tools never import the observability layer — they set
de.*keys on their existing returnedmetadataobject.What this PR explicitly does NOT touch
altimate-core/altimate-core-internal— unmodified.tool/tool.ts) — the metadata channel was already there.message-v2.ts) — already carriedstate.metadata.TraceFile/TraceSpanschema — already had theattributesslot.Codex review
Each chunk was reviewed by Codex before moving on. Fixes folded into this PR:
(?:^|/)models/...)sliceinstead of dropping the valuede.tool.bash_intentattribute, not all bash spansoutput_table(omit on multi-output rather than switch type to array)source_table/source_columnwhen present)flushSyncruns the session rollup so crash/interrupted traces get root attrssetSpanAttributes(..., \"session\")callers still win over derivedde.warehouse.rows_totalinstead ofde.quality.row_countfor schema-inspect totals (quality keys reserved for profiling/check results)Test plan
bun run typecheck— clean (pre-existing zod/effect/datamate noise unrelated)bun run build:localsucceeds; binary runs against the Altimate gatewayread(dbt layer),bash(dbt build+wc -c), and a final assistant turn. Trace shows:readspan:de.tool.category=fs,de.dbt.layer=stagingbashspan:de.tool.bash_intent=dbt,de.dbt.command=buildde.outcome.class=success,de.outcome.executed=true,de.artifacts.files_read=[...]de.*attributessummary.statusmix)sql_execute,dbt_profiles,warehouse_list,warehouse_add,schema_inspect) all received theirde.tool.*andde.sql.*attributes via the procedural classifierbridge-guardif any upstream sync is pendingTrace size impact
Per-tool-span attributes: bounded by the 32 KB total cap implemented in
logToolCall.Session root attributes: small fixed shape (workflow code, outcome enum, file path arrays capped at 100 entries each).
Existing
MAX_SERIALIZED_SPANS=5000cap is unchanged.Summary by cubic
Adds structured
de.*attributes to every tool span and the session root via a tool metadata channel and a deterministic annotator. Improves trace readability with new attribute namespaces and viewer groups for workflow, outcome, artifacts, environment, and tool taxonomy.New Features
de.*keys from toolmetadataonto span attributes with 10 KB per-value and 32 KB total caps; preserves existing caller-set attributes.de-attributes.tswithWORKFLOW,OUTCOME,ARTIFACTS,ENV,TOOL, plusde.dbt.layerandde.warehouse.rows_total; update viewer groups to render them.sql_execute(de.warehouse.{system,rows_returned},de.sql.dialect),altimate_core_column_lineage(structured lineage),schema_inspect(de.warehouse.rows_total,de.sql.lineage.output_table),project_scan(de.env.*).Bug Fixes
flushSync; keepde.sql.lineage.output_tablescalar.setSpanAttributes(..., "session")and tool metadata override derived values.Written for commit 6bac41e. Summary will update on new commits.
Summary by CodeRabbit