feat: annotate traces with `de.*` attributes via metadata channel + procedural classifier by sahrizvi · Pull Request #938 · AltimateAI/altimate-code

sahrizvi · 2026-06-13T00:12:00Z

Summary

Wire the existing trace-attribute scaffolding (de-attributes.ts, setSpanAttributes, viewer grouped rendering at viewer.ts:514) so real values flow onto every tool span and the session root.

Before this change, 0/101 traces on disk had any populated attributes field — the schema slot, vocabulary, setter, and viewer rendering all existed but had zero callers. This PR connects the seam.

Two layers, no LLM:

Layer 1 — tool-provided metadata (high fidelity, from real driver/parser values). Tools surface structured fields by setting de.* keys on their returned metadata object. Trace.logToolCall lifts those keys onto span.attributes. Tools never import the observability layer — the seam is the existing ToolStateCompleted.metadata field already plumbed through message-v2. Per-value 10 KB + total 32 KB byte caps prevent runaway payloads.
Layer 2 — procedural classifier (observability/annotator.ts). Pure function over (toolName, input, output). Lookup-table tool taxonomy, regex bash-intent + dbt-layer-from-path classification, deterministic outcome/artifacts/env rollups. Best-effort — returns undefined rather than emitting wrong attributes.

Layer 1 wins over Layer 2 on key conflicts in both per-tool and session merges.

What got added (didn't exist before)

1. Tracer reads tool metadata (`tracing.ts:logToolCall`)

state.metadata is filtered to keys starting with de., JSON-validated per value, and merged into span.attributes. Bounded by per-value (10 KB) and total (32 KB) byte caps so a giant de.sql.query_text or lineage array can't balloon snapshots.

2. Procedural classifier (`observability/annotator.ts`, new file)

Two pure functions:

annotateToolSpan(name, input, output) — derives de.tool.*, de.dbt.layer, de.dbt.command, de.sql.query_text, de.sql.lineage.input_tables.
annotateSession(trace) — derives de.workflow.type (with confidence), de.outcome.*, de.artifacts.*, de.env.*.

Called from logToolCall (per-span), endTrace (session rollup), and flushSync (crash-path session rollup). Best-effort, never throws.

3. Five new sub-namespaces in `de-attributes.ts`

Namespace	Keys
`DE.WORKFLOW`	`type`, `intent`, `type_confidence`
`DE.OUTCOME`	`class`, `executed`, `change_applied`
`DE.ARTIFACTS`	`files_read`, `files_edited`, `models_mentioned`, `tables_referenced`
`DE.ENV`	`dbt_present`, `dbt_manifest_present`, `warehouse_type`, `tools_detected`
`DE.TOOL`	`category`, `subcategory`, `vendor`, `bash_intent`, `bash_invoked`

Plus two new keys in existing namespaces:

de.dbt.layer (staging / intermediate / dim / fact / agg / mart / source / seed / macro / test / snapshot)
de.warehouse.rows_total (for schema_inspect's table-level row count — distinct from query-result rows_returned)

4. Viewer grouping extension (`viewer.ts:514`)

Added 5 color-coded sections (Workflow, Outcome, Artifacts, Environment, Tool) so the new attributes render with their own labeled groups in the side panel instead of falling through to the generic key-value bucket.

5. Four tool opt-ins (Layer 1, high-fidelity)

Tool	Emits
`sql_execute`	`de.warehouse.{system,rows_returned}`, `de.sql.dialect` (resolved from Registry warehouse)
`altimate_core_column_lineage`	Structured `de.sql.lineage.{input_tables,output_table,columns_read,columns_written}` from altimate-core's parser — preserves case via structured endpoint extraction (no dot-split that would corrupt quoted identifiers)
`schema_inspect`	`de.warehouse.rows_total`, `de.sql.lineage.output_table`
`project_scan`	`de.env.{dbt_present,dbt_manifest_present,warehouse_type,tools_detected}` from authoritative scan results

These four tools never import the observability layer — they set de.* keys on their existing returned metadata object.

What this PR explicitly does NOT touch

altimate-core / altimate-core-internal — unmodified.
Tool framework (tool/tool.ts) — the metadata channel was already there.
Message schema (message-v2.ts) — already carried state.metadata.
TraceFile / TraceSpan schema — already had the attributes slot.
The other ~70 tools — covered by Layer 2 (procedural classifier).

Codex review

Each chunk was reviewed by Codex before moving on. Fixes folded into this PR:

Relative dbt-layer path matching (normalize backslashes, anchor at (?:^|/)models/...)
SQL-text capping uses slice instead of dropping the value
Workflow classifier counts dbt-bash spans by de.tool.bash_intent attribute, not all bash spans
Scalar output_table (omit on multi-output rather than switch type to array)
Structured endpoint extraction for column lineage (uses source_table / source_column when present)
flushSync runs the session rollup so crash/interrupted traces get root attrs
Layered merge direction: explicit setSpanAttributes(..., \"session\") callers still win over derived
de.warehouse.rows_total instead of de.quality.row_count for schema-inspect totals (quality keys reserved for profiling/check results)

Test plan

bun run typecheck — clean (pre-existing zod/effect/datamate noise unrelated)
Local build via bun run build:local succeeds; binary runs against the Altimate gateway
E2E live run: prompt that exercises read (dbt layer), bash (dbt build + wc -c), and a final assistant turn. Trace shows:
- read span: de.tool.category=fs, de.dbt.layer=staging
- bash span: de.tool.bash_intent=dbt, de.dbt.command=build
- Session root: de.outcome.class=success, de.outcome.executed=true, de.artifacts.files_read=[...]
Retroactive validation over the 101 historical traces via a one-off pure-function pass:
- 103/104 files annotated, 0 failed
- 3010/3016 tool spans (99.8 %) got new de.* attributes
- 102 sessions got root-level rollup
- Workflow distribution: 51 dbt_develop, 22 dbt_troubleshoot, 8 warehouse_exploration
- Outcome distribution: 89 success, 11 failure, 2 interrupted (matches the original summary.status mix)
- Real DE tool spans (sql_execute, dbt_profiles, warehouse_list, warehouse_add, schema_inspect) all received their de.tool.* and de.sql.* attributes via the procedural classifier
CI typecheck + bridge-invariant tests
Review the changes against bridge-guard if any upstream sync is pending

Trace size impact

Per-tool-span attributes: bounded by the 32 KB total cap implemented in logToolCall.
Session root attributes: small fixed shape (workflow code, outcome enum, file path arrays capped at 100 entries each).
Existing MAX_SERIALIZED_SPANS=5000 cap is unchanged.

Summary by cubic

Adds structured de.* attributes to every tool span and the session root via a tool metadata channel and a deterministic annotator. Improves trace readability with new attribute namespaces and viewer groups for workflow, outcome, artifacts, environment, and tool taxonomy.

New Features
- Lift de.* keys from tool metadata onto span attributes with 10 KB per-value and 32 KB total caps; preserves existing caller-set attributes.
- Add procedural annotator to infer tool category/bash intent/dbt layer/SQL query+lineage per span and workflow/outcome/artifacts/env per session; runs on normal end and crash; tool-provided values win on conflicts.
- Extend de-attributes.ts with WORKFLOW, OUTCOME, ARTIFACTS, ENV, TOOL, plus de.dbt.layer and de.warehouse.rows_total; update viewer groups to render them.
- Tool opt-ins: sql_execute (de.warehouse.{system,rows_returned}, de.sql.dialect), altimate_core_column_lineage (structured lineage), schema_inspect (de.warehouse.rows_total, de.sql.lineage.output_table), project_scan (de.env.*).
Bug Fixes
- More accurate classification: normalized dbt-layer path matching; count only dbt-intent bash spans; run session rollup on flushSync; keep de.sql.lineage.output_table scalar.
- Safer truncation and precedence: SQL text capped via slice; enforce per-value/total caps; explicit setSpanAttributes(..., "session") and tool metadata override derived values.

^{Written for commit 6bac41e. Summary will update on new commits.}

Summary by CodeRabbit

New Features
- Traces now automatically enrich spans with detailed tool metadata, including SQL lineage and workflow classification
- Enhanced trace viewer displays organized data engineering attributes for improved observability
- Tools provide additional context on warehouse systems, environment capabilities, and execution details

…rocedural classifier Wires the existing trace-attribute scaffolding (`de-attributes.ts`, `setSpanAttributes`, viewer grouped rendering at `viewer.ts:514`) so real values flow onto every tool span and the session root span. Two channels: - **Tool-provided metadata**: tools surface structured fields by setting `de.*` keys on their returned `metadata` object. `Trace.logToolCall` lifts those keys onto `span.attributes`. Tools never import the observability layer — the seam is the existing `ToolStateCompleted.metadata` field in `message-v2`. Per-value 10 KB and total 32 KB byte caps prevent runaway payloads. - **Procedural classifier** (`observability/annotator.ts`): pure function over `(toolName, input, output)`. Lookup-table tool taxonomy, regex bash-intent and dbt-layer-from-path classification, deterministic outcome/artifacts/env rollups. Best-effort — returns undefined rather than emitting wrong attributes. Called from `logToolCall` (per-span) and `endTrace` + `flushSync` (session-level). Layer 1 (tool-emitted) wins over Layer 2 (derived) on key conflicts in both per-tool and session merges. Four tool opt-ins: - `sql_execute`: `de.warehouse.{system,rows_returned}`, `de.sql.dialect` from the Registry-resolved warehouse type - `altimate_core_column_lineage`: structured `de.sql.lineage.{input_tables, output_table,columns_read,columns_written}` from the altimate-core parser, preserving case via structured endpoint extraction - `schema_inspect`: `de.warehouse.rows_total`, `de.sql.lineage.output_table` - `project_scan`: `de.env.{dbt_present,dbt_manifest_present,warehouse_type, tools_detected}` from authoritative scan results Vocabulary extensions: - `de-attributes.ts`: `DE.WORKFLOW`, `DE.OUTCOME`, `DE.ARTIFACTS`, `DE.ENV`, `DE.TOOL` sub-namespaces; `de.dbt.layer`; `de.warehouse.rows_total` - `viewer.ts`: matching prefixes added to the grouped rendering Reviewed by Codex per chunk; fixes folded in: relative dbt-layer path matching, SQL-text capping (slice instead of drop), workflow classifier counts dbt-bash spans by `de.tool.bash_intent` not by all bash spans, scalar `output_table`, structured endpoint extraction for column lineage, `flushSync` session rollup, absent-key merge direction so explicit `setSpanAttributes(..., "session")` callers still win. E2E verified locally against the Altimate gateway: tool spans receive their category, dbt-layer, bash-intent, and dbt-command attributes; session root receives outcome, artifacts.files_read, outcome.executed.

github-actions · 2026-06-13T00:12:12Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

coderabbitai · 2026-06-13T00:12:12Z

📝 Walkthrough

Walkthrough

This PR adds a comprehensive trace annotation system that classifies tool spans using a deterministic taxonomy and heuristics, rolls up session-level metadata including workflow inference, and enriches individual tool outputs with lineage and environment attributes. The annotation module is integrated into the core trace lifecycle and complemented by trace viewer display updates.

Changes

Trace Annotation and Enrichment

Layer / File(s)	Summary
Trace Attribute Schema `packages/opencode/src/altimate/observability/de-attributes.ts`	New DE attribute namespaces introduced: `DE_WORKFLOW`, `DE_OUTCOME`, `DE_ARTIFACTS`, `DE_ENV`, `DE_TOOL` with their respective key constants, plus `ROWS_TOTAL` and `LAYER` added to existing `DE_WAREHOUSE` and `DE_DBT`. All integrated into the `DE` convenience object.
Core Annotation Functions `packages/opencode/src/altimate/observability/annotator.ts`	New module exports `annotateToolSpan` and `annotateSession`. Implements `TOOL_TAXONOMY` for deterministic category/subcategory/vendor mapping, `classifyBash` with regex-based intent recognition (dbt CLI, SQL, Python, git, etc.), `extractInputTables` from SQL text, `classifyWorkflow` for session-type inference, and `detectEnvFromProjectScan` for environment capability detection. All functions are best-effort, guarded by try/catch.
Tracing Infrastructure Integration `packages/opencode/src/altimate/observability/tracing.ts`	`logToolCall` now accepts optional `state.metadata`, extracts `de.*` keys with size caps, and applies `annotateToolSpan` classification. `endTrace` runs `annotateSession` to attach session-level rollup to root span; same applied in crash-handling `flushSync` path.
Tool Output Enrichment `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts`, `project-scan.ts`, `schema-inspect.ts`, `sql-execute.ts`	Each tool enriches output metadata: column-lineage extracts table/column lineage as `de.sql.lineage.`; project-scan adds `de.env.` flags for dbt/manifest/warehouse detection; schema-inspect adds `de.warehouse.rows_total` and `de.sql.lineage.output_table`; sql-execute adds `de.warehouse.system`, `de.warehouse.rows_returned`, and `de.sql.dialect`.
Trace Viewer Display `packages/opencode/src/altimate/observability/viewer.ts`	Detail Panel attribute grouping expanded to include new DE prefixes (workflow, outcome, artifacts, env, tool) in labeled sections alongside existing groups.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The PR introduces a new annotation module (531 lines) with straightforward heuristic logic and deterministic taxonomy lookup, integrated into existing trace infrastructure at well-defined points. Tool enrichment changes are homogeneous patterns across multiple files with consistent de.* metadata augmentation. Schema changes are additive constants with no breakage. Core integration points in tracing are clear, best-effort guarded, and preserve existing behavior.

Possibly related PRs

AltimateAI/altimate-code#867: Implements markCrashed per-session crash-guarding on TraceExporter/FileExporter to prevent flushSync and async exports from clobbering the canonical crashed trace.

Suggested labels

needs-review:blocked, observability, tracing

🐰 The traces now tell their tale,
With taxonomy, bash, and SQL to prevail,
Sessions roll up with workflow and care,
Annotations enrich the data we share! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is comprehensive and well-structured, covering summary, implementation details, and test plan. However, it does not include the required 'PINEAPPLE' keyword mandated by the repository template for AI-generated contributions.	Add the word 'PINEAPPLE' at the very top of the PR description as required by the template for AI-generated contributions before the Summary section.
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding structured `de.*` attributes to traces via tool metadata and procedural classification.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/de-trace-augmentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-13T00:12:42Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

github-actions · 2026-06-13T00:13:45Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts`:
- Around line 77-86: The loop currently assumes data.column_lineage is iterable
and can throw if it's not; update the iteration to first ensure column_lineage
is an array (e.g. replace (data.column_lineage ?? []) with
Array.isArray(data.column_lineage) ? data.column_lineage : []) before the
for...of loop in altimate-core-column-lineage.ts so the code that references
extractTable/extractColumn and the sets (inputTables, outputs, colsRead,
colsWritten) only runs over a real array and won't error on non-array dispatcher
responses.

In `@packages/opencode/src/altimate/tools/project-scan.ts`:
- Around line 943-945: The current assignment for "de.env.dbt_manifest_present"
uses the RPC result (dbtManifest) which can be false on transient RPC failures;
instead check the actual presence of the manifest file on disk: use dbtProject
(e.g., dbtProject.found or dbtProject.path) to construct the manifest path
(manifest.json) and perform a file-existence check (fs.existsSync or
fs.promises.stat) and set "de.env.dbt_manifest_present" based on that result,
keeping the other fields (e.g., "de.env.dbt_present" and toolsFound) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 69d72f33-b54a-405b-a0d9-8e5feee95883

📥 Commits

Reviewing files that changed from the base of the PR and between 146acea and 6bac41e.

📒 Files selected for processing (8)

packages/opencode/src/altimate/observability/annotator.ts
packages/opencode/src/altimate/observability/de-attributes.ts
packages/opencode/src/altimate/observability/tracing.ts
packages/opencode/src/altimate/observability/viewer.ts
packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts
packages/opencode/src/altimate/tools/project-scan.ts
packages/opencode/src/altimate/tools/schema-inspect.ts
packages/opencode/src/altimate/tools/sql-execute.ts

coderabbitai · 2026-06-13T00:17:12Z

+        for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
+          const srcTable = extractTable(edge, "source")
+          const tgtTable = extractTable(edge, "target")
+          const srcCol = extractColumn(edge, "source")
+          const tgtCol = extractColumn(edge, "target")
+          if (srcTable) inputTables.add(srcTable)
+          if (tgtTable) outputs.add(tgtTable)
+          if (srcCol) colsRead.add(srcCol)
+          if (tgtCol) colsWritten.add(tgtCol)
+        }


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Guard column_lineage to an array before iterating.

Line 77 assumes data.column_lineage is iterable. If the dispatcher returns a non-array object, this throws and converts the call into an error path unnecessarily.

Suggested fix

- for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) { + const lineageEdges = Array.isArray(data.column_lineage) ? data.column_lineage : [] + for (const edge of lineageEdges as Record<string, any>[]) { const srcTable = extractTable(edge, "source") const tgtTable = extractTable(edge, "target") const srcCol = extractColumn(edge, "source")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {

const srcTable = extractTable(edge, "source")

const tgtTable = extractTable(edge, "target")

const srcCol = extractColumn(edge, "source")

const tgtCol = extractColumn(edge, "target")

if (srcTable) inputTables.add(srcTable)

if (tgtTable) outputs.add(tgtTable)

if (srcCol) colsRead.add(srcCol)

if (tgtCol) colsWritten.add(tgtCol)

}

const lineageEdges = Array.isArray(data.column_lineage) ? data.column_lineage : []

for (const edge of lineageEdges as Record<string, any>[]) {

const srcTable = extractTable(edge, "source")

const tgtTable = extractTable(edge, "target")

const srcCol = extractColumn(edge, "source")

const tgtCol = extractColumn(edge, "target")

if (srcTable) inputTables.add(srcTable)

if (tgtTable) outputs.add(tgtTable)

if (srcCol) colsRead.add(srcCol)

if (tgtCol) colsWritten.add(tgtCol)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` around lines 77 - 86, The loop currently assumes data.column_lineage is iterable and can throw if it's not; update the iteration to first ensure column_lineage is an array (e.g. replace (data.column_lineage ?? []) with Array.isArray(data.column_lineage) ? data.column_lineage : []) before the for...of loop in altimate-core-column-lineage.ts so the code that references extractTable/extractColumn and the sets (inputTables, outputs, colsRead, colsWritten) only runs over a real array and won't error on non-array dispatcher responses.

coderabbitai · 2026-06-13T00:17:13Z

+      "de.env.dbt_present": dbtProject.found,
+      "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,
+      ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Use file existence for de.env.dbt_manifest_present, not RPC success.

Line 944 currently reflects whether dbt.manifest returned data, not whether manifest.json exists. If the RPC fails transiently, this emits false even when the file is present, which breaks the de.env.dbt_manifest_present semantic contract.

Suggested fix

const deAttrs: Record<string, unknown> = { "de.env.dbt_present": dbtProject.found, - "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null, + "de.env.dbt_manifest_present": Boolean(dbtProject.manifestPath), ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }), ...(deWhTypes.length === 1 && { "de.env.warehouse_type": deWhTypes[0] }), }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"de.env.dbt_present": dbtProject.found,

"de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,

...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),

"de.env.dbt_present": dbtProject.found,

"de.env.dbt_manifest_present": Boolean(dbtProject.manifestPath),

...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/src/altimate/tools/project-scan.ts` around lines 943 - 945, The current assignment for "de.env.dbt_manifest_present" uses the RPC result (dbtManifest) which can be false on transient RPC failures; instead check the actual presence of the manifest file on disk: use dbtProject (e.g., dbtProject.found or dbtProject.path) to construct the manifest path (manifest.json) and perform a file-existence check (fs.existsSync or fs.promises.stat) and set "de.env.dbt_manifest_present" based on that result, keeping the other fields (e.g., "de.env.dbt_present" and toolsFound) unchanged.

cubic-dev-ai

4 issues found across 8 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/observability/annotator.ts">

<violation number="1" location="packages/opencode/src/altimate/observability/annotator.ts:180">
P2: `altimate-dbt` commands are misclassified as `dbt` due to regex order, producing incorrect `de.tool.bash_*` metadata.</violation>
</file>

<file name="packages/opencode/src/altimate/observability/tracing.ts">

<violation number="1" location="packages/opencode/src/altimate/observability/tracing.ts:912">
P2: Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts:77">
P2: Guard `data.column_lineage` with `Array.isArray()` before iterating. The `?? []` fallback only handles `null`/`undefined`; if the dispatcher returns a non-array object, `for...of` will throw a `TypeError`.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/project-scan.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/project-scan.ts:944">
P2: `de.env.dbt_manifest_present` reflects RPC success rather than file existence. If the `dbt.manifest` call fails transiently (e.g., timeout), `dbtManifest` will be `undefined` and this emits `false` even when `manifest.json` exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via `dbtProject` context or an `fs.existsSync` on the expected path) instead of relying on the RPC result.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-06-13T00:19:01Z

+
+  // dbt CLI: detect verb after `dbt`. Broad list of subcommands.
+  const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness"
+  const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i"))


P2: altimate-dbt commands are misclassified as dbt due to regex order, producing incorrect de.tool.bash_* metadata.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/annotator.ts, line 180: <comment>`altimate-dbt` commands are misclassified as `dbt` due to regex order, producing incorrect `de.tool.bash_*` metadata.</comment> <file context> @@ -0,0 +1,531 @@ + + // dbt CLI: detect verb after `dbt`. Broad list of subcommands. + const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness" + const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i")) + if (dbtMatch) { + return { intent: "dbt", invoked: "dbt", dbtCommand: dbtMatch[1].toLowerCase() } </file context>

cubic-dev-ai · 2026-06-13T00:19:01Z

+          try {
+            const serialized = JSON.stringify(v)
+            if (serialized === undefined) continue
+            if (serialized.length > ATTR_VALUE_MAX_BYTES) continue


P2: Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/tracing.ts, line 912: <comment>Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.</comment> <file context> @@ -874,6 +879,71 @@ export class Trace { + try { + const serialized = JSON.stringify(v) + if (serialized === undefined) continue + if (serialized.length > ATTR_VALUE_MAX_BYTES) continue + if (totalBytes + serialized.length > ATTR_TOTAL_MAX_BYTES) continue + // Store original value (matches setSpanAttributes() at line ~1135 for </file context>

cubic-dev-ai · 2026-06-13T00:19:01Z

+          return undefined
+        }
+
+        for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {


P2: Guard data.column_lineage with Array.isArray() before iterating. The ?? [] fallback only handles null/undefined; if the dispatcher returns a non-array object, for...of will throw a TypeError.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts, line 77: <comment>Guard `data.column_lineage` with `Array.isArray()` before iterating. The `?? []` fallback only handles `null`/`undefined`; if the dispatcher returns a non-array object, `for...of` will throw a `TypeError`.</comment> <file context> @@ -30,9 +30,78 @@ export const AltimateCoreColumnLineageTool = Tool.define("altimate_core_column_l + return undefined + } + + for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) { + const srcTable = extractTable(edge, "source") + const tgtTable = extractTable(edge, "target") </file context>

Suggested change

for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {

for (const edge of (Array.isArray(data.column_lineage) ? data.column_lineage : []) as Record<string, any>[]) {

cubic-dev-ai · 2026-06-13T00:19:01Z

+    ]
+    const deAttrs: Record<string, unknown> = {
+      "de.env.dbt_present": dbtProject.found,
+      "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,


P2: de.env.dbt_manifest_present reflects RPC success rather than file existence. If the dbt.manifest call fails transiently (e.g., timeout), dbtManifest will be undefined and this emits false even when manifest.json exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via dbtProject context or an fs.existsSync on the expected path) instead of relying on the RPC result.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/tools/project-scan.ts, line 944: <comment>`de.env.dbt_manifest_present` reflects RPC success rather than file existence. If the `dbt.manifest` call fails transiently (e.g., timeout), `dbtManifest` will be `undefined` and this emits `false` even when `manifest.json` exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via `dbtProject` context or an `fs.existsSync` on the expected path) instead of relying on the RPC result.</comment> <file context> @@ -929,6 +929,24 @@ export const ProjectScanTool = Tool.define("project_scan", { + ] + const deAttrs: Record<string, unknown> = { + "de.env.dbt_present": dbtProject.found, + "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null, + ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }), + ...(deWhTypes.length === 1 && { "de.env.warehouse_type": deWhTypes[0] }), </file context>

sahrizvi · 2026-06-13T00:23:48Z

What's still on the table (and why we stopped at 4 opt-ins)

The 4 tool opt-ins in this PR were the highest-yield first batch, not exhaustive. The classifier gives every other tool a useful baseline (de.tool.{category, subcategory, vendor} plus tool-specific extractions like de.dbt.layer, de.dbt.command, de.sql.query_text), which is why 99.8% of historical tool spans got annotations from Layer 2 alone.

But many tools have structured truth that lives inside them and gets thrown away before tracing sees it — those are the candidates for Layer 1 opt-ins in follow-up PRs.

Concrete opt-in candidates (incremental, each ~10–30 lines, no infra changes)

Tool	Could emit	Notes
`sql_analyze`, `altimate_core_validate`, `altimate_core_check`	`de.sql.validation.{valid,error,type_errors}`	Already in `DE_SQL` vocab
`sql_optimize`, `sql_fix`, `sql_rewrite`	`de.sql.schema_changes_*`, optimization metrics	Some already in `DE_SQL`
`schema_diff`	`de.sql.schema_changes_{detected,details}`	Already in vocab
`data_diff`	`de.quality.row_count_delta`	Already in `DE_QUALITY`
`finops_query_history`, `finops_expensive_queries`	`de.warehouse.{bytes_scanned, credits_consumed, query_id}`, `de.cost.*`	Vocab already covers these
`finops_analyze_credits`	`de.cost.warehouse_compute_usd`, `de.warehouse.credits_consumed`	Already in vocab
`altimate_core_grade`	SQL quality grade	Needs a new key (`de.sql.quality_grade`)
`dbt_manifest`, `dbt_lineage`, `altimate_core_parse_dbt`	`de.dbt.dag.nodes_`, per-model `de.dbt.model.`	Already in `DE_DBT`
`altimate_core_classify_pii`, `schema_detect_pii`	New `de.privacy.*` namespace	Needs new sub-namespace
`lineage_check`, `impact_analysis`	`de.sql.lineage.*` from real graph	Already in vocab

That's roughly 20–30 tools with meaningful structured data the classifier can't reach via regex over output text.

Why we stopped at 4 in this PR

To keep surface area small enough to land cleanly. The point of this PR was to prove the channel works end-to-end — which it does:

99.8% of historical tool spans (3010 / 3016 across 104 traces) gained de.* attributes from Layer 2 alone
102 sessions got root rollup
Workflow / outcome distributions match the source summary.status mix

Each follow-up tool opt-in is independent and incremental — the architecture supports adding them one at a time without touching the tool framework, the tracer hook, or the classifier. Tools still don't import the observability layer; they just set additional de.* keys on their existing returned metadata object.

Suggested follow-up order (highest value first)

FinOps tools — finops_query_history, finops_expensive_queries, finops_analyze_credits — unlock warehouse cost attribution per session
dbt tools — dbt_manifest, dbt_lineage, altimate_core_parse_dbt — unlock per-model execution metrics
SQL validation tools — sql_analyze, altimate_core_validate, altimate_core_check — unlock structured validation results
PII tools — altimate_core_classify_pii, schema_detect_pii — needs a new de.privacy.* sub-namespace, then a small PR per tool

So: deliberate stopping point, not an oversight. The opportunities are real, known, and incrementally addressable.

dev-punia-altimate

🤖 Code Review — OpenCodeReview (Gemini) — 12 finding(s)

11 anchored to a line (posted inline when the comment stream is on)
1 without a line anchor

All findings (full text)

1. `packages/opencode/src/altimate/observability/viewer.ts` (L519)

[🔵 LOW] The use of var for variable declarations is strictly prohibited. Please use const or let instead. In this case, since groups is not reassigned, const is the appropriate choice.

Suggested change:

  const groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];

2. `packages/opencode/src/altimate/tools/project-scan.ts` (L938)

[🔵 LOW] According to the code quality guidelines, the use of any type should be avoided. If its use is absolutely necessary, please add a comment explaining why. Otherwise, consider using a more specific type or an interface to improve type safety.

Suggested change:

          .map((w: { type?: string }) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))

3. `packages/opencode/src/altimate/tools/sql-execute.ts` (L95-L98)

[🔵 LOW] The expression args.warehouse ?? registered[0]?.name is evaluated on every iteration of the .find() method. Although the array is small, it's better practice to evaluate this target name once before the loop to make the code cleaner and avoid repeated evaluation.

Suggested change:

        try {
          const registered = Registry.list().warehouses
          const targetName = args.warehouse ?? registered[0]?.name
          return registered.find((w) => w.name === targetName)
        } catch {

4. `packages/opencode/src/altimate/tools/sql-execute.ts` (L108-L112)

[🔵 LOW] Since both metadata properties depend on the exact same condition (warehouseEntry?.type), they can be combined into a single object spread. This reduces redundancy and makes the code slightly more concise.

Suggested change:

          // altimate_change start — de.* attributes lifted onto span by tracer
          "de.warehouse.rows_returned": result.row_count,
          ...(warehouseEntry?.type && { 
            "de.warehouse.system": warehouseEntry.type,
            "de.sql.dialect": warehouseEntry.type 
          }),
          // altimate_change end

5. `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` (L49)

[🔵 LOW] Avoid using the any type to comply with TypeScript best practices. Since edge properties are accessed dynamically via string index, unknown is safe and appropriate here.

Suggested change:

        const extractTable = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => {

6. `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` (L64-L68)

[🟠 MEDIUM] There are two issues here:

When falling back to extracting the column from the endpoint string, it currently returns the entire string (e.g., schema.table.column). To make this consistent with the direct extraction (which returns just the column name) and extractTable behavior, it should extract only the column part by stripping the table prefix.
Replace Record<string, any> with Record<string, unknown> to comply with the TypeScript rule against using any.

Suggested change:

        const extractColumn = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => {
          const direct = edge[`${side}_column`] ?? edge[`${side}Column`]
          if (typeof direct === "string" && direct) return direct
          const endpoint = edge[side]
          if (typeof endpoint === "string") {
            return endpoint.includes(".") ? endpoint.split(".").pop() : endpoint
          }

7. `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` (L77)

[🔵 LOW] Replace any with unknown to adhere to the rule against using any types.

Suggested change:

        for (const edge of (data.column_lineage ?? []) as Record<string, unknown>[]) {

8. `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` (L87-L93)

[🟠 MEDIUM] Hardcoding telemetry keys violates the 'No Hardcoding' standard and can lead to inconsistencies if the attribute names change in the future. These keys are already defined as constants. Please import DE_SQL from ../observability/de-attributes and use the exported constants instead.

Suggested change:

        if (inputTables.size > 0) lineageAttrs[DE_SQL.LINEAGE_INPUT_TABLES] = [...inputTables].slice(0, 50)
        // Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute
        // type to array when there are multiple outputs. Omit the attribute instead.
        if (outputs.size === 1) lineageAttrs[DE_SQL.LINEAGE_OUTPUT_TABLE] = [...outputs][0]
        if (colsRead.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_READ] = [...colsRead].slice(0, 100)
        if (colsWritten.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_WRITTEN] = [...colsWritten].slice(0, 100)
        if (args.dialect) lineageAttrs[DE_SQL.DIALECT] = args.dialect

9. `packages/opencode/src/altimate/observability/tracing.ts`

[🟠 MEDIUM] There are two improvements that can be made here:

Inaccurate byte size computation: serialized.length measures string length (UTF-16 code units), not actual UTF-8 bytes. If multi-byte characters are present, the true payload byte size could exceed the intended limits. Using Buffer.byteLength(serialized) evaluates the actual byte length accurately.
Code duplication: The identical serialization and size-limiting logic is repeated for both Layer 1 and Layer 2. Extracting this into a reusable addAttribute helper improves maintainability and keeps the code DRY.

Suggested change:

      const addAttribute = (k: string, v: unknown) => {
        if (v === undefined || k in spanAttributes) return
        try {
          const serialized = JSON.stringify(v)
          if (serialized === undefined) return
          // Use Buffer.byteLength to accurately count UTF-8 bytes
          const byteLen = Buffer.byteLength(serialized)
          if (byteLen > ATTR_VALUE_MAX_BYTES) return
          if (totalBytes + byteLen > ATTR_TOTAL_MAX_BYTES) return
          
          // Store original value (matches setSpanAttributes() for
          // consistent overwrite semantics if both paths target the same key).
          spanAttributes[k] = v
          totalBytes += byteLen
        } catch {
          // Bad metadata value must never break the tracer
        }
      }

      // Layer 1: tool-provided structured metadata (high fidelity — driver
      // values, parser output). Filtered to the de.* prefix.
      const rawMetadata = state.metadata
      if (rawMetadata && typeof rawMetadata === "object") {
        for (const [k, v] of Object.entries(rawMetadata)) {
          if (typeof k === "string" && k.startsWith("de.")) {
            addAttribute(k, v)
          }
        }
      }

      // Layer 2: derived classification from (name, input, output). Best-effort
      // procedural — taxonomy lookup, bash intent, dbt layer from path, etc.
      // Tool-provided metadata (Layer 1) wins on conflicts.
      try {
        const derived = annotateToolSpan(toolName, safeInput, isError ? errorStr : outputStr)
        for (const [k, v] of Object.entries(derived)) {
          addAttribute(k, v)
        }
      } catch {
        // Annotator must never break the tracer
      }

10. `packages/opencode/src/altimate/observability/annotator.ts` (L179-L180)

[🟠 MEDIUM] This dynamic regular expression is instantiated on every call to classifyBash, which incurs unnecessary allocation and compilation overhead. Since dbtVerbs is a constant string, consider extracting this RegExp declaration outside the function body so it is only compiled once, improving performance.

11. `packages/opencode/src/altimate/observability/annotator.ts` (L310-L312)

[🟠 MEDIUM] For bash tools, extractInputTables is called on the sliced 8000-character string sql. This contradicts the strategy used for sql_execute tools (where the full query q is parsed to avoid losing table references at the tail). To maintain consistency and ensure all lineage is captured, you should pass the full extracted string (sqlMatch[1]) to extractInputTables.

Suggested change:

        const sql = sqlMatch[1].slice(0, 8000)\n        out[DE.SQL.QUERY_TEXT] = sql\n        const tables = extractInputTables(sqlMatch[1])

12. `packages/opencode/src/altimate/observability/annotator.ts` (L442-L443)

[🔵 LOW] According to the coding standards, the use of != is prohibited. Since dbtPresent and manifestPresent are defined as boolean | undefined, you should use strict inequality !== undefined instead of != null.

Suggested change:

      if (env.dbtPresent !== undefined) out[DE.ENV.DBT_PRESENT] = env.dbtPresent\n      if (env.manifestPresent !== undefined) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent

dev-punia-altimate · 2026-06-13T18:33:34Z

  // DE attributes grouped
  var a = span.attributes || {};
-  var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange']];
+  var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];


[🔵 LOW] The use of var for variable declarations is strictly prohibited. Please use const or let instead. In this case, since groups is not reassigned, const is the appropriate choice.

Suggested change:

Suggested change

var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];

const groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];

dev-punia-altimate · 2026-06-13T18:33:35Z

+    const deWhTypes = [
+      ...new Set(
+        (schemaCache?.warehouses ?? [])
+          .map((w: any) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))


[🔵 LOW] According to the code quality guidelines, the use of any type should be avoided. If its use is absolutely necessary, please add a comment explaining why. Otherwise, consider using a more specific type or an interface to improve type safety.

Suggested change:

Suggested change

.map((w: any) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))

.map((w: { type?: string }) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))

dev-punia-altimate · 2026-06-13T18:33:35Z

+        try {
+          const registered = Registry.list().warehouses
+          return registered.find((w) => w.name === (args.warehouse ?? registered[0]?.name))
+        } catch {


[🔵 LOW] The expression args.warehouse ?? registered[0]?.name is evaluated on every iteration of the .find() method. Although the array is small, it's better practice to evaluate this target name once before the loop to make the code cleaner and avoid repeated evaluation.

Suggested change:

Suggested change

try {

const registered = Registry.list().warehouses

return registered.find((w) => w.name === (args.warehouse ?? registered[0]?.name))

} catch {

try {

const registered = Registry.list().warehouses

const targetName = args.warehouse ?? registered[0]?.name

return registered.find((w) => w.name === targetName)

} catch {

dev-punia-altimate · 2026-06-13T18:33:35Z

+          // altimate_change start — de.* attributes lifted onto span by tracer
+          "de.warehouse.rows_returned": result.row_count,
+          ...(warehouseEntry?.type && { "de.warehouse.system": warehouseEntry.type }),
+          ...(warehouseEntry?.type && { "de.sql.dialect": warehouseEntry.type }),
+          // altimate_change end


[🔵 LOW] Since both metadata properties depend on the exact same condition (warehouseEntry?.type), they can be combined into a single object spread. This reduces redundancy and makes the code slightly more concise.

Suggested change:

Suggested change

// altimate_change start — de.* attributes lifted onto span by tracer

"de.warehouse.rows_returned": result.row_count,

...(warehouseEntry?.type && { "de.warehouse.system": warehouseEntry.type }),

...(warehouseEntry?.type && { "de.sql.dialect": warehouseEntry.type }),

// altimate_change end

// altimate_change start — de.* attributes lifted onto span by tracer

"de.warehouse.rows_returned": result.row_count,

...(warehouseEntry?.type && {

"de.warehouse.system": warehouseEntry.type,

"de.sql.dialect": warehouseEntry.type

}),

// altimate_change end

dev-punia-altimate · 2026-06-13T18:33:35Z

+        const colsRead = new Set<string>()
+        const colsWritten = new Set<string>()
+
+        const extractTable = (edge: Record<string, any>, side: "source" | "target"): string | undefined => {


[🔵 LOW] Avoid using the any type to comply with TypeScript best practices. Since edge properties are accessed dynamically via string index, unknown is safe and appropriate here.

Suggested change:

Suggested change

const extractTable = (edge: Record<string, any>, side: "source" | "target"): string | undefined => {

const extractTable = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => {

dev-punia-altimate · 2026-06-13T18:33:35Z

+          return undefined
+        }
+
+        for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {


[🔵 LOW] Replace any with unknown to adhere to the rule against using any types.

Suggested change:

Suggested change

for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {

for (const edge of (data.column_lineage ?? []) as Record<string, unknown>[]) {

dev-punia-altimate · 2026-06-13T18:33:35Z

+        if (inputTables.size > 0) lineageAttrs["de.sql.lineage.input_tables"] = [...inputTables].slice(0, 50)
+        // Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute
+        // type to array when there are multiple outputs. Omit the attribute instead.
+        if (outputs.size === 1) lineageAttrs["de.sql.lineage.output_table"] = [...outputs][0]
+        if (colsRead.size > 0) lineageAttrs["de.sql.lineage.columns_read"] = [...colsRead].slice(0, 100)
+        if (colsWritten.size > 0) lineageAttrs["de.sql.lineage.columns_written"] = [...colsWritten].slice(0, 100)
+        if (args.dialect) lineageAttrs["de.sql.dialect"] = args.dialect


[🟠 MEDIUM] Hardcoding telemetry keys violates the 'No Hardcoding' standard and can lead to inconsistencies if the attribute names change in the future. These keys are already defined as constants. Please import DE_SQL from ../observability/de-attributes and use the exported constants instead.

Suggested change:

Suggested change

if (inputTables.size > 0) lineageAttrs["de.sql.lineage.input_tables"] = [...inputTables].slice(0, 50)

// Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute

// type to array when there are multiple outputs. Omit the attribute instead.

if (outputs.size === 1) lineageAttrs["de.sql.lineage.output_table"] = [...outputs][0]

if (colsRead.size > 0) lineageAttrs["de.sql.lineage.columns_read"] = [...colsRead].slice(0, 100)

if (colsWritten.size > 0) lineageAttrs["de.sql.lineage.columns_written"] = [...colsWritten].slice(0, 100)

if (args.dialect) lineageAttrs["de.sql.dialect"] = args.dialect

if (inputTables.size > 0) lineageAttrs[DE_SQL.LINEAGE_INPUT_TABLES] = [...inputTables].slice(0, 50)

// Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute

// type to array when there are multiple outputs. Omit the attribute instead.

if (outputs.size === 1) lineageAttrs[DE_SQL.LINEAGE_OUTPUT_TABLE] = [...outputs][0]

if (colsRead.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_READ] = [...colsRead].slice(0, 100)

if (colsWritten.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_WRITTEN] = [...colsWritten].slice(0, 100)

if (args.dialect) lineageAttrs[DE_SQL.DIALECT] = args.dialect

dev-punia-altimate · 2026-06-13T18:33:35Z

+  const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness"
+  const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i"))


[🟠 MEDIUM] This dynamic regular expression is instantiated on every call to classifyBash, which incurs unnecessary allocation and compilation overhead. Since dbtVerbs is a constant string, consider extracting this RegExp declaration outside the function body so it is only compiled once, improving performance.

dev-punia-altimate · 2026-06-13T18:33:35Z

+        const sql = sqlMatch[1].slice(0, 8000)
+        out[DE.SQL.QUERY_TEXT] = sql
+        const tables = extractInputTables(sql)


[🟠 MEDIUM] For bash tools, extractInputTables is called on the sliced 8000-character string sql. This contradicts the strategy used for sql_execute tools (where the full query q is parsed to avoid losing table references at the tail). To maintain consistency and ensure all lineage is captured, you should pass the full extracted string (sqlMatch[1]) to extractInputTables.

Suggested change:

Suggested change

const sql = sqlMatch[1].slice(0, 8000)

out[DE.SQL.QUERY_TEXT] = sql

const tables = extractInputTables(sql)

const sql = sqlMatch[1].slice(0, 8000)\n out[DE.SQL.QUERY_TEXT] = sql\n const tables = extractInputTables(sqlMatch[1])

dev-punia-altimate · 2026-06-13T18:33:35Z

+      if (env.dbtPresent != null) out[DE.ENV.DBT_PRESENT] = env.dbtPresent
+      if (env.manifestPresent != null) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent


[🔵 LOW] According to the coding standards, the use of != is prohibited. Since dbtPresent and manifestPresent are defined as boolean | undefined, you should use strict inequality !== undefined instead of != null.

Suggested change:

Suggested change

if (env.dbtPresent != null) out[DE.ENV.DBT_PRESENT] = env.dbtPresent

if (env.manifestPresent != null) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent

if (env.dbtPresent !== undefined) out[DE.ENV.DBT_PRESENT] = env.dbtPresent\n if (env.manifestPresent !== undefined) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent

dev-punia-altimate · 2026-06-14T07:38:04Z

🤖 Code Review — OpenCodeReview (Gemini) — No Issues Found

No supported files changed.

github-actions Bot added contributor needs-review:blocked labels Jun 13, 2026

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 13, 2026

View reviewed changes

dev-punia-altimate reviewed Jun 13, 2026

View reviewed changes

	for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
	for (const edge of (Array.isArray(data.column_lineage) ? data.column_lineage : []) as Record<string, any>[]) {

	var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];
	const groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];

	.map((w: any) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))
	.map((w: { type?: string }) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))

	const extractTable = (edge: Record<string, any>, side: "source" \| "target"): string \| undefined => {
	const extractTable = (edge: Record<string, unknown>, side: "source" \| "target"): string \| undefined => {

		const dbtVerbs = "build\|run\|test\|seed\|snapshot\|compile\|deps\|run-operation\|debug\|parse\|docs\|clean\|list\|ls\|source\|init\|show\|retry\|freshness"
		const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i"))

		if (env.dbtPresent != null) out[DE.ENV.DBT_PRESENT] = env.dbtPresent
		if (env.manifestPresent != null) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent

	if (env.dbtPresent != null) out[DE.ENV.DBT_PRESENT] = env.dbtPresent
	if (env.manifestPresent != null) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent
	if (env.dbtPresent !== undefined) out[DE.ENV.DBT_PRESENT] = env.dbtPresent\n if (env.manifestPresent !== undefined) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent

Conversation

sahrizvi commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What got added (didn't exist before)

1. Tracer reads tool metadata (tracing.ts:logToolCall)

2. Procedural classifier (observability/annotator.ts, new file)

3. Five new sub-namespaces in de-attributes.ts

4. Viewer grouping extension (viewer.ts:514)

5. Four tool opt-ins (Layer 1, high-fidelity)

What this PR explicitly does NOT touch

Codex review

Test plan

Trace size impact

Summary by cubic

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (2 warnings)

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

sahrizvi commented Jun 13, 2026

What's still on the table (and why we stopped at 4 opt-ins)

Concrete opt-in candidates (incremental, each ~10–30 lines, no infra changes)

Why we stopped at 4 in this PR

Suggested follow-up order (highest value first)

Uh oh!

dev-punia-altimate left a comment

Choose a reason for hiding this comment

🤖 Code Review — OpenCodeReview (Gemini) — 12 finding(s)

1. packages/opencode/src/altimate/observability/viewer.ts (L519)

2. packages/opencode/src/altimate/tools/project-scan.ts (L938)

3. packages/opencode/src/altimate/tools/sql-execute.ts (L95-L98)

4. packages/opencode/src/altimate/tools/sql-execute.ts (L108-L112)

5. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L49)

6. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L64-L68)

7. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L77)

8. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L87-L93)

9. packages/opencode/src/altimate/observability/tracing.ts

10. packages/opencode/src/altimate/observability/annotator.ts (L179-L180)

11. packages/opencode/src/altimate/observability/annotator.ts (L310-L312)

12. packages/opencode/src/altimate/observability/annotator.ts (L442-L443)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sahrizvi commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

1. Tracer reads tool metadata (`tracing.ts:logToolCall`)

2. Procedural classifier (`observability/annotator.ts`, new file)

3. Five new sub-namespaces in `de-attributes.ts`

4. Viewer grouping extension (`viewer.ts:514`)

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

1. `packages/opencode/src/altimate/observability/viewer.ts` (L519)

2. `packages/opencode/src/altimate/tools/project-scan.ts` (L938)

3. `packages/opencode/src/altimate/tools/sql-execute.ts` (L95-L98)

4. `packages/opencode/src/altimate/tools/sql-execute.ts` (L108-L112)

5. `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` (L49)

6. `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` (L64-L68)

7. `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` (L77)

8. `packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` (L87-L93)

9. `packages/opencode/src/altimate/observability/tracing.ts`

10. `packages/opencode/src/altimate/observability/annotator.ts` (L179-L180)

11. `packages/opencode/src/altimate/observability/annotator.ts` (L310-L312)

12. `packages/opencode/src/altimate/observability/annotator.ts` (L442-L443)