Skip to content

feat: annotate traces with de.* attributes via metadata channel + procedural classifier#938

Open
sahrizvi wants to merge 1 commit into
mainfrom
feat/de-trace-augmentation
Open

feat: annotate traces with de.* attributes via metadata channel + procedural classifier#938
sahrizvi wants to merge 1 commit into
mainfrom
feat/de-trace-augmentation

Conversation

@sahrizvi

@sahrizvi sahrizvi commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Wire the existing trace-attribute scaffolding (de-attributes.ts, setSpanAttributes, viewer grouped rendering at viewer.ts:514) so real values flow onto every tool span and the session root.

Before this change, 0/101 traces on disk had any populated attributes field — the schema slot, vocabulary, setter, and viewer rendering all existed but had zero callers. This PR connects the seam.

Two layers, no LLM:

  • Layer 1 — tool-provided metadata (high fidelity, from real driver/parser values). Tools surface structured fields by setting de.* keys on their returned metadata object. Trace.logToolCall lifts those keys onto span.attributes. Tools never import the observability layer — the seam is the existing ToolStateCompleted.metadata field already plumbed through message-v2. Per-value 10 KB + total 32 KB byte caps prevent runaway payloads.

  • Layer 2 — procedural classifier (observability/annotator.ts). Pure function over (toolName, input, output). Lookup-table tool taxonomy, regex bash-intent + dbt-layer-from-path classification, deterministic outcome/artifacts/env rollups. Best-effort — returns undefined rather than emitting wrong attributes.

Layer 1 wins over Layer 2 on key conflicts in both per-tool and session merges.

What got added (didn't exist before)

1. Tracer reads tool metadata (tracing.ts:logToolCall)

state.metadata is filtered to keys starting with de., JSON-validated per value, and merged into span.attributes. Bounded by per-value (10 KB) and total (32 KB) byte caps so a giant de.sql.query_text or lineage array can't balloon snapshots.

2. Procedural classifier (observability/annotator.ts, new file)

Two pure functions:

  • annotateToolSpan(name, input, output) — derives de.tool.*, de.dbt.layer, de.dbt.command, de.sql.query_text, de.sql.lineage.input_tables.
  • annotateSession(trace) — derives de.workflow.type (with confidence), de.outcome.*, de.artifacts.*, de.env.*.

Called from logToolCall (per-span), endTrace (session rollup), and flushSync (crash-path session rollup). Best-effort, never throws.

3. Five new sub-namespaces in de-attributes.ts

Namespace Keys
DE.WORKFLOW type, intent, type_confidence
DE.OUTCOME class, executed, change_applied
DE.ARTIFACTS files_read, files_edited, models_mentioned, tables_referenced
DE.ENV dbt_present, dbt_manifest_present, warehouse_type, tools_detected
DE.TOOL category, subcategory, vendor, bash_intent, bash_invoked

Plus two new keys in existing namespaces:

  • de.dbt.layer (staging / intermediate / dim / fact / agg / mart / source / seed / macro / test / snapshot)
  • de.warehouse.rows_total (for schema_inspect's table-level row count — distinct from query-result rows_returned)

4. Viewer grouping extension (viewer.ts:514)

Added 5 color-coded sections (Workflow, Outcome, Artifacts, Environment, Tool) so the new attributes render with their own labeled groups in the side panel instead of falling through to the generic key-value bucket.

5. Four tool opt-ins (Layer 1, high-fidelity)

Tool Emits
sql_execute de.warehouse.{system,rows_returned}, de.sql.dialect (resolved from Registry warehouse)
altimate_core_column_lineage Structured de.sql.lineage.{input_tables,output_table,columns_read,columns_written} from altimate-core's parser — preserves case via structured endpoint extraction (no dot-split that would corrupt quoted identifiers)
schema_inspect de.warehouse.rows_total, de.sql.lineage.output_table
project_scan de.env.{dbt_present,dbt_manifest_present,warehouse_type,tools_detected} from authoritative scan results

These four tools never import the observability layer — they set de.* keys on their existing returned metadata object.

What this PR explicitly does NOT touch

  • altimate-core / altimate-core-internal — unmodified.
  • Tool framework (tool/tool.ts) — the metadata channel was already there.
  • Message schema (message-v2.ts) — already carried state.metadata.
  • TraceFile / TraceSpan schema — already had the attributes slot.
  • The other ~70 tools — covered by Layer 2 (procedural classifier).

Codex review

Each chunk was reviewed by Codex before moving on. Fixes folded into this PR:

  • Relative dbt-layer path matching (normalize backslashes, anchor at (?:^|/)models/...)
  • SQL-text capping uses slice instead of dropping the value
  • Workflow classifier counts dbt-bash spans by de.tool.bash_intent attribute, not all bash spans
  • Scalar output_table (omit on multi-output rather than switch type to array)
  • Structured endpoint extraction for column lineage (uses source_table / source_column when present)
  • flushSync runs the session rollup so crash/interrupted traces get root attrs
  • Layered merge direction: explicit setSpanAttributes(..., \"session\") callers still win over derived
  • de.warehouse.rows_total instead of de.quality.row_count for schema-inspect totals (quality keys reserved for profiling/check results)

Test plan

  • bun run typecheck — clean (pre-existing zod/effect/datamate noise unrelated)
  • Local build via bun run build:local succeeds; binary runs against the Altimate gateway
  • E2E live run: prompt that exercises read (dbt layer), bash (dbt build + wc -c), and a final assistant turn. Trace shows:
    • read span: de.tool.category=fs, de.dbt.layer=staging
    • bash span: de.tool.bash_intent=dbt, de.dbt.command=build
    • Session root: de.outcome.class=success, de.outcome.executed=true, de.artifacts.files_read=[...]
  • Retroactive validation over the 101 historical traces via a one-off pure-function pass:
    • 103/104 files annotated, 0 failed
    • 3010/3016 tool spans (99.8 %) got new de.* attributes
    • 102 sessions got root-level rollup
    • Workflow distribution: 51 dbt_develop, 22 dbt_troubleshoot, 8 warehouse_exploration
    • Outcome distribution: 89 success, 11 failure, 2 interrupted (matches the original summary.status mix)
    • Real DE tool spans (sql_execute, dbt_profiles, warehouse_list, warehouse_add, schema_inspect) all received their de.tool.* and de.sql.* attributes via the procedural classifier
  • CI typecheck + bridge-invariant tests
  • Review the changes against bridge-guard if any upstream sync is pending

Trace size impact

Per-tool-span attributes: bounded by the 32 KB total cap implemented in logToolCall.
Session root attributes: small fixed shape (workflow code, outcome enum, file path arrays capped at 100 entries each).
Existing MAX_SERIALIZED_SPANS=5000 cap is unchanged.


Summary by cubic

Adds structured de.* attributes to every tool span and the session root via a tool metadata channel and a deterministic annotator. Improves trace readability with new attribute namespaces and viewer groups for workflow, outcome, artifacts, environment, and tool taxonomy.

  • New Features

    • Lift de.* keys from tool metadata onto span attributes with 10 KB per-value and 32 KB total caps; preserves existing caller-set attributes.
    • Add procedural annotator to infer tool category/bash intent/dbt layer/SQL query+lineage per span and workflow/outcome/artifacts/env per session; runs on normal end and crash; tool-provided values win on conflicts.
    • Extend de-attributes.ts with WORKFLOW, OUTCOME, ARTIFACTS, ENV, TOOL, plus de.dbt.layer and de.warehouse.rows_total; update viewer groups to render them.
    • Tool opt-ins: sql_execute (de.warehouse.{system,rows_returned}, de.sql.dialect), altimate_core_column_lineage (structured lineage), schema_inspect (de.warehouse.rows_total, de.sql.lineage.output_table), project_scan (de.env.*).
  • Bug Fixes

    • More accurate classification: normalized dbt-layer path matching; count only dbt-intent bash spans; run session rollup on flushSync; keep de.sql.lineage.output_table scalar.
    • Safer truncation and precedence: SQL text capped via slice; enforce per-value/total caps; explicit setSpanAttributes(..., "session") and tool metadata override derived values.

Written for commit 6bac41e. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • New Features
    • Traces now automatically enrich spans with detailed tool metadata, including SQL lineage and workflow classification
    • Enhanced trace viewer displays organized data engineering attributes for improved observability
    • Tools provide additional context on warehouse systems, environment capabilities, and execution details

…rocedural classifier

Wires the existing trace-attribute scaffolding (`de-attributes.ts`, `setSpanAttributes`,
viewer grouped rendering at `viewer.ts:514`) so real values flow onto every tool span
and the session root span. Two channels:

- **Tool-provided metadata**: tools surface structured fields by setting `de.*`
  keys on their returned `metadata` object. `Trace.logToolCall` lifts those keys
  onto `span.attributes`. Tools never import the observability layer — the seam
  is the existing `ToolStateCompleted.metadata` field in `message-v2`. Per-value
  10 KB and total 32 KB byte caps prevent runaway payloads.

- **Procedural classifier** (`observability/annotator.ts`): pure function over
  `(toolName, input, output)`. Lookup-table tool taxonomy, regex bash-intent and
  dbt-layer-from-path classification, deterministic outcome/artifacts/env
  rollups. Best-effort — returns undefined rather than emitting wrong attributes.
  Called from `logToolCall` (per-span) and `endTrace` + `flushSync` (session-level).

Layer 1 (tool-emitted) wins over Layer 2 (derived) on key conflicts in both
per-tool and session merges.

Four tool opt-ins:

- `sql_execute`: `de.warehouse.{system,rows_returned}`, `de.sql.dialect` from
  the Registry-resolved warehouse type
- `altimate_core_column_lineage`: structured `de.sql.lineage.{input_tables,
  output_table,columns_read,columns_written}` from the altimate-core parser,
  preserving case via structured endpoint extraction
- `schema_inspect`: `de.warehouse.rows_total`, `de.sql.lineage.output_table`
- `project_scan`: `de.env.{dbt_present,dbt_manifest_present,warehouse_type,
  tools_detected}` from authoritative scan results

Vocabulary extensions:

- `de-attributes.ts`: `DE.WORKFLOW`, `DE.OUTCOME`, `DE.ARTIFACTS`, `DE.ENV`,
  `DE.TOOL` sub-namespaces; `de.dbt.layer`; `de.warehouse.rows_total`
- `viewer.ts`: matching prefixes added to the grouped rendering

Reviewed by Codex per chunk; fixes folded in: relative dbt-layer path matching,
SQL-text capping (slice instead of drop), workflow classifier counts dbt-bash
spans by `de.tool.bash_intent` not by all bash spans, scalar `output_table`,
structured endpoint extraction for column lineage, `flushSync` session rollup,
absent-key merge direction so explicit `setSpanAttributes(..., "session")`
callers still win.

E2E verified locally against the Altimate gateway: tool spans receive their
category, dbt-layer, bash-intent, and dbt-command attributes; session root
receives outcome, artifacts.files_read, outcome.executed.
@github-actions

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a comprehensive trace annotation system that classifies tool spans using a deterministic taxonomy and heuristics, rolls up session-level metadata including workflow inference, and enriches individual tool outputs with lineage and environment attributes. The annotation module is integrated into the core trace lifecycle and complemented by trace viewer display updates.

Changes

Trace Annotation and Enrichment

Layer / File(s) Summary
Trace Attribute Schema
packages/opencode/src/altimate/observability/de-attributes.ts
New DE attribute namespaces introduced: DE_WORKFLOW, DE_OUTCOME, DE_ARTIFACTS, DE_ENV, DE_TOOL with their respective key constants, plus ROWS_TOTAL and LAYER added to existing DE_WAREHOUSE and DE_DBT. All integrated into the DE convenience object.
Core Annotation Functions
packages/opencode/src/altimate/observability/annotator.ts
New module exports annotateToolSpan and annotateSession. Implements TOOL_TAXONOMY for deterministic category/subcategory/vendor mapping, classifyBash with regex-based intent recognition (dbt CLI, SQL, Python, git, etc.), extractInputTables from SQL text, classifyWorkflow for session-type inference, and detectEnvFromProjectScan for environment capability detection. All functions are best-effort, guarded by try/catch.
Tracing Infrastructure Integration
packages/opencode/src/altimate/observability/tracing.ts
logToolCall now accepts optional state.metadata, extracts de.* keys with size caps, and applies annotateToolSpan classification. endTrace runs annotateSession to attach session-level rollup to root span; same applied in crash-handling flushSync path.
Tool Output Enrichment
packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts, project-scan.ts, schema-inspect.ts, sql-execute.ts
Each tool enriches output metadata: column-lineage extracts table/column lineage as de.sql.lineage.*; project-scan adds de.env.* flags for dbt/manifest/warehouse detection; schema-inspect adds de.warehouse.rows_total and de.sql.lineage.output_table; sql-execute adds de.warehouse.system, de.warehouse.rows_returned, and de.sql.dialect.
Trace Viewer Display
packages/opencode/src/altimate/observability/viewer.ts
Detail Panel attribute grouping expanded to include new DE prefixes (workflow, outcome, artifacts, env, tool) in labeled sections alongside existing groups.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The PR introduces a new annotation module (531 lines) with straightforward heuristic logic and deterministic taxonomy lookup, integrated into existing trace infrastructure at well-defined points. Tool enrichment changes are homogeneous patterns across multiple files with consistent de.* metadata augmentation. Schema changes are additive constants with no breakage. Core integration points in tracing are clear, best-effort guarded, and preserve existing behavior.

Possibly related PRs

  • AltimateAI/altimate-code#867: Implements markCrashed per-session crash-guarding on TraceExporter/FileExporter to prevent flushSync and async exports from clobbering the canonical crashed trace.

Suggested labels

needs-review:blocked, observability, tracing


🐰 The traces now tell their tale,
With taxonomy, bash, and SQL to prevail,
Sessions roll up with workflow and care,
Annotations enrich the data we share!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is comprehensive and well-structured, covering summary, implementation details, and test plan. However, it does not include the required 'PINEAPPLE' keyword mandated by the repository template for AI-generated contributions. Add the word 'PINEAPPLE' at the very top of the PR description as required by the template for AI-generated contributions before the Summary section.
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding structured de.* attributes to traces via tool metadata and procedural classification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/de-trace-augmentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

1 similar comment
@github-actions

Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts`:
- Around line 77-86: The loop currently assumes data.column_lineage is iterable
and can throw if it's not; update the iteration to first ensure column_lineage
is an array (e.g. replace (data.column_lineage ?? []) with
Array.isArray(data.column_lineage) ? data.column_lineage : []) before the
for...of loop in altimate-core-column-lineage.ts so the code that references
extractTable/extractColumn and the sets (inputTables, outputs, colsRead,
colsWritten) only runs over a real array and won't error on non-array dispatcher
responses.

In `@packages/opencode/src/altimate/tools/project-scan.ts`:
- Around line 943-945: The current assignment for "de.env.dbt_manifest_present"
uses the RPC result (dbtManifest) which can be false on transient RPC failures;
instead check the actual presence of the manifest file on disk: use dbtProject
(e.g., dbtProject.found or dbtProject.path) to construct the manifest path
(manifest.json) and perform a file-existence check (fs.existsSync or
fs.promises.stat) and set "de.env.dbt_manifest_present" based on that result,
keeping the other fields (e.g., "de.env.dbt_present" and toolsFound) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 69d72f33-b54a-405b-a0d9-8e5feee95883

📥 Commits

Reviewing files that changed from the base of the PR and between 146acea and 6bac41e.

📒 Files selected for processing (8)
  • packages/opencode/src/altimate/observability/annotator.ts
  • packages/opencode/src/altimate/observability/de-attributes.ts
  • packages/opencode/src/altimate/observability/tracing.ts
  • packages/opencode/src/altimate/observability/viewer.ts
  • packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts
  • packages/opencode/src/altimate/tools/project-scan.ts
  • packages/opencode/src/altimate/tools/schema-inspect.ts
  • packages/opencode/src/altimate/tools/sql-execute.ts

Comment on lines +77 to +86
for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
const srcTable = extractTable(edge, "source")
const tgtTable = extractTable(edge, "target")
const srcCol = extractColumn(edge, "source")
const tgtCol = extractColumn(edge, "target")
if (srcTable) inputTables.add(srcTable)
if (tgtTable) outputs.add(tgtTable)
if (srcCol) colsRead.add(srcCol)
if (tgtCol) colsWritten.add(tgtCol)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Guard column_lineage to an array before iterating.

Line 77 assumes data.column_lineage is iterable. If the dispatcher returns a non-array object, this throws and converts the call into an error path unnecessarily.

Suggested fix
-        for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
+        const lineageEdges = Array.isArray(data.column_lineage) ? data.column_lineage : []
+        for (const edge of lineageEdges as Record<string, any>[]) {
           const srcTable = extractTable(edge, "source")
           const tgtTable = extractTable(edge, "target")
           const srcCol = extractColumn(edge, "source")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
const srcTable = extractTable(edge, "source")
const tgtTable = extractTable(edge, "target")
const srcCol = extractColumn(edge, "source")
const tgtCol = extractColumn(edge, "target")
if (srcTable) inputTables.add(srcTable)
if (tgtTable) outputs.add(tgtTable)
if (srcCol) colsRead.add(srcCol)
if (tgtCol) colsWritten.add(tgtCol)
}
const lineageEdges = Array.isArray(data.column_lineage) ? data.column_lineage : []
for (const edge of lineageEdges as Record<string, any>[]) {
const srcTable = extractTable(edge, "source")
const tgtTable = extractTable(edge, "target")
const srcCol = extractColumn(edge, "source")
const tgtCol = extractColumn(edge, "target")
if (srcTable) inputTables.add(srcTable)
if (tgtTable) outputs.add(tgtTable)
if (srcCol) colsRead.add(srcCol)
if (tgtCol) colsWritten.add(tgtCol)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts` around
lines 77 - 86, The loop currently assumes data.column_lineage is iterable and
can throw if it's not; update the iteration to first ensure column_lineage is an
array (e.g. replace (data.column_lineage ?? []) with
Array.isArray(data.column_lineage) ? data.column_lineage : []) before the
for...of loop in altimate-core-column-lineage.ts so the code that references
extractTable/extractColumn and the sets (inputTables, outputs, colsRead,
colsWritten) only runs over a real array and won't error on non-array dispatcher
responses.

Comment on lines +943 to +945
"de.env.dbt_present": dbtProject.found,
"de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,
...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Use file existence for de.env.dbt_manifest_present, not RPC success.

Line 944 currently reflects whether dbt.manifest returned data, not whether manifest.json exists. If the RPC fails transiently, this emits false even when the file is present, which breaks the de.env.dbt_manifest_present semantic contract.

Suggested fix
     const deAttrs: Record<string, unknown> = {
       "de.env.dbt_present": dbtProject.found,
-      "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,
+      "de.env.dbt_manifest_present": Boolean(dbtProject.manifestPath),
       ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),
       ...(deWhTypes.length === 1 && { "de.env.warehouse_type": deWhTypes[0] }),
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"de.env.dbt_present": dbtProject.found,
"de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,
...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),
"de.env.dbt_present": dbtProject.found,
"de.env.dbt_manifest_present": Boolean(dbtProject.manifestPath),
...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/altimate/tools/project-scan.ts` around lines 943 - 945,
The current assignment for "de.env.dbt_manifest_present" uses the RPC result
(dbtManifest) which can be false on transient RPC failures; instead check the
actual presence of the manifest file on disk: use dbtProject (e.g.,
dbtProject.found or dbtProject.path) to construct the manifest path
(manifest.json) and perform a file-existence check (fs.existsSync or
fs.promises.stat) and set "de.env.dbt_manifest_present" based on that result,
keeping the other fields (e.g., "de.env.dbt_present" and toolsFound) unchanged.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 8 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/observability/annotator.ts">

<violation number="1" location="packages/opencode/src/altimate/observability/annotator.ts:180">
P2: `altimate-dbt` commands are misclassified as `dbt` due to regex order, producing incorrect `de.tool.bash_*` metadata.</violation>
</file>

<file name="packages/opencode/src/altimate/observability/tracing.ts">

<violation number="1" location="packages/opencode/src/altimate/observability/tracing.ts:912">
P2: Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts:77">
P2: Guard `data.column_lineage` with `Array.isArray()` before iterating. The `?? []` fallback only handles `null`/`undefined`; if the dispatcher returns a non-array object, `for...of` will throw a `TypeError`.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/project-scan.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/project-scan.ts:944">
P2: `de.env.dbt_manifest_present` reflects RPC success rather than file existence. If the `dbt.manifest` call fails transiently (e.g., timeout), `dbtManifest` will be `undefined` and this emits `false` even when `manifest.json` exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via `dbtProject` context or an `fs.existsSync` on the expected path) instead of relying on the RPC result.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic


// dbt CLI: detect verb after `dbt`. Broad list of subcommands.
const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness"
const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: altimate-dbt commands are misclassified as dbt due to regex order, producing incorrect de.tool.bash_* metadata.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/annotator.ts, line 180:

<comment>`altimate-dbt` commands are misclassified as `dbt` due to regex order, producing incorrect `de.tool.bash_*` metadata.</comment>

<file context>
@@ -0,0 +1,531 @@
+
+  // dbt CLI: detect verb after `dbt`. Broad list of subcommands.
+  const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness"
+  const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i"))
+  if (dbtMatch) {
+    return { intent: "dbt", invoked: "dbt", dbtCommand: dbtMatch[1].toLowerCase() }
</file context>

try {
const serialized = JSON.stringify(v)
if (serialized === undefined) continue
if (serialized.length > ATTR_VALUE_MAX_BYTES) continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/tracing.ts, line 912:

<comment>Attribute size limits are computed with string length, not UTF-8 byte size. Non-ASCII metadata can bypass the intended payload caps and oversize trace exports.</comment>

<file context>
@@ -874,6 +879,71 @@ export class Trace {
+          try {
+            const serialized = JSON.stringify(v)
+            if (serialized === undefined) continue
+            if (serialized.length > ATTR_VALUE_MAX_BYTES) continue
+            if (totalBytes + serialized.length > ATTR_TOTAL_MAX_BYTES) continue
+            // Store original value (matches setSpanAttributes() at line ~1135 for
</file context>

return undefined
}

for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Guard data.column_lineage with Array.isArray() before iterating. The ?? [] fallback only handles null/undefined; if the dispatcher returns a non-array object, for...of will throw a TypeError.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts, line 77:

<comment>Guard `data.column_lineage` with `Array.isArray()` before iterating. The `?? []` fallback only handles `null`/`undefined`; if the dispatcher returns a non-array object, `for...of` will throw a `TypeError`.</comment>

<file context>
@@ -30,9 +30,78 @@ export const AltimateCoreColumnLineageTool = Tool.define("altimate_core_column_l
+          return undefined
+        }
+
+        for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
+          const srcTable = extractTable(edge, "source")
+          const tgtTable = extractTable(edge, "target")
</file context>
Suggested change
for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
for (const edge of (Array.isArray(data.column_lineage) ? data.column_lineage : []) as Record<string, any>[]) {

]
const deAttrs: Record<string, unknown> = {
"de.env.dbt_present": dbtProject.found,
"de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: de.env.dbt_manifest_present reflects RPC success rather than file existence. If the dbt.manifest call fails transiently (e.g., timeout), dbtManifest will be undefined and this emits false even when manifest.json exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via dbtProject context or an fs.existsSync on the expected path) instead of relying on the RPC result.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/tools/project-scan.ts, line 944:

<comment>`de.env.dbt_manifest_present` reflects RPC success rather than file existence. If the `dbt.manifest` call fails transiently (e.g., timeout), `dbtManifest` will be `undefined` and this emits `false` even when `manifest.json` exists on disk, violating the attribute's semantic contract. Consider checking file existence directly (e.g., via `dbtProject` context or an `fs.existsSync` on the expected path) instead of relying on the RPC result.</comment>

<file context>
@@ -929,6 +929,24 @@ export const ProjectScanTool = Tool.define("project_scan", {
+    ]
+    const deAttrs: Record<string, unknown> = {
+      "de.env.dbt_present": dbtProject.found,
+      "de.env.dbt_manifest_present": dbtManifest !== undefined && dbtManifest !== null,
+      ...(toolsFound.length > 0 && { "de.env.tools_detected": toolsFound }),
+      ...(deWhTypes.length === 1 && { "de.env.warehouse_type": deWhTypes[0] }),
</file context>

@sahrizvi

Copy link
Copy Markdown
Contributor Author

What's still on the table (and why we stopped at 4 opt-ins)

The 4 tool opt-ins in this PR were the highest-yield first batch, not exhaustive. The classifier gives every other tool a useful baseline (de.tool.{category, subcategory, vendor} plus tool-specific extractions like de.dbt.layer, de.dbt.command, de.sql.query_text), which is why 99.8% of historical tool spans got annotations from Layer 2 alone.

But many tools have structured truth that lives inside them and gets thrown away before tracing sees it — those are the candidates for Layer 1 opt-ins in follow-up PRs.

Concrete opt-in candidates (incremental, each ~10–30 lines, no infra changes)

Tool Could emit Notes
sql_analyze, altimate_core_validate, altimate_core_check de.sql.validation.{valid,error,type_errors} Already in DE_SQL vocab
sql_optimize, sql_fix, sql_rewrite de.sql.schema_changes_*, optimization metrics Some already in DE_SQL
schema_diff de.sql.schema_changes_{detected,details} Already in vocab
data_diff de.quality.row_count_delta Already in DE_QUALITY
finops_query_history, finops_expensive_queries de.warehouse.{bytes_scanned, credits_consumed, query_id}, de.cost.* Vocab already covers these
finops_analyze_credits de.cost.warehouse_compute_usd, de.warehouse.credits_consumed Already in vocab
altimate_core_grade SQL quality grade Needs a new key (de.sql.quality_grade)
dbt_manifest, dbt_lineage, altimate_core_parse_dbt de.dbt.dag.nodes_*, per-model de.dbt.model.* Already in DE_DBT
altimate_core_classify_pii, schema_detect_pii New de.privacy.* namespace Needs new sub-namespace
lineage_check, impact_analysis de.sql.lineage.* from real graph Already in vocab

That's roughly 20–30 tools with meaningful structured data the classifier can't reach via regex over output text.

Why we stopped at 4 in this PR

To keep surface area small enough to land cleanly. The point of this PR was to prove the channel works end-to-end — which it does:

  • 99.8% of historical tool spans (3010 / 3016 across 104 traces) gained de.* attributes from Layer 2 alone
  • 102 sessions got root rollup
  • Workflow / outcome distributions match the source summary.status mix

Each follow-up tool opt-in is independent and incremental — the architecture supports adding them one at a time without touching the tool framework, the tracer hook, or the classifier. Tools still don't import the observability layer; they just set additional de.* keys on their existing returned metadata object.

Suggested follow-up order (highest value first)

  1. FinOps toolsfinops_query_history, finops_expensive_queries, finops_analyze_credits — unlock warehouse cost attribution per session
  2. dbt toolsdbt_manifest, dbt_lineage, altimate_core_parse_dbt — unlock per-model execution metrics
  3. SQL validation toolssql_analyze, altimate_core_validate, altimate_core_check — unlock structured validation results
  4. PII toolsaltimate_core_classify_pii, schema_detect_pii — needs a new de.privacy.* sub-namespace, then a small PR per tool

So: deliberate stopping point, not an oversight. The opportunities are real, known, and incrementally addressable.

@dev-punia-altimate dev-punia-altimate left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Code Review — OpenCodeReview (Gemini) — 12 finding(s)

  • 11 anchored to a line (posted inline when the comment stream is on)
  • 1 without a line anchor
All findings (full text)

1. packages/opencode/src/altimate/observability/viewer.ts (L519)

[🔵 LOW] The use of var for variable declarations is strictly prohibited. Please use const or let instead. In this case, since groups is not reassigned, const is the appropriate choice.

Suggested change:

  const groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];

2. packages/opencode/src/altimate/tools/project-scan.ts (L938)

[🔵 LOW] According to the code quality guidelines, the use of any type should be avoided. If its use is absolutely necessary, please add a comment explaining why. Otherwise, consider using a more specific type or an interface to improve type safety.

Suggested change:

          .map((w: { type?: string }) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))

3. packages/opencode/src/altimate/tools/sql-execute.ts (L95-L98)

[🔵 LOW] The expression args.warehouse ?? registered[0]?.name is evaluated on every iteration of the .find() method. Although the array is small, it's better practice to evaluate this target name once before the loop to make the code cleaner and avoid repeated evaluation.

Suggested change:

        try {
          const registered = Registry.list().warehouses
          const targetName = args.warehouse ?? registered[0]?.name
          return registered.find((w) => w.name === targetName)
        } catch {

4. packages/opencode/src/altimate/tools/sql-execute.ts (L108-L112)

[🔵 LOW] Since both metadata properties depend on the exact same condition (warehouseEntry?.type), they can be combined into a single object spread. This reduces redundancy and makes the code slightly more concise.

Suggested change:

          // altimate_change start — de.* attributes lifted onto span by tracer
          "de.warehouse.rows_returned": result.row_count,
          ...(warehouseEntry?.type && { 
            "de.warehouse.system": warehouseEntry.type,
            "de.sql.dialect": warehouseEntry.type 
          }),
          // altimate_change end

5. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L49)

[🔵 LOW] Avoid using the any type to comply with TypeScript best practices. Since edge properties are accessed dynamically via string index, unknown is safe and appropriate here.

Suggested change:

        const extractTable = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => {

6. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L64-L68)

[🟠 MEDIUM] There are two issues here:

  1. When falling back to extracting the column from the endpoint string, it currently returns the entire string (e.g., schema.table.column). To make this consistent with the direct extraction (which returns just the column name) and extractTable behavior, it should extract only the column part by stripping the table prefix.
  2. Replace Record<string, any> with Record<string, unknown> to comply with the TypeScript rule against using any.

Suggested change:

        const extractColumn = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => {
          const direct = edge[`${side}_column`] ?? edge[`${side}Column`]
          if (typeof direct === "string" && direct) return direct
          const endpoint = edge[side]
          if (typeof endpoint === "string") {
            return endpoint.includes(".") ? endpoint.split(".").pop() : endpoint
          }

7. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L77)

[🔵 LOW] Replace any with unknown to adhere to the rule against using any types.

Suggested change:

        for (const edge of (data.column_lineage ?? []) as Record<string, unknown>[]) {

8. packages/opencode/src/altimate/tools/altimate-core-column-lineage.ts (L87-L93)

[🟠 MEDIUM] Hardcoding telemetry keys violates the 'No Hardcoding' standard and can lead to inconsistencies if the attribute names change in the future. These keys are already defined as constants. Please import DE_SQL from ../observability/de-attributes and use the exported constants instead.

Suggested change:

        if (inputTables.size > 0) lineageAttrs[DE_SQL.LINEAGE_INPUT_TABLES] = [...inputTables].slice(0, 50)
        // Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute
        // type to array when there are multiple outputs. Omit the attribute instead.
        if (outputs.size === 1) lineageAttrs[DE_SQL.LINEAGE_OUTPUT_TABLE] = [...outputs][0]
        if (colsRead.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_READ] = [...colsRead].slice(0, 100)
        if (colsWritten.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_WRITTEN] = [...colsWritten].slice(0, 100)
        if (args.dialect) lineageAttrs[DE_SQL.DIALECT] = args.dialect

9. packages/opencode/src/altimate/observability/tracing.ts

[🟠 MEDIUM] There are two improvements that can be made here:

  1. Inaccurate byte size computation: serialized.length measures string length (UTF-16 code units), not actual UTF-8 bytes. If multi-byte characters are present, the true payload byte size could exceed the intended limits. Using Buffer.byteLength(serialized) evaluates the actual byte length accurately.
  2. Code duplication: The identical serialization and size-limiting logic is repeated for both Layer 1 and Layer 2. Extracting this into a reusable addAttribute helper improves maintainability and keeps the code DRY.

Suggested change:

      const addAttribute = (k: string, v: unknown) => {
        if (v === undefined || k in spanAttributes) return
        try {
          const serialized = JSON.stringify(v)
          if (serialized === undefined) return
          // Use Buffer.byteLength to accurately count UTF-8 bytes
          const byteLen = Buffer.byteLength(serialized)
          if (byteLen > ATTR_VALUE_MAX_BYTES) return
          if (totalBytes + byteLen > ATTR_TOTAL_MAX_BYTES) return
          
          // Store original value (matches setSpanAttributes() for
          // consistent overwrite semantics if both paths target the same key).
          spanAttributes[k] = v
          totalBytes += byteLen
        } catch {
          // Bad metadata value must never break the tracer
        }
      }

      // Layer 1: tool-provided structured metadata (high fidelity — driver
      // values, parser output). Filtered to the de.* prefix.
      const rawMetadata = state.metadata
      if (rawMetadata && typeof rawMetadata === "object") {
        for (const [k, v] of Object.entries(rawMetadata)) {
          if (typeof k === "string" && k.startsWith("de.")) {
            addAttribute(k, v)
          }
        }
      }

      // Layer 2: derived classification from (name, input, output). Best-effort
      // procedural — taxonomy lookup, bash intent, dbt layer from path, etc.
      // Tool-provided metadata (Layer 1) wins on conflicts.
      try {
        const derived = annotateToolSpan(toolName, safeInput, isError ? errorStr : outputStr)
        for (const [k, v] of Object.entries(derived)) {
          addAttribute(k, v)
        }
      } catch {
        // Annotator must never break the tracer
      }

10. packages/opencode/src/altimate/observability/annotator.ts (L179-L180)

[🟠 MEDIUM] This dynamic regular expression is instantiated on every call to classifyBash, which incurs unnecessary allocation and compilation overhead. Since dbtVerbs is a constant string, consider extracting this RegExp declaration outside the function body so it is only compiled once, improving performance.

11. packages/opencode/src/altimate/observability/annotator.ts (L310-L312)

[🟠 MEDIUM] For bash tools, extractInputTables is called on the sliced 8000-character string sql. This contradicts the strategy used for sql_execute tools (where the full query q is parsed to avoid losing table references at the tail). To maintain consistency and ensure all lineage is captured, you should pass the full extracted string (sqlMatch[1]) to extractInputTables.

Suggested change:

        const sql = sqlMatch[1].slice(0, 8000)\n        out[DE.SQL.QUERY_TEXT] = sql\n        const tables = extractInputTables(sqlMatch[1])

12. packages/opencode/src/altimate/observability/annotator.ts (L442-L443)

[🔵 LOW] According to the coding standards, the use of != is prohibited. Since dbtPresent and manifestPresent are defined as boolean | undefined, you should use strict inequality !== undefined instead of != null.

Suggested change:

      if (env.dbtPresent !== undefined) out[DE.ENV.DBT_PRESENT] = env.dbtPresent\n      if (env.manifestPresent !== undefined) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent

// DE attributes grouped
var a = span.attributes || {};
var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange']];
var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🔵 LOW] The use of var for variable declarations is strictly prohibited. Please use const or let instead. In this case, since groups is not reassigned, const is the appropriate choice.

Suggested change:

Suggested change
var groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];
const groups = [['de.warehouse.','Warehouse','cyan'],['de.sql.','SQL','secondary'],['de.dbt.','dbt','orange'],['de.quality.','Quality','green'],['de.cost.','Cost','orange'],['de.workflow.','Workflow','accent'],['de.outcome.','Outcome','green'],['de.artifacts.','Artifacts','secondary'],['de.env.','Environment','cyan'],['de.tool.','Tool','accent']];

const deWhTypes = [
...new Set(
(schemaCache?.warehouses ?? [])
.map((w: any) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🔵 LOW] According to the code quality guidelines, the use of any type should be avoided. If its use is absolutely necessary, please add a comment explaining why. Otherwise, consider using a more specific type or an interface to improve type safety.

Suggested change:

Suggested change
.map((w: any) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))
.map((w: { type?: string }) => (typeof w?.type === "string" ? w.type.toLowerCase() : ""))

Comment on lines +95 to +98
try {
const registered = Registry.list().warehouses
return registered.find((w) => w.name === (args.warehouse ?? registered[0]?.name))
} catch {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🔵 LOW] The expression args.warehouse ?? registered[0]?.name is evaluated on every iteration of the .find() method. Although the array is small, it's better practice to evaluate this target name once before the loop to make the code cleaner and avoid repeated evaluation.

Suggested change:

Suggested change
try {
const registered = Registry.list().warehouses
return registered.find((w) => w.name === (args.warehouse ?? registered[0]?.name))
} catch {
try {
const registered = Registry.list().warehouses
const targetName = args.warehouse ?? registered[0]?.name
return registered.find((w) => w.name === targetName)
} catch {

Comment on lines +108 to +112
// altimate_change start — de.* attributes lifted onto span by tracer
"de.warehouse.rows_returned": result.row_count,
...(warehouseEntry?.type && { "de.warehouse.system": warehouseEntry.type }),
...(warehouseEntry?.type && { "de.sql.dialect": warehouseEntry.type }),
// altimate_change end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🔵 LOW] Since both metadata properties depend on the exact same condition (warehouseEntry?.type), they can be combined into a single object spread. This reduces redundancy and makes the code slightly more concise.

Suggested change:

Suggested change
// altimate_change start — de.* attributes lifted onto span by tracer
"de.warehouse.rows_returned": result.row_count,
...(warehouseEntry?.type && { "de.warehouse.system": warehouseEntry.type }),
...(warehouseEntry?.type && { "de.sql.dialect": warehouseEntry.type }),
// altimate_change end
// altimate_change start — de.* attributes lifted onto span by tracer
"de.warehouse.rows_returned": result.row_count,
...(warehouseEntry?.type && {
"de.warehouse.system": warehouseEntry.type,
"de.sql.dialect": warehouseEntry.type
}),
// altimate_change end

const colsRead = new Set<string>()
const colsWritten = new Set<string>()

const extractTable = (edge: Record<string, any>, side: "source" | "target"): string | undefined => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🔵 LOW] Avoid using the any type to comply with TypeScript best practices. Since edge properties are accessed dynamically via string index, unknown is safe and appropriate here.

Suggested change:

Suggested change
const extractTable = (edge: Record<string, any>, side: "source" | "target"): string | undefined => {
const extractTable = (edge: Record<string, unknown>, side: "source" | "target"): string | undefined => {

return undefined
}

for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🔵 LOW] Replace any with unknown to adhere to the rule against using any types.

Suggested change:

Suggested change
for (const edge of (data.column_lineage ?? []) as Record<string, any>[]) {
for (const edge of (data.column_lineage ?? []) as Record<string, unknown>[]) {

Comment on lines +87 to +93
if (inputTables.size > 0) lineageAttrs["de.sql.lineage.input_tables"] = [...inputTables].slice(0, 50)
// Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute
// type to array when there are multiple outputs. Omit the attribute instead.
if (outputs.size === 1) lineageAttrs["de.sql.lineage.output_table"] = [...outputs][0]
if (colsRead.size > 0) lineageAttrs["de.sql.lineage.columns_read"] = [...colsRead].slice(0, 100)
if (colsWritten.size > 0) lineageAttrs["de.sql.lineage.columns_written"] = [...colsWritten].slice(0, 100)
if (args.dialect) lineageAttrs["de.sql.dialect"] = args.dialect

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🟠 MEDIUM] Hardcoding telemetry keys violates the 'No Hardcoding' standard and can lead to inconsistencies if the attribute names change in the future. These keys are already defined as constants. Please import DE_SQL from ../observability/de-attributes and use the exported constants instead.

Suggested change:

Suggested change
if (inputTables.size > 0) lineageAttrs["de.sql.lineage.input_tables"] = [...inputTables].slice(0, 50)
// Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute
// type to array when there are multiple outputs. Omit the attribute instead.
if (outputs.size === 1) lineageAttrs["de.sql.lineage.output_table"] = [...outputs][0]
if (colsRead.size > 0) lineageAttrs["de.sql.lineage.columns_read"] = [...colsRead].slice(0, 100)
if (colsWritten.size > 0) lineageAttrs["de.sql.lineage.columns_written"] = [...colsWritten].slice(0, 100)
if (args.dialect) lineageAttrs["de.sql.dialect"] = args.dialect
if (inputTables.size > 0) lineageAttrs[DE_SQL.LINEAGE_INPUT_TABLES] = [...inputTables].slice(0, 50)
// Keep output_table scalar — Codex chunk-3 review #5: don't switch attribute
// type to array when there are multiple outputs. Omit the attribute instead.
if (outputs.size === 1) lineageAttrs[DE_SQL.LINEAGE_OUTPUT_TABLE] = [...outputs][0]
if (colsRead.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_READ] = [...colsRead].slice(0, 100)
if (colsWritten.size > 0) lineageAttrs[DE_SQL.LINEAGE_COLUMNS_WRITTEN] = [...colsWritten].slice(0, 100)
if (args.dialect) lineageAttrs[DE_SQL.DIALECT] = args.dialect

Comment on lines +179 to +180
const dbtVerbs = "build|run|test|seed|snapshot|compile|deps|run-operation|debug|parse|docs|clean|list|ls|source|init|show|retry|freshness"
const dbtMatch = stripped.match(new RegExp(`\\bdbt\\s+(${dbtVerbs})\\b`, "i"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🟠 MEDIUM] This dynamic regular expression is instantiated on every call to classifyBash, which incurs unnecessary allocation and compilation overhead. Since dbtVerbs is a constant string, consider extracting this RegExp declaration outside the function body so it is only compiled once, improving performance.

Comment on lines +310 to +312
const sql = sqlMatch[1].slice(0, 8000)
out[DE.SQL.QUERY_TEXT] = sql
const tables = extractInputTables(sql)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🟠 MEDIUM] For bash tools, extractInputTables is called on the sliced 8000-character string sql. This contradicts the strategy used for sql_execute tools (where the full query q is parsed to avoid losing table references at the tail). To maintain consistency and ensure all lineage is captured, you should pass the full extracted string (sqlMatch[1]) to extractInputTables.

Suggested change:

Suggested change
const sql = sqlMatch[1].slice(0, 8000)
out[DE.SQL.QUERY_TEXT] = sql
const tables = extractInputTables(sql)
const sql = sqlMatch[1].slice(0, 8000)\n out[DE.SQL.QUERY_TEXT] = sql\n const tables = extractInputTables(sqlMatch[1])

Comment on lines +442 to +443
if (env.dbtPresent != null) out[DE.ENV.DBT_PRESENT] = env.dbtPresent
if (env.manifestPresent != null) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🔵 LOW] According to the coding standards, the use of != is prohibited. Since dbtPresent and manifestPresent are defined as boolean | undefined, you should use strict inequality !== undefined instead of != null.

Suggested change:

Suggested change
if (env.dbtPresent != null) out[DE.ENV.DBT_PRESENT] = env.dbtPresent
if (env.manifestPresent != null) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent
if (env.dbtPresent !== undefined) out[DE.ENV.DBT_PRESENT] = env.dbtPresent\n if (env.manifestPresent !== undefined) out[DE.ENV.DBT_MANIFEST_PRESENT] = env.manifestPresent

@dev-punia-altimate

Copy link
Copy Markdown
Contributor

🤖 Code Review — OpenCodeReview (Gemini) — No Issues Found

No supported files changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants