feat: packages tb enrichment#4243
Conversation
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
|
|
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e1c746d. Configure here.
| 3, | ||
| coalesce(dv.vulnerableDeps, 0) <= 5, | ||
| 1, | ||
| 0 |
There was a problem hiding this comment.
No deps get dependency credit
Medium Severity
dependencyHealth awards the maximum five points whenever vulnerableDeps coalesces to zero, including when the package has no packageDependencies join row. That conflates “no dependency data” with “zero vulnerable direct deps,” inflating securitySupplyChainScore while signalCoverageHealth marks dependency_health as blocked.
Reviewed by Cursor Bugbot for commit e1c746d. Configure here.
There was a problem hiding this comment.
Pull request overview
This PR introduces a Tinybird-based enrichment layer for OSS packages, producing a new materialized datasource (ossPackages_enriched_ds) that augments ossPackages with derived lifecycle and health scoring signals sourced from repo metadata, activity snapshots, maintainers, releases, vulnerabilities, and dependencies. It also updates the packages-db schema/replication to support the new snapshot feed and to persist enriched fields back into Postgres.
Changes:
- Add a new Tinybird pipe (
ossPackages_enriched.pipe) that computes lifecycle + composite health scoring for packages and materializes results via a scheduled COPY. - Add new Tinybird datasources for
repoActivitySnapshotand the resultingossPackages_enriched_ds. - Add a packages-db migration to improve indexing, add sequin publication replication for
repo_activity_snapshot, and add new enrichment columns topackages.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| services/libs/tinybird/pipes/ossPackages_enriched.pipe | Builds package lifecycle/health scoring and signal coverage JSON; materializes into Tinybird on a schedule. |
| services/libs/tinybird/datasources/repoActivitySnapshot.datasource | Defines the repo activity snapshot datasource schema and storage engine settings used by the enrichment pipe. |
| services/libs/tinybird/datasources/ossPackages_enriched_ds.datasource | Defines the enriched OSS packages datasource schema that the pipe writes into. |
| backend/src/osspckgs/migrations/V1781539311__packages_tables_sequin_updates.sql | Adds an index, ensures sequin publication includes repo activity snapshots, and adds enriched columns to packages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: anilb <epipav@gmail.com>
|
@cursor review |
| prMedianTimeToMergeHours, | ||
| prMedianTimeToFirstResponseHours, | ||
| multiIf( | ||
| prMedianTimeToFirstResponseHours IS NOT NULL | ||
| AND issueMedianTimeToFirstResponseHours IS NOT NULL, |


Note
Medium Risk
New multi-stage analytics and CDC/Kafka write-back pipeline with non-trivial scoring rules; failures or mis-scoring could affect package health data shown in product, but changes are scoped to enrichment columns and critical-package sink.
Overview
Adds a Tinybird-driven package enrichment pipeline that computes lifecycle labels, composite health scores (maintainer, security/supply chain, development activity), and per-signal coverage metadata, then materializes results for analytics and syncs them back into Postgres for critical packages.
Postgres (
osspckgs) gains apackage_dependencies (created_at, id)index, addsrepo_activity_snapshotto Sequin replication withREPLICA IDENTITY FULL, and extendspackageswith nullable columns forlifecycle_label, health scores/labels, andsignal_coverage_health(jsonb) intended to be filled from Tinybird.Tinybird introduces a
repoActivitySnapshotdatasource (CDC from packages-db), anossPackages_enriched_dstarget, andossPackages_enriched.pipe—a daily replace COPY (02:30 UTC) that joins packages, primary repo, activity snapshots, maintainers, releases, open advisories, and direct dependency vuln signals to score packages. A follow-onossPackages_enriched_sink.pipeexports enrichment fields forisCritical = 1packages to Kafka (ossPackages_enriched_sink) at 04:00 UTC for JDBC sink updates topackages.Reviewed by Cursor Bugbot for commit 522d18e. Bugbot is set up for automated code reviews on this repo. Configure here.