Skip to content

feat: packages tb enrichment#4243

Open
epipav wants to merge 4 commits into
mainfrom
feat/packages-tb-enrichment
Open

feat: packages tb enrichment#4243
epipav wants to merge 4 commits into
mainfrom
feat/packages-tb-enrichment

Conversation

@epipav

@epipav epipav commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Note

Medium Risk
New multi-stage analytics and CDC/Kafka write-back pipeline with non-trivial scoring rules; failures or mis-scoring could affect package health data shown in product, but changes are scoped to enrichment columns and critical-package sink.

Overview
Adds a Tinybird-driven package enrichment pipeline that computes lifecycle labels, composite health scores (maintainer, security/supply chain, development activity), and per-signal coverage metadata, then materializes results for analytics and syncs them back into Postgres for critical packages.

Postgres (osspckgs) gains a package_dependencies (created_at, id) index, adds repo_activity_snapshot to Sequin replication with REPLICA IDENTITY FULL, and extends packages with nullable columns for lifecycle_label, health scores/labels, and signal_coverage_health (jsonb) intended to be filled from Tinybird.

Tinybird introduces a repoActivitySnapshot datasource (CDC from packages-db), an ossPackages_enriched_ds target, and ossPackages_enriched.pipe—a daily replace COPY (02:30 UTC) that joins packages, primary repo, activity snapshots, maintainers, releases, open advisories, and direct dependency vuln signals to score packages. A follow-on ossPackages_enriched_sink.pipe exports enrichment fields for isCritical = 1 packages to Kafka (ossPackages_enriched_sink) at 04:00 UTC for JDBC sink updates to packages.

Reviewed by Cursor Bugbot for commit 522d18e. Bugbot is set up for automated code reviews on this repo. Configure here.

epipav added 2 commits June 18, 2026 17:12
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Copilot AI review requested due to automatic review settings June 19, 2026 14:55
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e1c746d. Configure here.

Comment thread services/libs/tinybird/pipes/ossPackages_enriched.pipe
3,
coalesce(dv.vulnerableDeps, 0) <= 5,
1,
0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No deps get dependency credit

Medium Severity

dependencyHealth awards the maximum five points whenever vulnerableDeps coalesces to zero, including when the package has no packageDependencies join row. That conflates “no dependency data” with “zero vulnerable direct deps,” inflating securitySupplyChainScore while signalCoverageHealth marks dependency_health as blocked.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e1c746d. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Tinybird-based enrichment layer for OSS packages, producing a new materialized datasource (ossPackages_enriched_ds) that augments ossPackages with derived lifecycle and health scoring signals sourced from repo metadata, activity snapshots, maintainers, releases, vulnerabilities, and dependencies. It also updates the packages-db schema/replication to support the new snapshot feed and to persist enriched fields back into Postgres.

Changes:

  • Add a new Tinybird pipe (ossPackages_enriched.pipe) that computes lifecycle + composite health scoring for packages and materializes results via a scheduled COPY.
  • Add new Tinybird datasources for repoActivitySnapshot and the resulting ossPackages_enriched_ds.
  • Add a packages-db migration to improve indexing, add sequin publication replication for repo_activity_snapshot, and add new enrichment columns to packages.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
services/libs/tinybird/pipes/ossPackages_enriched.pipe Builds package lifecycle/health scoring and signal coverage JSON; materializes into Tinybird on a schedule.
services/libs/tinybird/datasources/repoActivitySnapshot.datasource Defines the repo activity snapshot datasource schema and storage engine settings used by the enrichment pipe.
services/libs/tinybird/datasources/ossPackages_enriched_ds.datasource Defines the enriched OSS packages datasource schema that the pipe writes into.
backend/src/osspckgs/migrations/V1781539311__packages_tables_sequin_updates.sql Adds an index, ensures sequin publication includes repo activity snapshots, and adds enriched columns to packages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/libs/tinybird/datasources/repoActivitySnapshot.datasource
Comment thread services/libs/tinybird/pipes/ossPackages_enriched.pipe Outdated
Comment thread services/libs/tinybird/pipes/ossPackages_enriched.pipe Outdated
Signed-off-by: anilb <epipav@gmail.com>
@epipav

epipav commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

@cursor review

Copilot AI review requested due to automatic review settings June 23, 2026 10:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines +41 to +45
prMedianTimeToMergeHours,
prMedianTimeToFirstResponseHours,
multiIf(
prMedianTimeToFirstResponseHours IS NOT NULL
AND issueMedianTimeToFirstResponseHours IS NOT NULL,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants