Skip to content

fix: case insensitive channel matching in activityRelations enrich pipes (IN-1185)#4239

Merged
joanagmaia merged 4 commits into
linuxfoundation:mainfrom
joanagmaia:fix/repositories-mapping-in-activityrelations-buckets
Jun 23, 2026
Merged

fix: case insensitive channel matching in activityRelations enrich pipes (IN-1185)#4239
joanagmaia merged 4 commits into
linuxfoundation:mainfrom
joanagmaia:fix/repositories-mapping-in-activityrelations-buckets

Conversation

@joanagmaia

@joanagmaia joanagmaia commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Problem

Activities from GitHub have their channel field set directly from the GitHub API response, which preserves the original repository owner casing (e.g. https://github.com/NousResearch/hermes-agent).

The repositories Tinybird datasource, however, stores URLs in lowercase (e.g. https://github.com/nousresearch/hermes-agent). This means that when the activityRelations_bucket_clean_enrich_copy_pipe_* pipes filter activities using:

AND (channel, segmentId) IN (SELECT channel, segmentId FROM repos_to_channels)

the tuple comparison is case-sensitive in ClickHouse, so activities whose channel URL has a different casing than the stored repository URL are silently dropped from activityRelations_deduplicated_cleaned_bucket_*_ds.

Concretely: querying activityRelations for a given segment and type returns rows, but querying activityRelations_deduplicated_cleaned_bucket_union for the same filters returns 0 — because the channel casing mismatch causes the filter to exclude all matching activities.

Fix

Apply lower() to both sides of the channel comparison in all 10 activityRelations_bucket_clean_enrich_copy_pipe_*.pipe files:

-- before
AND (channel, segmentId) IN (SELECT channel, segmentId FROM repos_to_channels)

-- after
AND (lower(channel), segmentId) IN (SELECT lower(channel), segmentId FROM repos_to_channels)

This is safe for all platforms (GitHub, GitLab, Gerrit) since lower() is idempotent when the URL is already lowercase. After deploying, the next scheduled copy run (every 10 minutes) will backfill previously-dropped activities.


Note

Medium Risk
Changes analytics ETL filtering for all git platforms; wrong normalization could include or exclude activities until the next COPY replace run.

Overview
Fixes git-related activities being dropped from activityRelations_deduplicated_cleaned_bucket_* when activity channel URLs keep GitHub API casing but repos_to_channels stores lowercase repository URLs.

In all ten activityRelations_bucket_clean_enrich_copy_pipe_*.pipe files, the git platform filter now compares lower(channel) on both sides of the (channel, segmentId) tuple IN subquery against repos_to_channels, so casing no longer excludes valid rows from the scheduled COPY into the cleaned bucket datasources.

Reviewed by Cursor Bugbot for commit e6b9680. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings June 19, 2026 10:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a data-loss bug in the Tinybird activityRelations bucketing clean/enrich copy pipeline where git-platform activities could be dropped due to case-sensitive repository URL matching (channel) against repos_to_channels.

Changes:

  • Updates all 10 activityRelations_bucket_clean_enrich_copy_pipe_*.pipe copy pipes to compare (lower(channel), segmentId) against (lower(channel), segmentId) from repos_to_channels.
  • Keeps the non-git path unchanged (platform NOT IN (...)), limiting behavior change to git-like platforms.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@joanagmaia joanagmaia changed the title fix: case-insensitive channel matching in activityRelations clean/enrich pipes fix: case-insensitive channel matching in activityRelations clean/enrich pipes (IN-1185) Jun 19, 2026
@joanagmaia joanagmaia changed the title fix: case-insensitive channel matching in activityRelations clean/enrich pipes (IN-1185) fix: case-insensitive channel matching in activityRelations enrich pipes (IN-1185) Jun 19, 2026
@joanagmaia joanagmaia requested a review from gaspergrom June 22, 2026 16:51
@joanagmaia joanagmaia changed the title fix: case-insensitive channel matching in activityRelations enrich pipes (IN-1185) fix: case insensitive channel matching in activityRelations enrich pipes (IN-1185) Jun 22, 2026
Copilot AI review requested due to automatic review settings June 22, 2026 16:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

@gaspergrom

Copy link
Copy Markdown
Contributor

Lgtm

@joanagmaia joanagmaia merged commit 1aea913 into linuxfoundation:main Jun 23, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants