✨ catalogd graphql shift to file-based cache#2732
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This PR shifts catalogd’s GraphQL implementation from building/caching schemas off in-memory catalog metas to a file-based approach: schema metadata is discovered during Store() and persisted to disk, while query execution loads requested objects from catalog.jsonl on-demand using byte offsets from index.json. It also introduces a query complexity validation helper and updates handlers/tests to use the new GraphQL service interface.
Changes:
- Persist GraphQL schema metadata to
graphql-schema.jsonduring catalog storage and load it from disk for schema building. - Switch query execution to disk-backed object loading using index-based byte offsets (reducing RSS growth from caching parsed objects).
- Add a GraphQL query complexity validation helper and expand concurrency/singleflight tests around cache misses/builds.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/catalogd/storage/localdir.go | Writes graphql-schema.json on store, adds disk-backed object loader, and adjusts GraphQL pre-warm behavior. |
| internal/catalogd/storage/index.go | Exposes schema-section byte ranges for disk-backed GraphQL pagination. |
| internal/catalogd/service/graphql_service.go | Refactors service to use a storage-provided data provider and adds query validation/timeout. |
| internal/catalogd/service/graphql_service_test.go | Updates tests to use the new provider-based GraphQL service and adds singleflight coverage via provider counting. |
| internal/catalogd/server/handlers.go | Updates GraphQL handler to execute queries without needing a catalog FS. |
| internal/catalogd/server/handlers_test.go | Updates mocks/tests for the new GraphQL service interface and error behavior. |
| internal/catalogd/graphql/validation.go | Adds AST-based query complexity validation (depth/aliases/fields). |
| internal/catalogd/graphql/graphql.go | Adds schema serialization/deserialization and switches to loader-based query-time object retrieval. |
| hack/demo/graphql-demo-server/main.go | Updates demo server to build schema once and serve GraphQL directly from the generated schema. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2732 +/- ##
==========================================
+ Coverage 70.54% 71.39% +0.84%
==========================================
Files 143 144 +1
Lines 10617 10881 +264
==========================================
+ Hits 7490 7768 +278
+ Misses 2568 2534 -34
- Partials 559 579 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
ec01fe5 to
374e69e
Compare
f704ad1 to
1abb2b6
Compare
1abb2b6 to
04a2bc0
Compare
Signed-off-by: grokspawn <jordan@nimblewidget.com>
04a2bc0 to
e048b50
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: grokspawn <jordan@nimblewidget.com>
| } | ||
|
|
||
| // analyzeFieldValue analyzes a field value and returns type info, sample value, and nested fields | ||
| const maxSampleElements = 10 |
There was a problem hiding this comment.
would it be worth adding a comment to explain these parameters? I wonder if they'll need to be tweaked at some point in the future...?
| "github.com/graphql-go/graphql/language/parser" | ||
| ) | ||
|
|
||
| const ( |
There was a problem hiding this comment.
do we need to export these fellas?
| } | ||
|
|
||
| func (s *LocalDirV1) Store(ctx context.Context, catalog string, fsys fs.FS) error { | ||
| catalogDir, err := s.storeAtomicSwap(ctx, catalog, fsys) |
There was a problem hiding this comment.
could there be a race condition if there are two concurrent calls to Store?
First caller finishes storeAtomicSwap and starts GetSchema
Second caller starts storeAtomicSwap nuking the existing data
First caller GetSchema fails due to rug pull - rolls back destroying 2nd caller's storeAtomicSwap changes
Second caller soldiers on thinking his catalog data is where it should be - but 1st caller just rolled back!
| // Schema build failed — remove the catalog to maintain consistency. | ||
| // Re-acquire the write lock for the rollback since it touches shared filesystem state. | ||
| s.m.Lock() | ||
| removeErr := os.RemoveAll(catalogDir) |
There was a problem hiding this comment.
should we InvalidateCache again here? GetSchema rebuilds the cache right?
| sections = sections[:limit] | ||
| } | ||
|
|
||
| f, err := os.Open(catalogPath) |
There was a problem hiding this comment.
should we hold a lock on reading the catalog? Another thread somewhere could be rolling back?
Description
This PR proposes a shift in the graphql service endpoint from in-memory to on-disk caching which leverages common architectural conventions like fan-out catalog index generation (graphql-schema.json) which is then accessed to fulfill queries to eliminate RSS increases from the feature initial implementation.
Included is a structured graphql validation package as a focus for future limits enforcement and adds several cold-cache/cache-miss concurrency tests.
Referenced against a diverse set of existing catalogs, this change reduces RSS 39X on average:
The new file-system storage is approximately 200KB/catalog for the new file offsets index data.
Query latency shifts from microsecond access to low millisecond for the same access approach, with a transient per-query allocation between ~1-10MB depending on the size of the object(s) returned in the query.
NB: I ran some benchmarks after investigating repeated failures of the st2ex e2e test, and determined that unbounded FBC evaluation during schema detection phase was causing excessive delays in catalogd service readiness. Instituted a bound of ten (10) samples of a given Meta to determine the appropriate GraphQL schema type, which allows the functionality to complete operation on example catalogs above with a worst-case of 250ms on test hardware. These changes are in a separate commit to help them stand out.
Reviewer Checklist