Skip to content

feat: [2.6] struct element hybrid search design and impl#50369

Open
zhengbuqian wants to merge 13 commits into
milvus-io:2.6from
zhengbuqian:cherry-pick-50243-2.6
Open

feat: [2.6] struct element hybrid search design and impl#50369
zhengbuqian wants to merge 13 commits into
milvus-io:2.6from
zhengbuqian:cherry-pick-50243-2.6

Conversation

@zhengbuqian

Copy link
Copy Markdown
Collaborator

issue: #42148
pr: #50243

design doc: docs/design-docs/design_docs/20260602-struct_hybrid_search.md

issue: milvus-io#42148

design doc: docs/design-docs/design_docs/20260602-struct_hybrid_search.md

(cherry picked from commit 2f3f19e)
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. approved labels Jun 8, 2026
@mergify mergify Bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Jun 8, 2026
@sre-ci-robot sre-ci-robot added the do-not-merge/need-milestone generate by v2-label-manager label Jun 8, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot

Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-code-check-macos // for Code Checker MacOS (GitHub Actions)
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e pool dispatcher)
  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

Topks: append([]int64(nil), data.GetTopks()...),
FieldsData: data.GetFieldsData(),
Scores: append([]float32(nil), data.GetScores()...),
Ids: &schemapb.IDs{IdField: &schemapb.IDs_StrId{StrId: &schemapb.StringArray{Data: keys}}},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepareElementLevelHybridResult always emits IDs_StrId, but the reranker binds its generic T to the collection PK type and processOneSearchData does an unchecked col.ids.([]T) (rrf_function.go:77). On an int64-PK collection T is int64, so asserting the synthetic string-ID slice to []int64 is an invalid type assertion with no PK-type guard, panicking on every element-level hybrid search over an int64-PK collection (the common case). Existing tests miss this because they bypass the real reranker; fix by emitting IDs that match the PK type, or converting/guarding before the reranker.

}

annsField := typeutil.GetField(t.schema.CollectionSchema, t.FieldId)
if annsField != nil && annsField.GetDataType() == schemapb.DataType_ArrayOfVector {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseSearchInfo still contains a stale guard (search_util.go:303-317) that rejects any ArrayOfVector anns-field using radius/group_by/iterator with the old "embedding list" message before the placeholder type is known. Because it runs inside tryGeneratePlan->parseSearchInfo, it preempts the new placeholder-aware validation in both initSearchRequest (single search, line 806) and the per-sub-request loop in initAdvancedSearchRequest (hybrid, line 452), and since GetFieldByName resolves struct sub-fields it also fires for struct element/emb-list searches. The PR's own new tests fail as a result — 'element-level range search should succeed' and 'element-level iterator v2 should succeed' get rejected, and the hybrid range/iterator cases match the old message instead of the new '...in hybrid search' one — consistent with the red ci-v2/ut-go check. master deleted this guard and moved the checks into task_search.go; port that by removing or making the search_util.go guard placeholder-aware.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 63.57616% with 220 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.72%. Comparing base (cb08db0) to head (983e16a).
⚠️ Report is 948 commits behind head on 2.6.

Files with missing lines Patch % Lines
internal/proxy/search_pipeline.go 61.71% 88 Missing and 15 partials ⚠️
internal/proxy/struct_hybrid_search.go 65.40% 47 Missing and 17 partials ⚠️
internal/proxy/task_search.go 48.93% 36 Missing and 12 partials ⚠️
internal/util/function/rerank/rerank_base.go 50.00% 2 Missing and 1 partial ⚠️
internal/util/function/rerank/function_score.go 86.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##              2.6   #50369      +/-   ##
==========================================
+ Coverage   76.99%   77.72%   +0.72%     
==========================================
  Files        1700     2000     +300     
  Lines      262533   329431   +66898     
==========================================
+ Hits       202142   256042   +53900     
- Misses      53550    65509   +11959     
- Partials     6841     7880    +1039     
Components Coverage Δ
Client 79.48% <74.73%> (+1.34%) ⬆️
Core 84.58% <0.00%> (+2.37%) ⬆️
Go 75.79% <55.09%> (+0.40%) ⬆️
Files with missing lines Coverage Δ
internal/proxy/search_util.go 78.40% <100.00%> (+0.43%) ⬆️
internal/util/function/rerank/decay_function.go 93.83% <100.00%> (-0.37%) ⬇️
internal/util/function/rerank/model_function.go 87.50% <100.00%> (+1.01%) ⬆️
internal/util/function/rerank/rrf_function.go 87.75% <100.00%> (+0.52%) ⬆️
internal/util/function/rerank/weighted_function.go 90.62% <100.00%> (-2.56%) ⬇️
internal/util/function/rerank/function_score.go 82.28% <86.66%> (+0.84%) ⬆️
internal/util/function/rerank/rerank_base.go 94.00% <50.00%> (-6.00%) ⬇️
internal/proxy/task_search.go 68.71% <48.93%> (-2.28%) ⬇️
internal/proxy/struct_hybrid_search.go 65.40% <65.40%> (ø)
internal/proxy/search_pipeline.go 77.91% <61.71%> (-11.04%) ⬇️

... and 1406 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread internal/proxy/task_search.go Outdated
return merr.WrapErrParameterInvalid("", "",
"range search is not supported for vector array ("+searchKind+") fields in hybrid search, fieldName:"+annsField.GetName())
}
if t.rankParams.GetGroupByFieldId() > 0 || len(t.rankParams.GetGroupByFieldIds()) > 0 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 490 calls t.rankParams.GetGroupByFieldIds(), but the rankParams type only defines the singular groupByFieldId field and GetGroupByFieldId() accessor — no plural accessor exists anywhere in the repo. This is an undefined-method compile error that prevents internal/proxy (and the binary) from building. Drop the || len(t.rankParams.GetGroupByFieldIds()) > 0 clause and rely on the singular GetGroupByFieldId() > 0.

Comment thread internal/proxy/task_search.go Outdated
"legacy search iterator is not supported for element-level search on embedding list fields; use search iterator v2")
}

groupByFieldIDs := queryInfo.GetGroupByFieldIds()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 861 calls queryInfo.GetGroupByFieldIds(), but planpb.QueryInfo only has the singular GroupByFieldId field (proto field 6) with accessor GetGroupByFieldId() — there is no plural field or method in this 2.6 proto. This is a second undefined-method compile error in internal/proxy. Use the singular GetGroupByFieldId(); the following two lines already fall back to it.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

Comment thread internal/proxy/task_search.go Outdated
if t.rankParams.GetGroupByFieldId() > 0 {
return merr.WrapErrParameterInvalid("", "",
"group by search is not supported for vector array (embedding list) fields in hybrid search, fieldName:"+annsField.GetName())
if err := validateElementCollapseMetricType(collapseConfig, queryInfo.GetMetricType()); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In initAdvancedSearchRequest (task_search.go:467), validateElementCollapseMetricType is passed queryInfo.GetMetricType(), which is empty whenever the caller omits metric_type and relies on the index-resolved metric (parseSearchInfo leaves it "" at search_util.go:232). Because PositivelyRelated("") is false, a valid element_scope collapse with sum/topk_sum on an IP/COSINE field is rejected with 'only supported for positively related metrics'. The runtime collapse stage already re-validates against the metric resolved from results (search_pipeline.go: 'the metrictype in the request may be empty, it can only be obtained from the result'), so this proxy-side check is premature — validate against the resolved metric or drop it. Note the defect is in initAdvancedSearchRequest, not initSearchRequest, and metric_type is not schema-defaulted.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Jun 8, 2026
Comment thread internal/proxy/task_search.go Outdated
}
for _, field := range cloned.GetFields() {
if field.GetIsPrimaryKey() {
field.DataType = schemapb.DataType_VarChar

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elementLevelHybridRerankSchema clones the schema and flips the primary key field's DataType to VarChar (task_search.go:651). newRerankBase derives both pkType and the rerank input-field types from that cloned schema, so when a decay reranker's input field is the primary key itself, a numeric PK is read as VarChar and newDecayFunction rejects it via its else branch at decay_function.go:96 ("only support numeric field"). The same decay-over-numeric-PK configuration is accepted on the normal row-level hybrid path, so element-level hybrid behaves inconsistently. This only triggers when the decay input field is the PK; if that config should be supported, classify the decay input against the PK's real type rather than the flipped VarChar.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

Comment thread internal/proxy/task_search_test.go Outdated
})

t.Run("regular vector advanced controls should succeed", func(t *testing.T) {
task := makeTask("regular_vec", commonpb.PlaceholderType_FloatVector, rangeParams, true, false, "scalar_field")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case calls makeTask("regular_vec", …, rangeParams, /*withIterator=*/true, false, /*groupByField=*/"scalar_field") then asserts initSearchRequest succeeds, but it enables iterator and group-by at the same time. parseSearchInfo (reached via initSearchRequesttryGeneratePlan) hits the pre-existing guard if isIterator && groupByFieldId > 0 { return ...WrapErrParameterInvalid(..., "Not allowed to do groupBy when doing iteration") } before any ArrayOfVector validation runs, so the call returns an error and assert.NoError fails deterministically on every CI run with the proxy test environment available. Split this into separate range / iterator / group-by regular-vector cases like TestSearchTask_ArrayOfVectorHybridSearch does, or drop one of the conflicting withIterator/groupByField controls.

@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@zhengbuqian zhengbuqian added this to the 2.6.19 milestone Jun 9, 2026
@zhengbuqian zhengbuqian removed the do-not-merge/need-milestone generate by v2-label-manager label Jun 9, 2026
@zhengbuqian

Copy link
Copy Markdown
Collaborator Author

/ci-rerun-ut

@zhengbuqian

Copy link
Copy Markdown
Collaborator Author

/ci-rerun-e2e-default

@zhengbuqian

Copy link
Copy Markdown
Collaborator Author

/ci-rerun-ut-go

@zhengbuqian

Copy link
Copy Markdown
Collaborator Author

/ci-rerun-e2e-default

@zhengbuqian

Copy link
Copy Markdown
Collaborator Author

/ci-rerun-gosdk

@sre-ci-robot sre-ci-robot removed the low-code-coverage add test-label from zhikun, diff coverage > 80% label Jun 11, 2026
@zhengbuqian

Copy link
Copy Markdown
Collaborator Author

/ci-rerun-ut-go

@zhengbuqian

Copy link
Copy Markdown
Collaborator Author

/ci-rerun-e2e-default

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

Use /refresh-label to update related check and label manually

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Jun 12, 2026
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

Use /refresh-label to update related check and label manually

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

Use /refresh-label to update related check and label manually

@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label Jun 13, 2026
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

Use /refresh-label to update related check and label manually

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

Use /refresh-label to update related check and label manually

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved area/test dco-passed DCO check passed. kind/feature Issues related to feature request from users low-code-coverage add test-label from zhikun, diff coverage > 80% sig/testing size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants