Skip to content

fix: self-heal compaction segment positions and add L0 force-select bypass#48907

Merged
sre-ci-robot merged 5 commits into
milvus-io:masterfrom
XuanYang-cn:silent-delete-loss
Apr 22, 2026
Merged

fix: self-heal compaction segment positions and add L0 force-select bypass#48907
sre-ci-robot merged 5 commits into
milvus-io:masterfrom
XuanYang-cn:silent-delete-loss

Conversation

@XuanYang-cn

Copy link
Copy Markdown
Contributor

Compaction inherits StartPosition/DmlPosition from source segments via getMinPosition without recalculating from actual data. The import position bug (PR #47276) wrote wrong timestamps on imported segments, and these wrong positions persist and propagate through compaction. L0 compaction then misses L1/L2 segments due to StartPosition mismatches, causing zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction uses getMinPosition, but DmlPosition represents the latest entity timestamp and should use max.

See also: #46435

@sre-ci-robot sre-ci-robot added the size/XL Denotes a PR that changes 500-999 lines. label Apr 10, 2026
@mergify mergify Bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Apr 10, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e pool dispatcher)
  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 10, 2026
…lect bypass

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import position
bug (PR milvus-io#47276) wrote wrong timestamps on imported segments, and these
wrong positions persist and propagate through compaction. L0 compaction
then misses L1/L2 segments due to StartPosition mismatches, causing
zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435
pr: milvus-io#48907

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 10, 2026
…lect bypass

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import position
bug (PR milvus-io#47276) wrote wrong timestamps on imported segments, and these
wrong positions persist and propagate through compaction. L0 compaction
then misses L1/L2 segments due to StartPosition mismatches, causing
zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435
pr: milvus-io#48907

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 10, 2026
…lect bypass

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import position
bug (PR milvus-io#47276) wrote wrong timestamps on imported segments, and these
wrong positions persist and propagate through compaction. L0 compaction
then misses L1/L2 segments due to StartPosition mismatches, causing
zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435
pr: milvus-io#48907

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

✅ CI Loop Results d2dac49

Stage Result Duration Tests
✅ Build SUCCESS 9.3min -

Total: 14min | Pipeline | Artifacts

@codecov

codecov Bot commented Apr 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.18%. Comparing base (cd88c79) to head (cd88c79).
⚠️ Report is 19 commits behind head on master.

⚠️ Current head cd88c79 differs from pull request most recent head ff981c0

Please upload reports for the commit ff981c0 to get more accurate results.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #48907   +/-   ##
=======================================
  Coverage   78.18%   78.18%           
=======================================
  Files        2174     2174           
  Lines      361086   361086           
=======================================
  Hits       282324   282324           
  Misses      70089    70089           
  Partials     8673     8673           
Components Coverage Δ
Client 78.99% <0.00%> (ø)
Core 84.64% <0.00%> (ø)
Go 76.52% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sre-ci-robot

Copy link
Copy Markdown
Contributor

✅ CI Loop Results b1f4f10

Stage Result Duration Tests
✅ Build SUCCESS 12.6min -

Total: 17min | Pipeline | Artifacts

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Apr 10, 2026
@XuanYang-cn

Copy link
Copy Markdown
Contributor Author

/ci-rerun-e2e-default

@XuanYang-cn

Copy link
Copy Markdown
Contributor Author

/ci-rerun-gosdk

@mergify mergify Bot added the ci-passed label Apr 14, 2026
@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed ci-passed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Apr 15, 2026
@XuanYang-cn

Copy link
Copy Markdown
Contributor Author

/ci-rerun-ut-go

1 similar comment
@XuanYang-cn

Copy link
Copy Markdown
Contributor Author

/ci-rerun-ut-go

…ypass

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import position
bug (PR milvus-io#47276) wrote wrong timestamps on imported segments, and these
wrong positions persist and propagate through compaction. L0 compaction
then misses L1/L2 segments due to StartPosition mismatches, causing
zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label Apr 16, 2026
@sre-ci-robot sre-ci-robot removed the low-code-coverage add test-label from zhikun, diff coverage > 80% label Apr 16, 2026
@XuanYang-cn

Copy link
Copy Markdown
Contributor Author

/ci-rerun-e2e-defaul

@XuanYang-cn

Copy link
Copy Markdown
Contributor Author

/ci-rerun-e2e-default

XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 21, 2026
…lect bypass

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import position
bug (PR milvus-io#47276) wrote wrong timestamps on imported segments, and these
wrong positions persist and propagate through compaction. L0 compaction
then misses L1/L2 segments due to StartPosition mismatches, causing
zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435
pr: milvus-io#48907

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 21, 2026
…lect bypass

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import position
bug (PR milvus-io#47276) wrote wrong timestamps on imported segments, and these
wrong positions persist and propagate through compaction. L0 compaction
then misses L1/L2 segments due to StartPosition mismatches, causing
zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435
pr: milvus-io#48907

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
@mergify mergify Bot added the ci-passed label Apr 21, 2026
@tedxu

tedxu commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

This PR introduced a force mode on selecting L0 in compactions. While it fixes the leftover L0 issue, regression of performance is expected.

Per offline discussion, the leftover L0 should be no use and we should remove them in the future PRs. This PR should be considered a temporary workaround.

@tedxu

tedxu commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

/lgtm
/approve

@sre-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tedxu, XuanYang-cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 8fb3fef into milvus-io:master Apr 22, 2026
13 of 15 checks passed
sre-ci-robot pushed a commit that referenced this pull request Apr 23, 2026
…lect bypass (#48910)

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import
position bug (PR #47276) wrote wrong timestamps on imported segments,
and these wrong positions persist and propagate through compaction. L0
compaction then misses L1/L2 segments due to StartPosition mismatches,
causing zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: #46435
pr: #48907

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
sre-ci-robot pushed a commit that referenced this pull request Apr 23, 2026
…lect bypass (#48909)

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import
position bug (PR #47276) wrote wrong timestamps on imported segments,
and these wrong positions persist and propagate through compaction. L0
compaction then misses L1/L2 segments due to StartPosition mismatches,
causing zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: #46435
pr: #48907

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
@XuanYang-cn XuanYang-cn deleted the silent-delete-loss branch April 27, 2026 03:19
XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 27, 2026
…lect bypass (milvus-io#48910)

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import
position bug (PR milvus-io#47276) wrote wrong timestamps on imported segments,
and these wrong positions persist and propagate through compaction. L0
compaction then misses L1/L2 segments due to StartPosition mismatches,
causing zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435
pr: milvus-io#48907

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

fix: Fast finish compaction when L0Comp hit zero L1/L2 (milvus-io#47154) (milvus-io#47187)

See also: milvus-io#46435, milvus-io#48907
pr: milvus-io#47154

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 28, 2026
…ect bypass (milvus-io#48910)

fix: Fast finish compaction when L0Comp hit zero L1/L2 (milvus-io#47154) (milvus-io#47187)

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import
position bug (PR milvus-io#47276) wrote wrong timestamps on imported segments,
and these wrong positions persist and propagate through compaction. L0
compaction then misses L1/L2 segments due to StartPosition mismatches,
causing zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435, milvus-io#48907
pr: milvus-io#47154
pr: milvus-io#48907

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 28, 2026
…ect bypass (milvus-io#48910)

- fix: Fast finish compaction when L0Comp hit zero L1/L2 (milvus-io#47154) (milvus-io#47187)

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import
position bug (PR milvus-io#47276) wrote wrong timestamps on imported segments,
and these wrong positions persist and propagate through compaction. L0
compaction then misses L1/L2 segments due to StartPosition mismatches,
causing zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435, milvus-io#48907
pr: milvus-io#47154
pr: milvus-io#48907

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
sre-ci-robot pushed a commit that referenced this pull request Apr 28, 2026
See also: #46435, #48907, #48909

pr: #47154

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
sre-ci-robot pushed a commit that referenced this pull request Apr 28, 2026
…) (#49376)

See also: #46435, #48907
pr: #47154

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this pull request Apr 29, 2026
…ect bypass (milvus-io#48910)

fix: Fast finish compaction when L0Comp hit zero L1/L2 (milvus-io#47154) (milvus-io#47187)

Compaction inherits StartPosition/DmlPosition from source segments via
getMinPosition without recalculating from actual data. The import
position bug (PR milvus-io#47276) wrote wrong timestamps on imported segments,
and these wrong positions persist and propagate through compaction. L0
compaction then misses L1/L2 segments due to StartPosition mismatches,
causing zombie L0 segments and silent delete loss.

There is also a latent bug: DmlPosition in mix/clustering compaction
uses getMinPosition, but DmlPosition represents the latest entity
timestamp and should use max.

See also: milvus-io#46435, milvus-io#48907
pr: milvus-io#47154
pr: milvus-io#48907

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
sre-ci-robot pushed a commit that referenced this pull request Apr 29, 2026
See also: #46435, #48907
pr: #48910
pr: #47154

Backports two L0 compaction fixes:
- self-heal compaction segment positions and add L0 force-select bypass
- fast finish L0 compaction when there are zero L1/L2 target segments

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/XL Denotes a PR that changes 500-999 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants