Skip to content

fix: honor kbagent action retry policy#10361

Draft
weicao wants to merge 1 commit into
mainfrom
bugfix/kbagent-reconfigure-timeout-retry
Draft

fix: honor kbagent action retry policy#10361
weicao wants to merge 1 commit into
mainfrom
bugfix/kbagent-reconfigure-timeout-retry

Conversation

@weicao

@weicao weicao commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Problem

Redis reconfigure validation found that lifecycle actions needing bounded wait can be killed before convergence even when the addon declares a larger timeout:

  • timeoutSeconds: 120 is capped by kbagent at 60 seconds.
  • retryPolicy declared on the action is registered into the kbagent action table, but the action service only used req.RetryPolicy when executing the action.
  • The reconfigure path currently passes runtime ReconfigureArgs, but not a request-level retry policy, so an action-level retryPolicy is ineffective on this path.

This makes the addon-side pattern exit non-zero + retryPolicy unreliable for reconfigure actions.

Solution

  • Raise kbagent maxActionCallTimeout from 60s to 120s and update the public Action timeout docs.
  • Resolve retry policy the same way timeout is resolved: request-level retry policy overrides action-level retry policy; otherwise the action-level policy is used.
  • Apply retry handling to non-blocking calls without runtime arguments as well.

Validation

Local validation passed:

go test ./pkg/kbagent/service -count=1
go test ./pkg/kbagent/... -count=1
make test-go-generate
go test ./pkg/kbagent/service ./pkg/controller/instanceset ./pkg/controller/instance -count=1
git diff --check

Additional test note:

  • go test ./pkg/controller/component -count=1 could not run on this machine because envtest assets are missing: /usr/local/kubebuilder/bin/etcd not found. The package did not reach product assertions.

Boundary

This PR is a controller/kbagent candidate for the Redis task #46 timeout/retry gap. It still needs exact-image runtime validation on the affected reconfigure scenario before being treated as test-accepted or ready for non-draft review.

@apecloud-bot

Copy link
Copy Markdown
Collaborator

Auto Cherry-pick Instructions

Usage:
  - /nopick: Not auto cherry-pick when PR merged.
  - /pick: release-x.x [release-x.x]: Auto cherry-pick to the specified branch when PR merged.

Example:
  - /nopick
  - /pick release-1.1

CLA Recheck Instructions

Usage:
  - /recheck-cla: Trigger a re-check of CLA status for this pull request.
Example:
  - /recheck-cla

@github-actions github-actions Bot added the size/L Denotes a PR that changes 100-499 lines. label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants