Fixed failing KleidiAI NHWC unit tests by martin-klacer-arm · Pull Request #29010 · microsoft/onnxruntime

martin-klacer-arm · 2026-06-11T18:32:22Z

Description

The current tests in the NhwcTransformerTests suite ConvFloat_UsesNhwcOnlyWithKleidi and FusedConvFloat_UsesNhwcOnlyWithKleidi assumed that whenever KleidiAI NHWC float-conv support is available, the test graph must be rewritten to com.microsoft.NhwcFusedConv.

That assumption is not valid when ONNX Runtime is built with --enable_arm_neon_nchwc. In that cofiguration, the Level 3 NCHWc transformer is registered before the NHWC transformer, so the NCHWc rewrite can be selected instead. The optimiser priority is intentional, so these tests shouldn't require NHWC to be chosen over NCHWc.

This change keeps the existing optimiser ordering and instead updates the assertions in the 2 tests. If the NHWC path is selected, the tests still validate the expected NhwcFusedConv graph shape and verify that the path is only used when KleidiAI NHWC support is available. If another valid layout optimisation is selected first, the tests no longer fail just because the NhwcFusedConv op isn't generated.

Motivation and Context

This change fixes the 2 mentioned unit tests which fail when ONNX Runtime is built and tested with --enable_arm_neon_nchwc.

* Fixed tests ConvFloat_UsesNhwcOnlyWithKleidi, FusedConvFloat_UsesNhwcOnlyWithKleidi * Added check to only test NHWC shape parameters if NHWC path was taken in graph optimisation Signed-off-by: Martin Klacer <martin.klacer@arm.com>

hariharans29 · 2026-06-11T21:24:04Z

Verdict: Request changes — the fix is too permissive, and there is a one-line tighter alternative readily available

The diagnosis is correct: when ORT is built with --enable_arm_neon_nchwc, NchwcTransformer is registered at Level 3 and can win the layout-conversion race against the NHWC transformer, so the original EXPECT_EQ(NhwcFusedConv, 1) is genuinely wrong for that build config. But the proposed fix throws away too much coverage. The condition that drives the conflict is statically detectable in the test — let's gate on that rather than silently passing whenever no NHWC fused conv appears.

The coverage gap introduced by this patch

The original test made two assertions, depending on HasFloatNhwcNoTransposeSupport():

	KleidiAI available	KleidiAI not available
Original	`NhwcFusedConv == 1` (must use NHWC)	`NhwcFusedConv == 0` (must not use NHWC)
This PR	`NhwcFusedConv == 1` (if any was emitted)	early return

In the no-NCHWc + KleidiAI-available build (the common ARM64 CI configuration), the original test catches a regression where the NHWC transformer fails to fire. The new test passes silently in that case — op_to_count["com.microsoft.NhwcFusedConv"] == 0 triggers an early return, and the bug goes undetected.

The test name (..._UsesNhwcOnlyWithKleidi) implies it verifies the NHWC path is actually selected when KleidiAI is available. After this patch, it verifies a strictly weaker contract: "if NHWC was selected, KleidiAI must have been available." That is the contrapositive of half of what the test used to check.

A better fix exists, one line away

The same condition that registers NchwcTransformer is checked at runtime in graph_transformer_utils.cc:466:

// Register the NCHWc layout transformer if supported by the platform.
if (MlasNchwcGetBlockSize() > 1) {
  transformers.emplace_back(std::make_unique<NchwcTransformer>());
}

mlas.h is already included in nhwc_transformer_test.cc:11, so MlasNchwcGetBlockSize() is directly callable from the test. Use it to branch the assertion on the same flag the production code uses:

auto check_nhwc_graph = [&](InferenceSessionWrapper& session) {
  auto op_to_count = CountOpsInGraph(session.GetGraph());
  const bool nchwc_can_intercept = MlasNchwcGetBlockSize() > 1;
  const bool expect_nhwc = HasFloatNhwcNoTransposeSupport({1, 8, 7, 7}, {16, 8, 3, 3}, {1, 1, 1, 1});

  if (nchwc_can_intercept) {
    // NCHWc transformer runs first and may rewrite the graph before the NHWC transformer.
    // Either NCHWc or NHWC is acceptable; we only verify "no NHWC without Kleidi support".
    if (op_to_count["com.microsoft.NhwcFusedConv"] > 0) {
      EXPECT_TRUE(expect_nhwc);
      EXPECT_EQ(op_to_count["com.microsoft.NhwcFusedConv"], 1);
      EXPECT_EQ(op_to_count["Transpose"], 2);
    }
    return;
  }

  // Original strict contract: NHWC is selected iff KleidiAI provides float NHWC support.
  EXPECT_EQ(op_to_count["com.microsoft.NhwcFusedConv"], expect_nhwc ? 1 : 0);
  EXPECT_EQ(op_to_count["Transpose"], expect_nhwc ? 2 : 0);
};

This preserves the full original contract on every build that doesn't have NCHWc enabled (the vast majority), and only relaxes it on the build that genuinely cannot guarantee the layout choice. Same one-file scope as the current patch.

Even better — if NchwcConv/NchwcMaxPool/etc. are the expected op types the NCHWc transformer emits, you can tighten the NCHWc branch further by asserting one of those appears, so the test still validates some layout transformation actually ran. (Worth checking what op names nchwc_transformer.cc produces and adding EXPECT_GT(nchwc_op_count, 0).)

Unrelated and unexplained: the tolerance bump

nhwc_transformer_test.cc:534 (the second test only):

-                    /*per_sample_tolerance*/ 1e-6,
+                    /*per_sample_tolerance*/ 2e-6,
                     /*relative_per_sample_tolerance*/ 1e-6);

The first test (ConvFloat_UsesNhwcOnlyWithKleidi) keeps 1e-6 — so this isn't a uniform Kleidi-related precision relaxation, it's specific to the FusedConv+Relu case. The PR description doesn't mention numerical tolerance at all.

Either:

The original 1e-6 was flaky for this particular case and the bump is fine (please say so in the description and ideally describe which path drifts), or
This was an unrelated drive-by relax that should be in a separate PR.

Without a justification it's a small but real loss of precision coverage that a reviewer can't tell whether to accept.

Smaller things

The early-return comment is good but worth naming the trigger: // NCHWc transformer (registered under --enable_arm_neon_nchwc) may rewrite this graph before NHWC runs, so an absent NhwcFusedConv is acceptable. Future readers shouldn't have to dig through git blame to learn why.
Both tests have identical check_nhwc_graph skeletons that differ only in the HasFloatNhwcNoTransposeSupport argument list. A small helper could share the logic, but that's a cleanup, not a request.
If you opt for the gated-strict-assertion approach above, the test name UsesNhwcOnlyWithKleidi still describes what it does in the dominant build. With the current patch, the name is misleading in the no-NCHWc + KleidiAI case (the test no longer ensures NHWC is actually used).

Bottom line

Right diagnosis, fix too broad. Please gate the existing strict assertion on MlasNchwcGetBlockSize() > 1 so the strict check survives in builds that don't have the NCHWc transformer registered, and either justify or drop the 1e-6 → 2e-6 tolerance change. Happy to re-approve once those two are addressed.

Copilot

Pull request overview

This PR updates NhwcTransformerTests to stop assuming that KleidiAI NHWC float-conv availability always implies the graph will be rewritten to com.microsoft.NhwcFusedConv, which can be false when NCHWc layout optimizations run earlier (e.g., with --enable_arm_neon_nchwc).

Changes:

Relaxed assertions in two NHWC transformer tests to only validate NhwcFusedConv-specific graph shape when that path is actually selected.
Increased the per-sample tolerance for the fused-conv test.

martin-klacer-arm · 2026-06-12T15:22:22Z

+    // Only validate NhwcFusedConv graph shape if that path was selected
+    if (op_to_count["com.microsoft.NhwcFusedConv"] == 0) {
+      return;
+    }
+    EXPECT_TRUE(HasFloatNhwcNoTransposeSupport({1, 8, 7, 7}, {16, 8, 3, 3}, {1, 1, 1, 1}));
+    EXPECT_EQ(op_to_count["com.microsoft.NhwcFusedConv"], 1);
+    EXPECT_EQ(op_to_count["Transpose"], 2);


Addressed this in the latest commit, added an assert to the early return to ensure that if KleidiAI is enabled and an NhwcFusedConv isn't generated, a different optimiser consumed the Conv node rather than this being caused by an NhwcTransformer failure.

Tried to make this not directly tied to NCHWc as that would mean whenever a new Transformer is added that would take precedence over NHWC here in some cases, we'd need to update the test. My commit attempts to generalise this to any Transformer without needing to know the specifics, but I'm open to discussion or any suggestions about this point.

martin-klacer-arm · 2026-06-12T15:22:43Z

+    // Only validate NhwcFusedConv graph shape if that path was selected
+    if (op_to_count["com.microsoft.NhwcFusedConv"] == 0) {
+      return;
+    }
+    EXPECT_TRUE(HasFloatNhwcNoTransposeSupport({1, 8, 7, 7}, {16, 8, 3, 3}, {1, 1, 1, 1}, {1, 1}));
+    EXPECT_EQ(op_to_count["com.microsoft.NhwcFusedConv"], 1);
+    EXPECT_EQ(op_to_count["Transpose"], 2);


Same reply as the previous Copilot comment

…merTests * Affected tests: ConvFloat_UsesNhwcOnlyWithKleidi, FusedConvFloat_UsesNhwcOnlyWithKleidi Signed-off-by: Martin Klacer <martin.klacer@arm.com>

martin-klacer-arm · 2026-06-12T15:19:43Z

Thanks for the review @hariharans29, that makes sense. I agreed the previous early return was too permissive, so I updated the two affected NhwcTransformerTests to tighten the checks slightly as well as to make the intention behind the tests clearer in code.

The tests now separate the two cases explicitly: if NhwcFusedConv is produced, they still validate that KleidiAI NHWC support is available and that the expected transpose-boundary graph shape is present. If NhwcFusedConv is not produced but KleidiAI NHWC support is available, they now assert that the original Conv / com.microsoft.FusedConv node was consumed by another layout optimizer. That should preserve coverage for the regression you called out, while still allowing NCHWc or another earlier Level 3 layout optimizer to take precedence without making the tests depend specifically on MlasNchwcGetBlockSize() or the NCHWc case.

Since the test isn't validating NCHWc functionality, I'm attempting to find a solution that doesn't tie it into the test directly, but I'm open to discussion on this.

I also added a short comment for the FusedConvFloat_UsesNhwcOnlyWithKleidi tolerance bump. The slightly larger absolute tolerance is only to account for different optimizer paths changing accumulation order; the relative tolerance is unchanged.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

+      // When the Arm® KleidiAI™ NHWC Conv path is available but no NHWC ops were produced,
+      // ensure that some other optimiser run and consumed the Conv node instead
+      EXPECT_TRUE(!kleidi_supported || op_to_count["Conv"] == 0);


+      // When the Arm® KleidiAI™ NHWC Conv path is available but no NHWC ops were produced,
+      // ensure that some other optimiser run and consumed the FusedConv node instead
+      EXPECT_TRUE(!kleidi_supported || op_to_count["com.microsoft.FusedConv"] == 0);


Fixed failing KleidiAI NHWC failing unit tests

07a9073

* Fixed tests ConvFloat_UsesNhwcOnlyWithKleidi, FusedConvFloat_UsesNhwcOnlyWithKleidi * Added check to only test NHWC shape parameters if NHWC path was taken in graph optimisation Signed-off-by: Martin Klacer <martin.klacer@arm.com>

hariharans29 requested a review from Copilot June 11, 2026 21:21

Copilot started reviewing on behalf of hariharans29 June 11, 2026 21:22 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Cleared up expected values and added fallback check to 2 NhwcTransfor…

795be01

…merTests * Affected tests: ConvFloat_UsesNhwcOnlyWithKleidi, FusedConvFloat_UsesNhwcOnlyWithKleidi Signed-off-by: Martin Klacer <martin.klacer@arm.com>

hariharans29 requested a review from Copilot June 12, 2026 17:09

Copilot started reviewing on behalf of hariharans29 June 12, 2026 17:09 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed failing KleidiAI NHWC unit tests#29010

Fixed failing KleidiAI NHWC unit tests#29010
martin-klacer-arm wants to merge 2 commits into
microsoft:mainfrom
martin-klacer-arm:markla01_fix_nhwc_failure

martin-klacer-arm commented Jun 11, 2026

Uh oh!

hariharans29 commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

martin-klacer-arm Jun 12, 2026

Uh oh!

martin-klacer-arm Jun 12, 2026

Uh oh!

Uh oh!

martin-klacer-arm commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

martin-klacer-arm commented Jun 11, 2026

Description

Motivation and Context

Uh oh!

hariharans29 commented Jun 11, 2026

Verdict: Request changes — the fix is too permissive, and there is a one-line tighter alternative readily available

The coverage gap introduced by this patch

A better fix exists, one line away

Unrelated and unexplained: the tolerance bump

Smaller things

Bottom line

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

martin-klacer-arm Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

martin-klacer-arm Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martin-klacer-arm commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants