Fixed failing KleidiAI NHWC unit tests#29010
Conversation
* Fixed tests ConvFloat_UsesNhwcOnlyWithKleidi, FusedConvFloat_UsesNhwcOnlyWithKleidi * Added check to only test NHWC shape parameters if NHWC path was taken in graph optimisation Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Verdict: Request changes — the fix is too permissive, and there is a one-line tighter alternative readily availableThe diagnosis is correct: when ORT is built with The coverage gap introduced by this patchThe original test made two assertions, depending on
In the no-NCHWc + KleidiAI-available build (the common ARM64 CI configuration), the original test catches a regression where the NHWC transformer fails to fire. The new test passes silently in that case — The test name ( A better fix exists, one line awayThe same condition that registers // Register the NCHWc layout transformer if supported by the platform.
if (MlasNchwcGetBlockSize() > 1) {
transformers.emplace_back(std::make_unique<NchwcTransformer>());
}
auto check_nhwc_graph = [&](InferenceSessionWrapper& session) {
auto op_to_count = CountOpsInGraph(session.GetGraph());
const bool nchwc_can_intercept = MlasNchwcGetBlockSize() > 1;
const bool expect_nhwc = HasFloatNhwcNoTransposeSupport({1, 8, 7, 7}, {16, 8, 3, 3}, {1, 1, 1, 1});
if (nchwc_can_intercept) {
// NCHWc transformer runs first and may rewrite the graph before the NHWC transformer.
// Either NCHWc or NHWC is acceptable; we only verify "no NHWC without Kleidi support".
if (op_to_count["com.microsoft.NhwcFusedConv"] > 0) {
EXPECT_TRUE(expect_nhwc);
EXPECT_EQ(op_to_count["com.microsoft.NhwcFusedConv"], 1);
EXPECT_EQ(op_to_count["Transpose"], 2);
}
return;
}
// Original strict contract: NHWC is selected iff KleidiAI provides float NHWC support.
EXPECT_EQ(op_to_count["com.microsoft.NhwcFusedConv"], expect_nhwc ? 1 : 0);
EXPECT_EQ(op_to_count["Transpose"], expect_nhwc ? 2 : 0);
};This preserves the full original contract on every build that doesn't have NCHWc enabled (the vast majority), and only relaxes it on the build that genuinely cannot guarantee the layout choice. Same one-file scope as the current patch. Even better — if Unrelated and unexplained: the tolerance bumpnhwc_transformer_test.cc:534 (the second test only): - /*per_sample_tolerance*/ 1e-6,
+ /*per_sample_tolerance*/ 2e-6,
/*relative_per_sample_tolerance*/ 1e-6);The first test ( Either:
Without a justification it's a small but real loss of precision coverage that a reviewer can't tell whether to accept. Smaller things
Bottom lineRight diagnosis, fix too broad. Please gate the existing strict assertion on |
There was a problem hiding this comment.
Pull request overview
This PR updates NhwcTransformerTests to stop assuming that KleidiAI NHWC float-conv availability always implies the graph will be rewritten to com.microsoft.NhwcFusedConv, which can be false when NCHWc layout optimizations run earlier (e.g., with --enable_arm_neon_nchwc).
Changes:
- Relaxed assertions in two NHWC transformer tests to only validate
NhwcFusedConv-specific graph shape when that path is actually selected. - Increased the per-sample tolerance for the fused-conv test.
| // Only validate NhwcFusedConv graph shape if that path was selected | ||
| if (op_to_count["com.microsoft.NhwcFusedConv"] == 0) { | ||
| return; | ||
| } | ||
| EXPECT_TRUE(HasFloatNhwcNoTransposeSupport({1, 8, 7, 7}, {16, 8, 3, 3}, {1, 1, 1, 1})); | ||
| EXPECT_EQ(op_to_count["com.microsoft.NhwcFusedConv"], 1); | ||
| EXPECT_EQ(op_to_count["Transpose"], 2); |
There was a problem hiding this comment.
Addressed this in the latest commit, added an assert to the early return to ensure that if KleidiAI is enabled and an NhwcFusedConv isn't generated, a different optimiser consumed the Conv node rather than this being caused by an NhwcTransformer failure.
Tried to make this not directly tied to NCHWc as that would mean whenever a new Transformer is added that would take precedence over NHWC here in some cases, we'd need to update the test. My commit attempts to generalise this to any Transformer without needing to know the specifics, but I'm open to discussion or any suggestions about this point.
| // Only validate NhwcFusedConv graph shape if that path was selected | ||
| if (op_to_count["com.microsoft.NhwcFusedConv"] == 0) { | ||
| return; | ||
| } | ||
| EXPECT_TRUE(HasFloatNhwcNoTransposeSupport({1, 8, 7, 7}, {16, 8, 3, 3}, {1, 1, 1, 1}, {1, 1})); | ||
| EXPECT_EQ(op_to_count["com.microsoft.NhwcFusedConv"], 1); | ||
| EXPECT_EQ(op_to_count["Transpose"], 2); |
There was a problem hiding this comment.
Same reply as the previous Copilot comment
…merTests * Affected tests: ConvFloat_UsesNhwcOnlyWithKleidi, FusedConvFloat_UsesNhwcOnlyWithKleidi Signed-off-by: Martin Klacer <martin.klacer@arm.com>
|
Thanks for the review @hariharans29, that makes sense. I agreed the previous early return was too permissive, so I updated the two affected The tests now separate the two cases explicitly: if Since the test isn't validating NCHWc functionality, I'm attempting to find a solution that doesn't tie it into the test directly, but I'm open to discussion on this. I also added a short comment for the |
| // When the Arm® KleidiAI™ NHWC Conv path is available but no NHWC ops were produced, | ||
| // ensure that some other optimiser run and consumed the Conv node instead | ||
| EXPECT_TRUE(!kleidi_supported || op_to_count["Conv"] == 0); |
| // When the Arm® KleidiAI™ NHWC Conv path is available but no NHWC ops were produced, | ||
| // ensure that some other optimiser run and consumed the FusedConv node instead | ||
| EXPECT_TRUE(!kleidi_supported || op_to_count["com.microsoft.FusedConv"] == 0); |
Description
The current tests in the
NhwcTransformerTestssuiteConvFloat_UsesNhwcOnlyWithKleidiandFusedConvFloat_UsesNhwcOnlyWithKleidiassumed that whenever KleidiAI NHWC float-conv support is available, the test graph must be rewritten tocom.microsoft.NhwcFusedConv.That assumption is not valid when ONNX Runtime is built with
--enable_arm_neon_nchwc. In that cofiguration, the Level 3 NCHWc transformer is registered before the NHWC transformer, so the NCHWc rewrite can be selected instead. The optimiser priority is intentional, so these tests shouldn't require NHWC to be chosen over NCHWc.This change keeps the existing optimiser ordering and instead updates the assertions in the 2 tests. If the NHWC path is selected, the tests still validate the expected
NhwcFusedConvgraph shape and verify that the path is only used when KleidiAI NHWC support is available. If another valid layout optimisation is selected first, the tests no longer fail just because theNhwcFusedConvop isn't generated.Motivation and Context
This change fixes the 2 mentioned unit tests which fail when ONNX Runtime is built and tested with
--enable_arm_neon_nchwc.