[None] [feat] Add densegemm backend for MoE#10479
Conversation
298c140 to
edc2b8f
Compare
|
/bot run |
|
PR_Github #37813 [ run ] triggered by Bot. Commit: |
|
PR_Github #37813 [ run ] completed with state
|
517f8fc to
afed3ed
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #38695 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThis PR introduces Dense GEMM-based MoE support for NVFP4 quantization, adding Blackwell SM100 kernels with SwiGLU fusion (FC1) and FC2 dense projection paths, custom PyTorch operations, a new Changes
Sequence DiagramsequenceDiagram
participant Input as Input Tensor
participant Router as Router
participant Quant as NVFP4 Quantizer
participant FC1 as Dense GEMM FC1<br/>(SwiGLU Fusion)
participant FC2Alpha as FC2 Alpha Gen<br/>(Optional Fused)
participant FC2 as Dense GEMM FC2
participant Output as Output Tensor
Input->>Router: token logits
Router->>Quant: routing decisions<br/>expert assignments
Quant->>Quant: quantize input to NVFP4<br/>compute scales
alt Fused FC2 Alpha Path
Quant->>FC1: x, x_sf, weights<br/>alpha, weight_scales
FC1->>FC1: SwiGLU fusion<br/>per-expert scaling
FC1->>FC2Alpha: FC1 output<br/>expert scales
FC2Alpha->>FC2Alpha: gen_fc2_alpha_fused<br/>normalized alpha
FC2Alpha->>FC2: alpha_max (scalar)
else Standard Path
Quant->>FC1: x, x_sf, weights<br/>alpha, weight_scales
FC1->>FC1: SwiGLU fusion<br/>per-expert scaling
FC1->>FC2: FC1 output
FC2->>FC2: per-token-per-expert<br/>alpha computation
end
FC2->>FC2: dense GEMM projection<br/>per-expert scaling
FC2->>Output: final MoE output<br/>output_scale (FP4)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can use your project's `pylint` configuration to improve the quality of Python code reviews.Add a pylint configuration file to your project to customize how CodeRabbit runs |
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/modules/fused_moe/create_moe.py (1)
148-153:⚠️ Potential issue | 🟠 MajorAdd
DenseGEMMFusedMoEto the load-balancer allowlist.The new DenseGEMM branch forwards
init_load_balancer, but the upfront assertion still excludesDenseGEMMFusedMoE. Any DenseGEMM call that reachescreate_moe_backend()with MoE load balancing enabled will raise before construction, especially whenENABLE_CONFIGURABLE_MOE=0or callers usecreate_moe_backend()directly.🛠️ Suggested fix
moe_load_balancer = get_moe_load_balancer() if moe_load_balancer is not None: assert moe_cls in [ WideEPMoE, CutlassFusedMoE, TRTLLMGenFusedMoE, CuteDslFusedMoE, - DeepGemmFusedMoE - ], "MoE Load Balance is only supported in WideEPMoE, CutlassFusedMoE, TRTLLMGenFusedMoE, CuteDslFusedMoE, and DeepGemmFusedMoE." + DeepGemmFusedMoE, DenseGEMMFusedMoE + ], "MoE Load Balance is only supported in WideEPMoE, CutlassFusedMoE, TRTLLMGenFusedMoE, CuteDslFusedMoE, DeepGemmFusedMoE, and DenseGEMMFusedMoE."Also applies to: 290-305
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py` around lines 148 - 153, The assertion that guards MoE load-balancer support excludes the new DenseGEMMFusedMoE class, causing calls that forward init_load_balancer (e.g., via create_moe_backend when moe_load_balancer is set) to raise; update the allowlist in the assertion that references WideEPMoE, CutlassFusedMoE, TRTLLMGenFusedMoE, CuteDslFusedMoE, DeepGemmFusedMoE to also include DenseGEMMFusedMoE, and make the same change for the second identical assertion later in the file so DenseGEMMFusedMoE is permitted wherever load balancing is checked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/custom_ops/cute_dsl_custom_ops.py`:
- Around line 3035-3042: The cache key tuple used when caching DenseGEMM kernels
(the variable named cache_key built from self.weight_per_expert, mma_tiler_mn,
cluster_shape_mn, self.scaling_vector_size, self.expert_count, and the
alpha_post flag) must include the output element type; add output_dtype (or the
name used in the method that represents the requested output element type) into
that tuple so BF16-compiled kernels cannot be reused for FP16/FP32/FP4 calls.
Apply the same change to the other occurrence that builds a DenseGEMM cache_key
(the second cache_key construction in the class) so both cache keys include
output_dtype.
- Around line 2895-2900: The mapping that sets c_cutlass_dtype currently uses
.get(self.output_dtype, cutlass.BFloat16) and silently falls back to BF16;
change this to explicitly validate self.output_dtype and raise a clear exception
(e.g., ValueError) when the dtype is unsupported instead of defaulting to
cutlass.BFloat16. Locate the assignments to c_cutlass_dtype in the DenseGEMM
runners (the tactic-probing and forward paths for the FC1 and FC2 runners) where
the mapping dict is used and replace the .get default with an explicit
lookup+error path that names self.output_dtype and the allowed dtypes. Apply the
same fix to the other occurrences referenced in the review (the FC1/FC2
tactic-probing and forward blocks) so both probing and runtime use fail-fast
behavior.
In `@tensorrt_llm/_torch/cute_dsl_kernels/blackwell/moe_as_dense_gemm/fc2.py`:
- Around line 2288-2302: The swizzled SF layouts use floor division (m // 128, n
// 128) which drops partial 128-wide tiles and can produce zero-sized tensors;
replace those with ceil-div tile counts (e.g. compute m_tiles = (m + 127)//128
and n_tiles = (n + 127)//128) and use m_tiles/n_tiles in the calls to
cute.make_ordered_layout for a_sf and b_sf (or add explicit 128-tile guards in
the validation path), so the layouts always account for partial tiles and never
produce zero-length dimensions when m<128 or n not divisible by 128.
- Around line 1-27: The file
tensorrt_llm/_torch/cute_dsl_kernels/blackwell/moe_as_dense_gemm/fc2.py
currently has a BSD-3-Clause header; replace that entire BSD header block with
the repository-standard NVIDIA Apache-2.0 license header using the latest
modification year (2026) and include any necessary upstream attribution
separately (do not mix BSD text into the file header). Ensure the new header
matches the project template used across *.py sources (Apache-2.0 text and
NVIDIA copyright line), and leave the rest of fc2.py unchanged.
In `@tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py`:
- Around line 1095-1100: The dispatch logic in configurable_moe.py fails to pass
backend-specific kwargs to DenseGEMMFusedMoE because the earlier tuple check
(CutlassFusedMoE, DeepGemmFusedMoE, CuteDslFusedMoE, DenseGEMMFusedMoE) is a
no-op and later branches use exact class equality (self.backend.__class__ ==) so
DenseGEMMFusedMoE is skipped; fix by either adding an explicit branch for
DenseGEMMFusedMoE that passes enable_alltoall and moe_output (e.g., elif
self.backend.__class__ == DenseGEMMFusedMoE: ...) or, preferably, change the
equality checks to isinstance(self.backend, CutlassFusedMoE) /
isinstance(self.backend, DenseGEMMFusedMoE) (or a common base) so inherited
classes receive the correct kwargs in the dispatch that sets enable_alltoall and
moe_output for the backend.
In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_densegemm.py`:
- Around line 67-75: The fallback to compute expert_size as
token_final_scales.shape[1] when alpha is None is wrong because that is top_k
not total experts and causes scatter_ indexing OOB; change the logic in the
block that builds fc2_alpha (around token_selected_experts, alpha,
token_final_scales, fc2_alpha, scatter_) so expert_size is the actual expert
count (either use an explicit expert_count argument or compute expert_count =
int(token_selected_experts.max().item()) + 1) when alpha is None, allocate
fc2_alpha with that expert_size, and then perform the scatter_. Ensure
token_selected_experts.long() still indexes into the resized fc2_alpha.
In
`@tests/scripts/cute_dsl_kernels/moe_as_dense_gemm/run_moe_as_dense_gemm_fc1.py`:
- Around line 180-182: The code computes weight_per_expert = n // expert_count
but does not validate divisibility, causing silent truncation (e.g.,
mnkl=(512,256,256,1) with expert_count=257 yields 0); update the logic in this
module to first assert or raise a clear error if n % expert_count != 0 and
choose consistent default values so defaults are divisible by expert_count
(adjust mnkl and/or expert_count), and apply the same validation to the other
occurrence of the calculation (the block around the second occurrence computing
weight_per_expert at lines referenced in the review); reference the variables
weight_per_expert, n, expert_count and the default mnkl tuple when making these
changes.
- Around line 1-27: The file run_moe_as_dense_gemm_fc1.py contains an incorrect
BSD-3-Clause header and a 2025 copyright year; replace the existing top-of-file
license block with the repository's standard NVIDIA Apache-2.0 Python header and
update the copyright year to 2026 (the latest meaningful modification), ensuring
the new header sits at the very top of run_moe_as_dense_gemm_fc1.py before any
imports or code so it matches the repo policy for .py sources.
- Around line 59-65: The fallback import in the try/except block around
importing tensorrt_llm._torch.cute_dsl_kernels.blackwell.moe_as_dense_gemm (the
code that inserts into sys.path before importing blackwell.moe_as_dense_gemm as
kernel_module) uses Path(__file__).parents[3], which points to tests/, so change
the inserted path computation to use Path(__file__).parents[4] so the
sys.path.insert(0, str(...)) points to the repository
root/tensorrt_llm/_torch/cute_dsl_kernels and the fallback import succeeds.
- Around line 947-952: The parser arguments for the flags "--vectorized_f32" and
"--use_cupti" use action="store_true" together with default=True so they always
parse to True; update their parser.add_argument calls so the flags can be
disabled at the CLI: either set default=False when using action="store_true" (so
passing the flag enables the feature) or add complementary flags using
action="store_false" (e.g., "--no-vectorized_f32" / "--no-use_cupti") with
default=True to allow turning them off; modify the parser.add_argument calls for
the entries that reference "--vectorized_f32" and "--use_cupti" accordingly.
In
`@tests/scripts/cute_dsl_kernels/moe_as_dense_gemm/run_moe_as_dense_gemm_fc2.py`:
- Around line 1-27: Replace the current BSD-3-Clause header in
run_moe_as_dense_gemm_fc2.py with the repository-standard NVIDIA Apache-2.0
header (using the latest modification year), or if the BSD notice must be
preserved, prepend the official Apache-2.0 header above the existing BSD block;
ensure the file begins with the full Apache-2.0 license text and the NVIDIA
copyright line per project guidelines.
- Around line 62-68: The fallback sys.path insertion uses
Path(__file__).parents[3] which points to tests/ (one directory too shallow)
causing the fallback import of blackwell.moe_as_dense_gemm.fc2 to fail; update
the fallback to insert the correct parent directory by changing
Path(__file__).parents[3] to Path(__file__).parents[4] (so the inserted path is
.../tensorrt_llm/_torch/cute_dsl_kernels) before importing kernel_module,
ensuring the direct python run_moe_as_dense_gemm_fc2.py flow can locate the
module.
In `@tests/unittest/_torch/modules/moe/test_moe_backend.py`:
- Around line 471-473: The test mutates process-wide env var
TRTLLM_MOE_FUSED_FC2_ALPHA when backend_type == MoeBackendType.DENSEGEMM and
never restores it; change the code to isolate this override by using pytest's
monkeypatch (e.g., monkeypatch.setenv("TRTLLM_MOE_FUSED_FC2_ALPHA","0") within
the test or fixture that sets backend_type) or save the original os.environ
value and restore it in a finally/teardown block so the env is returned to its
prior state after the parametrized case; update the code around the backend_type
check (the branch referencing MoeBackendType.DENSEGEMM and
TRTLLM_MOE_FUSED_FC2_ALPHA) to use monkeypatch or explicit restore.
In `@tests/unittest/_torch/modules/moe/test_moe_module.py`:
- Around line 1059-1061: The test currently sets
os.environ["TRTLLM_MOE_FUSED_FC2_ALPHA"]="0" unconditionally which leaks into
the pytest process; replace that with the pytest monkeypatch fixture to localize
the change: when checking moe_backend == MoeBackendType.DENSEGEMM.value, call
monkeypatch.setenv("TRTLLM_MOE_FUSED_FC2_ALPHA", "0") instead of writing to
os.environ so the environment is restored after the test (reference
TRTLLM_MOE_FUSED_FC2_ALPHA, moe_backend, and MoeBackendType.DENSEGEMM.value).
In `@tests/unittest/_torch/thop/parallel/test_moe_densegemm.py`:
- Around line 1-2: The file test_moe_densegemm.py currently contains only an
SPDX short-form header; replace it with the repository’s full NVIDIA Apache-2.0
header block (include the full multi-line copyright/Apache 2.0 license text and
the latest modification year, e.g., 2026) so the top of test_moe_densegemm.py
matches the standard header used across Python sources in the repo.
---
Outside diff comments:
In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py`:
- Around line 148-153: The assertion that guards MoE load-balancer support
excludes the new DenseGEMMFusedMoE class, causing calls that forward
init_load_balancer (e.g., via create_moe_backend when moe_load_balancer is set)
to raise; update the allowlist in the assertion that references WideEPMoE,
CutlassFusedMoE, TRTLLMGenFusedMoE, CuteDslFusedMoE, DeepGemmFusedMoE to also
include DenseGEMMFusedMoE, and make the same change for the second identical
assertion later in the file so DenseGEMMFusedMoE is permitted wherever load
balancing is checked.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0fcf4776-2a01-4cd1-a2ac-f6f955ae18ad
📒 Files selected for processing (15)
tensorrt_llm/_torch/custom_ops/cute_dsl_custom_ops.pytensorrt_llm/_torch/cute_dsl_kernels/blackwell/moe_as_dense_gemm/__init__.pytensorrt_llm/_torch/cute_dsl_kernels/blackwell/moe_as_dense_gemm/fc1.pytensorrt_llm/_torch/cute_dsl_kernels/blackwell/moe_as_dense_gemm/fc2.pytensorrt_llm/_torch/modules/fused_moe/configurable_moe.pytensorrt_llm/_torch/modules/fused_moe/create_moe.pytensorrt_llm/_torch/modules/fused_moe/fused_moe_densegemm.pytensorrt_llm/_torch/utils.pytensorrt_llm/llmapi/llm_args.pytests/scripts/cute_dsl_kernels/moe_as_dense_gemm/run_moe_as_dense_gemm_fc1.pytests/scripts/cute_dsl_kernels/moe_as_dense_gemm/run_moe_as_dense_gemm_fc2.pytests/unittest/_torch/modules/moe/moe_test_utils.pytests/unittest/_torch/modules/moe/test_moe_backend.pytests/unittest/_torch/modules/moe/test_moe_module.pytests/unittest/_torch/thop/parallel/test_moe_densegemm.py
|
PR_Github #38695 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #39108 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
Add DenseGEMMFusedMoE to the ConfigurableMoE supported backend list so it follows the same composition-based execution flow as other backends (routing, quantization, communication, computation). Signed-off-by: Zongfei Jing <[email protected]>
Signed-off-by: Zongfei Jing <[email protected]>
Add the 2CTA MMA tile (256,256) with cluster shape (2,1) to the FC1 SwiGLU kernel autotuner candidates. Benchmarks show this config is optimal for M=144~256 and M=400~512 on B200. Signed-off-by: Zongfei Jing <[email protected]>
- Add assert for alpha!=None in gen_fc2_alpha_fused fallback path - Add DenseGEMMFusedMoE to MoE Load Balancer supported list - Reduce load_weights peak memory by replacing clone() with transpose().contiguous() Signed-off-by: Zongfei Jing <[email protected]>
- Add SM100/103 capability check in get_moe_cls() with graceful fallback
to CutlassFusedMoE, and assert in DenseGEMMFusedMoE.__init__()
- Accept and validate activation_type parameter to reject non-SwiGLU
activations at construction time instead of silently using wrong semantics
- Add output_dtype to FC1/FC2 kernel cache keys to prevent cache collisions
when called with different output types
- Strip {$nv-internal-release} markers from fc1.py, fc2.py, and test scripts
- Fix misleading load_weights comment about peak memory behavior
- Remove stray print() in test_moe_densegemm.py
Signed-off-by: Zongfei Jing <[email protected]>
…type fix - Replace silent .get(dtype, BFloat16) fallback with explicit validation and raise ValueError for unsupported output_dtype in FC1/FC2 runners - Extract dtype mapping as class-level _CUTLASS_DTYPE_MAP constant to eliminate 4x duplication across get_valid_tactics()/forward() methods - Fix activation_type assertion: accept parameter in DenseGEMMFusedMoE __init__ and pass through create_moe_backend() so non-SwiGLU requests are properly rejected instead of silently defaulting to SwiGLU Signed-off-by: Zongfei Jing <[email protected]>
Replace BSD-3-Clause headers with Apache-2.0 to match the rest of the cute_dsl_kernels/blackwell/ directory convention. Signed-off-by: Zongfei Jing <[email protected]>
Add DENSEGEMM entries to l0_b200.yml for test_moe_backend and test_configurable_moe_single_gpu alongside existing CUTLASS/TRTLLM/ CUTEDSL/DEEPGEMM entries. Signed-off-by: Zongfei Jing <[email protected]>
The FC1 kernel autotuner was using get_last_power_of_2_num_tokens_buckets which only generated power-of-2 M values (1,2,4,...,256), missing optimal configs for non-power-of-2 token counts. Switch to deep_gemm_gen_tuning_buckets (step-8 for M<128, step-128 for M>=128) and increase tune_max_num_tokens from 256 to 512 to cover the full operating range. Also consolidate deep_gemm_gen_tuning_buckets into utils.py as a shared utility. Signed-off-by: Zongfei Jiang <[email protected]> Signed-off-by: Zongfei Jing <[email protected]>
- Add can_implement() to DenseGEMMFusedMoE to accurately report backend capabilities (SM100/103, NVFP4-only, no swiglu_gptoss_style), instead of inheriting the overly permissive CutlassFusedMoE implementation. - Replace magic seed number 1111 with named DEFAULT_RANDOM_SEED constant in both FC1 and FC2 test scripts. - Move nested helper functions out of run() to module level in FC1 test script (simulate_f8_quantization, simulate_nvfp4_quantization, compute_scale_factor, apply_quantization_scale, unswizzle_kernel_sfc, ceil_div). Remove redundant local ceil_div definitions in both scripts. - Fix fallback import path in both FC1 and FC2 test scripts: parents[3] pointed to tests/ instead of repo root. Changed to parents[4]. Signed-off-by: Zongfei Jiang <[email protected]> Signed-off-by: Zongfei Jing <[email protected]>
The intermediate_size >= 14336 skip was conservatively copied from the CuteDSL/TRTLLMGen backends, but DenseGEMM does not have the same FP4 error accumulation issue. Verified Mixtral config (e=8, k=2, h=4096, i=14336) passes at both seq_len=1 and seq_len=8. Signed-off-by: Zongfei Jiang <[email protected]> Signed-off-by: Zongfei Jing <[email protected]>
The fused fc2_alpha path (TRTLLM_MOE_FUSED_FC2_ALPHA) has a known accuracy issue under TP where the scalar fc2_alpha_max gets summed tp_size times during ReduceScatter. Change the default from enabled to disabled so the non-fused per-expert path is used, which correctly factors out of the TP reduction. Also add should_skip_densegemm to the multi-GPU test parameter generation so DenseGEMM correctly skips EP modes (DEP/TEP) and validates TP alignment constraints. Signed-off-by: Zongfei Jing <[email protected]>
…tlassFusedMoE DenseGEMMFusedMoE was inheriting from CutlassFusedMoE but overriding most of its core methods while only reusing a few (create_weights, forward_impl, load_weights). This tight coupling was misleading since the two backends have fundamentally different architectures: CutlassFusedMoE uses per-expert scattered GEMM with alltoall support, while DenseGEMM packs all experts into a single dense matrix for min-latency scenarios (NVFP4, SM100/103 only). Changes: - DenseGEMMFusedMoE now inherits from MoE base class directly - Implements its own create_weights, load_weights, forward_impl, and _get_quant_method independently - Simplified forward_impl without chunking/alltoall logic - Added isinstance check for DenseGEMMFusedMoE in ConfigurableMoE's float32 assertion for token_final_scales - Fixed env var leak in test_moe_backend.py by using monkeypatch Signed-off-by: Zongfei Jing <[email protected]>
|
/bot run --disable-fail-fast |
|
PR_Github #40899 [ run ] triggered by Bot. Commit: |
|
PR_Github #40899 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41037 [ run ] triggered by Bot. Commit: |
|
PR_Github #41037 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41062 [ run ] triggered by Bot. Commit: |
|
PR_Github #41062 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41128 [ run ] triggered by Bot. Commit: |
|
PR_Github #41128 [ run ] completed with state |
Signed-off-by: Zongfei Jing <[email protected]>
@coderabbitai summary
Description
Add a new DenseGEMM backend for MoE that reshapes all expert weights into a single dense matrix and performs one large GEMM call per FC layer, targeting minimum latency on Blackwell (SM100/SM103) with NVFP4 quantization.
Key design
Instead of the traditional per-expert grouped GEMM approach, DenseGEMM concatenates all expert weights along the N (FC1) or K (FC2) dimension and executes a single dense GEMM. This trades flexibility for maximum GPU utilization at small batch sizes where per-expert GEMMs underutilize SMs.
nvfp4_gemmcall (controlled byTRTLLM_MOE_FUSED_FC2_ALPHAenv var, default: enabled)Files added/modified
cute_dsl_kernels/blackwell/moe_as_dense_gemm/fc1.py,fc2.pycustom_ops/cute_dsl_custom_ops.py(+738 lines)TunableRunnerwrappers with autotuner integration for FC1/FC2 kernelsfused_moe/fused_moe_densegemm.py(new)DenseGEMMFusedMoEclass: weight transpose, NVFP4 quantization, fc2_alpha fusion, CUDA stream overlapcreate_moe.py,configurable_moe.py,llm_args.py,utils.pyDENSEGEMMoption inMoeConfig,MoeFc2Alphaaux streamtest_moe_densegemm.py(564 cases),test_moe_backend.py,test_moe_module.pyrun_moe_as_dense_gemm_fc1.py,run_moe_as_dense_gemm_fc2.pyConstraints
get_moe_cls()gracefully falls back to CutlassFusedMoE on other architectures.quant_mode.has_nvfp4()activation_typeis rejected at construction time.intermediate_sizemust be 256-aligned — FC2 kernel tiles K with MMA tile size 256; expert boundaries must align with tile boundaries.Test Coverage
test_moe_densegemm.pytest_moe_densegemm.pytest_moe_backend.py(DenseGEMM added to parametrized matrix with skip guards)test_moe_module.py(ConfigurableMoE + DenseGEMM integration)PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.