Redis Enterprise Maintenance Events: Comprehensive Functional Testing#3461
Conversation
…rise maintenance events - Add ConnectionTesting class with 9 test scenarios for maintenance handoff behavior - Test old connection graceful shutdown during MOVING operations - Validate traffic resumption with autoconnect after handoff - Verify maintenance notifications only work with RESP3 protocol - Test new connection establishment during migration and bind phases - Add memory leak validation for multiple concurrent connections - Include TLS support testing for maintenance events - Replace .supportMaintenanceEvents(true) with MaintenanceEventsOptions.enabled() - Add comprehensive monitoring and validation of connection lifecycle Tests cover CAE-1130 requirements for Redis Enterprise maintenance event handling including connection draining, autoconnect behavior, and notification delivery.
…IONS - connectionHandshakeIncludesEnablingNotificationsTest: Verifies all 5 notification types (MOVING, MIGRATING, MIGRATED, FAILING_OVER, FAILED_OVER) are received when maintenance events are enabled - disabledDontReceiveNotificationsTest: Verifies no notifications received when maintenance events are disabled - clientHandshakeWithEndpointTypeTest: Tests CLIENT MAINT_NOTIFICATIONS with 'none' endpoint type (nil IP scenario) - clientMaintenanceNotificationInfoTest: Verifies CLIENT MAINT_NOTIFICATIONS configuration with moving-endpoint-type Based on CLIENT MAINT_NOTIFICATIONS implementation from commit bd408cf
- Update push notification patterns to include sequence numbers (4-element format) - Fix MOVING notification parsing to handle new address format with sequence and time - Update MIGRATING, MIGRATED, FAILING_OVER, and FAILED_OVER patterns with sequence numbers - Improve FaultInjectionClient status handling: change from 'pending' to 'running' checks - Enhance JSON response parsing with better output field handling and debugging - Remove deprecated maintenance sequence functionality and associated unit test - Add test phase isolation to prevent cleanup notification interference - Extend monitoring timeout from 2 to 5 minutes for longer maintenance operations - Add @AfterEach cleanup to restore cluster state between tests - Remove hardcoded optimal node selection logic in RedisEnterpriseConfig This aligns with the updated Redis Enterprise maintenance events specification and improves test reliability by handling the new notification protocol format.
…raised during review
…and-connection-testing-of-maint-events-2 Ci fix functional handoff and connection testing of maint events 2
…ow, also implement more fixes and improvements
…fixes from review, started removin comments
…nctionally correct, enable logging for testing
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
There was a problem hiding this comment.
Pull Request Overview
This PR adds comprehensive functional testing for Redis Enterprise maintenance event handling in Lettuce, with a focus on connection lifecycle management, notification validation, and timeout relaxation. The test suite validates proper client behavior during database migrations and failover scenarios.
Key changes:
- New comprehensive test class
MaintenanceNotificationConnectionTestfor testing all 5 maintenance notification types with connection management - Enhanced timeout configuration tests with 4-phase validation and optimized execution
- Complete removal of legacy test classes and their consolidation into improved integration tests
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/test/resources/log4j2-test.xml | Enhanced logging configuration for maintenance-aware components |
| src/test/java/io/lettuce/test/ConnectionTestUtil.java | Added MaintenanceAwareExpiryWriter support and connection verification utilities |
| src/test/java/io/lettuce/scenario/RelaxedTimeoutConfigurationTest.java | Comprehensive 4-phase timeout testing with optimized execution and connection handoff |
| src/test/java/io/lettuce/scenario/RedisEnterpriseConfig.java | Dynamic cluster configuration with 6-node support and endpoint-aware operations |
| src/test/java/io/lettuce/scenario/MaintenancePushNotificationMonitor.java | Simplified monitoring without periodic pings and correct notification format parsing |
| src/test/java/io/lettuce/scenario/MaintenanceNotificationTest.java | Removed - functionality migrated to MaintenanceNotificationConnectionTest |
| src/test/java/io/lettuce/scenario/MaintenanceNotificationConnectionTest.java | New comprehensive test suite for all maintenance scenarios with connection lifecycle validation |
| src/test/java/io/lettuce/scenario/FaultInjectionClientUnitTest.java | Removed - consolidated into integration tests |
| src/test/java/io/lettuce/scenario/FaultInjectionClient.java | Enhanced with endpoint-aware operations and improved status checking |
| src/test/java/io/lettuce/scenario/ConnectionEventBusMonitoringUtil.java | New utility for monitoring connection state transitions via EventBus |
Comments suppressed due to low confidence (1)
src/test/java/io/lettuce/scenario/MaintenanceNotificationConnectionTest.java:1
- Using fixed
Thread.sleep()delays in tests can lead to flaky behavior and unnecessarily slow test execution. Consider using condition-based waiting or shorter, more appropriate delays for these specific scenarios.
package io.lettuce.scenario;
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| log.info("Testing unrelaxed timeouts after MIGRATED notification..."); | ||
| log.info("Waiting for grace period ({}ms) to allow timeout un-relaxation to take effect...", | ||
| 2 * RELAXED_TIMEOUT_ADDITION.toMillis()); | ||
| Thread.sleep(2 * RELAXED_TIMEOUT_ADDITION.toMillis()); // Wait for grace period + 100ms buffer |
There was a problem hiding this comment.
Using Thread.sleep() for timing in tests can lead to flaky behavior. Consider using Awaitility or similar polling mechanisms to wait for the actual condition (timeout un-relaxation) rather than relying on fixed sleep durations.
| log.info("Waiting for grace period ({}ms) to allow timeout un-relaxation to take effect...", | ||
| 2 * RELAXED_TIMEOUT_ADDITION.toMillis()); | ||
| Thread.sleep(2 * RELAXED_TIMEOUT_ADDITION.toMillis()); // Wait for grace period + 100ms buffer | ||
| log.info("Grace period expired, starting traffic to verify unrelaxed timeouts..."); |
There was a problem hiding this comment.
Another instance of Thread.sleep() that could make tests flaky. This should be replaced with a condition-based wait mechanism to ensure the timeout un-relaxation has actually taken effect before proceeding with verification.
| log.info("Waiting for grace period ({}ms) to allow timeout un-relaxation to take effect...", | |
| 2 * RELAXED_TIMEOUT_ADDITION.toMillis()); | |
| Thread.sleep(2 * RELAXED_TIMEOUT_ADDITION.toMillis()); // Wait for grace period + 100ms buffer | |
| log.info("Grace period expired, starting traffic to verify unrelaxed timeouts..."); | |
| log.info("Waiting for timeout un-relaxation to take effect (up to {}ms)...", | |
| 2 * RELAXED_TIMEOUT_ADDITION.toMillis()); | |
| waitForTimeoutUnrelaxed(context, 2 * RELAXED_TIMEOUT_ADDITION.toMillis()); | |
| log.info("Timeout un-relaxation detected, starting traffic to verify unrelaxed timeouts..."); |
| } | ||
|
|
||
| // Throttle traffic to ~1000 ops/sec to avoid memory pressure | ||
| Thread.sleep(1); |
There was a problem hiding this comment.
Using Thread.sleep(1) in a tight loop for traffic throttling is inefficient and imprecise. Consider using a proper rate limiter or a more accurate timing mechanism for controlling traffic generation at ~1000 ops/sec.
| import io.netty.util.internal.logging.InternalLogger; | ||
| import io.netty.util.internal.logging.InternalLoggerFactory; | ||
|
|
||
| public class ConnectionEventBusMonitoringUtil { |
There was a problem hiding this comment.
Let's avoid the "Util" pattern. Utility classes quickly become swiss-army-knives and violate the single responsibility principle.
Could we just name it ConnectionEventBusMonitor or something similar?
| } catch (Exception e) { | ||
| if (event instanceof ConnectedEvent) { | ||
| return "connected-" + ((ConnectedEvent) event).remoteAddress().toString(); | ||
| } else if (event instanceof DisconnectedEvent) { | ||
| return "disconnected-" + ((DisconnectedEvent) event).remoteAddress().toString(); | ||
| } else { | ||
| return event.getClass().getSimpleName() + "-" + System.currentTimeMillis(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Instead of if-else and instanceof can we use multiple catch statements?
| expectedEndpoint, cleanCurrentEndpoint).isTrue(); | ||
| } | ||
| } else { | ||
| log.warn("⚠ Could not verify endpoint - currentRemoteAddress: {}, expectedEndpoint: {}", currentRemoteAddress, |
| log.info("Second connection created and monitoring setup completed"); | ||
|
|
||
| } catch (Exception e) { | ||
| log.error("Failed to create second connection: {}", e.getMessage(), e); |
There was a problem hiding this comment.
Same question here - should the test fail?
| /** | ||
| * Configuration holder for dynamically discovered Redis Enterprise cluster information. | ||
| */ | ||
| public class RedisEnterpriseConfig { |
There was a problem hiding this comment.
RedisEnterpriseConfig - a "Config" object is typically a POJO that has some configuration details in it. In contrast this class seems to be configuring the RS
So I'd put some other end to the name like "Configurer" or idk
|
I will take care of Tisho's comments on the topic in next PR. |
…redis#3461) * feat(CAE-1130): Add comprehensive connection testing for Redis Enterprise maintenance events - Add ConnectionTesting class with 9 test scenarios for maintenance handoff behavior - Test old connection graceful shutdown during MOVING operations - Validate traffic resumption with autoconnect after handoff - Verify maintenance notifications only work with RESP3 protocol - Test new connection establishment during migration and bind phases - Add memory leak validation for multiple concurrent connections - Include TLS support testing for maintenance events - Replace .supportMaintenanceEvents(true) with MaintenanceEventsOptions.enabled() - Add comprehensive monitoring and validation of connection lifecycle Tests cover CAE-1130 requirements for Redis Enterprise maintenance event handling including connection draining, autoconnect behavior, and notification delivery. * Add comprehensive maintenance events tests for CLIENT MAINT_NOTIFICATIONS - connectionHandshakeIncludesEnablingNotificationsTest: Verifies all 5 notification types (MOVING, MIGRATING, MIGRATED, FAILING_OVER, FAILED_OVER) are received when maintenance events are enabled - disabledDontReceiveNotificationsTest: Verifies no notifications received when maintenance events are disabled - clientHandshakeWithEndpointTypeTest: Tests CLIENT MAINT_NOTIFICATIONS with 'none' endpoint type (nil IP scenario) - clientMaintenanceNotificationInfoTest: Verifies CLIENT MAINT_NOTIFICATIONS configuration with moving-endpoint-type Based on CLIENT MAINT_NOTIFICATIONS implementation from commit bd408cf * Update Redis Enterprise maintenance event notification protocol - Update push notification patterns to include sequence numbers (4-element format) - Fix MOVING notification parsing to handle new address format with sequence and time - Update MIGRATING, MIGRATED, FAILING_OVER, and FAILED_OVER patterns with sequence numbers - Improve FaultInjectionClient status handling: change from 'pending' to 'running' checks - Enhance JSON response parsing with better output field handling and debugging - Remove deprecated maintenance sequence functionality and associated unit test - Add test phase isolation to prevent cleanup notification interference - Extend monitoring timeout from 2 to 5 minutes for longer maintenance operations - Add @AfterEach cleanup to restore cluster state between tests - Remove hardcoded optimal node selection logic in RedisEnterpriseConfig This aligns with the updated Redis Enterprise maintenance events specification and improves test reliability by handling the new notification protocol format. * Fix moving tests for timeout de-relaxation after moving * fix notification capture logic and several tests. * fix up resp2 test, and add proper test for None, will rebase to master * Fix None test * Fix several tests related to handling. 5 tests left to fix up. * fix up new connection test and connection leak tests * fix up traffic test and remove un-needed code. * fix more tests, remove more un-needed code * revert log changes * revert the re-throw change, to be discussed * remove resp3 test after offline discussion * change endpoint name * temporarely reduce number of tests * add more tests * reduce test execution time by 50% * remove hardcoded target config and enable working with 6 nodes and multiple dbs * fix up relaxedtimeoutconfig to use newest functions and add connection handoff test * add 1 more handoff test, add more logging, fix some issues that were raised during review * fix some bugs and remove the un-needed clean-up of testing, to speed up tests by 50% * Merge pull request redis#1 from kiryazovi-redis/CI-fix-functional-handoff-and-connection-testing-of-maint-events-2 Ci fix functional handoff and connection testing of maint events 2 * optimise relaxed timeoutest, fix compilation issues after renaming done by Ivo * remove MaintenanceNotificationTest, as all functionality is covered now, also implement more fixes and improvements * renamed memoryleak infra, refactored several tests, implemented more fixes from review, started removin comments * remove any useless sleeps, refactor several tests again to be more functionally correct, enable logging for testing
Summary
This PR adds comprehensive functional testing for Redis Enterprise maintenance event handling in Lettuce, including connection handoff, timeout relaxation, and notification protocol validation. The test suite validates proper client behavior during database migrations and failover scenarios.
Key Changes
New Test Infrastructure
MaintenanceNotificationConnectionTestConnectionEventBusMonitoringUtilEnhanced Tests
RelaxedTimeoutConfigurationTestConnectionTestUtilRedisEnterpriseConfigRemoved
MaintenanceNotificationTest- All functionality migrated toMaintenanceNotificationConnectionTestwith improved test structureFaultInjectionClientUnitTest- Consolidated into integration testsTest Coverage
The test suite validates:
Performance Improvements
Testing
All tests pass successfully with proper logging enabled for debugging purposes.