Skip to content

Flaky test TestRemoteWrite_ReshardingWithoutDeadlock #17489

@juliusv

Description

@juliusv

Just had my Go CI tests for an unrelated UI PR fail multiple times, saw this Condition never satisfied error for the TestRemoteWrite_ReshardingWithoutDeadlock test:

--- FAIL: TestRemoteWrite_ReshardingWithoutDeadlock (30.66s)
    reload_test.go:194: time=2025-11-06T15:44:01.608Z level=INFO source=main.go:1557 msg="updated GOGC" old=100 new=75
    reload_test.go:194: time=2025-11-06T15:44:01.617Z level=INFO source=main.go:688 msg="Leaving GOMAXPROCS=4: CPU quota undefined" component=automaxprocs
    reload_test.go:194: time=2025-11-06T15:44:01.617Z level=INFO source=memlimit.go:150 msg="GOMEMLIMIT is already set, skipping" component=automemlimit package=github.com/KimMachineGun/automemlimit/memlimit GOMEMLIMIT=13103578KiB
    reload_test.go:194: time=2025-11-06T15:44:01.617Z level=INFO source=main.go:730 msg="No time or size retention was set so using the default time retention" duration=15d
    reload_test.go:194: time=2025-11-06T15:44:01.617Z level=INFO source=main.go:781 msg="Starting Prometheus Server" mode=server version="(version=, branch=, revision=unknown)"
    reload_test.go:194: time=2025-11-06T15:44:01.617Z level=INFO source=main.go:786 msg="operational information" build_context="(go=go1.25.4, platform=linux/amd64, user=, date=, tags=unknown)" host_details="(Linux 6.11.0-1018-azure #18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025 x86_64 104e817a2bf1 (none))" fd_limits="(soft=1048576, hard=1048576)" vm_limits="(soft=unlimited, hard=unlimited)"
    reload_test.go:194: time=2025-11-06T15:44:01.641Z level=INFO source=web.go:663 msg="Start listening for connections" component=web address=0.0.0.0:39399
    reload_test.go:194: time=2025-11-06T15:44:01.647Z level=INFO source=main.go:1301 msg="Starting TSDB ..."
    reload_test.go:194: time=2025-11-06T15:44:01.680Z level=INFO source=head.go:666 msg="Replaying on-disk memory mappable chunks if any" component=tsdb
    reload_test.go:194: time=2025-11-06T15:44:01.680Z level=INFO source=head.go:752 msg="On-disk memory mappable chunks replay completed" component=tsdb duration=12.664µs
    reload_test.go:194: time=2025-11-06T15:44:01.680Z level=INFO source=head.go:760 msg="Replaying WAL, this may take a while" component=tsdb
    reload_test.go:194: time=2025-11-06T15:44:01.681Z level=INFO source=head.go:833 msg="WAL segment loaded" component=tsdb segment=0 maxSegment=0 duration=491.766µs
    reload_test.go:194: time=2025-11-06T15:44:01.681Z level=INFO source=head.go:870 msg="WAL replay completed" component=tsdb checkpoint_replay_duration=117.449µs wal_replay_duration=586.171µs wbl_replay_duration=410ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=12.664µs total_replay_duration=822.042µs
    reload_test.go:194: time=2025-11-06T15:44:01.686Z level=INFO source=tls_config.go:346 msg="Listening on" component=web address=[::]:39399
    reload_test.go:194: time=2025-11-06T15:44:01.686Z level=INFO source=tls_config.go:349 msg="TLS is disabled." component=web http2=false address=[::]:39399
    reload_test.go:194: time=2025-11-06T15:44:01.782Z level=INFO source=main.go:1322 msg="filesystem information" fs_type=794c7630
    reload_test.go:194: time=2025-11-06T15:44:01.782Z level=INFO source=main.go:1325 msg="TSDB started"
    reload_test.go:194: time=2025-11-06T15:44:01.782Z level=INFO source=main.go:1510 msg="Loading configuration file" filename=/tmp/TestRemoteWrite_ReshardingWithoutDeadlock4001292031/001/prometheus.yml
    reload_test.go:194: time=2025-11-06T15:44:01.793Z level=INFO source=watcher.go:240 msg="Starting WAL watcher" component=remote remote_name=f9b5fa url=http://127.0.0.1:37131/ queue=f9b5fa
    reload_test.go:194: time=2025-11-06T15:44:01.793Z level=INFO source=metadata_watcher.go:90 msg="Starting scraped metadata watcher" component=remote remote_name=f9b5fa url=http://127.0.0.1:37131/
    reload_test.go:194: time=2025-11-06T15:44:01.794Z level=INFO source=watcher.go:292 msg="Replaying WAL" component=remote remote_name=f9b5fa url=http://127.0.0.1:37131/ queue=f9b5fa
    reload_test.go:194: time=2025-11-06T15:44:01.796Z level=INFO source=main.go:1550 msg="Completed loading of configuration file" db_storage=6.092µs remote_storage=9.948397ms web_handler=2.675µs query_engine=3.917µs scrape=2.495177ms scrape_sd=92.762µs notify=9.268µs notify_sd=6.813µs rules=5.43µs tracing=23.975µs filename=/tmp/TestRemoteWrite_ReshardingWithoutDeadlock4001292031/001/prometheus.yml totalDuration=13.603155ms
    reload_test.go:194: time=2025-11-06T15:44:01.796Z level=INFO source=main.go:1286 msg="Server is ready to receive web requests."
    reload_test.go:194: time=2025-11-06T15:44:01.796Z level=INFO source=manager.go:190 msg="Starting rule manager..." component="rule manager"
    reload_test.go:194: time=2025-11-06T15:44:06.957Z level=INFO source=watcher.go:538 msg="Done replaying WAL" component=remote remote_name=f9b5fa url=http://127.0.0.1:37131/ duration=5.16309351s
    main_test.go:1009: 
        	Error Trace:	/__w/prometheus/prometheus/cmd/prometheus/main_test.go:1009
        	Error:      	Condition never satisfied
        	Test:       	TestRemoteWrite_ReshardingWithoutDeadlock
FAIL
FAIL	github.com/prometheus/prometheus/cmd/prometheus	63.980s

@machine424 Seems like this is a new test added in #17412?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions