Skip to content

Module initialization order seems to be wrong for querier #3550

@alvinlin123

Description

@alvinlin123

Describe the bug
We say queries failed with query-frontend returning message {"status":"error","errorType":"internal","error":"BlocksStoreQueryable is not running: Starting"}

This happens around the time we deploy querier. We are using Cortex at this commit: beddbff

Investigation

Looking at the code, it seems like querier is taking work from query-frontend before its self is fully initialized.

In querier's log, we see the following initialization log entry:

level=debug ts=2020-12-01T07:47:53.895516151Z caller=module_service.go:48 msg="module waiting for initialization" module=distributor-service waiting_for=ring
level=info ts=2020-12-01T07:47:53.895542059Z caller=module_service.go:58 msg=initialising module=runtime-config
level=info ts=2020-12-01T07:47:53.895605222Z caller=module_service.go:58 msg=initialising module=memberlist-kv
level=info ts=2020-12-01T07:47:53.89569197Z caller=module_service.go:58 msg=initialising module=server
level=debug ts=2020-12-01T07:47:53.895713041Z caller=module_service.go:48 msg="module waiting for initialization" module=store-queryable waiting_for=memberlist-kv
level=info ts=2020-12-01T07:47:53.895744095Z caller=module_service.go:58 msg=initialising module=store-queryable
level=debug ts=2020-12-01T07:47:53.895760711Z caller=module_service.go:48 msg="module waiting for initialization" module=ring waiting_for=runtime-config
level=debug ts=2020-12-01T07:47:53.895773926Z caller=module_service.go:48 msg="module waiting for initialization" module=ring waiting_for=memberlist-kv
level=info ts=2020-12-01T07:47:53.895784002Z caller=module_service.go:58 msg=initialising module=ring
level=info ts=2020-12-01T07:47:53.895778955Z caller=module_service.go:58 msg=initialising module=querier
level=info ts=2020-12-01T07:47:53.895947084Z caller=module_service.go:58 msg=initialising module=distributor-service
ts=2020-12-01T07:47:53.902402602Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.0.53.204:7946"

Compare to earlier version of Cortex (revision bbf01e830) the initialization log looks like

level=debug ts=2020-10-06T19:36:09.645276963Z caller=module_service.go:48 msg=module waiting for initialization module=querier waiting_for=memberlist-kv
level=info ts=2020-10-06T19:36:09.645303929Z caller=module_service.go:58 msg=initialising module=querier
level=info ts=2020-10-06T19:36:09.64380678Z caller=module_service.go:58 msg=initialising module=distributor-service
level=debug ts=2020-10-06T19:36:09.643885063Z caller=module_service.go:48 msg=module waiting for initialization module=querier waiting_for=ring
level=debug ts=2020-10-06T19:36:09.643910866Z caller=module_service.go:48 msg=module waiting for initialization module=querier waiting_for=store-queryable
level=info ts=2020-10-06T19:36:09.643412491Z caller=module_service.go:58 msg=initialising module=server
level=info ts=2020-10-06T19:36:09.643473123Z caller=module_service.go:58 msg=initialising module=runtime-config
level=debug ts=2020-10-06T19:36:09.643520867Z caller=module_service.go:48 msg=module waiting for initialization module=distributor-service waiting_for=ring
level=debug ts=2020-10-06T19:36:09.643542333Z caller=module_service.go:48 msg=module waiting for initialization module=store-queryable waiting_for=memberlist-kv
level=debug ts=2020-10-06T19:36:09.643560481Z caller=module_service.go:48 msg=module waiting for initialization module=querier waiting_for=distributor-service
level=info ts=2020-10-06T19:36:09.643576371Z caller=module_service.go:58 msg=initialising module=memberlist-kv
level=info ts=2020-10-06T19:36:09.643638605Z caller=module_service.go:58 msg=initialising module=store-queryable
level=debug ts=2020-10-06T19:36:09.643625475Z caller=module_service.go:48 msg=module waiting for initialization module=ring waiting_for=runtime-config
level=debug ts=2020-10-06T19:36:09.643676698Z caller=module_service.go:48 msg=module waiting for initialization module=ring waiting_for=memberlist-kv
level=info ts=2020-10-06T19:36:09.64368781Z caller=module_service.go:58 msg=initialising module=ring

Notice the log of earlier version of Cortex contains msg=module waiting for initialization module=querier waiting_for=store-queryable, but newer version of cortex does not. It looks like querier started to perform work without waiting for store-queryable to finish initialization?

Any idea what is going on?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions