Skip to content

Conversation

@avamingli
Copy link
Contributor

@avamingli avamingli commented Oct 18, 2024

We already have the ability to track the data status for some materialized views, aware whether its data is up to date or not.
And we could avoid doing the real REFRESH if the data of view is up to date.

The no-refreshed data should be the logically same as after a real REFRESH when there is no data changed since latest REFRESH command.
In that case we may save a lot (read data from view query, compute and write into view table), ex: a cron task REFRESH view takes a long time and much resource periodically or executed manually by users each time.

New GUC: gp_enable_refresh_fast_path

Set this feature default to true, but let users decide if they intend to do a real REFRESH.

Performance

If the fast path is chosen, we always return immediately and almost do nothing.
And the cost we save depends on the amount of data, the resource we use to compute and the time we read and write back to view table.

insert into t1 select i from generate_series(1, 100000000) i;
create materialized view mv2 as select * from t1 where a > 1 with no data;
refresh materialized view mv2;
REFRESH MATERIALIZED VIEW
Time: 194061.961 ms (03:14.062)

set gp_enable_refresh_fast_path = on;

refresh materialized view mv2;
REFRESH MATERIALIZED VIEW
Time: 4.617 ms

Authored-by: Zhang Mingli [email protected]

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
  • Learn the coding contribution guide, including our code conventions, workflow and more.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

We already have the ability to track the data status for
some materialized views, aware whether its data is
up to date or not.
And we could avoid doing the real REFRESH if the data of
view is up to date.

The no-refreshed data should be the logically same as
after a real REFRESH when there is no data changed since
latest REFRESH command.
In that case we may save a lot (read data from view query,
compute and write into view table), ex: a cron task REFRESH
view takes a long time and much resource periodically or
executed manually by users each time.

New GUC: gp_enable_refresh_fast_path

Set this feature default to true, but let users decide if
they intend to do a real REFRESH.

Authored-by: Zhang Mingli [email protected]
@yjhjstz
Copy link
Member

yjhjstz commented Oct 23, 2024

LGTM

@my-ship-it my-ship-it merged commit 5633fbb into apache:main Oct 23, 2024
@avamingli avamingli deleted the refresh_mv_fastpath branch October 23, 2024 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants