Skip to content

Use _counts to speed up counts #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simonw opened this issue Jan 2, 2021 · 9 comments
Closed

Use _counts to speed up counts #215

simonw opened this issue Jan 2, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Jan 2, 2021

Utility mechanism for taking advantage of the new _counts table from #212 would be nice.

These can trigger automatically if the _counts table exists, but since sqlite-utils needs to work against any existing database there should be a way of opting out of this optimization.

@simonw simonw added the enhancement New feature or request label Jan 2, 2021
@simonw
Copy link
Owner Author

simonw commented Jan 2, 2021

Idea: a db.cached_counts() method that returns a dictionary of data from the _counts table. Call it with a list of tables to get back the counts for just those tables.

@simonw
Copy link
Owner Author

simonw commented Jan 2, 2021

Thought: maybe there should be a .reset_counts() method too, for if the table gets out of date with the triggers.

One way that could happen is if a table is dropped and recreated - the counts in the _counts table would likely no longer match the number of rows in that table.

@simonw
Copy link
Owner Author

simonw commented Jan 3, 2021

    def cached_counts(self, tables=None):
        sql = "select [table], count from {}".format(self._counts_table_name)
        if tables:
            sql += " where [table] in ({})".format(", ".join("?" for table in tables))
        return {r[0]: r[1] for r in self.execute(sql, tables).fetchall()}

@simonw
Copy link
Owner Author

simonw commented Jan 3, 2021

Here's the current .count property:

class Queryable:
def exists(self):
return False
def __init__(self, db, name):
self.db = db
self.name = name
@property
def count(self):
return self.db.execute(
"select count(*) from [{}]".format(self.name)
).fetchone()[0]

It's implemented on Queryable which means it's available on both Table and View - the optimization doesn't make sense for views.

I'm a bit cautious about making that property so much more complex. In order to decide if it should try the _counts table first it needs to know:

  • Should it be trusting the counts? I'm thinking a .should_trust_counts property on Database which defaults to True would be good - then advanced users can turn that off if they know the counts should not be trusted.
  • Does the _counts table exist?
  • Are the triggers defined?

Then it can do the query, and if the query fails it can fall back on the count(*). That's quite a lot of extra activity though.

@simonw
Copy link
Owner Author

simonw commented Jan 3, 2021

Alternative implementation: provided db.should_trust_counts is True, try running the query:

select count from _counts where [table] = ?

If the query fails to return a result OR throws an error because the table doesn't exist, run the count(*) query.

@simonw
Copy link
Owner Author

simonw commented Jan 3, 2021

Another option: on creation of the Database() object, check to see if the _counts table exists and use that as the default for a use_counts_table property. Also flip that property to True if the user calls .enable_counts() at any time.

@simonw
Copy link
Owner Author

simonw commented Jan 3, 2021

Idea: a .execute_count() method that never uses the cache.

@simonw
Copy link
Owner Author

simonw commented Jan 3, 2021

I'm having second thoughts about this being the default behaviour. It's pretty weird. I feel like HUGE databases that need this are rare, so having it on by default doesn't make sense.

@simonw
Copy link
Owner Author

simonw commented Jan 3, 2021

So if you instantiate the Database() constructor with use_counts_table=True any access to the .count properties will go through this table - otherwise regular count(*) queries will be executed.

@simonw simonw closed this as completed in 94b5023 Jan 3, 2021
simonw added a commit that referenced this issue Jan 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant