Use _counts to speed up counts #215

simonw · 2021-01-02T22:30:17Z

Utility mechanism for taking advantage of the new _counts table from #212 would be nice.

These can trigger automatically if the _counts table exists, but since sqlite-utils needs to work against any existing database there should be a way of opting out of this optimization.

The text was updated successfully, but these errors were encountered:

simonw · 2021-01-02T23:52:52Z

Idea: a db.cached_counts() method that returns a dictionary of data from the _counts table. Call it with a list of tables to get back the counts for just those tables.

simonw · 2021-01-02T23:58:07Z

Thought: maybe there should be a .reset_counts() method too, for if the table gets out of date with the triggers.

One way that could happen is if a table is dropped and recreated - the counts in the _counts table would likely no longer match the number of rows in that table.

simonw · 2021-01-03T18:50:15Z

    def cached_counts(self, tables=None):
        sql = "select [table], count from {}".format(self._counts_table_name)
        if tables:
            sql += " where [table] in ({})".format(", ".join("?" for table in tables))
        return {r[0]: r[1] for r in self.execute(sql, tables).fetchall()}

simonw · 2021-01-03T18:53:05Z

Here's the current .count property:

sqlite-utils/sqlite_utils/db.py

Lines 597 to 609 in 036ec6d

    
           class Queryable: 
        
               def exists(self): 
        
                   return False 
        
               def __init__(self, db, name): 
        
                   self.db = db 
        
                   self.name = name 
        
               @property 
        
               def count(self): 
        
                   return self.db.execute( 
        
                       "select count(*) from [{}]".format(self.name) 
        
                   ).fetchone()[0]

It's implemented on Queryable which means it's available on both Table and View - the optimization doesn't make sense for views.

I'm a bit cautious about making that property so much more complex. In order to decide if it should try the _counts table first it needs to know:

Should it be trusting the counts? I'm thinking a .should_trust_counts property on Database which defaults to True would be good - then advanced users can turn that off if they know the counts should not be trusted.
Does the _counts table exist?
Are the triggers defined?

Then it can do the query, and if the query fails it can fall back on the count(*). That's quite a lot of extra activity though.

simonw · 2021-01-03T18:55:16Z

Alternative implementation: provided db.should_trust_counts is True, try running the query:

select count from _counts where [table] = ?

If the query fails to return a result OR throws an error because the table doesn't exist, run the count(*) query.

simonw · 2021-01-03T18:56:06Z

Another option: on creation of the Database() object, check to see if the _counts table exists and use that as the default for a use_counts_table property. Also flip that property to True if the user calls .enable_counts() at any time.

simonw · 2021-01-03T19:05:53Z

Idea: a .execute_count() method that never uses the cache.

simonw · 2021-01-03T19:31:33Z

I'm having second thoughts about this being the default behaviour. It's pretty weird. I feel like HUGE databases that need this are rare, so having it on by default doesn't make sense.

simonw · 2021-01-03T19:55:53Z

So if you instantiate the Database() constructor with use_counts_table=True any access to the .count properties will go through this table - otherwise regular count(*) queries will be executed.

Refs #206, #211, #212, #213, #214, #215, #216, #217, #218, #219

simonw added the enhancement label Jan 2, 2021

simonw mentioned this issue Jan 3, 2021

reset_counts() method and command #219

Closed

simonw closed this as completed in 94b5023 Jan 3, 2021

simonw added a commit that referenced this issue Jan 3, 2021

Release 3.2

4cc82fd

Refs #206, #211, #212, #213, #214, #215, #216, #217, #218, #219

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use _counts to speed up counts #215

Use _counts to speed up counts #215

simonw commented Jan 2, 2021

simonw commented Jan 2, 2021

simonw commented Jan 2, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021 •

edited

Loading

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021

Use _counts to speed up counts #215

Use _counts to speed up counts #215

Comments

simonw commented Jan 2, 2021

simonw commented Jan 2, 2021

simonw commented Jan 2, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021 • edited Loading

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021

simonw commented Jan 3, 2021 •

edited

Loading