-
-
Notifications
You must be signed in to change notification settings - Fork 117
sqlite-utils analyze-tables command and table.analyze_column() method #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…st_common as null
* Record total_rows for each column * Record (value, count) if there is just a single distinct value * Do not calculate most/least common if all values are distinct * Calculate table count once per table, not once per column
Should truncate values in the least/most common JSON array to a sensible length, otherwise you end up with stuff like this: [
[
"b'\\x00\\x05barry\\x03\\x01\\x02\\x00\\x00\\x03cat\\x03\\x01\\x03\\x00\\x00\\x03dog\\x08\\x01\\x01\\x01\\x03\\x00\\x01\\x03\\x00\\x00\\x07panther\\x05\\x01\\x01\\x02\\x02\\x00\\x01\\x03uma\\x05\\x02\\x01\\x02\\x02\\x00\\x00\\x04sara\\x05\\x02\\x01\\x01\\x02\\x00\\x00\\x05terry\\x08\\x01\\x01\\x01\\x02\\x00\\x01\\x02\\x00\\x00\\x06weasel\\x05\\x02\\x01\\x01\\x03\\x00'",
1
]
] This example also shows that binary values (like those in |
CLI output looks like this at the moment, which is bad:
|
If there are less than ten values is it worth outputting them twice, once in |
It would be neat if you could optionally specify a subset of columns to analyze, using |
|
Example output:
|
Refs #207
-c
column selection option