-
Notifications
You must be signed in to change notification settings - Fork 70
Closed
Description
The aggregations in the TUI are super helpful when creating initial deletion batches (all messages from one prolific sender/email list, for example.)
However, there's no way to see what has already been deleted: sortable columns in the TUI with "size on server" or "count on server" would be very helpful, so I could see what else should/could be deleted (or a global switch "only show what's on server" would work too.)
An aside: since the message parquet analytics have the time stamp delete_from_source_at, I was able to scan in the parquet files with polars & see these numbers on my own which is pretty cool! (The parquet files are only ~22MB so doing this with lazyframes wasn't really needed).
import polars as pl
import glob
particpants = pl.scan_parquet(f'{ANALYTICS_PATH}/participants/participants.parquet')
recipients = pl.scan_parquet(f'{ANALYTICS_PATH}/message_recipients/data.parquet')
messages = glob.glob(f'{ANALYTICS_PATH}/messages/*/*.parquet',recursive=True)
messages_df = pl.scan_parquet(messages)
out = (
(
recipients
.filter(pl.col('recipient_type').eq('from'))
.join(particpants, left_on='participant_id', right_on='id')
.join(messages_df, right_on='id', left_on='message_id', how='left')
.filter(pl.col('deleted_from_source_at').is_null())
.group_by(
'email_address',
)
.agg(pl.col('size_estimate').sum())
)
.sort('size_estimate', descending=True)
.collect()
)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels