Skip to content

Feature request: TUI show sizes with/without what was deleted. #99

@astrowonk

Description

@astrowonk

The aggregations in the TUI are super helpful when creating initial deletion batches (all messages from one prolific sender/email list, for example.)

However, there's no way to see what has already been deleted: sortable columns in the TUI with "size on server" or "count on server" would be very helpful, so I could see what else should/could be deleted (or a global switch "only show what's on server" would work too.)

An aside: since the message parquet analytics have the time stamp delete_from_source_at, I was able to scan in the parquet files with polars & see these numbers on my own which is pretty cool! (The parquet files are only ~22MB so doing this with lazyframes wasn't really needed).

import polars as pl
import glob 

particpants = pl.scan_parquet(f'{ANALYTICS_PATH}/participants/participants.parquet')
recipients = pl.scan_parquet(f'{ANALYTICS_PATH}/message_recipients/data.parquet')
messages = glob.glob(f'{ANALYTICS_PATH}/messages/*/*.parquet',recursive=True)
messages_df = pl.scan_parquet(messages)

out = (
    (
        recipients
        .filter(pl.col('recipient_type').eq('from'))
        .join(particpants, left_on='participant_id', right_on='id')
        .join(messages_df, right_on='id', left_on='message_id', how='left')
        .filter(pl.col('deleted_from_source_at').is_null())
        .group_by(
            'email_address',
        )
        .agg(pl.col('size_estimate').sum())
    )
    
    .sort('size_estimate', descending=True)
    .collect()
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions