Skip to content

Rspamd: more features#3159

Merged
georglauterbach merged 9 commits intomasterfrom
rspamd/more-features
Mar 18, 2023
Merged

Rspamd: more features#3159
georglauterbach merged 9 commits intomasterfrom
rspamd/more-features

Conversation

@georglauterbach
Copy link
Copy Markdown
Member

@georglauterbach georglauterbach commented Mar 5, 2023

Description

Add new features to our Rspamd integration:

  1. With MOVE_SPAM_TO_JUNK=1, we will now move messages marked with the X-Spam: Yes header to the "Junk" folder
  2. With RSPAMD_LEARN=1, we enable
    1. auto-learning for the Bayes filter
    2. when a user moves messages from or to the "Junk" folder, Rspamd will learn them as ham or spam (for the Bayes filter as well)

I tried to separate commits as cleanly as possible to make reviews easier.

Closes #3145
Closes #3156

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (README.md or the documentation under docs/)
  • If necessary I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@georglauterbach georglauterbach added kind/new feature A new feature is requested in this issue or implemeted with this PR service/security/rspamd labels Mar 5, 2023
@georglauterbach georglauterbach added this to the v12.1.0 milestone Mar 5, 2023
@georglauterbach georglauterbach self-assigned this Mar 5, 2023
@georglauterbach georglauterbach added the meta/feature freeze On hold due to upcoming release process label Mar 5, 2023
@georglauterbach georglauterbach marked this pull request as draft March 5, 2023 21:25
@georglauterbach
Copy link
Copy Markdown
Member Author

@hanscees @alexanderadam you could really help me here: do you know how to trigger Sieve locally by moving a mail into the "Junk" folder with a command executed inside the container? I want to complete the last test for RSPAMD_LEARN.

@alexanderadam
Copy link
Copy Markdown

do you know how to trigger Sieve locally by moving a mail into the "Junk" folder with a command executed inside the container?

Sadly I don't 😬
But I would guess copying a SPAM mail into the new folder maybe?
Sorry, but I'm not very good with this email infrastructure thing.

@hanscees
Copy link
Copy Markdown
Contributor

hanscees commented Mar 6, 2023

I have to think about that. Will come back to it this evening.
You basically have to log in as imap user. Then do a move from folder a to folder b.

@georglauterbach
Copy link
Copy Markdown
Member Author

I have to think about that. Will come back to it this evening.

Thanks!

You basically have to log in as imap user. Then do a move from folder a to folder b.

Indeed. I don't know how to do that actually, I have never done that before.

@georglauterbach
Copy link
Copy Markdown
Member Author

@polarathene @casperklein would you be fine if this PR got included into v12.0.0? I marked it for v12.1.0, but if you're okay with it, I'd take care of this ASAP so we can merge this for v12.0.0.

@casperklein
Copy link
Copy Markdown
Member

@polarathene @casperklein would you be fine if this PR got included into v12.0.0? I marked it for v12.1.0, but if you're okay with it, I'd take care of this ASAP so we can merge this for v12.0.0.

Why shouldn't it be fine to include it in v12? I see no reason against it.

Comment thread target/scripts/startup/setup.d/security/rspamd.sh Outdated
Comment thread target/scripts/startup/setup.d/security/rspamd.sh Outdated
@hanscees
Copy link
Copy Markdown
Contributor

hanscees commented Mar 6, 2023

I have to think about that. Will come back to it this evening.

Thanks!

You basically have to log in as imap user. Then do a move from folder a to folder b.

Indeed. I don't know how to do that actually, I have never done that before.

Hi,
I am not familiar with dockers testing, but I assume you can spin up an image and run commands in it.

A test that simulates an email copy from commandline to dovecot is possible, but wouldnt recommend it since its very complex. However, the notes to do copy's with imap are here: see https://www.atmail.com/blog/advanced-imap/
You would have to make a user. Send email in a folder and then move it. Then look in the logging of dovecot.
And you would need some sort of rather large expect script. Perhaps these do exists, don't know.

Perhaps more manageable is to run and parse the command doveconf and look for the plugin section and check a couple of things :

doveconf
usable things 
plugin {
  imapsieve_mailbox1_before = file:/etc/dovecot/sieve/learn-spam.sieve
  imapsieve_mailbox1_causes = COPY
  imapsieve_mailbox1_name = INBOX.Junk
  imapsieve_mailbox2_before = file:/etc/dovecot/sieve/learn-ham.sieve
  imapsieve_mailbox2_causes = COPY
  imapsieve_mailbox2_from = INBOX.Junk
  imapsieve_mailbox2_name = *
  sieve = ~/.dovecot.sieve
  sieve_after = /usr/lib/dovecot/sieve-global/after/
  sieve_before = /usr/lib/dovecot/sieve-global/before/
  sieve_dir = ~/sieve
  sieve_extensions = +notify +imapflags +vnd.dovecot.pipe +vnd.dovecot.filter
  sieve_filter_bin_dir = /usr/lib/dovecot/sieve-filter
  sieve_global_extensions = +vnd.dovecot.pipe
  sieve_pipe_bin_dir = /etc/dovecot/sieve
  sieve_plugins = sieve_imapsieve sieve_extprograms
  }

By the way a similar command for rspamd is rspamd configdump

Does this help a bit?

@georglauterbach
Copy link
Copy Markdown
Member Author

I am not familiar with dockers testing, but I assume you can spin up an image and run commands in it.

That's what we do :)

the notes to do copy's with imap are here: see https://www.atmail.com/blog/advanced-imap/

Thanks for the link, will have a look.

You would have to make a user. Send email in a folder and then move it. Then look in the logging of dovecot.

The Rspamd test sends 4 emails actually: 1 normal, 1 Gtube rejection, one virus rejection, and one that Rspamds adds a header to.

The part that moves the mail does not trigger Sieve for me though.

And you would need some sort of rather large expect script. Perhaps these do exists, don't know.

Don't worry, our test suite (BATS) plus our helpers are rather comprehensive.

Perhaps more manageable is to run and parse the command doveconf and look for the plugin section and check a couple of things :

Test is already doing that.

Does this help a bit?

To will have a look at the link you provided. The rest is already all implemented :)

@georglauterbach
Copy link
Copy Markdown
Member Author

Why shouldn't it be fine to include it in v12? I see no reason against it.

👍🏼

@georglauterbach georglauterbach modified the milestones: v12.1.0, v12.0.0 Mar 6, 2023
@georglauterbach georglauterbach removed the meta/feature freeze On hold due to upcoming release process label Mar 6, 2023
@georglauterbach
Copy link
Copy Markdown
Member Author

@casperklein @polarathene you can already go ahead and review changes. The only additions I will try to implement is for the last test to be more comprehensive.

@georglauterbach georglauterbach marked this pull request as ready for review March 6, 2023 22:33
@polarathene
Copy link
Copy Markdown
Member

I marked it for v12.1.0, but if you're okay with it, I'd take care of this ASAP so we can merge this for v12.0.0.

This is probably going to be low on my priority to get to. Or if @casperklein reviews before me and approves, that's fine to merge without waiting on my review input 👍

I am probably going to need a day or so before I could make time for it so by all means if that suits you go for it.

@casperklein
Copy link
Copy Markdown
Member

casperklein commented Mar 6, 2023

Generally, "move_spam_to_junk" isn't a spamassassin/rspamd specific thing.

I think there is no need for a separate RSPAMD_MOVE_SPAM_TO_JUNK ENV / function.

# sieve will move spams to .Junk folder when SPAMASSASSIN_SPAM_TO_INBOX=1 and MOVE_SPAM_TO_JUNK=1
if [[ ${SPAMASSASSIN_SPAM_TO_INBOX} -eq 1 ]] && [[ ${MOVE_SPAM_TO_JUNK} -eq 1 ]]
then
_log 'debug' 'Spam messages will be moved to the Junk folder'
cp /etc/dovecot/sieve/before/60-spam.sieve /usr/lib/dovecot/sieve-global/before/
sievec /usr/lib/dovecot/sieve-global/before/60-spam.sieve
else
rm -f /usr/lib/dovecot/sieve-global/before/60-spam.sieve /usr/lib/dovecot/sieve-global/before/60-spam.svbin
fi

Just remove the ${SPAMASSASSIN_SPAM_TO_INBOX} -eq 1 condition and MOVE_SPAM_TO_JUNK can be used universally.

PS: Your (rspamd) sieve solution

require ["fileinto","mailbox"];
if header :contains "X-Spam" "Yes" {
    fileinto :create "Junk";
}

looks a bit more advanced, than what currently is used:

#target/dovecot/sieve/before/60-spam.sieve:

require "fileinto";
if header :contains "X-Spam-Flag" "YES" {
    fileinto "Junk";
}

I think this can be updated safely 👍

@georglauterbach
Copy link
Copy Markdown
Member Author

@casperklein you are right! I went ahead and implemented your approach, then rewrote history to have cleanly separated commits again; suggestions are incorporated :)

I will force-push my changes now.

@georglauterbach
Copy link
Copy Markdown
Member Author

FYI: I'd like to merge this at the very latest on Friday, 12 o'clock UTC+1 because thereafter I will be unavailable to merge/adjust for 2 weeks straight.

@polarathene
Copy link
Copy Markdown
Member

Friday, 12 o'clock UTC+1

12 hours to go 😅

I'll try to review before then, but if another maintainer approves that's fine by me too for merging.

polarathene
polarathene previously approved these changes Mar 10, 2023
Copy link
Copy Markdown
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light review as short on time and Dovecot sieve + rspamd aren't anything I am too knowledgeable about.

I've left some notes, but if they're not a concern in your opinion that's fine, go ahead and merge 👍 Happy vacation! 😀

Comment thread target/scripts/startup/setup.d/dovecot.sh
Comment thread target/scripts/startup/setup.d/dovecot.sh Outdated
Comment thread target/scripts/startup/setup.d/security/misc.sh Outdated
Comment thread target/scripts/startup/setup.d/security/misc.sh Outdated
Comment thread target/scripts/startup/setup.d/dovecot.sh
@casperklein
Copy link
Copy Markdown
Member

FYI: I'd like to merge this at the very latest on Friday, 12 o'clock UTC+1 because thereafter I will be unavailable to merge/adjust for 2 weeks straight.

I wasn't able to review yesterday :-\ Tonight/tomorrow I can review.

@georglauterbach
Copy link
Copy Markdown
Member Author

georglauterbach commented Mar 10, 2023

I wasn't able to review yesterday :-\ Tonight/tomorrow I can review.

I won't be able to properly respond to feedback then; I got about 3 hours left for that, as I said.

@georglauterbach
Copy link
Copy Markdown
Member Author

georglauterbach commented Mar 10, 2023

I was able to feedback from @polarathene now; I hope you like the PR @casperklein :)

Happy vacation! 😀

Thank's a lot! :D

Copy link
Copy Markdown
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we merge this after your feedback @casperklein as @georglauterbach is no longer able to respond for 2 weeks.

Feel free to merge after your review or make any changes 👍

Comment on lines +273 to +280
cat >/usr/lib/dovecot/sieve-global/after/spam_to_junk.sieve << EOF
require ["fileinto","mailbox"];

if anyof (header :contains "X-Spam-Flag" "YES",
header :contains "X-Spam" "Yes") {
fileinto "Junk";
}
EOF
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly not a fan of EOF like this vs separate files, but I've said this before with cron jobs and looked the other way in the PR where this was also done 😅

Separate config files have better observability in my opinion.

  • Far more visible in the file tree as distinct content from scripts. Also beneficial for tasks like grep on content that isn't .sh when searching / filtering the entire project for something, less noise to wade through.
  • git blame far more useful as they don't tend to go through as much churn as scripts have been (each year it seems they get refactored heavily).

I don't think there's much gain from inlining content like this, but I know it's preferred by you and casper 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're not a fan, but from my perspective, the gains (reduced complexity in the file tree, Dockerfile and removal of the uncanny rm in the else branch) outweigh the disadvantages in this case for such a small file :)

With larger files, I agree with you!

Copy link
Copy Markdown
Member

@casperklein casperklein Mar 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much gain from inlining content like this, but I know it's preferred by you and casper 😅

It depends on 😆 For static content, I generally prefer plain files. For small snippets (with variables), a heredoc may better suited.
If there is consent, we can in a separate PR move some (not only rspam related) heredocs to files.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is consent, we can in a separate PR move some (not only rspam related) heredocs to files.

I can live with it 😅

There's some merit for the co-location to that extent perhaps, but those less familiar where everything is and wanting to contribute or troubleshoot something may first expect files in the container that we add to exist as files in our project, rather than embedded into scripts 🤷‍♂️ (I know this was a bit of a pain-point for me with the cron jobs, which I think existed in several locations in the container, some from packages beyond what DMS added)

It probably won't make much of a difference to getting new maintainers onboard though, so if it's easier for those actively maintaining the project to leverage heredocs, go for it 😎

@georglauterbach
Copy link
Copy Markdown
Member Author

I suppose we merge this after your feedback @casperklein as @georglauterbach is no longer able to respond for 2 weeks.

Very quick feedback I can give, will be on GH for 1 minute a day :) Flight hasn't yet started :D I just can't edit or adjust anything

Feel free to merge after your review or make any changes 👍

Yes - go ahead! :)

@georglauterbach
Copy link
Copy Markdown
Member Author

FYI: I'd like to merge this at the very latest on Friday, 12 o'clock UTC+1 because thereafter I will be unavailable to merge/adjust for 2 weeks straight.

I wasn't able to review yesterday :-\ Tonight/tomorrow I can review.

ping :)

@github-actions
Copy link
Copy Markdown
Contributor

Documentation preview for this PR is ready! 🎉

Built with commit: 3ab664b

@casperklein
Copy link
Copy Markdown
Member

I wasn't able to review yesterday :-\ Tonight/tomorrow I can review.

ping :)

  1. Enjoy your vacation and don't hang around at github 😝
  2. Sorry for the delay again, I am currently out of order.. 🤒

Copy link
Copy Markdown
Member

@casperklein casperklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally 🎉

Comment on lines +273 to +280
cat >/usr/lib/dovecot/sieve-global/after/spam_to_junk.sieve << EOF
require ["fileinto","mailbox"];

if anyof (header :contains "X-Spam-Flag" "YES",
header :contains "X-Spam" "Yes") {
fileinto "Junk";
}
EOF
Copy link
Copy Markdown
Member

@casperklein casperklein Mar 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much gain from inlining content like this, but I know it's preferred by you and casper 😅

It depends on 😆 For static content, I generally prefer plain files. For small snippets (with variables), a heredoc may better suited.
If there is consent, we can in a separate PR move some (not only rspam related) heredocs to files.

@georglauterbach georglauterbach merged commit e58dd1b into master Mar 18, 2023
@georglauterbach georglauterbach deleted the rspamd/more-features branch March 18, 2023 15:32
@HeySora
Copy link
Copy Markdown
Contributor

HeySora commented May 1, 2023

@georglauterbach Commit b918de1 introduced a breaking change:

The sieve rules are now global "after" rules; this makes more sense than "before" rules..

Having the rules as "before" rules allowed users to manually "whitelist" a sender whose e-mails could be wrongly flagged as spam (with a fileinto to the inbox), since the user rules were parsed after.
With this change, this is not possible anymore, as this sieve rule will now override any user filter, thus making user whitelists impossible.

May I ask what was the reasoning behind this change? I'm asking because I might be missing the point. Thanks in advance!

@georglauterbach
Copy link
Copy Markdown
Member Author

May I ask what was the reasoning behind this change? I'm asking because I might be missing the point. Thanks in advance!

Truth be told, I'm just not a Sieve expert 😅 I didn't know that this could be breaking.

Having the rules as "before" rules allowed users to manually "whitelist" a sender whose e-mails could be wrongly flagged as spam (with a fileinto to the inbox), since the user rules were parsed after.

With this change, this is not possible anymore, as this sieve rule will now override any user filter, thus making user whitelists impossible.

So if I understand you correctly, it's not really about the before/after, but about the global/user order? I will look into this again; if you could help me a bit and provide information about your setup so I can see where your configuration files are, that'd help a lot! Thanks :)

@HeySora
Copy link
Copy Markdown
Contributor

HeySora commented May 2, 2023

Truth be told, I'm just not a Sieve expert 😅 I didn't know that this could be breaking.

It's alright 😄 There are so many things that can go wrong, and collaborating helps figuring all of this out!

So if I understand you correctly, it's not really about the before/after, but about the global/user order?

This docker image exposes ManageSieve on the port 4190, which is used to let each user set up their own filters.
The order of Sieve filters are: global "before", then user, then global "after".

I manage a mailserver where a few users have genuine e-mails getting wrongly caught by SpamAssassin. As a result, some users have custom rules where if an e-mail comes from a trusted e-mail address, to fileinto "INBOX"; so it doesn't get to spam.
As far as I'm aware, this is the only way users can "whitelist" specific e-mail senders without the help of the administrator.

I'm not a Sieve expert either, but from my understanding, the last fileinto directive is the one that takes predecence over all the others. Your commit moved the fileinto "Junk"; directive into sieve-global/after/, which will override any user filter, thus preventing people from doing that "whitelisting" again.

User filters are located in docker-data/dms/mail-data/DOMAIN/USERNAME/sieve/*.sieve, and are documented here, on the official documentation of this project.

Hopefully that makes sense!


In regards to my specific setup, I'm providing SnappyMail (active fork of Rainloop) as our webmail, which lets users interface with ManageSieve and make their own filters.

One user in particular has filters like this in place:

if header :contains ["From"] "twitch"
{
    fileinto "INBOX.Twitch";
}

if header :contains ["From"] "bandcamp"
{
    fileinto "INBOX.Bandcamp";
}

However, with your global "after" filter, spam mails would still be filed into the "Junk" folder, with no way of overriding it.

@georglauterbach
Copy link
Copy Markdown
Member Author

I genuinly didn't think the last fileinto directive is the one that is used... now all makes sense. I can try to fix this up and provide a PR.

Thanks a lot for the explanation!

@HeySora
Copy link
Copy Markdown
Contributor

HeySora commented May 2, 2023

.....Actually I think I just got hit with the syndrome of "asking for help / telling someone something, to then find an answer by yourself" 😅 Apologies!

It might be worth trying to keep the Junk filter as an "after" filter just like you did, and attempt using stop; from user filters to see if that would work? That might actually be the right call here, instead of what the particular user I mentioned was doing.

I'm just not sure if a stop; only halts the execution of the current (here, user) script, or if it halts everything, including "after" filters.

I'll go test this ASAP

@georglauterbach
Copy link
Copy Markdown
Member Author

georglauterbach commented May 2, 2023

Haha, I know this "syndrom" very well :D

What I found about your question: https://dovecot.org/list/dovecot/2018-March/111384.html so it seems to be possible.

I am uncertain which route to go down. Your idea using stop; sounds reasonable to me though.

I will adjust the release notes though to make users aware of this unintended change. It is, after all, somehow breaking...

@georglauterbach
Copy link
Copy Markdown
Member Author

Please reach back to me whether you think the stop; is the right approach or whether we should return to the old way. I do prefer the stop; (disabling implicit keep) though, as it seems to be the cleaner solution to me.

@HeySora
Copy link
Copy Markdown
Contributor

HeySora commented May 11, 2023

Please reach back to me whether you think the stop; is the right approach or whether we should return to the old way. I do prefer the stop; (disabling implicit keep) though, as it seems to be the cleaner solution to me.

Hi @georglauterbach, I can now confirm after a while that this has been working flawlessly! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/features kind/new feature A new feature is requested in this issue or implemeted with this PR service/security/rspamd

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[FR] Rspamd: make it possible to move e-mails to Junk automatically [FR] Rspamd: learn from moving from/to junk folder

6 participants