userguide:manual_training_of_the_bayes_filter_rspamd

Version and revision: V1.0
For: Nethserver Administrators
Skill: Beginner

Published: 2021-02-21
Review: tbd

Contact: @capote

Manual training of the bayes filter (rspamd)

For various reasons it may be necessary to manually train the bayes filter of rspamd:

  • the database is corrupt
  • fast learning of freshly implemented systems to shorten the learning phase
  • reinitialize the database because users have made too many classification mistakes

This user guide describes how to manually train the bayes flter using collected sample spam mails.

System preparation

  • You need ssh access to your system or use the web app “Terminal” in Cockpit.
  • You need an unpack program to unpack 7z-files
  • How to install 7z-Unpack program:
     ~# yum install p7zip

This will install the 7zip program. Please note that the command to call the utility is not 7zip or p7zip, but 7za. Check following articles to get started with 7zip:

Extract .7z File in Linux
Create .7z File in Linux
Create .7z File From Folder Recursively in Linux

Use Case 1: manual training for a fresh installed system

  • login to your system
  • download Spam-Samples from http://untroubled.org/spam/
  • Example:
    ~# wget http://untroubled.org/spam/2021-01.7z
  • unpack samples:
    ~# 7za x 2021-01.7z
  • check the current number of learned samples
    ~# rspamc stat
  • remember the line total learns: 0.
  • train the filter with the downloaded samples:
    ~# rspamc learn_spam 2021/*
  • check the current number of learned samples again and compare it: The number of total learns should increase

Use Case 2: manual training for a resetting system

  • backup the rspamd-DB:
# It is better to stop Redis before you copy the file.
cp /var/lib/redis/rspamd/dump.rdb /var/lib/redis/rspamd/dump.rdb_bak_jjmmtt
  • reset the bayes data (Source: Wiki)
redis-cli -s /var/run/redis-rspamd/rspamd --scan --pattern BAYES_* | xargs redis-cli -s /var/run/redis-rspamd/rspamd del
redis-cli -s /var/run/redis-rspamd/rspamd --scan --pattern RS* | xargs redis-cli -s /var/run/redis-rspamd/rspamd del
  • train the bayes filter like in Use Case 1

Usefull commands for deeper investigation

Sometimes more in-depth information is helpful, especially when support is needed.

  • current state of the service
~# systemctl status redis redis-rspamd
● redis.service - Redis persistent key-value database
   Loaded: loaded (/usr/lib/systemd/system/redis.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/redis.service.d
           └─limit.conf
   Active: active (running) since Mon 2021-02-22 18:20:20 CET; 9s ago
 Main PID: 31539 (redis-server)
   CGroup: /system.slice/redis.service
           └─31539 /usr/bin/redis-server 127.0.0.1:6379

Feb 22 18:20:20 ns-srv01.dargels.de systemd[1]: Starting Redis persistent key-value database...
Feb 22 18:20:20 ns-srv01.dargels.de systemd[1]: Started Redis persistent key-value database.

● redis-rspamd.service - Redis persistent key-value database Rspamd
   Loaded: loaded (/usr/lib/systemd/system/redis-rspamd.service; static; vendor preset: disabled)
   Active: active (running) since Mon 2021-02-22 12:30:09 CET; 5h 50min ago
 Main PID: 737 (redis-server)
   CGroup: /system.slice/redis-rspamd.service
           └─737 /usr/bin/redis-server 127.0.0.1:0

Feb 22 12:30:09 ns-srv01.dargels.de systemd[1]: Started Redis persistent key-value database Rspamd.
  • example for an inactive services
~# systemctl status redis
● redis.service - Redis persistent key-value database
   Loaded: loaded (/usr/lib/systemd/system/redis.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/redis.service.d
           └─limit.conf
   Active: inactive (dead)
  • functionality of the the rspamd service
~# redis-cli -s /var/run/redis-rspamd/rspamd --scan --pattern BAYES_*
BAYES_SPAM_keys
BAYES_HAM_keys
  • monitor rspamd
~# redis-cli -s /var/run/redis-rspamd/rspamd monitor
OK

bibliography

  • userguide/manual_training_of_the_bayes_filter_rspamd.txt
  • Last modified: 2021/02/22 17:48
  • by Marko Dargel