Skip to content

iis-MarkKuang/reddit_mod_tool

Repository files navigation

Raid Shield

A Devvit app that detects coordinated raids and brigades on a subreddit using only local behavioral signals, escalates through a graduated 4-stage response ladder, and surfaces everything through a mod dashboard. Dry-run by default. Mods opt into enforcement one stage at a time.

Built for the Reddit Mod Tools and Migrated Apps Hackathon (Best New Mod Tool track).

Why now: In March 2026 Reddit disabled the auto-ban / "guilt-by-association" features in SaferBot and Hive-Protect, removing a key brigading defense for many large subs. Raid Shield rebuilds that protection using purely local signals on your own subreddit, so it sits on the right side of the new policy line.


What it does

Raid Shield runs four detectors over a rolling 5-minute window of activity on your subreddit:

  1. Velocity spike — posts/min and comments/min vs your sub's rolling 7-day baseline (z-score over median + MAD).
  2. Young-account ratio — share of active items from accounts under 30 days old, vs your baseline ratio.
  3. Text similarity cluster — 64-bit simhash over title + body tokens; items within Hamming distance ≤ 6 form clusters.
  4. Link / domain repetition — same normalized domain appearing across 3+ distinct authors.

Each signal produces a 0..1 score; a weighted combiner with a coincidence bonus turns them into a 0..100 threat score. The state machine maps score to stage with hysteresis so the system doesn't thrash on noisy bursts.

The response ladder

Stage Threshold Effect (when enforced)
0 — Normal < 35 No action.
1 — Alert ≥ 35 Modmail notification + dashboard incident card. Always on; never enforces by itself.
2 — Heightened monitoring ≥ 55 Sub-wide flag set; the realtime matcher widens (lower match threshold + looser hamming) so borderline items get caught. When enforce.stage2 is on, those matched items are held for review the same way Stage 3 would; otherwise they're audit-logged in dry-run. Mods are also asked to consider manual slow-mode.
3 — Hold matching items ≥ 70 New items matching the cluster signature are removed-and-held in modqueue with a mod note.
4 — Auto-remove ≥ 85 Same as Stage 3 plus a templated removal comment to the poster.

Dry-run by default. On install, all enforce flags are off. Stages 1+ still detect, alert, and log "would-have-done" entries to each incident's audit trail. You enable enforcement per stage from the dashboard.

Safety guardrails

  • Global kill-switch in the dashboard header pauses all enforcement instantly.
  • Hysteresis — promote needs N sustained ticks above threshold, demote needs M sustained ticks below.
  • Bounded actions per incident — hard cap (default 50) on auto-removals before forcing back to Stage 3.
  • Self-action exclusion — events authored by the app itself are ignored.
  • Mod-approval allowlist — when a moderator manually approves an item Raid Shield held, both the item ID and its near-duplicate signature are added to a 7-day allowlist so the responder won't re-act on the same item or its close variants. Manual mod removals are also logged to the active incident's audit notes.
  • Audit log — every action, dry-run or real, is appended to the incident's notes with timestamp + reasoning.
  • No cross-subreddit signals. Raid Shield never bans or filters based on another community's data. This is deliberate; it keeps the app inside the post-March-2026 policy.

Setup

Prerequisites

  • Node.js ≥ 22.2.0
  • A Reddit account with Devvit access enabled
  • A test subreddit you moderate (this repo uses the auto-created Devvit playtest sub r/raid_shield_dev)

Install and run

npm install
npm run login            # devvit login (one-time)
npm run dev              # devvit playtest, hot-reloads on file changes

To upload + publish:

npm run deploy           # type-check + lint + devvit upload
npm run launch           # deploy + devvit publish (after upload approval)

Set DEVVIT_SUBREDDIT env var or add dev.subreddit to devvit.json to target a default test sub.

What the install does

On onAppInstall the app:

  1. Writes the default config (killSwitch: false, all enforce stages off, conservative thresholds).
  2. Sends a modmail welcome explaining dry-run and how to enable enforcement.

The dashboard is created lazily the first time a mod clicks Open Raid Shield dashboard from the subreddit's mod menu. It's a custom post; pin it to the sub if you want it permanently visible.


Architecture

src/
  shared/                 # types + API shapes shared between client and server
  server/
    index.ts              # Hono entry: routes /api, /internal/menu, /internal/form, /internal/triggers, /internal/cron
    routes/
      api.ts              # GET /api/dashboard, POST /api/config, POST /api/kill-switch, POST /api/stage-action
      menu.ts             # mod-only menu items (open dashboard, open settings)
      triggers.ts         # onPostSubmit, onCommentSubmit, onModAction, onAppInstall/Upgrade
      forms.ts            # settings form submit
      cron.ts             # 1-min rescore, hourly baseline refresh, nightly cleanup
    core/
      detectors/          # velocity, youngAccounts, simhash, links, score
      incidents/          # state machine, transitions (pure), signature derive + match
      actions/            # responder, templates
      storage/            # keys, sliding-window helpers, item index, cleanup
      config/             # defaults + load/save
      util/               # tokens, fnv1a64, simhash hamming, domain extraction, time
      ingest.ts           # write-time pipeline shared by both trigger handlers
      post.ts             # createOrFindDashboardPost
  client/                 # React webview
    App.tsx               # dashboard composition + polling loop
    components/
      ThreatHeader.tsx
      EnforcementPanel.tsx
      ActiveIncidentCard.tsx
      RecentIncidents.tsx
      Spinner.tsx
    api.ts                # fetch wrappers
tests/
  detectors/              # unit tests per detector
  incidents/              # pure transition logic
  integration/            # end-to-end raid scenario in pure logic

Key invariants

  • All Redis keys are prefixed rs: and namespaced by subredditId so the app is safe to install on many subs.
  • The detector evaluator is the only writer to the active-incident key; the dashboard only reads + posts mod overrides.
  • decideTransition() in src/server/core/incidents/transitions.ts is pure — it takes (config, active, score, counters) and returns the next transition. Easy to unit-test, easy to reason about. The state machine wrapper is the only place that does Redis I/O.

Testing

npm test                  # vitest run (174 tests)
npm run type-check        # tsc --build (server + client + shared + tests)
npm run smoke             # type-check + tests + vite build (run this before deploy)

The unit tests cover each detector in isolation, every util (tokens, hash, domain, time, identity), the responder safety matrix (tests/actions/policy.test.ts walks every kill-switch × per-stage-enforce cell), modmail templates, and config merging. Four integration tests in tests/integration/ construct synthetic raids plus organic items and walk the full pipeline (score → state machine → close, the bound-cap clamp, demote-then-reopen scenarios, and the locked demo-recording payload) in pure logic.

npm run smoke is the pre-deploy gate: it runs the type-checker, the full test suite, and vite build so the same artifacts devvit.json expects (dist/client/, dist/server/index.cjs) are produced before upload. npm run deploy runs smoke before invoking devvit upload.

To exercise the real Reddit + Redis surfaces end-to-end, run npm run dev against your test sub and:

  1. Post several items from new alt accounts within a minute (use any external link to trigger the link signal). The repo ships npm run replay-raid (see docs/RECORDING.md §1.7) which posts the canonical 4-item raid payload from 3–4 throwaway accounts via Reddit's OAuth password grant — useful if you'll be doing this more than once.
  2. Watch the dashboard threat score climb.
  3. Wait for the 1-minute cron tick to evaluate and open an incident.
  4. Toggle Stage 3 enforce to ON and verify subsequent matching items go to modqueue.

For a fully scripted 60–90s demo recording (Devpost-quality video), follow docs/RECORDING.md. It includes a pre-flight checklist, a second-by-second beat sheet, the locked raid payload, post-production notes, and a troubleshooting table for the replay-raid helper.


Devpost submission copy

Reuse this content when filling out the Devpost form for the Best New Mod Tool track.

App listing: link to https://developers.reddit.com/apps/raid-shield once published

Reddit usernames: your reddit handle here

Tool Overview

Raid Shield is a moderation app for Reddit communities that detects coordinated raids and brigades in real time using only local behavioral signals on the host subreddit. Mods install it once; thereafter four detectors continuously evaluate a 5-minute rolling window: posting velocity vs a 7-day baseline (z-score over median + MAD), young-account ratio, near-duplicate text clusters (64-bit simhash + Hamming clustering), and repeated link domains across distinct authors. A weighted combiner produces a 0..100 threat score that maps onto a 4-stage response ladder — alert, heightened monitoring, hold-for-review, auto-remove — each independently toggleable between dry-run and enforcement. A custom-post mod dashboard shows live threat score, per-signal bars, the active incident's cluster signature, one-click stage controls, and a global kill-switch.

The system is dry-run on install. Every stage transition writes an audit-grade note to the incident log, including dry-run "would have done X" entries, so mods can see exactly what the app would do before flipping enforcement on. Stage advancement uses hysteresis (sustained ticks above threshold) to avoid thrashing; demotion uses the same logic in reverse. Auto-removals are hard-capped per incident (default 50) before the responder forces the stage back to hold-for-review.

Project Impact

I built Raid Shield specifically to fill the gap left by Reddit's March 2026 policy change that disabled auto-ban features in SaferBot and Hive-Protect. Three categories of community would benefit immediately:

  1. Large news / political subs that get brigade waves around major events (e.g. r/news, r/worldnews, r/politics). Time spent in modqueue during a raid drops from "drop everything for an hour" to "review the dashboard incident, approve enforcement, walk away."
  2. Niche subs targeted by ban-evading individuals operating multiple alts (e.g. r/AskHistorians, r/femaledatingstrategy, fan subs after creator drama). The young-account + text-cluster combination catches sock-puppet waves that AutoModerator's static rules miss.
  3. High-growth gaming / live-event subs that see organic spikes mixed with promo spam (e.g. r/livestreamfail, r/wallstreetbets during volatility). The MAD-based velocity baseline distinguishes organic spikes from coordinated ones, so mods get fewer false alarms than with simple rate-limit rules.

The dry-run default and one-click kill-switch mean a moderator team can install Raid Shield in observe-only mode on Day 1, watch it for a week to verify the alerts match their intuitions, and then enable enforcement stage-by-stage with confidence.

Devvit Helper Award nomination (optional)

If a community member helped you during the hackathon, put their u/name and reason here.

Developer Platform feedback (optional but eligible for Feedback prize)

Complete the developer satisfaction survey from the Devpost page.


Known limitations and roadmap

  • Devvit doesn't currently expose subreddit slow-mode programmatically. Stage 2 raises a heightened-monitoring flag and asks mods to enable slow-mode manually; if/when the API ships, swap the implementation in src/server/core/actions/responder.ts.
  • Author createdAt comes from reddit.getUserByUsername() on first sighting and is cached in Redis for 7 days; first hit on a brand-new account incurs one API round-trip.
  • Text similarity is deterministic (simhash) rather than embedding-based. A future v2 could route hard-to-classify clusters to an external LLM via Devvit's HTTP plugin for "is this hate speech / harassment?" classification, but that's deliberately out of v1 to keep the surface explainable and cheap.

Files of interest

License

BSD-3-Clause.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors