Well, this is a solution for you!
TopicRadar's purpose is to help moderators manage duplicate-topic posts on their subreddit, automatically detecting when new posts cover a topic that has already been posted recently, and (optionally) reporting, filtering, or removing the duplicate based on moderator-configured rules. It comes with a web view post to show users which topics are active and easily search To see if what they are going to post would violate the rules, Therefore, there can never be any dispute about if it's a duplicate or not.
Duplicate-topic spam is one of the most common moderation pains for active subreddits. Even with automod rules and post flair, recurring topics (outages, drama, the same question over and over) sneak through because they're worded differently each time. TopicRadar groups posts by meaning, not by exact wording or matching keywords, so it can catch duplicates that traditional tooling misses.
text-embedding-3-small) to group posts that are about the same thing even when worded differently. This isn't just AI matching using a standard LLM. It uses a sophisticated algorithm which properly matches duplicates efficiently. And without the need for moderators to input their own OpenAI keys, therefore making this app completely free! The full technical spec is down below For insights on how the mathematics work in the algorithm.None (tracking only), Report (send to mod queue), Filter (temp-remove for review), or Remove (full removal with a sticky removal comment + post lock).Once a cluster reaches 3, 6, or 12 posts, TopicRadar sends up to 12 of its post titles to GPT-4o-mini and asks for a short noun-phrase title (2–6 words) and a one-sentence description. These get stored on the cluster and surfaced in the UI. This is purely cosmetic, it's what users see in the "frequently posted" tab instead of just the first poster's wording.
+ Add to this community from the App page (if not already here).When TopicRadar is first installed, it can't action duplicates against an empty cluster store, every post would look new. To avoid this cold-start gap, the install trigger schedules a backfill job that pulls the most recent 600 text posts in the subreddit and runs them through the same detection pipeline (with no actions taken). Each backfilled post increments a backfillSeedCount on its cluster.
When the live detection then decides whether to action a new duplicate, it uses effective post count:
effectivePostCount = topicPostCount - backfillSeedCount
So a cluster seeded from 4 backfilled posts will only start actioning duplicates once 4 live posts have joined it (assuming the default min-duplicates-before-action of 3 - that would be the 4th live post that gets actioned). This prevents TopicRadar from immediately nuking the first post about a topic that just happens to have been popular in the past month.
This section is for anyone curious about the algorithm. Skip it if you just want to use the app.
Duplicate detection by exact-string match is useless on Reddit because every poster phrases the same topic differently. "Servers down again?", "Anyone else getting connection errors?", and "Why can't I log in?" are all the same topic but share almost no tokens. Keyword/regex rules in AutoModerator can't handle this without massive false-positive rates.
The trick is to compare posts in semantic space, where two phrasings of the same idea sit close together regardless of wording.
When a post is created, TopicRadar generates a vector embedding of title + body excerpt using OpenAI's text-embedding-3-small model at 256 dimensions. The model maps the text to a point in a 256-dimensional space such that semantically similar texts produce nearby points.
To decide whether the new post belongs to an existing topic cluster, we compute the cosine similarity between the new post's embedding and each existing cluster's centroid:
cosine(A, B) = (A · B) / (||A|| × ||B||)
Cosine similarity ranges from -1 to 1. For text embeddings it tends to live in the 0.3–0.9 range. Two posts about "servers down" might score around 0.75. Two unrelated posts might score around 0.4.
A cluster centroid is the running mean of every embedding in that cluster. Concretely we store:
centroidSum: the element-wise sum of all member embeddings.centroidCount: how many embeddings have been added.When matching, we compute centroidMean = centroidSum / centroidCount and compare the incoming embedding against that mean. Using the mean (rather than the original seed embedding) means the cluster's "centre of gravity" drifts as more posts join, which makes the cluster more robust to outlier phrasings.
If the new post's cosine similarity to a cluster's centroid is at least the threshold for the current strength setting, it joins that cluster:
| Strength | Embedding threshold | Image hamming threshold |
|---|---|---|
| Weak | 0.66 | 6 |
| Moderate | 0.62 | 10 |
| Strong | 0.58 | 16 |
Lower thresholds mean more posts qualify as duplicates (more aggressive). If no existing cluster crosses the threshold, the post becomes the seed of a new cluster.
Before doing any embedding work, TopicRadar checks whether the post links to a URL already associated with a known cluster. URLs are normalised by:
www.utm_*, fbclid, gclid, ref, etc.)If the canonical URL is already mapped to a topic, and the new post's embedding is within a looser threshold (0.5) of that cluster's centroid, the URL match wins. This catches re-shares of the same article even when titles differ.
For image posts (single image, gallery, or image-link), each image is downloaded and run through a block-hash perceptual hash (image-hash library, 16-bit, method 2). pHashes are designed so visually similar images produce nearby hashes even after cropping, recompression, or minor edits.
We compare two hashes using Hamming distance, the number of bit positions where they differ. A hamming distance of 0 means identical hashes. The thresholds above are bit-counts; at Moderate strength, two images within 10 bits are considered a match.
For image posts, matching runs in this order:
image).centroid).This means image posts can join text clusters (e.g. an image post titled "Servers are down" joins the existing text cluster about server outages), and any future visually-similar image will also land in that cluster via the pHash path. Text posts, conversely, cannot join an image-only cluster purely on visual grounds — only via the text-embedding bridge, which requires the image cluster to have at least one post with a substantive title.
When a moderator approves a post that TopicRadar removed, the onModAction trigger fires with action=approvelink. We look up the original removal record, find the cluster it was matched against, and increment falsePositiveCount on that cluster.
Once a cluster has accumulated ≥ 2 false positives, we add a per-cluster threshold boost of +0.04 to its similarity threshold, capped at +0.20 total. So a cluster at Moderate strength (base 0.62) that has been corrected twice will only match posts at cosine ≥ 0.66. Three corrections → 0.70. And so on.
The boost is per-cluster, not global, so a single noisy cluster getting boosted doesn't reduce sensitivity across the rest of the sub. The boost survives until the cluster expires from the retention window.
Everything lives in Devvit Redis, scoped per-subreddit, with per-record TTLs equal to the configured retention window. All keys are prefixed topicradar:v1:{subredditName}::
post:{postId} / image-post:{postId}, full post records (hashes). Image post records also carry the perceptual hashes for that post.recent-posts / recent-image-posts, zsets scored by createdAt, used to enumerate posts in the retention window for centroid lookup and timestamp aggregation.topic:{topicId}, cluster records including centroid sum, centroid count, post count, removed count, AI title/description, threshold boost, false-positive count, backfill seed count.hot-topics, zset scored by postCount, used to rank topics for the public "frequently posted" tab and the mod view.url-index, hash mapping canonical URL to topic ID.duplicate-topic-action:{postId}, the action record for each post (action type, status, scores, detection path).duplicate-topic-actions, zset of action records scored by createdAt, used for the mod-view recent-actions feed and time series.topic-votes:{topicId}, hash of up/down counters for the user-voting on each topic.topic-user-vote:{topicId}, hash mapping each voter's username to their current vote, so the UI can show the viewer's own up/down state.All keys have a TTL matching the retention window, so old data evicts automatically without needing a background cleanup job.
To keep the public "frequently posted" tab snappy and reduce Redis traffic, the heavy computed responses are wrapped in Devvit's built-in cache() helper (Redis-backed shared cache plus a per-pod in-memory layer):
topicradar:frequent-summary:{subredditName}:{limit}, the full categorised-topic summary served to the /api/topics/frequent endpoint and the mod-view stats handler. TTL: 60 seconds. This bounds how stale the "frequently posted" list and the mod view's headline counters can be, so users see new topics within roughly a minute of them being clustered.topicradar:topic-search:{subredditName}:{query}:{limit}, search results for the search box in the public tab. TTL: 60 seconds. Repeating the same query within a minute returns the cached match list instead of rescanning recent clusters.The footer reflects the 60-second freshness with "Data refreshes every minute". Lowering the TTL further would cut down Redis savings without meaningfully changing user experience; raising it past a minute would make the frontend feel stale when topics rapidly evolve. The Devvit cache enforces a 5-second minimum.
On the client, only useTopicRecentPosts keeps a small in-memory Map to avoid re-fetching the same topic's recent posts when a user expands/collapses a card repeatedly within a session. Everything else fetches fresh per page load.
The text-fallback (the markdown summary attached to the pinned TopicRadar post for old-Reddit users) is rate-limited to one regeneration per minute per subreddit, so the same handler doesn't rewrite that long markdown payload on every page view.
PlexversalHD
CreatorApp identifier
topicradar
Version
0.0.12
Last updated
Jun 17, 2026
Send feedback