Why keyword bots miss 60% of toxic messages

The dominant Discord moderation tools — MEE6, Carl-bot, AutoMod — all share an architecture decision from a decade ago: match strings against a banned-word list. The word "sick" gets banned, every message containing "sick" gets removed, done. It's fast, it's predictable, and on the kind of toxicity it was designed for (slurs, racial epithets), it works.

But that's not where most modern Discord toxicity lives. The actual hard cases are:

Targeted harassment with no banned words. "You should just give up. Nobody here wants you." Zero keywords match, but the intent is unmistakable.
Slang that flips banned words positive. "That new skin is sick" gets removed alongside "you make me sick." Same word, opposite meaning.
Coordinated raids using unusual phrasings. A brigade arrives with a memorized rotation of "edgy but not banned" phrases — the keyword list lags behind every time.
Multi-language servers. Hungarian toxicity, Polish toxicity, French toxicity — each requires its own banned-word list, and most servers never get around to maintaining them.

The math#

Studies on automated content moderation consistently put keyword-only systems at 30-40% recall against modern internet toxicity. That means in a representative sample of 1,000 actually-toxic messages, a keyword bot catches 300-400. The remaining 600-700 slip through and either reach the targeted user — or your moderators are paged at 2am to clean up manually.

What changes when you switch#

The mods we talk to consistently report three changes after moving from keyword-based moderation to context-aware:

False positives drop dramatically. The "this is being deleted for no reason" complaints in the support DMs go away because the bot can tell "that skin is sick" from "you make me sick."
Mod queue gets quieter. Stuff that would have hit the mod team manually is now resolved by the bot — the team works on edge cases instead of routine cleanup.
Multi-language servers stop needing per-language config. Whatever language the toxicity is in, the model understands it.

The trade-off#

Context-aware moderation costs money to run (every message goes through an inference call) and isn't as instant as a string match — though both are noise once you're talking about hundredths of a second. For a small server, the math doesn't always favor it. For a server where toxicity is actively ruining the experience, it's the difference between a community that's worth being in and one that isn't.

If you want to see the difference live, the Civora sandbox lets you paste a message and compare AI moderation vs the same keyword approach the older bots use. No signup needed — just type a phrase and see what each side flags.

← Back to all posts