This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Our sub gets a lot of people trying to evade filters by using non-English characters as letters like 'kⅰll myself' or 'bʟåㄷk people' (trust me, I could have used much worse examples). We've been doing a good job of adapting to these by making word-specific filters using common characters. But one day I thought 'we could be more proactive with less work if we just filtered titles that used any non-English letter-like character!'
...And thus began a two-week saga of hitting my head against the keyboard. I've now made five different versions, but each time the code has failed.
Version 1:
type: submission
title (includes, regex): ["[ăäãáàảẩẩẫắằåạậặɑæɚÄÀÁÂÃÅÆᎪßÇçㄷĆ¢ĐđêëẻẽếềểễÉÈÊËᎬĞğíìïîỉĩịÍÎÏñöôõóòỏốồổỗơớờởỏọộợÓÔÕÖÔŒœÖoㅇø•ŞşSûüúùủưứừửữụựÜÚÙÛỹýỳỷÿŸ]"]
~title (includes, regex): ['“’']
action: filter
priority: 1
action_reason: 'FILTER EVASION 2: Suspicious use of non-English characters in title (experimental, version 1). Investigate ({{match}}).'
---
This set of code hit false positives like these:
I’m almost certain I’ve come down with something and I’m thinking of taking the day off. Should I mention that I’ll still try to make it to work?
Can most people identify common languages, even if they don’t know any of the language?
Do kids still put their finger and their thumb in the shape of an L on their forehead as a way to say “Loser!” ?
Weirdly, the action_reason for these came back as: Suspicious use of non-English characters in title. Investigate (â).
...You'll notice that there is no â in these titles.
...You'll notice, in fact, that I specifically removed â, ấ and ầ from the code. It matched â anyway.
...You'll notice that I started excluding quotation marks and other characters I thought might be giving a false positive. Still matching â.
It didn't do this for every post (thank goodness!) but did catch more false positives than actual ones, so it's worthless.
I tried other code:
type: submission
title (includes, regex): ["(?#Latin Extended-A)(?-i:[\u0100-\u017f] )", "(?#Latin Extended-B)[\u0180-\u024f] ", "(?#Combining Diacritical Marks)[\u0300-\u0335\u0337-\u0360\u0362-\u036f] ", "(?#Cyrillic)[\u0400-\u052f] ", "(?#Hebrew)[\u0590-\u05ff] ", "(?#Arabic)[\u0600-\u0669\u066b-\u06ff] ", "(?#Devanagari)[\u0900-\u097f] ", "(?#Bengali)[\u0980-\u09ff] ", "(?#Gurmukhi)[\u0a00-\u0a7f] ", "(?#Tamil)[\u0b80-\u0bff] ", "(?#Kannada)[\u0c80-\u0cff] ", "(?#Thai)[\u0e00-\u0e7f] ", "(?#Latin Extended Additional)[\u1e00-\u1eff] ", "(?#Hiragana)[\u3041-\u3096] ", "(?#Katakana)[\u30a1-\u30c3\u30c5-\u30fa] ", "(?#CJK Unified Ideographs)[\u4e00-\u9fff] ", "(?#Hangul)[\uac00-\ud7af] ", '(?-i:[ÀÂÆÇÈÉÊËÎÏÔÙÛÜàâæçèêëîïôùûüÿŒœŸ])', '[ÄÖÜßäöü]', '(?-i:[¡ªº¿ÀÁÂÃÇÈÉÊÌÍÑÒÓÔÕÙÚÜàáâãçèêìíñòóôõùúü])']
action: filter
priority: 1
action_reason: 'FILTER EVASION 2: Suspicious use of non-English characters in title (experimental - version 2). Investigate ({{match}}).'
---
Similar problems.
Even more code:
type: submission
~title (regex): ^[\p{L}\p{M}\p{N}\p{P}\p{Sm}\p{Sc}\p{Sk}\p{Z}] $'
action: filter
priority: 1
action_reason: 'FILTER EVASION 2: Suspicious use of non-English chaarcters in title (experimental, version 3). Investigate.'
---
This one was even worse.
Version 4:
type: submission
title (includes, regex): '[^A-Z0-9\?\.,\/\$\\ \(\)''’"“”:;<>= …\-_\*%#@\!&]'
action: filter
priority: 1
action_reason: 'FILTER EVASION 2: Suspicious use of non-English chaarcters in title (experimental, version 4). Investigate.'
---
Still not working. This one caught questions like Does heat make those bottles Starbucks iced coffees go bad if they’re unopened? and What does it mean when someone says "as one computer said, if you’re on the train and they say portal bridge you know you better make other plans,"?
So what am I doing wrong? Is there something obvious I've missed? Is there a way to do this that I haven't tried? Has anyone got working code for this? Am I driving myself into an early grave by trying to do something too difficult?
Help me, Automoderator community. You're my only hope.
Subreddit
Post Details
- Posted
- 2 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/AutoModerat...