Navigation
Disclaimer
Authors are solely responsible for the content of their articles on PandasThumb.org. Linked material is the responsibility of the party who created it. Commenters are responsible for the content of comments. The opinions expressed in articles, linked materials, and comments are not necessarily those of PandasThumb.org. See our full disclaimer.
Recent Comments
- Reed A. Cartwright on February 9, 2005 12:46 AM
- chris on February 9, 2005 12:35 AM
- Fed Up With Salvador's Thoughtless Claims on February 8, 2005 09:32 PM
- Reed A. Cartwright on February 8, 2005 06:39 PM
- Carl Ballard on February 8, 2005 06:11 PM
- Reed A. Cartwright on February 8, 2005 06:02 PM
- Engineer-Poet on February 8, 2005 05:54 PM
- Mike Hopkins on February 8, 2005 05:32 PM
- Colin on February 8, 2005 02:02 PM
- Great White Wonder on February 8, 2005 11:54 AM
Recent Trackbacks
Recommend this entry to a friend
Posted by Reed A. Cartwright on February 7, 2005 08:55 PM
Salvador is making some noise on ARN about a comment of his being rejected by our spam filter. This post is to clarify things.
Spammers target blogs with comments. These attacks can be harsh. At times spammers will go through every single post in the blog and post three comments containing scores of links advertising every thing from child rape to internet bingo.
To counter such horrid spam, we employ a blacklist plugin that searches every comment for certain patterns and rejects any that fit. Unfortunately sometimes non-spam also gets blocked. Users are sent a message informing them of the bad content so they can change the post. (Robotic spammers ignore such messages.)
Our typical cycle of spam control went like this:
Spam gets through the filter.
We recognize the spam.
Add the urls from the spam to the blacklist.
Delete the messages that got through.
Of course, the time between 1 and 4 can be hours or days, which can lead to a lot of naughty messages sitting around the blog for a while.
I finally got fed up with this reactionary technique a few months ago, and decided if there was a better option. I tried the explanatory filter but it was unable to detect links designed by spammers. So I had to fall back to old methodology and took links from our blacklist, which I already knew had been designed by spammers, and tried to deduce some megarules from them. I ended up deciding to block urls that contained multiple hyphens, since about 75% of the spammers’ urls went something like “hot-chicks-want-to-hottub-with-you.ruky.net.” (I also blocked all .info addresses since we were only getting spam from them.)
The multi-hyphen megarule has worked very well. However, it is still experimental and has been modified more than once. If you have a problem with getting a url past the spam blocker, you can simply use tinyurl.com to create a replacement url. That is what Wesley did in this comment to link to ISCID. (Contrary to some claims we did not change the blacklist for Wesley.)
We do make our blacklist publicly available, blacklist.txt, so anyone can check if we are banning sites critical towards us. Sorry, would-be martyrs, we do not censor your favorite sites from comments, unless you’re into mature mamas or something. Besides if we wanted to censor you, we’d ban your IP, not add you to our spam blocker.
Trackback URL: http://www.pandasthumb.org/cgi-bin/mt/mt-tb.cgi/797
Comment #15294
Posted by Ginger Yellow on February 7, 2005 09:24 PM (e) (s)
“I tried the explanatory filter but it was unable to detect links designed by spammers.”
That’s because you should be looking for irreducible complexity! Haven’t the IDers taught you anything?
Comment #15320
Posted by Gary Hurd on February 8, 2005 02:28 AM (e) (s)
I am sure I lost many neurons looking in on ARN. I have heard that ARN is in the midst of a large scale purge of scientists, and the IDiots’ “blog” won’t allow any comments at all, nor does it even link to sites that oppose creationist stupidity, and Sal is moaning that he is oppressed.
Good grief!
Comment #15339
Posted by Bayesian Bouffant on February 8, 2005 09:29 AM (e) (s)
So you have a blacklist. Have you considered adding a whitelist? For example, here’s a site that I might legitimately want to link, but that contains multiple hyphens:
http://
www.
don-lindsay-archive
.org/
creation/god_of_gaps.html
Comment #15351
Posted by Colin on February 8, 2005 10:01 AM (e) (s)
I think that since creationists can’t build external credibility with their methods or their results, they do it by climbing on a cross and martyring themselves. When no one is willing to put the nails in, though, they just wind up looking silly. Not to imply, of course, that the talented scientific minds at ARN are somehow not the focus of every posting, comment, and defarious plot at PT. Because we all know that they are.
Meanwhile, the ARN brain trust is busy discussing “The Integrity Difference Between God and Allah.” Can a Nobel Prize in physics be far behind?
Comment #15355
Posted by Colin on February 8, 2005 10:09 AM (e) (s)
Pardon me. I meant to refer to the proprietors’ *nefarious* plots. I have no idea what their defarious plots are, but I’m sure they’re related to their datheistic dagenda.
Comment #15356
Posted by Salvador T. Cordova on February 8, 2005 10:13 AM (e) (s)
Thank you Reed for clarifying. I withdraw my complaint regarding that threads at ISCID were being singled out by the auto-blocking features.
I accept your explanation that URLs to ISCID thread were by coincidence sharing characteristics with URLs like the one you mentioned, such as
“hot chicks want to hottub with you ruky net ” (hyphens ommitted)
and that URLs to ISCID threads were only inadvertently censored because they shared characteristics with URLs from porn sites.
I extend my thanks for your hospitatlity here at PandasThumb. You need not worry for any spam threats from me here at PandasThumb…..
Further, though I have vigorously assailed some of the writings of Wesley Elsberry, I salute him as a gentleman. He’s far more statesmanlike than I ever will be. Same can be said of many of the contributors at PandasThumb including yourself, Steve Reuland, Jason Rosenhouse, Richard Hoppe, Jack Krebs, Matt Young, Mark Perakh, etc…
regards,
Salvador
Comment #15366
Posted by Great White Wonder on February 8, 2005 11:54 AM (e) (s)
Salvador, what about me? That really hurts, man.
Colin wrote
Meanwhile, the ARN brain trust is busy discussing “The Integrity Difference Between God and Allah.”
Colin, is that a joke? It must be a joke. It’s a joke, right?
Comment #15389
Posted by Colin on February 8, 2005 02:02 PM (e) (s)
No, it is not a joke. At least, not the humerous kind.
“The Integrity Difference Between God and Allah.”
To be fair, it is in the “Off Topic” forum, so it may be exempt from ARN’s normally rigorous scientific methodology.
(That one was a joke.)
Comment #15440
Posted by Mike Hopkins on February 8, 2005 05:32 PM (e) (s)
The white list is not a bad idea. Put the anti-evolutionist sites on it just to make the point.
Not that in this post ‘*’ means hyphen.
I think I might have a possible modification to the no-multiple-hyphens rule. As everyone here probably realizes a URL has three parts:
method: usually http://
domain name: whatever.org
path: /documents/speech1.html
Trash/spam URLs like the hypothetical “hot*chicks*want*to*hot*tub*with*you.ruky.net” have their hyphens in the domain name. The “ubb-get_topic*f*6*t*000532.html” was the path for the ISCID link Salvador tried to link to.
A regular expression to do this should fairly easy to construct.
—
Anti-spam: replace “user” with “harlequin2”
Comment #15446
Posted by Engineer-Poet on February 8, 2005 05:54 PM (e) (s)
I notice that “yahoo dot com” does not appear in the blacklist as such, but it is still prohibited by the spam filter. In e-mail addresses (not content), no less!
Is there a second filter for that?
Can you fix it and allow me the luxury of not munging my address? Thanks.
Comment #15447
Posted by Reed A. Cartwright on February 8, 2005 06:02 PM (e) (s)
I actually deleted yahoo.com from the blacklist yesterday. Some spammers use yahoo.com in their email, which can cause it to get added to the blacklist.
Comment #15456
Posted by Reed A. Cartwright on February 8, 2005 06:39 PM (e) (s)
A regular expression to do this should fairly easy to construct.
It’s actually harder than you might think, given the constraints of MT Blacklist.
Comment #15477
Posted by Fed Up With Salvador's Thoughtless Claims on February 8, 2005 09:32 PM (e) (s)
Salvador: “I withdraw my complaint regarding that threads at ISCID were being singled out by the auto-blocking features.”
Help, help! I’m being repressed! Come see the violence inherent in the system!
Yap, yap, yap…
Comment #15506
Posted by chris on February 9, 2005 12:35 AM (e) (s)
An alternative to tweaking the hell out of your blacklist is to switch to MT 3.x, which supports (and actually <em>comes with</em> MT-Blacklist 2). I had to switch my personal site to MT 3 for exactly this reason, and haven’t experienced a type 1 or type 2 error since.
Depending on how heavily you’ve modified your installation of MT, making the switch might be easy or might be a total pain the ass. Of course, there’s always drupal, which might prove to be far superior to MT for a site like this. Check it out: http://www.drupal.org/…
Comment #15509
Posted by Reed A. Cartwright on February 9, 2005 12:46 AM (e) (s)
Switching to MT 3 and Blacklist 2 is in the pipeline after our planned server OS upgrade.

Comment #15293
Posted by Tim Tesar on February 7, 2005 09:20 PM (e) (s)
Thanks very much for the explanation. I have had problems a couple times with posts being rejected, but had no idea why. Since the error message indicates the text that caused the problem, I was able to make changes and post successfully. As you indicate, perfectly innocuous text made get pegged as an error. But it was easy to fix, so I’m not complaining.