Software and People

Like many other Wiki operators, I'm having a problem with "wiki spam" - wildly off-topic entries posted solely in the hope of distorting search engine link counts or attracting gullible customers. As the author of a Wiki implementation, I'm also concerned with how we can change the Wiki software to reduce or eliminate this problem.

Three types of solution spring immediately to mind, and I'd be interested to hear of any other suggestions, and also of people's opinions on which (if any) of these options they prefer. The next release of Friki will definately have some form of anti-spam measures, but I would really like to make sure I'm taking the right approach.

My possible solution types so far are:

  • Authentication Things such as requiring a login, or a confirmation email before accepting an edit.
  • Filtering by originator Things such as a "blacklist" or whitelist of IP addresses.
  • Filtering by content Things such as banning posts containing certain URL patterns or phrases.

Suggestions? Opinions?


My two cents worth: Of the three suggestions I believe Authentication to be the best alternative. That technology is already available. It may annoy new users who do not understand the rationale, however, I believe it to be the wave of the future to prevent automated bots and rabid cut-and-pasters from destroying wikis. Blacklists and content filtering are too much like 'cold war' solutions...you create a system to block content followed by more sophisticated programming by spammers...and the cycle repeats ad nauseum. It is a sad day to think that we're even having this discussion. IMHO
I think blacklists are the way to go. I'd like to see blacklists shared among wiki's and blogs to get to the bottom of which machines are generating all of the programmatic spam.
You can significantly reduce weblog and wiki spam by simply using a non-standard URL for the posting operation. The spammers' scripts aren't very bright; they usually just post to the "installation standard" URL.

Something else that seems fairly effective is to require that all updates be previewed before posting.

Another approach is throttling: don't allow updates from any one IP address to come in too rapidly.

The traditional idea behind Wikis was that everyone could post and all changes would become a seamless whole, no login and stuff required. If you go that way you're going contrary to that ideal. Whether that ideal is realistic is a discussion I won't go into :) Blacklisting is a good idea, as is throttling and requiring a preview (coupled with the submission URL for the confirmation being different for each submission request so it can't be hardcoded into a script). These work for other mediums as well (to a degree I admit). Maybe a Baysean analysis of new entries to determine whether they're genuine can also work (similar to Baysean filtering for spam) with suspicious entries being deferred to an administrator or moderator for approval before being added.
TrackBack to http://radio.javaranch.com/frank/addTrackBack.action?entry=1102331072000