понедельник, 23 мая 2005 г.

Nofollow revisited

It’s been about four months now since Google introduced nofollow to the web. Since then it’s been adopted by every major blogging platform, as well as many other wiki, forum and CMS platforms. Now that is everywhere, it’s time to take a good hard look at whether it’s actually been effective at its stated purpose of stopping comment spam.

I will show today that Google’s nofollow initiative has not resulted in a reduction of link spam, but instead has had much more insidious effects on the Internet.


What is nofollow?

Before I can show you anything, we must be clear on what nofollow is, and what it is not. In short, nofollow is a link relationship which one can add to any hyperlink on a Web site. When added, it looks much like this:

<a rel="nofollow" href="http://www.google.com/">Google</a>

The rel="nofollow" tells search engines not to count this link as an inbound link to the target web site. It does not prevent the search engine from following the link, as it seems to imply. The search engine is still free to follow the link and index the content on the target web site, but for the purposes of counting inbound links (and PageRank) the link should not be factored into that calculation.

The intended effect of this is that any link containing rel="nofollow" will allow both users and search engines to reach the site, but the existence of the link will not increase the ranking of the site in participating search engines. Aside from Google, so far MSN and Yahoo! are participating. Others may be as well.

Comment spam

Comment spam is the problem which led to the introduction of nofollow. Many types of software, including blogs, wikis, forums, and some CMS software, allow the general public to post comments, and in some cases, to post top-level articles (what these are and how they appear depends on the software). In the beginning, this was good; it led to a lot of interactivity between a site and its readers. However, soon spammers discovered that most of these systems allowed them to post links to their own sites, and thus began posting spam.

In case you’ve never seen one, here is an example of a comment spam:

In the ordinary, everyday understandings of the words involved, to say that someone survived death is to contradict yourself; while to assert that all of us live forever is to assert a manifest falsehood, the flat contrary of a universally known truth: namely, the truth that all human beings are mortal. For when, after some disaster, the ‘dead’ and the ’survivors’ have both been listed, what logical space remains for a third category? by buy viagra
Comment by phentermine — 3/24/2005 @ 6:03 pm

The two links go to various spamvertised sites, and I’ve omitted them here. Sorry if you were actually looking for Viagra or phentermine.

There are two main reasons spammers target blogs, wikis, forums and CMS. In no particular order, the first is to create additional inbound links to their sites, in order to raise the sites’ rankings in search engines. The second is to create additional inbound links to their sites, in order to entice users to purchase their products.

Nofollow: the final solution to comment spam

If you’re a blogger (or a blog reader), you’re painfully familiar with people who try to raise their own websites’ search engine rankings by submitting linked blog comments like “Visit my discount pharmaceuticals site.” This is called comment spam, we don’t like it either, and we’ve been testing a new tag that blocks it. From now on, when Google sees the attribute (rel=”nofollow”) on hyperlinks, those links won’t get any credit when we rank websites in our search results. This isn’t a negative vote for the site where the comment was posted; it’s just a way to make sure that spammers get no benefit from abusing public areas like blog comments, trackbacks, and referrer lists. — Google

Google’s promise was: by tagging spammers’ links with nofollow, their sites would decrease in rank in their search engine. It was quite surprising how quickly virtually everyone in the blogging community signed on. I recall a few people raised concerns as to whether it would actually cause spammers to stop, and I was one of them, but that didn’t seem to stop anyone. MovableType, , Blogger, Flickr, you name it, everybody was adding nofollow. Even Slashdot. Who are they trying to stop, the Gay Nigger Association of America? (Yes, that last link has a rel="nofollow" on it.)

MSN quickly signed on to the nofollow initiative, and Yahoo joined as well. People all over the Internet started rejoicing: the comment spam problem had been solved! Or had it?

Why nofollow doesn’t stop spam

If you’ve been running a blog, you are quite well aware that nofollow has done little or nothing to stop comment spam. In some cases they are hitting blogs so hard as to generate denial of service conditions, even when the blogs use rel="nofollow"!

I am so sick of the damn spammers. Spammers are teh sux0r. Spammers are a festering boil on the ass of the Internets. I wouldn’t let a spammer kiss my butt with a pair of wax lips from ten feet away. If I ever see a spammer bleeding in a ditch, I will not be a Good Samaritan, I will kick him in the head, cover him up with dirt, and leave him there to rot. — Dougal Campbell

Why is this? Where did nofollow fail? It prevents spammers from getting PageRank, doesn’t it?

Yes, nofollow prevents spammers from getting PageRank. But they want traffic on their web sites. How they get it is irrelevant, except as a means to an end: bringing in users and taking their money. Indeed, shortly after Google launched nofollow, The Register published an interview with a link spammer. It goes into great detail as to how link spammers operate, and it is required reading if you want to understand why nofollow has failed, how to actually stop link spam, or both. As is my usual style, I’ll post a few choice cuts. Quotes are from the link spammer, “Sam,” interviewed in the article.

“You could be aiming at 20,000 or 100,000 blogs. Any sensible spammer will be looking to spam not for quality [of site] but quantity of links.”

This is Rule #1 for a link spammer. Post the link in as many places as you can, to bring in as many people as possible.

When a new blog format appears, it can take less than ten minutes to work out how to comment spam it. Write a couple of hundred lines of terminal script, and the spam can begin. But you can’t just set your PC to start doing that. It’ll get spotted by your ISP, and shut down; or the IP address of your machine will be blocked forver by the targeted blogs. So Sam, like other link spammers, uses the thousands of ‘open proxies’ on the net. These are machines which, by accident (read: clueless sysadmins) or design (read: clueless managers) are set up so that anyone, anywhere, can access another website through them. . . . Sam’s code gets hundreds of open proxies to obediently spam blogs and other sites with the messages he wants posted.

Most link spammers, manual or automated, use open proxies to disguise the source of their spam. It’s rare to receive link spam that did not originate from an open proxy. Yes, I’ve been tracking this.

When Sam spams tons of blogs and sites with links to his sites - which are affiliates of bigger PPC sites - people see the links and, seeking some porn, pills or casino action, click through to his site, and from there to the parent site, which pays Sam for each person landing there. The PPC sites can see revenues of £100,000 to £200,000 per month, says Sam. He gets a slice of that - and he wants it to stay that way.

Aha! We get to the heart of the matter. Our link spammer cares about click-throughs. Nofollow is completely irrelevant to click-throughs. Remember, it affects search engines, not users. Now, you may never follow a spammed link and buy something from one of these sites, but there are many users reading your blog (and others’ blogs) who will indeed patronize these sites if they happen to run across one from comment spam.

Will the initiative by Google, Yahoo and MSN, to honour “don’t follow” links defeat Sam and his ilk? “I don’t think it’ll have much effect in the short, medium or long term. The search engines caused the problem . . . and they’re doing this to placate the community. It won’t work because most blogs and [forums] are set up with the best intentions, but when people find hard graft has to go into it they’re left to rot. To use this, they’ll all have to be updated. The majority won’t be. And there’ll just be trackback spamming.”

Straight from the spammer’s mouth. He doesn’t care about nofollow. It isn’t stopping him. Not only do people click through anyway, many sites he spams don’t have nofollow implemented, so it doesn’t affect his search engine rankings too much.

What nofollow really stops

From the beginning, people began questioning the use of rel="nofollow", whether it would be effective at stopping comment spam, and what other effects it might have.

I’m deeply mystified by the hallelujahs bursting forth about Google’s rel="nofollow" method of preventing comment spam. . . .

If rel="nofollow" works, if it’s applied universally, it will actually have the reverse effect. It actually gets less effective the more it is implemented. Why? Because the comment spamming sites are in competition with each other, and not with any legitimate businesses. They’re not so much trying to get the best pagerank for their term, as trying to get a better one than their rivals. That’s a key distinction. If the playing field is levelled by rel="nofollow", then everyone involved will be forced to try all the harder to get their links out there. The blogosphere will be hit all the harder because of the need to maximise the gains. As there’s no more effort in hitting 6 million blogs as there is in hitting 1 million, this really won’t bother the spammers one bit. All it does is shift the problem from the high pagerank blogs we here might have, with rel="nofollow", custom sanitize settings, and mt-blacklist in full effect, all the way over to the less technically adept. And that is one enormous customer service problem heading towards Blogger, 6A and the rest.

. . . forcing comment spammers to cast a wider net will cause them to target the long tail of people who have no idea what to do but come screaming at tech support, or slagging blogs off to their friends.

That would be a disaster. — Ben Hammersley

Hammersley didn’t even mention the effect nofollow would have on legitimate bloggers who use comments and trackbacks to interlink their blogs. However, The Register did:

It’s effectively declaring PageRank™ dead for weblogs, in an attempt to stem the problem [of comment spam].

“If such a tag were used widespread against comments and trackbacks, then wouldn’t this end up kneecaping blogs, by killing their intricate networks of interlinks?”

Indeed, this has already begun to happen. If your blog software inserts nofollow, then in order for you to give another blog Google juice, you have to go out of your way to link to them without nofollow, such as in your blogroll. It is no longer enough that your reader left an insightful comment or a trackback to his blog with more information. Now, as far as Google juice is concerned, it is as if all of your readers were never there and you had received no comments or trackbacks at all.

That’s what Google wanted all along.

For years Google has been plagued by blog noise, the phenomenon where blogs’ articles, comments and trackbacks will show as highly ranked for search results. While they have tried many approaches to dealing with this problem, none have worked out very well — or at all.

But is blog noise a problem at all? Sven-S. Porst doesn’t think so, and neither do others. The counter-argument is that blogs often are the search results people need.

One of the keys to being found on Google is that the webmaster has to want the page to be found. And most of what one would normally consider “primary source material” doesn’t want to be found. . . .

With most of the good reading material unavailable for free, what’s left? — Calico Cat

What’s left is the poor vilified blog and the thousands upon thousands of bloggers who work hard every day to bring useful content to the Web that wouldn’t otherwise be there.

Stopping nofollow

As we’ve seen, rel="nofollow" is Google’s way of having bloggers effectively delist themselves from search engines under the guise of protecting them from comment spam. If you want your site to have more Google juice, and who doesn’t, people have to link to you without rel="nofollow". It’s that simple. Nofollow hurts the entire blogosphere, and if carried to its extreme, will result in most blogs being relegated to obscurity as they drop out of the top 100 search engine results.

When you’re ready to get rid of rel="nofollow", first urge your blogging software developers to drop rel="nofollow" from their software. Then (if applicable) install a plugin or extension which removes rel="nofollow" from your blog, or remove the plugin or extension which added it. If you’re on a hosted blogging site such as Blogger, LiveJournal or MSN Spaces, your only immediate recourse will likely be to switch to another platform.

For Movable Type, simply disable and remove the nofollow plugin. For WordPress, install the NoNofollow plugin and set the number of days to 0 (zero). Update: Mark Jaquith has posted his Screw Nofollow plugin for WordPress which is smaller and less complicated.

And if you want to actually see how prevalent rel="nofollow" is, you can use Firefox, and add this bit of CSS to your chrome/userContent.css file:

a[rel~=nofollow] { color: red !important; background: black !important; text-decoration: blink !important; }

Unfortunately, I don’t know of any way to cause Internet Explorer to show rel="nofollow" links. If you do, please leave a comment below.

And finally, stopping comment spam

No article on rel="nofollow" would be complete without mention of how to stop comment spam. As I’m a WordPress user, much of my research in this area has been focused on stopping WordPress comment and trackback spam. However, I do have one thing of interest to Movable Type users.

Bad Behavior Blackhole is a DNS-based blackhole list which lists sources of comment spam and open proxy servers. Bad Behavior Blackhole intends to have the most complete list of open proxies available anywhere as well as automated removal for any legitimate user who happens to get stuck with a dynamic IP address a spammer once used. A WordPress plugin is available which looks up addresses in Bad Behavior Blackhole, and it is trivially easy to convert MT-DSBL to use Bad Behavior Blackhole; just replace “list.dsbl.org” with “dnsbl.ioerror.us”. I recommend for best coverage that you use both, though, and that’s probably not a trivial hack, unfortunately.

For WordPress, Bad Behavior (which is different from Bad Behavior Blackhole) analyzes incoming requests and determines if they are spambots; if they are found to be automated spambots, they are blocked before they can ever read your site to find the comment form! I’m not aware of any similar solution for Movable Type, unfortunately. In addition, WP-SpamAssassin uses SpamAssassin to analyze comments; this has been ported to Movable Type.

See also Movable Type anti-spam plugins and WordPress anti-spam plugins.

Comments and trackbacks are welcome and will be posted without rel="nofollow".

Хочешь читать по-английски? Учи язык!

Автор: IO ERROR.

Комментарии:

Tom
2005/05/23 at 08:19:48
Join Top Blogs!
The problem too, is that it doesn’t really prevent someone from posting, just from Google indexing the page.

Andy Skelton
2005/05/23 at 09:29:33
I am astonished at how much money the CPP’s can make from their enterprise. To them, it’s a sweet stream of income. To you and me, it feels like spammers are spraypainting POKER and VIAGRA all over our homes.
And I see nofollow is off here. Does this mean that you don’t mind if I put a link to Blogs Of The Day here? I mean, it’s a relevant question, so it’s not really spam, right?

IO ERROR
2005/05/23 at 11:19:17
Tom, be nice. And don’t we have enough of those “top sites” listings already?
Andy, there already was a link to Blogs of the Day here; you just can’t see it. I can’t fit 80×15 buttons into my layout very well anywhere. Give me an 88×31 (or so) sized button and I’ll make it visible.

Denis de Bernardy
2005/05/23 at 18:51:44
I had similar thoughts upon reading Sam, and eventually wrote my DoFollow plugin. Then again, at some times, I wonder if I should not put the nofollow back for users who do not have at least two comments, for the sake of countering individuals who use comment and run spam techniques.

Seun Osewa
2005/05/27 at 16:57:21
I wonder if we will notice a massive loss of pagerank by blogs as a result of the wide adoption of rel=nofollow. I wonder if this has been measured.

positive
2005/05/31 at 03:36:44
I think nofollow is a bad idea for yet another reason. It gives people (read “webmasters”) the power to cheat other webmasters when doing “links exchanges” between their websites. Some webmasters will add a nofollow tag to your link a few days after the links exchange has been live. Of course you can get back to their websites and check your link, or you may have a script do that automatically. But how about people that don’t even know about nofollow and the fact that they are being cheated by others?
Did you know that spreadfirefox.com uses nofollow tags in their Top Affiliates sections to link back to the websites promoting Firefox through their affiliate links? Although the whole point of having a “Top Affiliates” section is to give people linking to you a little credit for doing so (unless you’re one of those affiliate programs that pays money for linking to your site - which is not SFX’s case).

IO ERROR
2005/06/01 at 00:23:45
Selectively applying nofollow is a whole other issue. Wikipedia is infamous for it. If I’d included this issue, the article would have been twice as long. :) I’m holding it to see what the general response is from the Internet to this article. It would help if the whole Internet read it…somebody submit me to /.

Charles Miller
2005/06/06 at 21:11:59
Implemented as designed, nofollow should only be applied to untrusted links. Links that form part of the main body of the blog post are trusted, and thus should continue to pass along PageRank. Thus the traditional way for bloggers to share the link-love — by making blog posts that link to other pages — remains entirely unaffected. What gets swept aside is 90% incidental links that go from each comment back to the commenter’s homepage.
Which, even discounting comment spam, makes a lot of sense from a search engine’s point of view. The value of PageRank, if any, comes from the way it attempts to measure how many people other than the page author believe a page is worth reading. If Alice links to Bob in a blog post, that’s a substantial link - Alice has found something interesting she wants to point the rest of the web to. If Bob comments in Alice’s blog, and that inserts a link back to Bob’s homepage, that’s an incidental link. Bob shouldn’t be considered for higher ranking in a search engine just because he leaves comments all over the place.

agenzie investigative
2005/06/08 at 09:41:22
I’m in general against nofollow but I think it will also help bloggers who don’t know how to fight comment spam any other way. Another side benefit to bloggers is that they can use the tag selectively to keep certain links they use from getting pagerank.

Zbigniew Lukasiak
2005/06/15 at 09:01:47
There might be uses for ‘nofollow’ different from fighting spam - for example it could be used in wikis to stop bots from indexing old revisions, diffs and other automatically generated pages.

IO ERROR
2005/06/15 at 09:11:45
Charles, you have illustrated very well Google’s problem with “blog noise.” I would almost think you were working for them. :)
Zbigniew, the problem with that is that nofollow does not stop search engines from following the link, so they still get indexed. The name “nofollow” is a misnomer. If you want to stop the pages from being indexed, you still have to use robots.txt.

Zbigniew Lukasiak
2005/06/16 at 09:59:23
Thats a pity. robots.txt is not flexible enough for me - most wiki engines have automatically generated content under addreses that you cannot separate with robots.txt. Tight tying of the link and the info for the bots would have another benefit of them using updated information, I now have visits from msnbot to sections that I put months ago into robots.txt - but apparently the bots have not updated their copy of that file yet.

IO ERROR
2005/06/19 at 16:14:56
Robots are supposed to re-read robots.txt no less often than once every seven days. You might want to report the trouble to Microsoft. Then again, MSNBot is Microsoft software, and I don’t expect them to actually fix it. :)
Of course, how the wiki presents its permanent links makes a big difference, too. If it uses a virtual directory structure, such as http://wiki/Special/whatever then it’s much easier to put such things in robots.txt than if the structure is http://wiki/Special:whatever (which MediaWiki uses) or http://wiki/index.php?Special=whatever (which is damn near impossible to work with).

Bennett
2005/06/19 at 22:35:33
Blogs’ PageRank Reduced: Civilization Collapses
Did anybody really think nofollow would reduce comment spam? As you point out, it removes the PageRank incentive for comment spammers, but of course it doesn’t remove the link incentive. Obviously, spammers will still do it.
It is no longer enough that your reader left an insightful comment or a trackback to his blog with more information.
So if your reader wants Google to rank her site highly, she’ll just have to post insightful content on her own site. If it’s really good, other bloggers will link to it (in posts, without nofollow). Is this so terrible?
As Charles said above, it’s hard to see why adding content (comments) to somebody else’s site should affect your Google ranking. I think his comment deserves a more reasoned response than it got.
You say that “blogs often are the search results people need”. I tend to agree. So why would Google shoot themselves in the foot by not providing the search results people need? They may or may not be evil, but they’re certainly not stupid.
I have posted a more complete response, Nofollow considered harmless, on my website.

Logan
2005/07/04 at 00:18:04
Hey, just wanted to say thanks for the enlightening article. I downloaded Dofollow for WP and installed it on my blog, and I hope others do the same. Thanks again.

Wohnung Mieten Wohung
2005/07/04 at 04:34:45
Question:
Why shouldn’t blogs pass their PR to other sites ?
Answer:
Due to major cross linking in the blogs area. Some blogs achieve PR8.
Amazon and Ebay etc pay millions to generate WWW spam sites.
Now comes this poor guy and own a PR8 blog website - for nothing !
Now you know!
I think blog must pass PR for whole blog spehre to survive.
I don’t want to land in silly affiliate site everytime i search for something.
Next story:
I think including a signature link to your website is not spam.
But adult sites, viagra and co. aren’t cool.
I would prefer a concept of trusted links as Charles Miller mentioned above.
Add to it a limit of say 5 url per post. Allow max 5 body url and max 2 signature links.
That’s it.
I’ve visited a lot of SEO sites where this is common.
Please let the experts at least post their sites. Otherwise, they will go
to the nonofollow sites.
And the dump spammer will continue spamming anyways.
Wohnung Mieten Wohung

chas
2005/07/07 at 01:20:23
the problem is the Blogwebmaster allow tom, dick and harry to post comments on their sites…..
i do want people to come to my site so i did insert my page for this comment. but i do have an interest and my site has interest in the topic at hand….
If the topic interest me enough i would have signed up to make a reply..
thats where the solution is: manual registration.

DianeV
2005/07/13 at 04:44:39
Interesting post, IO Error. I said as much in the WP forums just after the WP developers had agreed to implement nofollow … and was villified for saying it. But now we see the truth, which has nothing to do with not being “in agreement with the group” or anything else. It’s just … the truth.

11Ell
2005/08/04 at 16:33:38
Here is the way you can see “nofollow” links in IE. Just put this code instead of URL in some of your bookmarks:
javascript:(function(){var i,x;for(i=0;x=document.links[i];++i){if(x.rel&&x.rel.indexOf('nofollow')>-1){x.style.color='black';x.style.background='red';}}})();




Другие посты по этой теме:



Комментариев нет: