July 05, 2007

Avoiding Duplicate Content: Exactly How “Unique” Do Your Spun (Rewritten) Articles Need to Be?

Posted in: article marketing,article rewriting,article spinning,duplicate content,SEO

A couple of common questions among those who use article spinning to boost their article marketing efforts are: “How ‘unique’ does my content need to be? Is there some ‘magic formula’ that I can use to figure out the percentage of similarity that will safely escape the search engines’ duplicate content filter?” Well, as it turns out, there just may be.

The real question central to solving this problem is not “How dissimilar do my articles need to be?”, but rather, “How dissimilar from one another, on average, are so-called ‘unique’ articles that have been indexed by the search engines?” Most people may fail to consider this subtlety, erroneously assuming that most unrelated, non-spun content hovers around 0% similarity. However, once we know what percentage of content-sharing is typically “allowed” by the search engines, we then have a goal to shoot for when authoring spun or rewritten content.

To come up with some useful percentages as guidelines, I ran some duplicate content tests using DupeCop’s wonderful and free online duplicate content checker, available here.

The content for these tests was retrieved from Wikipedia’s Featured Articles section. To begin, I selected excerpts of relatively equal length from two seemingly totally unrelated articles: the first titled “An Experiment on a Bird in an Air Pump,” and the second about “Action Potential” (no, this isn’t about your plans for Saturday night 🙂 ). Surely these two totally distinct and dissimilar articles would share almost no “duplicate content,” right?

WRONG. Surprisingly, DupeCop showed that these two articles were only 80.2% unique, meaning that they “shared” approximately 19.8% of their content. To make sure this wasn’t some sort of fluke, I ran another test, this time comparing an article on the New Orleans Mint with one about Windows NT. The results were similar: The articles were only 83.3% unique, sharing almost 17% content.

OK, that’s a good start, but what about articles on the same topic? To test similarity between topically-related articles, I compared excerpts from a write-up on Influenza to another on Pneumonia. Not surprisingly, these articles came in slightly lower at 74.9% uniqueness, sharing around 25% content.

So what does this all mean? Well, it appears that for us article spinners out there, any two pieces of content will likely avoid the duplicate content filter (and subsequently be indexed) if they are at least 75% unique. However, this is most likely a low estimate, given the social nature of the web, and its encouragement (especially with the emergence of powerful social networks) of sharing “snippets” of interesting content with one another. It’s highly doubtful that one would be “penalized” for sharing a snippet of content from another site on one’s own site, as long as it is surrounded by sufficiently unique content.

In my experience, anything under 30% similarity is highly unlikely to be sucked into oblivion by the duplicate content filter. So, now all you duplicate content fanatics have a concrete number to shoot for, based on (albeit, a tiny) real-world sampling.

Happy Spinning,


