A couple of common questions among those who use article spinning to boost their article marketing efforts are: “How ‘unique’ does my content need to be? Is there some ‘magic formula’ that I can use to figure out the percentage of similarity that will safely escape the search engines’ duplicate content filter?” Well, as it turns out, there just may be.
The real question central to solving this problem is not “How dissimilar do my articles need to be?”, but rather, “How dissimilar from one another, on average, are so-called ‘unique’ articles that have been indexed by the search engines?” Most people may fail to consider this subtlety, erroneously assuming that most unrelated, non-spun content hovers around 0% similarity. However, once we know what percentage of content-sharing is typically “allowed” by the search engines, we then have a goal to shoot for when authoring spun or rewritten content.
To come up with some useful percentages as guidelines, I ran some duplicate content tests using DupeCop’s wonderful and free online duplicate content checker, available here.
The content for these tests was retrieved from Wikipedia’s Featured Articles section. To begin, I selected excerpts of relatively equal length from two seemingly totally unrelated articles: the first titled “An Experiment on a Bird in an Air Pump,” and the second about “Action Potential” (no, this isn’t about your plans for Saturday night
). Surely these two totally distinct and dissimilar articles would share almost no “duplicate content,” right?
WRONG. Surprisingly, DupeCop showed that these two articles were only 80.2% unique, meaning that they “shared” approximately 19.8% of their content. To make sure this wasn’t some sort of fluke, I ran another test, this time comparing an article on the New Orleans Mint with one about Windows NT. The results were similar: The articles were only 83.3% unique, sharing almost 17% content.
OK, that’s a good start, but what about articles on the same topic? To test similarity between topically-related articles, I compared excerpts from a write-up on Influenza to another on Pneumonia. Not surprisingly, these articles came in slightly lower at 74.9% uniqueness, sharing around 25% content.
So what does this all mean? Well, it appears that for us article spinners out there, any two pieces of content will likely avoid the duplicate content filter (and subsequently be indexed) if they are at least 75% unique. However, this is most likely a low estimate, given the social nature of the web, and its encouragement (especially with the emergence of powerful social networks) of sharing “snippets” of interesting content with one another. It’s highly doubtful that one would be “penalized” for sharing a snippet of content from another site on one’s own site, as long as it is surrounded by sufficiently unique content.
In my experience, anything under 30% similarity is highly unlikely to be sucked into oblivion by the duplicate content filter. So, now all you duplicate content fanatics have a concrete number to shoot for, based on (albeit, a tiny) real-world sampling.
Happy Spinning,
Paul
Bookmark This!


Paul does this mean your team of spinners99.9% of the time spin each article that is sent in to AT LEAST 70% unique as you just won’t get sucked into the Google abyss?
- Thanks
Comment by Ben — August 30, 2007 @ 6:11 am
Thanks for your question, Ben.
Unfortunately, we cannot guarantee a specific percentage of uniqueness for our rewrites. This is because successful rewriting is a function of a number of factors: total article length; average sentence length; number of keywords included; topic of the article (articles that cite a lot of statistics or fact-based information are more difficult to spin); and overall quality of the original article.
And although 70% uniqueness is an “ideal” goal to shoot for, acheiving this in reality is quite difficult on a consistent basis. However, even 50% unique is likely more than enough (at least in my experience) to escape any duplicate content penalty. Remember, it’s also unlikely that the rewritten text is the only text that will be on the page, so this further decreases the level of uniqueness you need to acheive.
Comment by Paul — August 30, 2007 @ 7:56 pm
Interesting information. I was just wondering if anyone knew anything about the impact of using the blockquote and cite tags in order to reduce the duplication signature. A customer of mine has lots of sites with substantive quotation repeated across them.
One way to get them out is just to use images, but it seems logical that blockquote and site should work too.
Comment by SEO Monkey — November 6, 2007 @ 11:51 am
SEO Monkey,
I believe this post from the “horse’s mouth” (read:Google) can offer you some guidance on what is and isn’t considered duplicate content:
http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html
Thanks, and hope this helps,
Paul
Comment by Paul — November 8, 2007 @ 10:20 pm
Again A good post, I have been wondering this myself. I used to visit copyscape and check some of the pages I had written about SEO, and too many times to count, the who article if not large portions of it would have been justed one other sites advertsing SEO services.
I used to write the the offending websites, and always would get nasty replies from them, so gave up.
It actually occured to me while reding about your service on your website the service you run could be well used for people that submit articles. Instead of submitting the same article 200 times to various article sites, wouldn’t it make more sense to get you to spin say 20 copies, and submit each one only 10 times the results in search engines be better.
I’ve been doing a lot of research on this, so could waffle on for ages… but again, very good post.
I’ve written to you privately before finding your blog, I’m really looking forward to working with you, I can certainly see a lot of uses for what you are doing…
Regards Lynny
Comment by SEO Guru (Self proclaimed) — June 26, 2008 @ 2:50 am
Hi thanks for the nice articles..
I totally agree with you that The Duplicate content filter depends upon a lot of factors. It doesnt work simply like Dupecop that all articles below a certain level of uniqueness are filtered out. It depends upon the quality of surrounding articles, whether the suspicious articles are actually from one single site. A lot of papers have been published online on this.
Comment by Papers — July 9, 2008 @ 2:11 pm