A couple of common questions among those who use article spinning to boost their article marketing efforts are: “How ‘unique’ does my content need to be? Is there some ‘magic formula’ that I can use to figure out the percentage of similarity that will safely escape the search engines’ duplicate content filter?” Well, as it turns out, there just may be.
The real question central to solving this problem is not “How dissimilar do my articles need to be?”, but rather, “How dissimilar from one another, on average, are so-called ‘unique’ articles that have been indexed by the search engines?” Most people may fail to consider this subtlety, erroneously assuming that most unrelated, non-spun content hovers around 0% similarity. However, once we know what percentage of content-sharing is typically “allowed” by the search engines, we then have a goal to shoot for when authoring spun or rewritten content.
To come up with some useful percentages as guidelines, I ran some duplicate content tests using DupeCop’s wonderful and free online duplicate content checker, available here.
The content for these tests was retrieved from Wikipedia’s Featured Articles section. To begin, I selected excerpts of relatively equal length from two seemingly totally unrelated articles: the first titled “An Experiment on a Bird in an Air Pump,” and the second about “Action Potential” (no, this isn’t about your plans for Saturday night
). Surely these two totally distinct and dissimilar articles would share almost no “duplicate content,” right?
WRONG. Surprisingly, DupeCop showed that these two articles were only 80.2% unique, meaning that they “shared” approximately 19.8% of their content. To make sure this wasn’t some sort of fluke, I ran another test, this time comparing an article on the New Orleans Mint with one about Windows NT. The results were similar: The articles were only 83.3% unique, sharing almost 17% content.
OK, that’s a good start, but what about articles on the same topic? To test similarity between topically-related articles, I compared excerpts from a write-up on Influenza to another on Pneumonia. Not surprisingly, these articles came in slightly lower at 74.9% uniqueness, sharing around 25% content.
So what does this all mean? Well, it appears that for us article spinners out there, any two pieces of content will likely avoid the duplicate content filter (and subsequently be indexed) if they are at least 75% unique. However, this is most likely a low estimate, given the social nature of the web, and its encouragement (especially with the emergence of powerful social networks) of sharing “snippets” of interesting content with one another. It’s highly doubtful that one would be “penalized” for sharing a snippet of content from another site on one’s own site, as long as it is surrounded by sufficiently unique content.
In my experience, anything under 30% similarity is highly unlikely to be sucked into oblivion by the duplicate content filter. So, now all you duplicate content fanatics have a concrete number to shoot for, based on (albeit, a tiny) real-world sampling.
Happy Spinning,
Paul
Bookmark This!


Paul does this mean your team of spinners99.9% of the time spin each article that is sent in to AT LEAST 70% unique as you just won’t get sucked into the Google abyss?
- Thanks
Comment by Ben — August 30, 2007 @ 6:11 am
Thanks for your question, Ben.
Unfortunately, we cannot guarantee a specific percentage of uniqueness for our rewrites. This is because successful rewriting is a function of a number of factors: total article length; average sentence length; number of keywords included; topic of the article (articles that cite a lot of statistics or fact-based information are more difficult to spin); and overall quality of the original article.
And although 70% uniqueness is an “ideal” goal to shoot for, acheiving this in reality is quite difficult on a consistent basis. However, even 50% unique is likely more than enough (at least in my experience) to escape any duplicate content penalty. Remember, it’s also unlikely that the rewritten text is the only text that will be on the page, so this further decreases the level of uniqueness you need to acheive.
Comment by Paul — August 30, 2007 @ 7:56 pm
Interesting information. I was just wondering if anyone knew anything about the impact of using the blockquote and cite tags in order to reduce the duplication signature. A customer of mine has lots of sites with substantive quotation repeated across them.
One way to get them out is just to use images, but it seems logical that blockquote and site should work too.
Comment by SEO Monkey — November 6, 2007 @ 11:51 am
SEO Monkey,
I believe this post from the “horse’s mouth” (read:Google) can offer you some guidance on what is and isn’t considered duplicate content:
http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html
Thanks, and hope this helps,
Paul
Comment by Paul — November 8, 2007 @ 10:20 pm
Again A good post, I have been wondering this myself. I used to visit copyscape and check some of the pages I had written about SEO, and too many times to count, the who article if not large portions of it would have been justed one other sites advertsing SEO services.
I used to write the the offending websites, and always would get nasty replies from them, so gave up.
It actually occured to me while reding about your service on your website the service you run could be well used for people that submit articles. Instead of submitting the same article 200 times to various article sites, wouldn’t it make more sense to get you to spin say 20 copies, and submit each one only 10 times the results in search engines be better.
I’ve been doing a lot of research on this, so could waffle on for ages… but again, very good post.
I’ve written to you privately before finding your blog, I’m really looking forward to working with you, I can certainly see a lot of uses for what you are doing…
Regards Lynny
Comment by SEO Guru (Self proclaimed) — June 26, 2008 @ 2:50 am
Hi thanks for the nice articles..
I totally agree with you that The Duplicate content filter depends upon a lot of factors. It doesnt work simply like Dupecop that all articles below a certain level of uniqueness are filtered out. It depends upon the quality of surrounding articles, whether the suspicious articles are actually from one single site. A lot of papers have been published online on this.
Comment by Papers — July 9, 2008 @ 2:11 pm
Hello!
Very Interesting post! Thank you for such interesting resource!
PS: Sorry for my bad english, I’v just started to learn this language
See you!
Your, Raiul Baztepo
Comment by RaiulBaztepo — March 28, 2009 @ 7:23 pm
TESTING DUPLICATE CONTENT - I did these tests using Spin baby spin and dupecop and webconfs.com similar-page-checker
QUALITY OF SPINNING
I had part of an article that has sentence ({sentence 1v1|sentance1v2|sentance1v3} with 2 words in each sentence with 2 spin options) and word spinning and another that had only sentence spinning. ({sentence 1v1|sentance1v2|sentance1v3}
My threshold was 30% and above of unique content on Dupecop.
If I only spun 100 articles the extra word spinning was not needed.
If I spun 500 then 37% of the article that had no extra word spun would be bellow the threshold compared with 0.5% for the text with the extra word spinning
If I spun 3000 article the same thing happened but it was worse – no extra word was so dup that I could not count it all while only 14 of the 450 combinations that I tested had dup content below the threshold when I had the extra word spinning.
TESTING RE-SPINS
I wanted to know if I used Spin baby spin and if I spun the same article to get 25 versions and then spun it again in another folder 25 more time would version 1 of the first spin be the same as version 1 of the second spin. The answer is yes. All 25 versions where identical each time.
TESTING DUPECOP SPUN
I took 10 articles that where never spun and completely different and tries to see what DUPECOP would say was the uniqueness of them – each article ranged at 65 to 81% unique and the average was 71% unique. They should have come up at 100% unique. So somehow I think that dupecop is too low in its calculation.
TESTING ON SAME AND DIFFERENT DOMAIN
What I did was make 1000 spins of sentence + word spin (about 2 per sentence) – I will call this extra; and did 1000 spins of sentence spin (but no extra word spin) – I will call this noxtra.
I took 10 extra articles and placed them on site a with the normal site A template and then place 10 noxtra articles and placed it on the same domain (site A). I also placed the original article (no spinning) on the site. I then used this tool to check for duplicate content: http://webconfs.com/similar-page-checker.php
What I found is that
1) the extra articles vs original was on average 63% similar; the noextra vs the original article was on average 75% similar.
2) An extra article vs the other extra article was on average 69% similar; an extra article vs the noxtra articles was 74% similar
3) An noxtra article vs other noxtra articles was 75% similar.
Conclusion if you are going to place article that you have spun on the same site with the same template, you better do the extra word spinning.
I then placed 3 extra article on another domain(Site B) with that template, placed 3 noxtra articles on Site B and the original article. I then used the webconfs tool to check on the similarness of the articles on Site A vs Site B.
This is what I found:
• SiteA-extra14 vs SiteB-extra14 = 55% similar (same article on both site – no spinning)
• SiteA-extra14 vs SiteB-extra214 = 36% similar
• SiteA-extra14 vs SiteB-extra533 = 33% similar
• SiteA-noxtra14 vs SiteB-extra14 = 28% similar
• SiteA-noxtra14 vs SiteB-extra214 = 22% similar
• SiteA-noxtra14 vs SiteB-extra533 = 31% similar
• SiteA-noxtra14 vs SiteB-noxtra14 = 63% similar
• SiteA-noxtra14 vs SiteB-noxtra214 = 44% similar
• SiteA-noxtra14 vs SiteB-noxtra533 = 45% similar
Conclusion: when posting article on different sites like article directories it seems that using extra or noxtra method of spinning would work.
Conclusion: it is worth the extra money to do the extra word spinning and you are better spinning let’s say 1500 articles and then separating the first 500 for project A (Article submission to directories) and the second 500 article for project B (blog solutions rss) and the last 500 articles to project C (create a PR pumper site.) If you only do article marketing and use article post robot to post different spun article on each directory it does not seam necessary to have to pay the extra money for the extra word spinning. If you want to create PR pumper site and place more than one copy of a spun article on the same domain you should do the extra spinning.
Comment by Gregory — June 11, 2009 @ 1:03 am
If anyone else has tested dup content please contact me at gregory (@) vanduyse.org - I am interested in this.
Comment by Gregory — June 11, 2009 @ 1:10 am
Quick question, I have several websites that have duplicate content on them. Some sites have as much as 50 post while others may only have approximately 20 or so.
(I asigned up to receive free content as a way of building up my sites.)
I have been searching for a solution to fix the problem apart from having to rewrite all of the content. All in all, we are talking about 200 or more pages.
What do you reccomend as a way to fix the duplicate content? I don’t want to delete the pages because they are indexed and that may affect my rankings. On the other hand if I leave the duplicate content on my site, it may eventually affect my rankings too.
I know I have to fix all of that duplicate content before it affects my site.
Comment by Darren — August 8, 2009 @ 10:00 am
Darren,
2 quick fixes may be to either “dilute” the duplicate content by adding more unique content to one of the duplicated pages, or if possible, remove parts of the duplicated content, but leave enough so readers can still grasp the intended message.
Another suggestion is to try to incorporate a way for your readers to add new content to the page (e.g., commments) to help dilute the content.
However, if your pages are already indexed, they may be fine as is. But periodically updating your pages is a good idea anyway, as it encourages Google to crawl your site more frequently.
Comment by Paul — August 8, 2009 @ 11:06 am
Hi! I was surfing and found your blog post… nice! I love your blog.
Cheers! Sandra. R.
Comment by sandrar — September 10, 2009 @ 8:57 am