The ‘duplicate content’ myth

No, Google will not ‘penalize’ your photos if your descriptions of them contain quotes from Wikipedia.

Those of us that sell photos online, as stock or wall art, know the importance of keywords and ‘descriptions’. Keyword search, on a POD site or via Google, is for many of us the main way we hope to be found. So we want Google to like our descriptive text, and use it to index and rank our photo pages .

For subjects like historical landmarks, insect species, or sea shells, descriptions need to include important nomenclature and details that people might be using as search terms. The logical source for these is Wikipedia.

For years, the claim has circulated online that Google detects outright copies of Wikipedia text, and penalizes the search rank of pages or images that contain them. That’s just not true. It’s not even possible.

You’ll see people claiming that Google “knows what its seen before” and can consequently “recognize” quotes from well known sources like Wikipedia. But Google isn’t some super-intelligent AI with a vast memory that it can instantly access; it’s just code, and a database. The only way its crawler could “recognize” plagiarized text would be by literally comparing it on the spot against every other page on the web. So that’s not happening.

Even assuming that Google, for some reason, wanted to specifically protect Wikipedia’s copyrights, it’s still an overwhelming task, as Wikipedia currently contains about 6 million articles and 3.67 billion words. The crawler code could try to cut this down by guessing the “subject” but that would misfire more often than not. And it would have to do this every time it crawls a page.

Google isn’t interested in spending their CPU time (i.e. money) trying to nail copycats. It has no interest in protecting someone’s copyright unless forced to by legal action. What it does have is an interest in crawling and parsing zillions of web pages as quickly as possible, and using that information to sell ads. They do want their search results to have quality, with the best results at the top; to that end there is apparently some checking for duplication in the search results, with the goal of putting the original content at the top and pushing down any copies. But Google doesn’t care if you crib from another site – although maybe the owner of that other site does…

Anyway, it’s perfectly legal to use text from Wikipedia – it’s an encyclopedia, it exists to dispense knowledge. You just have to follow some rules regarding licensing and attribution. Unfortunately, though, these rules are far from simple – as Wikipedia makes clear in an article about its copyrights – and revolve around that “Creative Commons” stuff that nobody understands because it makes your eyes glaze over in 10 seconds. So in reality, Wikipedia’s material ends up all over the place, often with no attribution at all. And Wikipedia probably doesn’t really care, unless there’s “malicious intent”.

The belief that Google punishes copying from Wikipedia is part of a larger mythology about “duplicate content”. Again, this isn’t something the crawler is going to try and detect, it only matters at the time search results are displayed. And the problematic duplication is within one domain, not across domains. A good article debunking this myth is found here. Neal Patel, the SEO guru, also dismisses it. And if that isn’t enough, here is a statement from Matt Cutts back when he was head of “search quality” at Google; he basically says that quoting Wikipedia won’t help your rank, but says nothing about a penalty.

Leave a Reply

Your email address will not be published.