Google’s John Mueller has recently said that pages with similar URL structures may be considered duplicates. It was discussed in the Google Search Central Hangout on March 5th, 2021 by one of the participants Ruchit Patel. He told Mueller that he manages an event website where thousands of URLs are not being indexed correctly.
Google uses a predictive method to identify duplicate content based on URL patterns on the web while crawling websites. Such assumptions can lead to pages wrongly being tagged as duplicates.
As we already know, Google needs to crawl the web to index and serve the content to the users. In this process, it uses different methodologies to optimize its crawl to become more efficient. One of the ways is by predicting if the pages contain duplicates or not with the URL structure’s help.
While crawling, if Google encounters a URL pattern with similar content, then it assumes all other URLs with the same pattern might contain similar content. This method might be efficient for Google; however, for site owners, this could mean their unique content might get classified as duplicate because of their same URL pattern. These pages will be left out of Google’s Index.
Mueller on Predicting Duplicate Content
Google has different ways of determining when the pages have duplicate content. One is by analyzing content on the website, and the other is by predicting duplicate pages based on URL patterns.
“What tends to happen on our side is we have multiple levels of trying to understand when there is duplicate content on a site. And one is when we look at the page’s content directly and we kind of see, well, this page has this content, this page has different content, we should treat them as separate pages.
The other thing is kind of a broader predictive approach that we have where we look at the URL structure of a website where we see, well, in the past, when we’ve looked at URLs that look like this, we’ve seen they have the same content as URLs like this. And then we’ll essentially learn that pattern and say, URLs that look like this are the same as URLs that look like this.”
Mueller further adds that Google does this to save resources while crawling and indexing. If Google thinks a page is a duplicate version of the other page with a similar URL structure, then the duplicate page would not be crawled by Google to check the content on that page.
“Even without looking at the individual URLs, we can sometimes say, well, we’ll save ourselves some crawling and indexing and just focus on these assumed or very likely duplication cases. And I have seen that happen with things like cities.
I have seen that happen with things like, I don’t know, automobiles is another one where we saw that happen, where essentially our systems recognize that what you specify as a city name is something that is not so relevant for the actual URLs. And usually we learn that kind of pattern when a site provides a lot of the same content with alternate names.”
Mueller’s Answer to the participant for Event Websites
Mueller explained how the predictive method of Google would have affected the event’s website.
“So with an event site, I don’t know if this is the case for your website, with an event site it could happen that you take one city, and you take a city that is maybe one kilometer away, and the events pages that you show there are exactly the same because the same events are relevant for both of those places.
And you take a city maybe five kilometers away and you show exactly the same events again. And from our side, that could easily end up in a situation where we say, well, we checked 10 event URLs, and this parameter that looks like a city name is actually irrelevant because we checked 10 of them and it showed the same content.
And that’s something where our systems can then say, well, maybe the city name overall is irrelevant and we can just ignore it.”
How to Fix the Problem?
Mueller suggested to the webmasters to limit duplicate content on the website and correct the issue where there are real issues of duplicate content.
“So what I would try to do in a case like this is to see if you have this kind of situations where you have strong overlaps of content and to try to find ways to limit that as much as possible.
And that could be by using something like a rel canonical on the page and saying, well, this small city that is right outside the big city, I’ll set the canonical to the big city because it shows exactly the same content.
So that really every URL that we crawl on your website and index, we can see, well, this URL and its content are unique and it’s important for us to keep all of these URLs indexed.
Or we see clear information that this URL you know is supposed to be the same as this other one, you have maybe set up a redirect or you have a rel canonical set up there, and we can just focus on those main URLs and still understand that the city aspect there is critical for your individual pages.”
It is always good to reduce duplicate content on the website. However, it is worth noting Google does not penalize or have any negative ranking for websites with duplicate content.
Internet Marketing Service | App store Optimization Agency | E-commerce SEO Service | SEO Services UK | Link building services | SEO Audit Services | Google Penalty Recovery Services | SEO Agency in UK | Local SEO Services | PPC Consultant | Enterprise SEO Services | SEO consulting Service | Amazon Advertising Agency | Professional SEO Agency | Conversion Rate Optimisation Agency UK | Content Marketing Services | SEO Guide | Canonicalization | Google for Business Listing | YouTube SEO | Website Navigation | Google Tag Manager | What is AMP | what is Google AdWords | What is PPC