Google Explains Discovery & Refresh Data in Crawl Stats Report

John Mueller has provided his view on the “Discovery” and “Refresh” metrics of the newly updated Crawl stats report in the Search Console.

The crawl stats report was updated several weeks ago to offer more data which was previously not available on the webmaster.

During the Google Search main live stream on 27th November 2020, an individual asked about the specific section of data, Crawl Purpose and its metrics.

John Mueller was asked to shed lights on the metrics of crawl purpose, i.e. Percentage of Refreshed URLs and the Percentage of Crawled URLs.

The question asked to Mueller

“What’s the difference between discovery and refresh? In our case, it’s showing 84% refresh.

Does that mean 84% of the time Google is crawling known URLs from their database, and only 16% of the time they crawl our site, sitemaps, and links from other URLs from the known URL database?”

Google Search Console help document has been updated in this regard and provides one clear idea of Discovery and Refresh.

Discovery: URL has never been crawled by Google before 

Refresh: The URLs have been crawled before as well

Further, Mueller shared his understanding of the topic and answered the question asked above. 

Mueller’s view on Crawl Purpose Data

“I’m not 100% sure what exactly we would put into each of those buckets, but generally we do split things up into refresh crawling where we try to update the information that we have on a site, and discovery crawling where we try to find new URLs that we’ve heard about from the website. Which could be things like from new internal links or from external links pointing to your website.”

Mueller said that he is not 100% sure which kind of URLs will be grouped in Refresh and Discovery metrics. But he was somewhat in sync with the explanation provided in the Google Help document. 

Discovered URLs are the URLs which were crawled for the first time by Google whereas Refreshed URLs are the ones which are crawled again to update the Google index if the content is changed and also to find newly added links on the webpage.

Furthermore, Mueller says,

“Refresh crawl doesn’t mean that we’re just updating the page’s content, we’re also looking for new links which we can then use for discovering new content.”

Usually, Webmaster may find the percentage of refreshed pages more than the discovered pages in the Crawl purpose as it is obvious.

But there could be some scenarios where the percentage of discovered pages could be higher such as during a site migration, launch of new sites and also while uploading a new sitemap to the website.

If the report does include the rapidly changing pages, then it might be a good idea to have them included in the sitemap.

Popular Searches

SEO Services  |  SEO Audit Services  |  Google Penalty Recovery Services  |  Local SEO Services  |  SEO Agency  |  SEO Consultants  |  Amazon PPC Services  |  Enterprise SEO Services  |  Ecommerce SEO Services  |  ASO Services   |  PPC Services  |  Content Marketing Services  |  Link Building Services  |  Digital Marketing Services  |  Conversion Rate Optimization Services  |  What is On Page SEO  |  What is Off Page SEO  |  Google Ranking Factors  |  Canonical Tags  |  What is PPC

The 21 Best Link Building Tools In 2022

Ankit Thakkar · Apr 26, 2022 · 5 min read

Google Tests New Featured Snippets

Priyanka Kodange · Apr 22, 2022 · 2 min read

People also read

Leave a Comment

Your email address will not be published. Required fields are marked *

Share this article

Google Explains Discovery & Refresh Data in Crawl Stats Report