Google releases a new podcast episode discussing the past, present, and future of robots.txt

Kaushal Thakkar is the Founder and MD of Infidigit. He has developed award-winning search strategies for various organizations, ranging from large enterprise and e-commerce websites to small and medium-sized businesses. Before Infidigit, he was leading digital marketing, product, and eCommerce initiatives at Myntra (a Walmart Company), Times Group, ICICI Group, Tata Group. Being an engineer and product manager in his earlier days, he loves to hack growth for websites via technical SEO strategies. He is a speaker at various forums and a Pro bono guest lecturer on Organic Search, Digital Marketing, Analytics & eCommerce. In X @

Home > Digital Marketing News > Google releases a new podcast episode discussing the past, present, and future of robots.txt

Kaushal Thakkar

Google releases a new podcast episode discussing the past, present, and future of robots.txt

Table of Contents

Witness an Increase in your ROI

Unlock higher rankings, quality traffic, and amplified conversions through tailored award-winning SEO strategies.

Google has released a new episode on the “Search Off the Record” podcast discussing robots.txt in Google Search

This episode is the 28th episode of the “Search Off the Record” podcast series. The speakers in the episode are Gary Illyes, Lizzi Harvey, and Martin Splitt from the Google Search team who are joined by special guest David Price, a Legal Director at Google.

Who is robots.txt meant for?

Gary Illyes mentioned that Google’s documentation on robots.txt says that robots.txt is meant for crawlers. However, he also raised the question of “what constitutes a crawler?”.

Special guest David Price replied to this by using an example of “Manual vs Automation”.

And so, often when I get this question, I kind of try and think about that manual versus automated question. Is this something that is sort of aligned with something that a user is doing where you have a person in the loop somewhere saying, “Yes, I want to initiate this thing.” And then you want to make sure that you’re not saying it’s a person who’s clicking start to the crawling program, but rather something where you expect to have people in the loop constantly using something and constantly saying, “Well, now I’m interested in this page or that page.” Very much akin to a browser.

And something that’s not that, something that sort of runs with full automation, that is going to be a robot, and if it operates at Google, then it definitely needs to obey robots.txt, said David Prince.

Do social media “bots” follow robots.txt?

Martin Splitt raised a very interesting point about social media bots. He gave an example of a user posting something on social media with a link to an article. He mentioned that once the post is submitted, the social media platform sends an automated request to the URL. The social media bots use this to fetch the Open Graph (OG) tags to display details on the feed. Martin raised the question of whether the social media “robots” follow the robots.txt directives or not.

David prince responded to this question:

“ It’s a good question. …The user said, “Hey, I want to sort of, in some way, engage with this piece of content on the web.” And it’s kind of natural that you would want to look at that content on the web and understand what it is.

And so the social media case, I think, is a good example where you might say that that is not a robot in the classical sense. Now, maybe that then means that there should be some other level of control or some other set of questions. If you are a webmaster, you say, “Hey, I don’t want this fetch to happen.” But the question of “Is it a robot?” I think the answer is probably not.”

Do requests from Google Merchant Center obey the robots.txt file?

Another interesting question asked by Martin Splitt was whether the requests from Google Merchant Center to fetch information from the pages follow robots.txt.

“What if I, as someone who has an e-commerce shop, am opting in to, say, Google Shopping so that Google Merchant Center can make requests. But then these requests happen somewhat outside of my control, because I say, “Yeah, sure you can do that.” But I don’t really request Google to do specific requests to my website.

And then the robot, so to speak, so Google’s product side has to figure that one out. And I think they’ve… Do they follow robots.txt? I guess they do. I don’t know. Maybe not because I, as a website owner, opted in already. Is that a robot?”, asked Martin Splitt.

David Prince replied to this:

“I don’t think anything would stop a webmaster from going out and saying, “Hey, for this particular specific integration that you and I are doing together, I am cool with you going and doing an automated access to my site, regardless of whatever my robots.txt says.

It might be something where a best practice is to adapt your robots file in that situation to account for whatever you’ve agreed to do, so that you have less sort of out-of-band permissions and denials that are not captured in the file. But that is maybe more of an engineering detail than anything else.”

How was robots.txt born?

At the beginning of the Internet, there were robots that were sort of malicious and wrecked havoc among websites by bombarding them with requests. The robots.txt file was introduced to give the webmaster control over these robots and crawlers.

Here are David Prince’s thoughts on the same:

“Because there were robots, especially in the beginning of the Internet, which didn’t necessarily behave well on the Internet, and they were going rampant and ravaging sites. Perhaps that’s also why robots.txt was born, basically giving people some sort of control over these rampant crawlers.

..I think if you kind of think back to the mid-’90s..the state of the technology then was of reasonably limited bandwidth, reasonably limited server capacity, and so you could imagine that somebody with a computer in their dorm room might be able to impose a pretty significant load on a fairly well-known and prominent set of web services at the time. You didn’t have the kind of galactic scale web service systems that you have today.

And so, I think the concern that one sort of ill-behaved script running on somebody’s box under their desk could cause some serious damage to your ability to serve was, I think, a real concern at the time. And if that happened multiple times, and it was scaled across multiple different robots running all over the web, then I think you have, especially in that technological environment, a pretty significant problem.”

There were a lot of other intriguing things discussed in the podcast regarding the robots.txt file. You can hear the entire discussion on the “Search Off the Record” podcast here.

People are also reading

Digital Marketing News

Google Launches New Search Features in Europe to Enhance User Experience and Compliance

2 min read
Feb 16, 2024
Vivek Chaudhary

Digital Marketing News Search News

Google Diminishes the Visibility of HowTo and FAQ Rich Results in Search

3 min read
Dec 20, 2023
Yahya Punjabi

Digital Marketing News Search News

Google Introduces Generative AI to Enhance Search Experience in India and Japan

2 min read
Sep 01, 2023
Ishika Jain

Digital Marketing News Google Algorithm News Search News

Google Introduces a New Ranking Algorithm Research is TW-BERT

4 min read
Aug 26, 2023
Yahya Punjabi

Digital Marketing News Google Algorithm News Search News

John Muller on the Importance of Text-to-HTML Ratio for SEO

2 min read
Aug 26, 2023
Shritej Mali

Digital Marketing News Search News

If You’re Using AI, It’s Going To Be Rehashed From Other Sites

2 min read
Aug 26, 2023
Shritej Mali

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Google releases a new podcast episode discussing the past, present, and future of robots.txt

Google has released a new episode on the “Search Off the Record” podcast discussing robots.txt in Google Search

Who is robots.txt meant for?

Do social media “bots” follow robots.txt?

Do requests from Google Merchant Center obey the robots.txt file?

How was robots.txt born?

Popular Searches

Leave a Comment Cancel Reply

People are also reading

Google Launches New Search Features in Europe to Enhance User Experience and Compliance

Google Diminishes the Visibility of HowTo and FAQ Rich Results in Search

Google Introduces Generative AI to Enhance Search Experience in India and Japan

Google Introduces a New Ranking Algorithm Research is TW-BERT

John Muller on the Importance of Text-to-HTML Ratio for SEO

If You’re Using AI, It’s Going To Be Rehashed From Other Sites

OUR SOLUTIONS

Website Audit

Ecommerce SEO

Enterprise SEO

Local SEO

App Store Optimization

Conversion Rate Optimization

SEO

Penalty Recovery

Content Writing

Share this article

Google releases a new podcast episode discussing the past, present, and future of robots.txt