Robots.txt is an important part of the technical SEO element.
Ever wonder a single wrong syntax code in your robots.txt file, could remove all of your pages from search engines index.
As the instruction is given in the robots.txt file, followed by most of the search engines.
So we must use robots.txt correctly, thus all your website, important pages could be crawled by search bots easily.
Thus optimizing the robots.txt file is one of the crucial parts in on-page seo.
So if you’ve found robots. txt file optimization challenging, you can make it work, and I’ll show you how.
What is a robots.txt file?
Robots. txt file is a text file placed on the root directory of the domain. Robots .txt file command the search engine bots which pages to crawl and which not to.
As robots. txt is the first file that crawler visit while visiting your site.
And first, it inspects all the instructions given in the robots .txt file.
Then it starts crawling the website by following given instructions in the robots .txt file.
You can also refer a sitemap URL in the robots .txt file it makes it easy for bots to find all the pages on your site.
Do you need robots.txt file?
Robots .txt file is not mandatory but you should keep it on your website.
Why?
To keep unimportant pages away from crawling search engine bots and it’s SE index.
Mainly search engines can crawl and index all of your website pages easily.
But as a results search engines can crawl those pages as well as, which are not important or which you don’t want to crawl.
Reasons you should have a robots.txt file on your site
#1. Block Private Pages
Such as login page, dynamically generated content and even block content is by folder as well.
#2. Improve Crawl Budget–
Block useless pages through robots file that way search engine bots spend more time crawling useful and valuable pages and as a result, it will improve your website crawl budget.
Why the robots.txt file is important
#1. Robots .txt file prevents appearing unimportant pages in search engine results.
#2. Robots .txt file helps your website in crawl budget by blocking unimportant pages so that crawlers can spend most of their time crawling important pages only.
#3. Robots .txt file helps your website blocking spam bots from crawling your website as well.
How to create robots .txt file manually?
#1. Open the Notepad enter the below syntax in the notepad file and save as robots.txt.
#2. robots.txt file syntax –
User-agent: *
disallow:
2. Make sure the filename should be exact (robots .txt) in lowercase.
Now your robots .txt file is ready.
Where to upload robots.txt file?
1#. Log in to your Cpanel and click on file manager.
2#. Now click on upload.
3#. Once uploaded now header over to your Cpanel in the root domain directory.
And you will see robots. txt file as you’re seeing in the given below example.
4#. Test your robots. txt file on the live domain as well.
In the browser enter your domain and behind it add robots .txt.
Example – https://digitalpankaj.me/robots.txt
And you will see your robots .txt file like this.
How to Create a Robots.txt File in WordPress?
Log into wordpress and if you installed the Yoast SEO plugin.
Now on the left side click on SEO >> Tools.
On the next page click on File Editor.
And click on create a robots. txt file.
Enter the below code and click on save.
And click on save.
How to Test your Robots.txt file?
Log in to search console and it will ask you to select the property.
Select the property and you will be redirected to robots .txt Tester section, in search console.
It will automatically fetch your robots .txt file in the search console. Now you can also test your robots .txt file here for errors. Below the domain enter the robots .txt and click on test.
Robots.txt Syntax –
Robots .txt must be a UTF-8 encoded text file (which includes ASCII). Using other character sets is not possible. Here is the list of all the User-Agent Robots.
Ready?
I’ll walk you through the whole process.
So here we go.
Robots.txt file Basic format:
User-agent: * Disallow:
User-agent:
This is the first line of the rule in robots .txt. This line of syntax tells for which crawlers you would like to set the permission.
Star *
Star Means for All Search Engine Bots.
If you want to set up permission for any particular bots.
Then just add that particular bots name in front of user-agent
Disallow: /
If you add / after the disallow then it tells the bots that you’re not allowed to visit any page.
User-agent: * Disallow: /
And if you remove the / behind the disallow then all the bots have permission to visit the pages of the website.
User-agent: * Disallow:
Allow:
It tells the search engine bots that you can access the file in a folder that has been disallowed.
User-agent: * Disallow: /photos Allow: /photos/mycar.jpg
Sitemap:
To help google discover urls that you want to crawl and index in search engines.
User-agent: * Disallow: / https://example.com/sitemap.xml
You know every search engines have their own crawlers & bots.
For example –
Google User-Agent
Googlebot
Bing User-Agent
Bingbot
Yahoo User-Agent
Slurp
Here you can find a list of all the search engine bots.
Here are some common useful robots.txt rules:
Robots.txt Disallow
User-agent: * Disallow: /
Robots.txt Allow
User-agent: * Disallow:
Blocking a specific web crawler from accessing all content
User-agent: googlebot Disallow: /
Blocking a specific web crawler from a specific folder
User-agent: * Disallow: /example-subfolder/
To allow a single robot
User-agent: Googlebot Disallow: User-agent: * Disallow: /
Block all images on your site from Google Images
User-agent: Googlebot-Image Disallow: /
Note – For subdomain make sure to create separate robots .txt file.
How to Use Wildcards in Robots.txt
Use robots .txt to allow or exclude specific urls from search engines.
For this robots .txt, use pattern matching for the urls.
And with the help of using wildcards in robots .txt, you can do it easily.
There are two types of character –
* Wildcard :
The * wildcard character is used to match any sequence of characters.
Block search engines from accessing any URL that has a ? in it:
User-agent: * Disallow: /*?
$ wildcards:
The $ wildcard character is used to denote the end of a URL.
Block search engines from crawling any URL that ends with “.pdf”
User-agent: *
Disallow: /*.pdf$
Always Validate your robots .txt Changes Before Making It Live.
Robots.txt Testing Tool
Use the below tools to double-check your work.
Robots.txt file Generator
If you don’t want to take the load to generate the robots .txt file yourself. No worries you can easily create a robots.txt file with the help of robots .txt generator tools online.
Here is the list of the tools that you can use to generate your robots .txt file.
Meta Robots vs Robots.txt
You can use robots .txt to block pdf, multimedia resources, images, videos but meta robots are complicated to implement on these pages & files.
Robots .txt is a better option for excluding the useless pages to improve the crawl budget.
As meta directives are easier to implement & if implemented wrongly they only affect that particular page but a wrong syntax of code in robots .txt can block the entire website from crawling.
So I would recommend using the meta directives instead of robots .txt.
Let me know in the comments how helpful is this post for you or did you learn something from this.