Google kickstarted the new year 2022 by releasing a new robots tag on January 21, 2022. Release of new Robots tag “indexifembedded” gives more power to bloggers. With the use of this new robots tag, now the blogger can decide whether he/she wants to use their URL to be index or noindex.
If you are hearing this for the first time about the meta robots tag, or often get confused between robots.txt, meta robots tag and x robots tag.Then this blog is for you!
In this article we will let you know about the robots.txt, meta robots tag and x robots tag. When we should use robots.txt, meta robots tag and x robots tag. Also we will dive deep into this new robots tag update by Google.
What is robots.txt?
Robots.txt, also known as Robots Exclusion Standard or Robots Exclusion Protocol. It is a simple text file placed on your web server which is used to tell search engine crawlers whether to crawl your website page or not.
User-agent: [Search Engine Crawlers or robots]
Allow: [Allow the URL to be crawled]
Disallow: [Disallow the URL to be crawled]
Crawl-delay: [How many seconds should a crawler wait before crawling a page content ]
Sitemap: [Sitemap of the Website]
Note: Robots.txt file of a website can be seen by anyone on the Internet.
To see a website’s robots.txt file type the following –
“Website’s name” + “/robots.txt”
How does the robots.txt file work?
When a Search Engine Crawler (user-agent) visits a website it first overviews for robots.txt file. If the website has robots.txt file, it visits the other pages of the website as directed in the file i.e. it only crawls the pages that are allowed.
When to use robots.txt?
Robots.txt files are used when we want some particular pages to be not shown over the Search engine or outside the website. For example, We will not discover the “Your Cart” page or “Make the Payment” page of an eCommerce website on the Search engine.
Points to Remember –
- robots.txt is case sensitive. In other words, the file should be named robots.txt. It should not be named as ROBOTS.TXT, ROBOTS.txt, Robots.TXT etc.
- Some user-agents will ignore your robots.txt file. These are normally malicious robots like email address scrapers.
- It is considered a good practice to put the absolute URL at the bottom of the robots.txt file.
What is Meta Robots Tag?
Meta Robots Tags are HTML tags which are placed in the top of the page and instructs search engine crawlers whether to crawl the content of the web page or not.
<meta name=”robots” content=”[PARAMETER]” />
Noindex: [Search engine not to index a page]
Index: [Search engine to index a page]
Follow: [Even if the page isn’t indexed, the crawler should follow all the links on a page and pass equity to the linked pages]
Nofollow: [Not to follow any links on a page or pass along any link equity]
Noimageindex: [Not to index any images on a page]
None: [Equivalent to using both the noindex and nofollow tags simultaneously
Noarchive: [Search engines should not show a cached link to this page on a SERP]
Nocache: [Same as noarchive, but only used by Internet Explorer and Firefox]
Nosnippet: [Search engine not to show a snippet of this page (i.e. meta description) of this page on a SERP]
Unavailable_after: [Search engines should no longer index this page after a particular date]
How does the Meta Robots Tag work?
Meta Robots tags works as an additional protection layer for pages which we added in the robots.txt file. As stated above, malicious user-agents often crawl and index the URLS that are disallowed in the robots.txt file. But by using Meta robots tags we add an additional protective layer to such pages.
When to use Meta Robots Tags?
Meta Robots Tags are used to control and instruct the indexing and no indexing of the website page.
Points to Remember –
- Robots Meta Tags directives are not case sensitive. It means <meta name=”robots” content=”noindex,follow” /> is the same as <meta name=”robots” content=”NOINDEX,FOLLOW” />.
- Robots Meta Tags directives should be seperated by commas and there should be no space after comma. For example, <meta name=”robots” content=”noindex,follow” /> is the correct way to write.
What is X Robots Tag?
X Robots tags is a part of the HTTP header and it is used to instruct search engine crawlers about the indexing and no indexing of the whole website and specific non HTML elements of a page.
Meta Robots tags directive can be used with X Robots Tags. X Robots Tags are placed in the website’s header.php file.
When to use X Robots Tags?
X Robots tags are used for controlling the indexation on website level. It is also used to control the non HTML elements like PDF, Videos or Images.
We have explained to you the definition and usage of robots.txt, meta robots tag and x robots tag. Now we will tell you about the New Robots Tag Update by Google.
New Meta Robots Tag by Google
Search Engine Google released its new Meta Robots Tag Indexifembedded. It gives more power to media publishers who want to index their content and not the whole page.
<meta name=”googlebot” value=”noindex” />
<meta name=”googlebot” value=”indexifembedded” >
You can also use this in the HTTP header:
X-Robots-Tag: googlebot:noindex, indexifembedded
Currently, Google is the only Search engine using this meta robots tag. This means your content will only be visible on google using this robot tag.
We hope this post about New Robots Tag by Google was useful to you.
For more Informational content, please stay connected with us.