About WordPress’ Robots.txt File

About WordPress’ Robots.txt File

robots.txt facts

Setting up a robots.txt file for your website.

You may have heard of robots.txt, but what is it exactly and what does it do?

This file is something that web programs, called crawlers or bots, look at when they first examine your site. Google uses bots to explore and understand the websites around the world and make decisions about how to index them. That's probably the most significant example of the types of bots exploring your website. When these bots view the site's robots.txt they can find various commands or cues about how to view the site.

One notable example of where this is used is when users check a box under WordPress' Tools->Reading options called "Discourage Search Engines From Indexing This Site". When this is checked WordPress adds a command to the site's robots.txt file that tells crawlers not to index the site. This is often used while the site is in development where one wouldn't want the public seeing it until it's done. After the site is launched, this option can be un-checked and disabled.

What else can it do when my site is live and indexed?

You can still designate specific directories or URLs that you don't want indexed. One setup that's been recommended by a lot of webmasters is the following:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-admin/customize.php

This list specifies several important WordPress directories that you site uses to display correctly, but that there is no reason outsiders should be able to view directly. Also not directories one wants search engines to index for the same reason, as a security measure.

You can also point crawlers to the URL of your sitemap with this line:

sitemap: your-domain.com/sitemap.xml

Note that depending on the plugins you're using the exact URL of the sitemap may vary. The one the Yoast SEO plugin generates ends in "/sitemap-index.xml" for example.

How do you use robots.txt and where do you put it?

By default WordPress generates the robots.txt dynamically when humans or bots try to access the URL (your-domain.com/robots.txt). Generally WordPress' dynamically generated file contains the first three lines of the above example. WordPress will use its own version of the file in the absence of a physical file on place.

If you want to modify your site's robots.txt file, you'll have to create the text file yourself and upload it to the root directory to your site via FTP. (Filezilla is a good free solution for FTP.) When WordPress detects this file, it will defer to whatever it contains rather than dynamically generating one. You can use any standard text editor, such as Notepad (in Windows) to create or edit the file.

If this was helpful to you, leave us a comment and share how you've snapped it into your website!

Further reading:

Leave a Comment

You must be logged in to post a comment.