Skip to main content

Tech Tip Tuesday: What is a Robots.txt and would I need to implement it for my site?

by JonJon Yeung | 14.05.2013
All Aboard! Tech Tip Tuesday, Captain JonJon highlighting once again about technical SEO issues & FAQ’s we come across with you…

“What is a Robots.txt and would I need to implement it for my site?”

Last week we talked about redirections, this week we will be talking about utilising Robots.txt file.

Benefits of a Robots.txt file and what they are used for…

Ideally in the SEO world, we are great cooks; most ever welcoming hosts and we certainly love inviting search engine spiders to attend our site and feast on our pages. We would love them to crawl all over our pages and be impressed with what we have cooked up, so that they will eventually give us as much thumbs up as possible and list us on their billboard search engine index for our keywords.

Having a site crawled is a great opportunity to show off our site and impress these judges, but then again their might be a few places we don’t want them visiting and this is where a Robots.txt comes in place.

A Robots.txt file is a room which houses certain URLs that indicates to search engine spiders to not crawl upon. No juice or equity will be passed along when blocked using a Robots.txt.

It is not advised to use a Robots.txt for disallowing duplicate content as you can always use a “Rel Canonical” tag instead. It is also worth a mention that anything disallowed will not prevent them from showing up on search results which may lead to “suppressed listing”. This means that Google’s search engine spiders will have no access to content which are blocked using “disallow,” hence they will have no information or snippets regarding this URL. When this URL comes across a point where there is a possibility of being linked to and being displayed on Google’s Serps (Search Engine Results Pages) then this will appear to be a bad user experience. Note: disallowing a URL will not prevent it from gaining link juice & equity, but blocking this URL will suppress it and prevent this URL from passing valuable equity. As referred to previously, it is recommended to use alternative methods for duplicate content issues.

Robots.txt is a good way of letting search engine spiders know which parts of the site you want to exclude from crawl,or alternatively, will highlight to them specifically where i.e. your sitemap is located. Blocking URL’s with a robots.txt does not prevent URL’s from displaying on Serps, it prevents crawls. It is also advised to have a Robots.txt file in place in comparison to none at all.

 

Ahoy fellow passengers! Follow us & stay tuned next week for more information on alternative methods… and every Tuesdays for more technical issues reviewed by Tug.

Why not like us on Facebook too?