December 28th, 2016
Search Engine Optimization
There’s a reason companies seek out experts in search engine optimization (SEO) to help market their websites — the field is extremely technical, and requires a lot of knowledge in both the backend of a website, and the functionality of a search engine. As users, we often take for granted the ability to search a phrase and come up with myriad relevant results; it’s easy to forget there are a lot of pieces working together behind the scenes to enable us to successfully search the web.
When you type a phrase into a search engine like Google, do you ever wonder how the search engine actually finds the websites you see in the results pages? While keywords are one of the most important factors, the setup of each website is also vital to ensuring the search engine can find them. This is one thing SEO experts specialize in — optimizing your company website so that search engines will see it, and add it to the results page when a user searches for a related topic.
One of the ways SEO specialists communicate with search engines is through Robots Exclusion Protocol (also called robots.txt) — a specific web page that tells search engines what they can index for their results. Today, we’re going to walk through the basics of the robots.txt file, and provide steps on how to add one to your own site.
Search engines like Google rely on programs called web robots (a.k.a. spiders, crawlers, or wanderers) to explore and scan page after page of every website. These spiders start on one page of your website, and move through every link until they’ve covered the whole website. Before spiders start this process, they go to your site’s robots.txt file, which lists which URLs the search engine is allowed to index (i.e. include) in their results. In other words, including a robots.txt file in your website is like giving instructions that tell the search engine what it can include in its results.
In today’s competitive environment for search engine rankings, a robots.txt file helps your website in several ways. If you have multiple repeats of the same page within your website — whether on purpose, or unintentionally — these duplicate pages can hurt SEO rankings. In your robots.txt file, you can include an instruction to exclude one of the two duplicates, and voila; the search engine will only scan and “see” one.
Keep in mind that search engines like Google have what’s called a “crawl budget,” meaning spiders will only crawl through so many web pages of your site. Fortunately, robots.txt files can be modified to include (or exclude) pages. This is great if your domain is massive, and your website has tons of subpages, as it allows you to control which pages can be pulled into search results. For instance, if you’re working under the limited crawl budget, you may choose to allow your five main product pages to be pulled by search engines, but leave out your admin backend page, which will likely be less relevant to consumers searching.
Spiders look for the robots.txt file by simply adding “/robots.txt” to your site’s root directory, to see if you have one. For example, if your site’s root directory is “www.mywebsite.com,” the spiders will look at “www.mywebsite.com/robots.txt,” to see if you have a robots.txt file.
Therefore, you should place your site’s robot.txt file at the URL that makes sense. For most sites, this is the main welcome page (index.html). If the file is not in this main directory, spiders will not see it, and they’ll go ahead indexing everything on your site without your specific instructions.
To start writing your robots.txt file, you can create a text file manually, with programs like notepad or WordPad — or, you can use one of many online generators available on sites like SEOBook or RobotsGenerator. The process for uploading the file to your website’s root directory will vary by web service software (though popular sites such as WordPress have easily accessible instructions online). If you manually created or adjusted your robots.txt file, remember to validate the file before including it on your site. Free tools to help you validate your file include Robots.txt checker and the free text validator Google provides in its search console.
Adding a robots.txt file to your site is one thing, but optimizing it to ensure search engines will get the most out of it can take a bit more effort. In order to maximize the benefits of your robots.txt file, keep in mind these tips and best practices:
Remember, there are many factors that contribute to high search engine rankings. While including a robots.txt file definitely helps, it’s only one piece of the puzzle. High-ranking websites need to combine their technical, backend know-how with effective, relevant, and authoritative content.
Google is infamous for adjusting its search engine algorithm — and for good reason. In an ideal world, people would be able to find the exact website they’re looking for in seconds. Someday perhaps, that will be a reality. Until then, search engines rely on spiders and other programs to help ensure users find what they want. Robots.txt files are just one way you can make the process a little bit easier, by helping search engines quickly identify the important and unimportant pages of your site.
Author Bio: Seth Patel is a Marketing Executive at Main Path with more than 10 years of digital marketing experience. He’s worked with everything from SMBs to Fortune 500 companies in industries ranging from retail to hospitality to e-commerce and even a Presidential campaign. When he’s not breaking down data sets, you can find him at one of San Diego’s famous local craft breweries or hiking one of the many scenic trails of Southern California.
There are no comments yet.