Despite all of their efforts to have a great site with content, many site owners don’t realize that they may have problems that are keeping their site from getting crawled. Google can overlook many things but not this BIG THREE:
1. Robots.txt issues
Every site should have a robots.txt file. It’s a very powerful file. Check to see if you have one. In all likelihood, it will be located at [insertyoursite.com/robots.txt], if you have one. For small sites, you may not have to worry about it that much. You can basically set it and forget it. But, for larger sites, it pays to continually watch this file to ensure that it is in line with your site’s goals.
What’s the purpose of the robots.txt file? Ultimately, the goal of this file is to keep various robots away from certain folders and parts of the site.
Why might this be a problem for you? Let’s say your website developer specified in the robots.txt file that Google was not allowed to crawl the site. This is logical while the site is on a dev server. Then, the site got moved to your server – with the robots.txt file telling Google not to crawl your site – you can probably see a problem ahead.
You can easily monitor this by checking your robots.txt file. Or, you can login to Google Webmaster Tools and Edit your crawl blocked URLs functionality. From there, you can see whether you have a problem or not.
2. Noindex tags
Noindex tags can be very helpful if there are some parts of your site that you do not want crawled. This tag looks like this:
<meta name=”robots” value=”noindex,follow” />
If you see this in your source code, you have a problem that is likely keep your site from getting crawled. The “index” portion of this tag is telling Google not to index this page. Thus, the page will not be crawled.
3. Wrong canonical
Duplicate content is a no-no in Google. Duplicate content would exist on different pages that have the same content. This might happen for any number of logical reasons. Because of this, Google created the Canonical tag.
A canonical tag is used to show Google which page you would like to display in the SERPs.
But, beware, using the wrong canonical could impact your rankings.