Crawl errors have been found to occur when a search engine fails to reach a page on your source or website. But crawling is basically the process when the search engine attempts to open every page of the website by a bot.
This bot helps in finding all the links to your website and figures out public pages. Essentially, the crawling process ensures that all the pages are crawled and indexed while also adding links to them.
If the bot isn’t able to reach all the pages of the website, then it means that crawling error has been encountered.
While your goal maybe a direct 301 redirect, but every page end shall always return with a 200 OK server response.
- Site Errors: On surface level, your whole website can’t be crawled.
- URL Errors: These are specific errors and hence are easier to deal with and fix.
Site Errors and Types with their Fixes:
Site errors are bunch of errors that obstruct the search engine bot from reaching your website. Some of the common reasons are:
It means that the search engine bot has been denied connection with the server, translating that it might be temporarily down and website can’t be accessed.
It is just a temporary issue and the search engine will make attempts later to crawl your website anyway.
But, if you spot this error, it basically will mean that the search engine has attempted it more than a couple of times and wasn’t able to go through.
DND Errors Fix:
Firstly, use the Fetch as Google tool and understand how Google crawls your website.
Fetch the process here without render and the slower might be useful and do a comparison of how Google views your website to that of a user.
Next, check your DNS provider. If the first solution doesn’t work then it needs further action.
Hence check with the DNS provider to locate the issue. It should display either a 404 or 500 error code instead.
Any search engine before crawling, employs the bot to crawl the robots.txt file to check if there are any pages on your website that you don’t want to be Now if the bot isn’t able to access the robots.txt file, it means that the search engine will postpone the crawl process.
Hence ensure that it’s always available.
Robot Failure Fix:
Please make sure that robots.txt file is configured properly. Also check on the pages that you don’t want to be crawled.
Use a server header check tool to check for the error code which should be either 200 or 404 error.
The best options is not to have any robots.txt file as Google will then automatically scan through the website.
It means that when the bot tried to reach your website, the connection request had been timed out. In this case, it took the server long enough to load and hence sent an error message.
Now they may also occur when you have flaws in the coding that stop the page from loading.
Another reason might be that the traffic may just be so high that the server couldn’t handle the request and sent an error message.
Server Errors Fix:
Use the same Fetch as Google tool and check the status. If the tool is able to return to the homepage without any further errors, it means that the search engine generally is able to access your site properly.
Before you go on to fix the issue, specifically diagnose which type of server error is being displayed and how to address it such as timeout, connection reset, connect failed, truncated headers and response.
URL Errors and types:
URL errors are more specific and refers to the crawling process failing to reach specific page on your website. There are some crawl errors such as 404 NOT FOUND or if the page is gone forever then a 410 page.
Hence it’s better to read about these common errors and fix them or perform a 301 redirect. Many URL errors are caused by internal links and hence mostly are the owner’s fault.
- Mobile Specific URL errors: It refers to URL errors found on modern smartphones. These are usually unlikely to occur when you have a very responsive website. But if you maintain separate mobile subdomain, then there is a possibility of encountering errors. It might be the basic desktop to mobile site errors that may have been blocked due to robots.txt file.
- Malware errors: If you have this error being displayed on the search engine, it means that the engine has found a trace of malicious software on the particular URL. Therefore, it needs a greater investigation of the page and you need to remove the malware from the URL as soon as possible.
- Google News Errors: These errors occur when your website is also being displayed on Google News. It is due to the documentation error from the search engine’s site. It may vary such as lack of title or just simply being classified as no news article contained. Hence, be sure to check the guidelines and improvise them accordingly as per the requirement of your site.
Fix for URL errors:
Some common errors such as soft 404 or 410 are encountered. To fix them, ensure that the error is 404 or 410 and not 200. Perform a 301 redirect to the relevant pages on your website.
For live pages, ensure that the content is appropriate and not thin content. If you see a 404 error code, ensure that the page is published via content management system and not derived from draft or deleted bin.
Check or the page variation and check for different domains such as www vs non www domain of site and similarly http vs https. If you don’t desire to have the page but keep it as a redirecting link then perform a 301 redirect.
The concluding point is that if you find any crawl errors, do fix them as soon as possible by any possible means.
Keep the maintenance schedule in check for every week and check for any new additions and deletions of the features and the pages.