Web indexing

Search engines crawl websites to discover updated content on the web, such as new sites or pages, changes to existing sites, and dead links. Once a search engine processes each of the pages it crawls, it compiles a massive index of all the words it sees and their location on each page.

Web site owners use the robots.txt file with Disallow, Allow, User-agent, Crawl-delay and other rules to give instructions about their site to web robots (for example, to prevent crawlers from visiting and indexing their webpages). Furthermore, it is possible to prevent a page from appearing in search by including a noindex and nofollow attribute.

Indexing report checks parameters that can affect web indexing: server response codes, restrictive directives in robots.txt, meta tags nofollow and noindex, link rel canonical and alternate tags that control webpage elements, and a refresh redirect tag.

Web indexing

Example of Indexing report generated by Website Audit Tool