Abstract

Robots.txt and sitemaps files are the main methods to regulate search engine crawler access to its content. This article explain the importance of such files and analyze robots.txt and sitemaps from more than 4,000 web sites belonging to spanish public administration to determine [...]