I employ a bad-robots trap on my site. If they disregard the contents
of robots.txt, then they are likely to fall into that trap. All robots
are disallowed from the folder where the robot trap is. Today, another
GoogleBot hit the trap!
Quite apart from the fact that the robot is then re-directed to the
trap page - the same page for every file called - and all the SEO
penalties that potentially brings, there is a fundamental question
here. It's one of trust.
If there is one robot that most webmasters trust, it's GoogleBot. Now
that is severely being drawn into question. The fact that any of its
Bots disrespects the directives of robots.txt challenges the basic
fundamentals of having that file there in the first place.
All webmasters face the constant challenge of damage done by spammers,
data harvesters, hackers and security flaw checkers. Attempting to
protect ourselves from these outlaws takes up an inordinate amount of
development time as it is. The very last thing that we need is Google
joining that band! Robots.txt is hugely important and its scope needs
to be strengthened, not undermined by your rogue robots!
Get your act together Google - you aren't too big to be faced by a
class action by people disaffected by your determination to mine
everything on every server out there, regardless of whether you are
permitted, or not.
Googlebot rarely disobeys a well-formed robots.txt prohibition. You
may be seeing a hacker script that has tried to disguise itself as
Googlebot. Did you check the IP address of the "bad" Googlebot
against any other records of its visit?
> I employ a bad-robots trap on my site. If they disregard the contents
> of robots.txt, then they are likely to fall into that trap. All robots
> are disallowed from the folder where the robot trap is. Today, another
> GoogleBot hit the trap!
> Quite apart from the fact that the robot is then re-directed to the
> trap page - the same page for every file called - and all the SEO
> penalties that potentially brings, there is a fundamental question
> here. It's one of trust.
> If there is one robot that most webmasters trust, it's GoogleBot. Now
> that is severely being drawn into question. The fact that any of its
> Bots disrespects the directives of robots.txt challenges the basic
> fundamentals of having that file there in the first place.
> All webmasters face the constant challenge of damage done by spammers,
> data harvesters, hackers and security flaw checkers. Attempting to
> protect ourselves from these outlaws takes up an inordinate amount of
> development time as it is. The very last thing that we need is Google
> joining that band! Robots.txt is hugely important and its scope needs
> to be strengthened, not undermined by your rogue robots!
> Get your act together Google - you aren't too big to be faced by a
> class action by people disaffected by your determination to mine
> everything on every server out there, regardless of whether you are
> permitted, or not.
That's quite a rant and it's all pretty unfounded.
The robots.txt file is NOT to be used to protect areas of a website
that you don't want visited.
It's a recommendation for robots of what you don't want INDEXED, which
is quite a different thing from VISITED.
Any urls to which there are otherwise unobstructed links to on your
site or off-site, may be visited by any robot. Googlebot will pay
attention to a well formed robots.txt file, which the server serves
properly, and not indexed contents of that url if it can see clearly
that it is disallowed. We are talking about indexing - which refers to
content- not about the url appearing in a site: query.
> I employ a bad-robots trap on my site. If they disregard the contents
> of robots.txt, then they are likely to fall into that trap. All robots
> are disallowed from the folder where the robot trap is. Today, another
> GoogleBot hit the trap!
> Quite apart from the fact that the robot is then re-directed to the
> trap page - the same page for every file called - and all the SEO
> penalties that potentially brings, there is a fundamental question
> here. It's one of trust.
> If there is one robot that most webmasters trust, it's GoogleBot. Now
> that is severely being drawn into question. The fact that any of its
> Bots disrespects the directives of robots.txt challenges the basic
> fundamentals of having that file there in the first place.
> All webmasters face the constant challenge of damage done by spammers,
> data harvesters, hackers and security flaw checkers. Attempting to
> protect ourselves from these outlaws takes up an inordinate amount of
> development time as it is. The very last thing that we need is Google
> joining that band! Robots.txt is hugely important and its scope needs
> to be strengthened, not undermined by your rogue robots!
> Get your act together Google - you aren't too big to be faced by a
> class action by people disaffected by your determination to mine
> everything on every server out there, regardless of whether you are
> permitted, or not.