Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Speaking as an upstart search engine guy (blekko) who also has a bunch of webpages and a huge robots.txt, that's a bad idea. Such a crawler would be knocking down webservers by running expensive scripts and clicking links that do bad things like deleting records from databases or reverting edits in wikis. You don't want to go there.


Really? I was always taught that search engines only do "get" requests, and anything that modifies data is in a "post" request. Are there really that many broken web sites out there, that hasn't already fallen victim to crawlers that ignore robots.txt?


Yes, there are a lot of broken websites out there.


I noticed this today. Googling "united check in" and clicking the link "check" gave me a link that told me the confirmation number that I entered was invalid though I never entered one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: