Warning: Some posts on this platform may contain adult material intended for mature audiences only. Viewer discretion is advised. By clicking ‘Continue’, you confirm that you are 18 years or older and consent to viewing explicit content.
from my limited experience, about half? i had to finally set up a robots.txt last month after Anthropic decided it would be OK to crawl my Wikipedia mirror from about a dozen different IP addresses simultaneously, non-stop, without any rate limiting, and bring it to its knees. fuck them for it, but at least it stopped once i added robots.txt.
Facebook, Amazon, and a few others are ignoring that robots.txt, on the other hand. they have the decency to do it slowly enough that i’d never notice unless i checked the logs, at least.
from my limited experience, about half? i had to finally set up a robots.txt last month after Anthropic decided it would be OK to crawl my Wikipedia mirror from about a dozen different IP addresses simultaneously, non-stop, without any rate limiting, and bring it to its knees. fuck them for it, but at least it stopped once i added robots.txt.
Facebook, Amazon, and a few others are ignoring that robots.txt, on the other hand. they have the decency to do it slowly enough that i’d never notice unless i checked the logs, at least.