Onion Information

http://wgq3bd2kqoybhstp77i3wrzbfnsyd27wt34psaja4grqiezqircorkyd.onion/notes/2023/04/21/opting-out-of-llm-indexing/

Opting out of LLM indexing - Seirdy

I added an entry to my robots.txt to block ChatGPT’s crawler, but blocking crawling isn’t the same as blocking indexing; it looks like Google chose to use the

Onion Details

Page Clicks: 0

First Seen: 03/11/2024

Last Indexed: 10/21/2024

Domain Index Total: 190

Onion Content

I added an entry to my robots.txt to block ChatGPT’s crawler, but blocking crawling isn’t the same as blocking indexing; it looks like Google chose to use the Common Crawl for this and sidestep the need to do crawling of its own. That’s a strange decision; after all, Google has a much larger proprietary index at its disposal. A “secret list of websites” was an ironic choice of words, given that this originates from the Common Crawl. It’s sad to see Common Crawl (ab)used for this, but I suppose we should have seen it coming. I know Google tells authors how to qualify/disqualify from rich results, but I don’t see any docs for opting a site out of LLM/Bard training.

Blog

Advertise