Das LG Hamburg schon wieder.... Anweisung in robots.txt ignorieren? | Intern.de http://www.intern.de/internet-news/9269-lg-hamburg-interpretiert-anweisung-in-robotstxt-auf-eigenwillige-art-und-weise.html
404 Not Found
www.intern.de
January 4, 2025 at 1:27 AM
Everybody can reply
What fresh #AI hell is that?
grep claudebot *.log | grep -c robots.txt
3792
Why the heck is that crap requesting the robots.txt over and over again?
#Claude #ClaudeBot #ClaudeAI #robotstxt
grep claudebot *.log | grep -c robots.txt
3792
Why the heck is that crap requesting the robots.txt over and over again?
#Claude #ClaudeBot #ClaudeAI #robotstxt
February 15, 2024 at 9:48 PM
Everybody can reply
2 likes
Blocking robots.txt is not very cyberpunk
#ai #llm #cyberpunk #openai #perplexity #anthropic #gemini #claude #stablediffusion #aiart #llms #chatgpt #copilot #robots #robotstxt
#ai #llm #cyberpunk #openai #perplexity #anthropic #gemini #claude #stablediffusion #aiart #llms #chatgpt #copilot #robots #robotstxt
July 7, 2024 at 4:29 PM
Everybody can reply
November 27, 2024 at 6:47 PM
Everybody can reply
2 quotes
7 likes
CRAN updates: chromote distrSim grates MCPModGeneral moc.gapbk robotstxt #rstats
August 29, 2024 at 5:02 PM
Everybody can reply
A GitHub-hosted project offers a curated robots.txt file designed to block known AI crawlers from accessing website content #AI #WebCrawlers #GitHub #airobots #DataPrivacy #LLMs #Devs #AITraining #RobotsTxt #DigitalRights #Copyright
GitHub-Project Offers to Block All Known AI Web Crawlers Via ROBOTS.TXT - WinBuzzer
ai.robots.txt on GitHub aims to empower developers to restrict known AI bots from scraping online data without permission.
buff.ly
January 14, 2025 at 3:30 PM
Everybody can reply
1 likes
Google may index pages blocked by Robots.txt: John Mueller of Google clarifies why pages blocked by robots.txt can still appear in search results, offering key insights for webmasters. #Google #SEO #RobotsTxt #Webmasters #SearchResults
Google may index pages blocked by Robots.txt
John Mueller of Google clarifies why pages blocked by robots.txt can still appear in search results, offering key insights for webmasters.
ppc.land
December 25, 2024 at 5:34 PM
Everybody can reply
1 likes
#Development #Analyses
Farewell to robots.txt (1994-2025) · “You were too good for this world.” ilo.im/167q2b by Henning Fries
_____
#SearchEngine #InternetArchive #Crawlers #AI #Content #Website #RobotsTxt #RFC9309 #WebDev #Backend
Farewell to robots.txt (1994-2025) · “You were too good for this world.” ilo.im/167q2b by Henning Fries
_____
#SearchEngine #InternetArchive #Crawlers #AI #Content #Website #RobotsTxt #RFC9309 #WebDev #Backend
Obituary: Farewell to robots.txt (1994-2025)
The voluntary compliance protocol that civilized the internet has departed, bids Henning Fries farewell.
ilo.im
October 21, 2025 at 8:20 AM
Everybody can reply
3 likes
Google revamps documentation for crawlers and user-triggered fetchers: Google revamps crawler documentation, adding product impact info and robots.txt snippets for each crawler user agent. #Google #SEO #WebCrawlers #Documentation #RobotsTxt
Google revamps documentation for crawlers and user-triggered fetchers
Google revamps crawler documentation, adding product impact info and robots.txt snippets for each crawler user agent.
ppc.land
December 25, 2024 at 5:08 PM
Everybody can reply
1 likes
Cloudflare Overhauls Web’s AI Rulebook with New Robots.txt ‘Content Signals’
#AI #Cloudflare #RobotsTxt #DataScraping #Publishing #GenerativeAI
winbuzzer.com/2025/10/06/c...
#AI #Cloudflare #RobotsTxt #DataScraping #Publishing #GenerativeAI
winbuzzer.com/2025/10/06/c...
Cloudflare Overhauls Web’s AI Rulebook with New Robots.txt ‘Content Signals’ - WinBuzzer
Cloudflare has launched its Content Signals Policy, a major update to robots.txt giving publishers new controls over how their content is used for AI training.
winbuzzer.com
October 6, 2025 at 1:08 PM
Everybody can reply
Hmm I probably have the most ridiculous #robotstxt for a #Misskey instance right now lol. I just want to let #Mojeek and #Marginalia crawl #Makai and make sure to keep out #Google and the AI scrapers... :satrithink:
If there are other user-agents of independent #searchengines I should allow in…
If there are other user-agents of independent #searchengines I should allow in…
May 23, 2024 at 10:14 AM
Everybody can reply
New release of nginx_robot_access:
https://github.com/glyn/nginx_robot_access/releases/tag/v0.1.1
#nginx #robotstxt
https://github.com/glyn/nginx_robot_access/releases/tag/v0.1.1
#nginx #robotstxt
Release v0.1.1 · glyn/nginx_robot_access
What's Changed Test case insensitive matching of user agent by @glyn in #8 Bump ngx-rust Full Changelog: v0.1.0...v0.1.1
github.com
March 19, 2025 at 10:50 AM
Everybody can reply
1 reposts
The robotx.txt standard turned 30 last year. But is it still relevant in a world filled with AI bots, site scrapers, and other dubious bots?
www.plagiarismtoday.com/2025/10/21/d...
#AI #RobotsTxt #Scraping
www.plagiarismtoday.com/2025/10/21/d...
#AI #RobotsTxt #Scraping
Does Robots.txt Matter Anymore?
The robotx.txt standard turned 30 last year. But is it still relevant in a world filled with AI bots, site scrapers, and other dubious bots?
www.plagiarismtoday.com
October 21, 2025 at 6:52 PM
Everybody can reply
Comment éviter de se faire crawler par Apple Bot mspoweruser.com/how-to-opt-o... #applebot #robotstxt #seo
How to Opt Out of AppleBot So Apple Won’t Train AI on Your Websites
Here's how to opt out of AppleBot, Apple's web crawler used to scrape public data and train its AI models.
mspoweruser.com
July 12, 2024 at 7:53 AM
Everybody can reply
Meet LLMs.txt, a Proposed Standard for AI Website Content Crawling, by @searchengineland.bsky.social:
https://searchengineland.com/llms-txt-proposed-standard-453676
#ai #crawling #scraping #robotstxt
https://searchengineland.com/llms-txt-proposed-standard-453676
#ai #crawling #scraping #robotstxt
Meet LLMs.txt, a Proposed Standard for AI Website Content Crawling
searchengineland.com
April 20, 2025 at 7:30 AM
Everybody can reply
1 reposts
1 likes
Google's @methode.bsky.social released an update to the opensource version of Google's robots.txt parser on GitHub www.seroundtable.com/google-updat...
#google #robotstxt #crawler #parser
#google #robotstxt #crawler #parser
May 23, 2024 at 11:51 AM
Everybody can reply
One typo in robots.txt can block your site from Google! 🚨
Test it with Google’s free tool and avoid mistakes.
Follow for more SEO insights.
SEO #RobotsTxt #WebCrawling #DigitalMarketing #SEOTips
Test it with Google’s free tool and avoid mistakes.
Follow for more SEO insights.
SEO #RobotsTxt #WebCrawling #DigitalMarketing #SEOTips
February 22, 2025 at 2:25 PM
Everybody can reply
useful article from @mallory.techpolicy.social.ap.brid.gy and @awdsome.bsky.social on the state of play of robots.txt, AI preference signals, and more: www.techpolicy.press/robotstxt-is... - also highlights that whatever happens with (c), there will be a tussle of technical measures & counters
Robots.txt Is Having a Moment: Here's Why We Should Care | TechPolicy.Press
Once a quiet piece of internet plumbing, robots.txt is now in the spotlight, write Audrey Hingle and Mallory Knodel.
www.techpolicy.press
April 4, 2025 at 4:55 PM
Everybody can reply
1 reposts
3 likes
OpenAI's crawlers took down e-commerce site Triplegangers by relentlessly scraping its entire content, as the site's robots.txt file was misconfigured. A reminder of the importance of proper site configuration for web scraping. #OpenAI #WebScraping #Ecommerce #RobotsTxt #TechEthics #SiteManagement
January 11, 2025 at 7:22 AM
Everybody can reply
2 likes
Cloudflare launches Robotcop to enforce robots.txt policies against AI crawlers: New tool helps website owners monitor and block unauthorized AI bot access by enforcing robots.txt directives at the network level. #Cloudflare #AI #RobotsTxt #WebSecurity #DataPrivacy
Cloudflare launches Robotcop to enforce robots.txt policies against AI crawlers
New tool helps website owners monitor and block unauthorized AI bot access by enforcing robots.txt directives at the network level.
ppc.land
December 20, 2024 at 5:22 PM
Everybody can reply
Hmm I probably have the most ridiculous #robotstxt for a #Misskey instance right now lol. I just want to let #Mojeek and #Marginalia crawl #Makai and make sure to keep out #Google and the AI scrapers... :satrithink:
If there are other user-agents of independent #searchengines I should allow in…
If there are other user-agents of independent #searchengines I should allow in…
May 23, 2024 at 10:14 AM
Everybody can reply
Modules are the backbone of Drupal’s flexibility. We’ve listed 57 must-have Drupal CMS modules to help turn your ideas into powerful websites—RobotsTxt is one of them!
Check out the full list: https://bit.ly/4kaEEg6
Check out the full list: https://bit.ly/4kaEEg6
June 13, 2025 at 10:00 AM
Everybody can reply
1 likes
Think robots.txt protects private data? It doesn’t. It’s public and not a lock. In Ep.10 of Ecomm Insights, we break down what it actually does and how to use it for SEO (not security).
Listen: open.spotify.com/episode/47cc...
#ShopifySEO #RobotsTxt #EcommInsights #noryX #SEOtips
Listen: open.spotify.com/episode/47cc...
#ShopifySEO #RobotsTxt #EcommInsights #noryX #SEOtips
July 24, 2025 at 2:57 PM
Everybody can reply
1 likes
#Business #Reports
The web has a new AI payment system · The RSL Standard sets rules for AI scraping fees ilo.im/166ryy by Emma Roth
_____
#Web #Publishing #Website #Blog #Content #AI #Crawlers #Payments #RSL #RobotsTxt
The web has a new AI payment system · The RSL Standard sets rules for AI scraping fees ilo.im/166ryy by Emma Roth
_____
#Web #Publishing #Website #Blog #Content #AI #Crawlers #Payments #RSL #RobotsTxt
The web has a new system for making AI companies pay up
The mission is to keep the web sustainable.
ilo.im
September 10, 2025 at 4:57 PM
Everybody can reply