Request that AI models exclude your site

Note: While our most popular guides have been translated into Spanish, some guides are only available in English.

Request that AI models exclude your site

Last updated January 08, 2026 22:43

Request how data from your website is used, including in certain AI models

All public pages on the internet are accessible by both humans and machines (web crawlers). These crawlers may index your site for various reasons depending on the company doing the crawling (for example, Google including your site in their search results). Squarespace provides you with two options related to crawlers that may be accessing your site to request that your data is used differently. This is done by putting the requested exclusions in your website’s robots.txt file. This guide explains how to add the request relating to AI crawlers.

Keep in mind:

Requesting that known AI crawlers exclude your site doesn't guarantee they will, but it's the best option currently available
If AI crawlers exclude your site, it might negatively impact your site traffic
Squarespace makes no revenue from website crawling from any 3rd party search or AI companies

To exclude your site from search engine results, visit Hiding your site from search results. To review all options for hiding content on your site, visit Controlling who can access your site's pages.

Exclude your site from known AI crawler scans

To request AI crawlers not scan your site:

Open the Settings panel.
Click Crawlers.
Check the box next to “Block known artificial intelligence crawlers."

Checking the box updates your robots.txt file to tell the following bots not to crawl your site:

AI2Bot
Ai2Bot-Dolma
aiHitBot
Amazonbot
anthropic-ai
Applebot-Extended
Bytespider
CCBot
ClaudeBot
cohere-ai
cohere-training-data-crawler
DuckAssistBot
FacebookBot
Google-Extended
GoogleOther
GoogleOther-Image
GoogleOther-Video
GPTBot
img2dataset
Meta-ExternalAgent
MyCentralAIScraperBot
omgili
omgilibot
Quora-Bot
TikTokSpider
YouBot

As AI technology continues to develop, we may add more bots to this list. If you'd like to suggest we block a specific bot that's not listed here, you can contact us to put in a feature request.

Note: Checking the box to block known artificial intelligence crawlers doesn't retroactively remove content previously scraped from a site from AI model training data.

How does this work? What is robots.txt?

Robots.txt is a file published on your website that friendly crawlers have offered to read and obey. In the past, the most common reason for modifying this file was to control inclusion/exclusion from search results. However, updating robots.txt this way is only a request, and malicious crawlers may still misuse your content.

The only way to ensure your content on the public internet is never viewed by a crawler is to make it private.

Why isn't the box to block known artificial intelligence crawlers checked by default?

All websites on the internet are visited by all crawlers by default unless they request a specific exclusion, and have been for decades (by companies like Google, Yahoo, SEO companies, AI companies, and more). There are tens of thousands of crawlers in the world doing various things, many positive and useful – and many (AI included) will provide traffic and visibility for your site.

We default to having the box unchecked (which means we haven’t added any “AI do not crawl” requests to your robots.txt file) because we don't want to potentially impact your site's traffic by excluding it from chat answers and sources. However, the checkbox is there so you can choose to request that AI crawlers exclude your site. It's a decision each site owner must make for themselves. There isn’t currently a universal way to request only being excluded from image or text training models, while still being featured by the same AI company’s chatbots in their answers to send potential customer traffic to your site.

Why would I leave this box unchecked?

The benefit from leaving your site as open as possible is that it will likely give you more traffic. Most sites want traffic from Google, and optimize for it. Similarly, newer AI companies commonly provide backlinks to source sites and include information from your site in answers, increasing visibility for your content.

Being present in chatbot answers is an additional source of traffic. For instance, if you were running a restaurant, and a potential customer typed, “What is the best restaurant in New York?” into a prompt, you would likely want to be mentioned in the answer so it would lead to more people knowing about your site and restaurant.

Learn more about improving your content’s reach and visibility in AI-powered search engines and chatbots.

Note: It's currently not possible to request that AI crawlers only scan specific pages.

Why is the setting called “known artificial intelligence crawlers”?

There isn't a universal way to request that AI companies not crawl a site. Instead, we have to create a specific request for each AI company. The list above shows the companies we currently know about and include, who have offered to obey a specific robots.txt instruction.

Was this article helpful?

232 out of 399 found this helpful

Ensure your .AU domain remains active by confirming your registrant details.

Report malware

Report phishing or pharming

Report spam

Report Whois contact inaccuracy

Non-Public Information Request Process Disclosure

Points to consider

Request that AI models exclude your site

Request how data from your website is used, including in certain AI models

Exclude your site from known AI crawler scans

How does this work? What is robots.txt?

Why isn't the box to block known artificial intelligence crawlers checked by default?

Why would I leave this box unchecked?

Why is the setting called “known artificial intelligence crawlers”?

Ensure your .AU domain remains active by confirming your registrant details.

Report malware

Report phishing or pharming

Report spam

Report Whois contact inaccuracy

Non-Public Information Request Process Disclosure

Points to consider

Request how data from your website is used, including in certain AI models

Exclude your site from known AI crawler scans

How does this work? What is robots.txt?

Why isn't the box to block known artificial intelligence crawlers checked by default?

Why would I leave this box unchecked?

Why is the setting called “known artificial intelligence crawlers”?

Related articles