Request that AI models exclude your site

Request how data from your website is used, including in certain AI models

Last updated July 25, 2024

All public pages on the internet are accessible by both humans and machines (web crawlers). These crawlers may index your site for various reasons depending on the company doing the crawling (for example, Google including your site in their search results). Squarespace provides you with two options related to crawlers that may be accessing your site to request that your data is used differently. This is done by putting the requested exclusions in your website’s robots.txt file. This guide explains how to add the request relating to AI crawlers.

Keep in mind:

  • Requesting that known AI crawlers exclude your site doesn't guarantee they will, but it's the best option currently available
  • If AI crawlers exclude your site, it might negatively impact your site traffic
  • Squarespace makes no revenue from website crawling from any 3rd party search or AI companies

To exclude your site from search engine results, visit Hiding your site from search results. To review all options for hiding content on your site, visit Controlling who can access your site's pages.

Exclude your site from known AI crawler scans

To request AI crawlers not scan your site:

  1. Open the Settings panel.
  2. Click Crawlers.
  3. Check the box next to “Block known artificial intelligence crawlers."

Checking the box to block known artificial intelligence crawlers updates your robots.txt file to tell the following bots not to crawl your site:

  • Anthropic AI
  • Applebot-Extended
  • CCBot
  • Claude-Web
  • cohere-ai
  • FacebookBot
  • Google Extended
  • GPTBot and ChatGPT-User
  • PerplexityBot

As AI technology continues to develop, we may add more bots to this list. If you'd like to suggest we block a specific bot that's not listed here, you can contact us to put in a feature request.

Note

Checking the box to block known artificial intelligence crawlers doesn't retroactively remove content previously scraped from a site from AI model training data.

How does this work? What is robots.txt?

Robots.txt is a file published on your website that friendly crawlers have offered to read and obey. In the past, the most common reason for modifying this file was to control inclusion/exclusion from search results. However, updating robots.txt this way is only a request, and malicious crawlers may still misuse your content.

The only way to ensure your content on the public internet is never viewed by a crawler is to make it private.

Why isn't the box to block known artificial intelligence crawlers checked by default?

All websites on the internet are visited by all crawlers by default unless they request a specific exclusion, and have been for decades (by companies like Google, Yahoo, SEO companies, AI companies, and more). There are tens of thousands of crawlers in the world doing various things, many positive and useful – and many (AI included) will provide traffic and visibility for your site.

We default to having the box unchecked (which means we haven’t added any “AI do not crawl” requests to your robots.txt file) because we don't want to potentially impact your site's traffic by excluding it from chat answers and sources. However, the checkbox is there so you can choose to request that AI crawlers exclude your site. It's a decision each site owner must make for themselves. There isn’t currently a universal way to request only being excluded from image or text training models, while still being featured by the same AI company’s chatbots in their answers to send potential customer traffic to your site.

Why would I leave this box unchecked?

The benefit from leaving your site as open as possible is that it will likely give you more traffic. Most sites want traffic from Google, and optimize for it. Similarly, newer AI companies commonly provide backlinks to source sites and include information from your site in answers, increasing visibility for your content.

Being present in chatbot answers is an additional source of traffic. For instance, if you were running a restaurant, and a potential customer typed, “What is the best restaurant in New York?” into a prompt, you would likely want to be mentioned in the answer so it would lead to more people knowing about your site and restaurant.

Note: It's currently not possible to request that AI crawlers only scan specific pages.

Why is the setting called “known artificial intelligence crawlers”?

There isn't a universal way to request that AI companies not crawl a site. Instead, we have to create a specific request for each AI company. The list above shows the companies we currently know about and include, who have offered to obey a specific robots.txt instruction.

Footer Image
  • Get help from our community

  • Get help from our community on advanced customizations.

  • Hire a Squarespace Expert

  • Stand out online with the help of an experienced designer or developer.

Request that AI models exclude your site