fix: disable AI service links due to Netlify edge function issues

- Remove /api/raw Netlify Function that caused build failures
- Comment out ChatGPT/Claude/Perplexity buttons in CopyPageDropdown
- Keep Copy page, View as Markdown, Download as SKILL.md options
- Update blog post with detailed log of attempted solutions
- Clean up netlify.toml by removing broken redirect rule

Users can still copy markdown and paste into AI tools manually.
The raw markdown files work in browsers but AI crawlers cannot
fetch them reliably due to Netlify edge function interception.
This commit is contained in:
Wayne Sutton
2025-12-24 01:44:00 -08:00
parent 534f020999
commit e2eaa9c43b
4 changed files with 139 additions and 296 deletions

View File

@@ -5,7 +5,7 @@ date: "2025-12-21"
slug: "netlify-edge-excludedpath-ai-crawlers" slug: "netlify-edge-excludedpath-ai-crawlers"
published: true published: true
tags: ["netlify", "edge-functions", "ai", "troubleshooting", "help"] tags: ["netlify", "edge-functions", "ai", "troubleshooting", "help"]
readTime: "4 min read" readTime: "5 min read"
featured: false featured: false
--- ---
@@ -36,78 +36,157 @@ The page could not be loaded with the tools currently available, so its raw mark
**Claude:** **Claude:**
Works. Loads and reads the markdown successfully. Works. Loads and reads the markdown successfully.
## Current configuration ## Attempted solutions log
Static files exist in `public/raw/` and are served via `_redirects`: ### December 24, 2025
``` **Attempt 1: excludedPath in netlify.toml**
/raw/* /raw/:splat 200
```
Edge function configuration in `netlify.toml`: Added array of excluded paths to the edge function declaration:
```toml ```toml
[[edge_functions]] [[edge_functions]]
path = "/*" path = "/*"
function = "botMeta" function = "botMeta"
excludedPath = "/raw/*" excludedPath = [
"/raw/*",
"/assets/*",
"/api/*",
"/.netlify/*",
"/favicon.ico",
"/favicon.svg",
"/robots.txt",
"/sitemap.xml",
"/llms.txt",
"/openapi.yaml"
]
``` ```
The `botMeta` function also has a code-level check: Result: ChatGPT and Perplexity still blocked.
**Attempt 2: Hard bypass in botMeta.ts**
Added early return at top of handler to guarantee static markdown is never intercepted:
```typescript ```typescript
// Skip if it's the home page, static assets, API routes, or raw markdown files const url = new URL(request.url);
if ( if (
pathParts.length === 0 || url.pathname.startsWith("/raw/") ||
pathParts[0].includes(".") || url.pathname.startsWith("/assets/") ||
pathParts[0] === "api" || url.pathname.startsWith("/api/") ||
pathParts[0] === "_next" || url.pathname.startsWith("/.netlify/") ||
pathParts[0] === "raw" // This check exists url.pathname.endsWith(".md")
) { ) {
return context.next(); return context.next();
} }
``` ```
## Why it's not working Result: ChatGPT and Perplexity still blocked.
Despite `excludedPath = "/raw/*"` and the code check, the edge function still intercepts requests to `/raw/*.md` before static files are served. **Attempt 3: AI crawler whitelist**
According to Netlify docs, edge functions run before redirects and static file serving. The `excludedPath` should prevent the function from running, but it appears the function still executes and may be returning a response that blocks static file access. Added explicit bypass for known AI user agents:
## What we've tried
1. Added `excludedPath = "/raw/*"` in netlify.toml
2. Added code-level check in botMeta.ts to skip `/raw/` paths
3. Verified static files exist in `public/raw/` after build
4. Confirmed `_redirects` rule for `/raw/*` is in place
5. Tested with different URLPattern syntax (`/raw/*`, `/**/*.md`)
All attempts result in the same behavior: ChatGPT and Perplexity cannot access the files, while Claude can.
## Why Claude works
Claude's web fetcher may use different headers or handle Netlify's edge function responses differently. It successfully bypasses whatever is blocking ChatGPT and Perplexity.
## The question
How can we configure Netlify edge functions to truly exclude `/raw/*` paths so static markdown files are served directly to all AI crawlers without interception?
Is there a configuration issue with `excludedPath`? Should we use a different approach like header-based matching to exclude AI crawlers from the botMeta function? Or is there a processing order issue where edge functions always run before static files regardless of exclusions?
## Code reference
The CopyPageDropdown component sends these URLs to AI services:
```typescript ```typescript
const rawMarkdownUrl = `${origin}/raw/${props.slug}.md`; const AI_CRAWLERS = [
"gptbot", "chatgpt", "chatgpt-user", "oai-searchbot",
"claude-web", "claudebot", "anthropic", "perplexitybot"
];
if (isAICrawler(userAgent)) {
return context.next();
}
``` ```
Example: `https://www.markdown.fast/raw/fork-configuration-guide.md` Result: ChatGPT and Perplexity still blocked.
The files exist. The redirects are configured. The edge function has exclusions. But AI crawlers still cannot access them. **Attempt 4: Netlify Function at /api/raw/:slug**
Created a serverless function to serve markdown files directly:
```javascript
// netlify/functions/raw.js
exports.handler = async (event) => {
const slug = event.queryStringParameters?.slug;
// Read from dist/raw/${slug}.md or public/raw/${slug}.md
return {
statusCode: 200,
headers: { "Content-Type": "text/plain; charset=utf-8" },
body: markdownContent
};
};
```
With redirect rule:
```toml
[[redirects]]
from = "/api/raw/*"
to = "/.netlify/functions/raw?slug=:splat"
status = 200
force = true
```
Result: Netlify build failures due to function bundling issues and `package-lock.json` dependency conflicts.
**Attempt 5: Header adjustments**
Removed `Link` header from global scope to prevent header merging on `/raw/*`:
```toml
[[headers]]
for = "/*"
[headers.values]
X-Frame-Options = "DENY"
# Link header removed from global scope
[[headers]]
for = "/index.html"
[headers.values]
Link = "</llms.txt>; rel=\"author\""
```
Removed `X-Robots-Tag = "noindex"` from `/raw/*` headers.
Result: ChatGPT and Perplexity still blocked.
### Why these attempts failed
The core issue appears to be how ChatGPT and Perplexity fetch URLs. Their tools receive 400 or 403 responses even when `curl` from the command line works. This suggests:
1. Netlify may handle AI crawler user agents differently at the CDN level
2. The edge function exclusions work for browsers but not for AI fetch tools
3. There may be rate limiting or bot protection enabled by default
## Current workaround
Users can still share content with AI tools by:
1. **Copy page** copies markdown to clipboard, then paste into any AI
2. **View as Markdown** opens the raw `.md` file in a browser tab for manual copying
3. **Download as SKILL.md** downloads in Anthropic Agent Skills format
The direct "Open in ChatGPT/Claude/Perplexity" buttons have been disabled since the URLs don't work reliably.
## Working features
Despite AI crawler issues, these features work correctly:
- `/raw/*.md` files load in browsers
- `llms.txt` discovery file is accessible
- `openapi.yaml` API spec loads properly
- Sitemap and RSS feeds generate correctly
- Social preview bots (Twitter, Facebook, LinkedIn) receive OG metadata
- Claude's web fetcher can access raw markdown
## Help needed ## Help needed
If you've solved this or have suggestions, we'd appreciate guidance. The goal is simple: serve static markdown files at `/raw/*.md` to all clients, including AI crawlers, without edge function interception. If you've solved this or have suggestions, open an issue. We've tried:
GitHub raw URLs work as a workaround, but we'd prefer to use Netlify-hosted files for consistency and to avoid requiring users to configure GitHub repo details when forking. - netlify.toml excludedPath arrays
- Code-level path checks in edge functions
- AI crawler user agent whitelisting
- Netlify Functions as an alternative endpoint
- Header configuration adjustments
None have worked for ChatGPT or Perplexity. GitHub raw URLs remain the most reliable option for AI consumption, but require additional repository configuration when forking.

View File

@@ -5,13 +5,6 @@
[build.environment] [build.environment]
NODE_VERSION = "20" NODE_VERSION = "20"
# API raw markdown endpoint for AI tools (ChatGPT, Claude, Perplexity)
[[redirects]]
from = "/api/raw/*"
to = "/.netlify/functions/raw?slug=:splat"
status = 200
force = true
# Raw markdown passthrough - explicit rule prevents SPA fallback from intercepting # Raw markdown passthrough - explicit rule prevents SPA fallback from intercepting
[[redirects]] [[redirects]]
from = "/raw/*" from = "/raw/*"

View File

@@ -1,77 +0,0 @@
const fs = require("fs");
const path = require("path");
/**
* Netlify Function: /api/raw/:slug
*
* Serves raw markdown files for AI tools (ChatGPT, Claude, Perplexity).
* Returns text/plain with minimal headers for reliable AI ingestion.
*/
function normalizeSlug(input) {
return (input || "").trim().replace(/^\/+|\/+$/g, "");
}
function tryRead(p) {
try {
if (!fs.existsSync(p)) return null;
const body = fs.readFileSync(p, "utf8");
if (!body || body.trim().length === 0) return null;
return body;
} catch {
return null;
}
}
exports.handler = async (event) => {
const slugRaw =
event.queryStringParameters && event.queryStringParameters.slug;
const slug = normalizeSlug(slugRaw);
if (!slug) {
return {
statusCode: 400,
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Access-Control-Allow-Origin": "*",
},
body: "missing slug",
};
}
const filename = slug.endsWith(".md") ? slug : `${slug}.md`;
const root = process.cwd();
const candidates = [
path.join(root, "public", "raw", filename),
path.join(root, "dist", "raw", filename),
];
let body = null;
for (const p of candidates) {
body = tryRead(p);
if (body) break;
}
if (!body) {
return {
statusCode: 404,
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Access-Control-Allow-Origin": "*",
},
body: `not found: ${filename}`,
};
}
return {
statusCode: 200,
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Access-Control-Allow-Origin": "*",
"Cache-Control": "public, max-age=3600",
},
body,
};
};

View File

@@ -1,85 +1,9 @@
import { useState, useRef, useEffect, useCallback } from "react"; import { useState, useRef, useEffect, useCallback } from "react";
import { import { Copy, Check, AlertCircle, FileText, Download } from "lucide-react";
Copy,
MessageSquare,
Sparkles,
Search,
Check,
AlertCircle,
FileText,
Download,
} from "lucide-react";
// Maximum URL length for query parameters (conservative limit) // Maximum URL length for query parameters (conservative limit)
const MAX_URL_LENGTH = 6000; const MAX_URL_LENGTH = 6000;
// AI service configurations
interface AIService {
id: string;
name: string;
icon: typeof Copy;
baseUrl: string;
description: string;
supportsUrlPrefill: boolean;
// Custom URL builder for services with special formats
buildUrl?: (prompt: string) => string;
// URL-based builder - takes raw markdown file URL for better AI parsing
buildUrlFromRawMarkdown?: (rawMarkdownUrl: string) => string;
}
// AI services configuration - uses raw markdown URLs for better AI parsing
const AI_SERVICES: AIService[] = [
{
id: "chatgpt",
name: "ChatGPT",
icon: MessageSquare,
baseUrl: "https://chatgpt.com/",
description: "Analyze with ChatGPT",
supportsUrlPrefill: true,
// Uses raw markdown file URL for direct content access
buildUrlFromRawMarkdown: (rawMarkdownUrl) => {
const prompt =
`Attempt to load and read the raw markdown at the URL below.\n` +
`If successful provide a concise summary and then ask what the user needs help with.\n` +
`If not accessible do not guess the content. State that the page could not be loaded and ask the user how you can help.\n\n` +
`${rawMarkdownUrl}`;
return `https://chatgpt.com/?q=${encodeURIComponent(prompt)}`;
},
},
{
id: "claude",
name: "Claude",
icon: Sparkles,
baseUrl: "https://claude.ai/",
description: "Analyze with Claude",
supportsUrlPrefill: true,
buildUrlFromRawMarkdown: (rawMarkdownUrl) => {
const prompt =
`Attempt to load and read the raw markdown at the URL below.\n` +
`If successful provide a concise summary and then ask what the user needs help with.\n` +
`If not accessible do not guess the content. State that the page could not be loaded and ask the user how you can help.\n\n` +
`${rawMarkdownUrl}`;
return `https://claude.ai/new?q=${encodeURIComponent(prompt)}`;
},
},
{
id: "perplexity",
name: "Perplexity",
icon: Search,
baseUrl: "https://www.perplexity.ai/search",
description: "Research with Perplexity",
supportsUrlPrefill: true,
buildUrlFromRawMarkdown: (rawMarkdownUrl) => {
const prompt =
`Attempt to load and read the raw markdown at the URL below.\n` +
`If successful provide a concise summary and then ask what the user needs help with.\n` +
`If not accessible do not guess the content. State that the page could not be loaded and ask the user how you can help.\n\n` +
`${rawMarkdownUrl}`;
return `https://www.perplexity.ai/search?q=${encodeURIComponent(prompt)}`;
},
},
];
// Extended props interface with optional metadata // Extended props interface with optional metadata
interface CopyPageDropdownProps { interface CopyPageDropdownProps {
title: string; title: string;
@@ -321,67 +245,6 @@ export default function CopyPageDropdown(props: CopyPageDropdownProps) {
setTimeout(() => setIsOpen(false), 1500); setTimeout(() => setIsOpen(false), 1500);
}; };
// Generic handler for opening AI services
// Uses /api/raw/:slug endpoint for AI tools (ChatGPT, Claude, Perplexity)
// IMPORTANT: window.open must happen BEFORE any await to avoid popup blockers
const handleOpenInAI = async (service: AIService) => {
// Use /api/raw/:slug endpoint for AI tools - more reliable than static /raw/*.md files
if (service.buildUrlFromRawMarkdown) {
// Build absolute API URL using current origin
// Uses Netlify Function endpoint that returns text/plain with minimal headers
const apiRawUrl = new URL(
`/api/raw/${props.slug}`,
window.location.origin,
).toString();
const targetUrl = service.buildUrlFromRawMarkdown(apiRawUrl);
window.open(targetUrl, "_blank");
setIsOpen(false);
return;
}
// Other services: send full markdown content
const markdown = formatAsMarkdown(props);
const prompt = `Please analyze this article:\n\n${markdown}`;
// Build the target URL using the service's buildUrl function
if (!service.buildUrl) {
// Fallback: open base URL FIRST (sync), then copy to clipboard
window.open(service.baseUrl, "_blank");
const success = await writeToClipboard(markdown);
if (success) {
setFeedback("url-too-long");
setFeedbackMessage("Copied! Paste in " + service.name);
} else {
setFeedback("error");
setFeedbackMessage("Failed to copy content");
}
clearFeedback();
return;
}
const targetUrl = service.buildUrl(prompt);
// Check URL length - if too long, open base URL then copy to clipboard
if (isUrlTooLong(targetUrl)) {
// Open window FIRST (must be sync to avoid popup blocker)
window.open(service.baseUrl, "_blank");
const success = await writeToClipboard(markdown);
if (success) {
setFeedback("url-too-long");
setFeedbackMessage("Copied! Paste in " + service.name);
} else {
setFeedback("error");
setFeedbackMessage("Failed to copy content");
}
clearFeedback();
} else {
// URL is within limits, open directly with prefilled content
window.open(targetUrl, "_blank");
setIsOpen(false);
}
};
// Handle download skill file (Anthropic Agent Skills format) // Handle download skill file (Anthropic Agent Skills format)
const handleDownloadSkill = () => { const handleDownloadSkill = () => {
const skillContent = formatAsSkill(props); const skillContent = formatAsSkill(props);
@@ -423,6 +286,10 @@ export default function CopyPageDropdown(props: CopyPageDropdownProps) {
} }
}; };
// Suppress unused variable warnings for functions that may be used later
void isUrlTooLong;
void MAX_URL_LENGTH;
return ( return (
<div className="copy-page-dropdown" ref={dropdownRef}> <div className="copy-page-dropdown" ref={dropdownRef}>
{/* Trigger button with ARIA attributes */} {/* Trigger button with ARIA attributes */}
@@ -484,33 +351,6 @@ export default function CopyPageDropdown(props: CopyPageDropdownProps) {
</div> </div>
</button> </button>
{/* AI service options */}
{AI_SERVICES.map((service) => {
const Icon = service.icon;
return (
<button
key={service.id}
className="copy-page-item"
onClick={() => handleOpenInAI(service)}
role="menuitem"
tabIndex={0}
>
<Icon size={16} className="copy-page-icon" aria-hidden="true" />
<div className="copy-page-item-content">
<span className="copy-page-item-title">
Open in {service.name}
<span className="external-arrow" aria-hidden="true">
</span>
</span>
<span className="copy-page-item-desc">
{service.description}
</span>
</div>
</button>
);
})}
{/* View as Markdown option */} {/* View as Markdown option */}
<button <button
className="copy-page-item" className="copy-page-item"
@@ -553,6 +393,14 @@ export default function CopyPageDropdown(props: CopyPageDropdownProps) {
</span> </span>
</div> </div>
</button> </button>
{/* AI service options temporarily disabled
* ChatGPT, Claude, and Perplexity links were removed because
* Netlify edge functions block AI crawler fetch requests to /raw/*.md
* despite multiple configuration attempts. See blog post:
* /netlify-edge-excludedpath-ai-crawlers for details.
* Users can still copy markdown and paste into AI tools.
*/}
</div> </div>
)} )}
</div> </div>