mirror of
https://github.com/waynesutton/markdown-site.git
synced 2026-01-11 20:08:57 +00:00
fix: disable AI service links due to Netlify edge function issues
- Remove /api/raw Netlify Function that caused build failures - Comment out ChatGPT/Claude/Perplexity buttons in CopyPageDropdown - Keep Copy page, View as Markdown, Download as SKILL.md options - Update blog post with detailed log of attempted solutions - Clean up netlify.toml by removing broken redirect rule Users can still copy markdown and paste into AI tools manually. The raw markdown files work in browsers but AI crawlers cannot fetch them reliably due to Netlify edge function interception.
This commit is contained in:
@@ -5,7 +5,7 @@ date: "2025-12-21"
|
||||
slug: "netlify-edge-excludedpath-ai-crawlers"
|
||||
published: true
|
||||
tags: ["netlify", "edge-functions", "ai", "troubleshooting", "help"]
|
||||
readTime: "4 min read"
|
||||
readTime: "5 min read"
|
||||
featured: false
|
||||
---
|
||||
|
||||
@@ -36,78 +36,157 @@ The page could not be loaded with the tools currently available, so its raw mark
|
||||
**Claude:**
|
||||
Works. Loads and reads the markdown successfully.
|
||||
|
||||
## Current configuration
|
||||
## Attempted solutions log
|
||||
|
||||
Static files exist in `public/raw/` and are served via `_redirects`:
|
||||
### December 24, 2025
|
||||
|
||||
```
|
||||
/raw/* /raw/:splat 200
|
||||
```
|
||||
**Attempt 1: excludedPath in netlify.toml**
|
||||
|
||||
Edge function configuration in `netlify.toml`:
|
||||
Added array of excluded paths to the edge function declaration:
|
||||
|
||||
```toml
|
||||
[[edge_functions]]
|
||||
path = "/*"
|
||||
function = "botMeta"
|
||||
excludedPath = "/raw/*"
|
||||
excludedPath = [
|
||||
"/raw/*",
|
||||
"/assets/*",
|
||||
"/api/*",
|
||||
"/.netlify/*",
|
||||
"/favicon.ico",
|
||||
"/favicon.svg",
|
||||
"/robots.txt",
|
||||
"/sitemap.xml",
|
||||
"/llms.txt",
|
||||
"/openapi.yaml"
|
||||
]
|
||||
```
|
||||
|
||||
The `botMeta` function also has a code-level check:
|
||||
Result: ChatGPT and Perplexity still blocked.
|
||||
|
||||
**Attempt 2: Hard bypass in botMeta.ts**
|
||||
|
||||
Added early return at top of handler to guarantee static markdown is never intercepted:
|
||||
|
||||
```typescript
|
||||
// Skip if it's the home page, static assets, API routes, or raw markdown files
|
||||
const url = new URL(request.url);
|
||||
if (
|
||||
pathParts.length === 0 ||
|
||||
pathParts[0].includes(".") ||
|
||||
pathParts[0] === "api" ||
|
||||
pathParts[0] === "_next" ||
|
||||
pathParts[0] === "raw" // This check exists
|
||||
url.pathname.startsWith("/raw/") ||
|
||||
url.pathname.startsWith("/assets/") ||
|
||||
url.pathname.startsWith("/api/") ||
|
||||
url.pathname.startsWith("/.netlify/") ||
|
||||
url.pathname.endsWith(".md")
|
||||
) {
|
||||
return context.next();
|
||||
}
|
||||
```
|
||||
|
||||
## Why it's not working
|
||||
Result: ChatGPT and Perplexity still blocked.
|
||||
|
||||
Despite `excludedPath = "/raw/*"` and the code check, the edge function still intercepts requests to `/raw/*.md` before static files are served.
|
||||
**Attempt 3: AI crawler whitelist**
|
||||
|
||||
According to Netlify docs, edge functions run before redirects and static file serving. The `excludedPath` should prevent the function from running, but it appears the function still executes and may be returning a response that blocks static file access.
|
||||
|
||||
## What we've tried
|
||||
|
||||
1. Added `excludedPath = "/raw/*"` in netlify.toml
|
||||
2. Added code-level check in botMeta.ts to skip `/raw/` paths
|
||||
3. Verified static files exist in `public/raw/` after build
|
||||
4. Confirmed `_redirects` rule for `/raw/*` is in place
|
||||
5. Tested with different URLPattern syntax (`/raw/*`, `/**/*.md`)
|
||||
|
||||
All attempts result in the same behavior: ChatGPT and Perplexity cannot access the files, while Claude can.
|
||||
|
||||
## Why Claude works
|
||||
|
||||
Claude's web fetcher may use different headers or handle Netlify's edge function responses differently. It successfully bypasses whatever is blocking ChatGPT and Perplexity.
|
||||
|
||||
## The question
|
||||
|
||||
How can we configure Netlify edge functions to truly exclude `/raw/*` paths so static markdown files are served directly to all AI crawlers without interception?
|
||||
|
||||
Is there a configuration issue with `excludedPath`? Should we use a different approach like header-based matching to exclude AI crawlers from the botMeta function? Or is there a processing order issue where edge functions always run before static files regardless of exclusions?
|
||||
|
||||
## Code reference
|
||||
|
||||
The CopyPageDropdown component sends these URLs to AI services:
|
||||
Added explicit bypass for known AI user agents:
|
||||
|
||||
```typescript
|
||||
const rawMarkdownUrl = `${origin}/raw/${props.slug}.md`;
|
||||
const AI_CRAWLERS = [
|
||||
"gptbot", "chatgpt", "chatgpt-user", "oai-searchbot",
|
||||
"claude-web", "claudebot", "anthropic", "perplexitybot"
|
||||
];
|
||||
|
||||
if (isAICrawler(userAgent)) {
|
||||
return context.next();
|
||||
}
|
||||
```
|
||||
|
||||
Example: `https://www.markdown.fast/raw/fork-configuration-guide.md`
|
||||
Result: ChatGPT and Perplexity still blocked.
|
||||
|
||||
The files exist. The redirects are configured. The edge function has exclusions. But AI crawlers still cannot access them.
|
||||
**Attempt 4: Netlify Function at /api/raw/:slug**
|
||||
|
||||
Created a serverless function to serve markdown files directly:
|
||||
|
||||
```javascript
|
||||
// netlify/functions/raw.js
|
||||
exports.handler = async (event) => {
|
||||
const slug = event.queryStringParameters?.slug;
|
||||
// Read from dist/raw/${slug}.md or public/raw/${slug}.md
|
||||
return {
|
||||
statusCode: 200,
|
||||
headers: { "Content-Type": "text/plain; charset=utf-8" },
|
||||
body: markdownContent
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
With redirect rule:
|
||||
|
||||
```toml
|
||||
[[redirects]]
|
||||
from = "/api/raw/*"
|
||||
to = "/.netlify/functions/raw?slug=:splat"
|
||||
status = 200
|
||||
force = true
|
||||
```
|
||||
|
||||
Result: Netlify build failures due to function bundling issues and `package-lock.json` dependency conflicts.
|
||||
|
||||
**Attempt 5: Header adjustments**
|
||||
|
||||
Removed `Link` header from global scope to prevent header merging on `/raw/*`:
|
||||
|
||||
```toml
|
||||
[[headers]]
|
||||
for = "/*"
|
||||
[headers.values]
|
||||
X-Frame-Options = "DENY"
|
||||
# Link header removed from global scope
|
||||
|
||||
[[headers]]
|
||||
for = "/index.html"
|
||||
[headers.values]
|
||||
Link = "</llms.txt>; rel=\"author\""
|
||||
```
|
||||
|
||||
Removed `X-Robots-Tag = "noindex"` from `/raw/*` headers.
|
||||
|
||||
Result: ChatGPT and Perplexity still blocked.
|
||||
|
||||
### Why these attempts failed
|
||||
|
||||
The core issue appears to be how ChatGPT and Perplexity fetch URLs. Their tools receive 400 or 403 responses even when `curl` from the command line works. This suggests:
|
||||
|
||||
1. Netlify may handle AI crawler user agents differently at the CDN level
|
||||
2. The edge function exclusions work for browsers but not for AI fetch tools
|
||||
3. There may be rate limiting or bot protection enabled by default
|
||||
|
||||
## Current workaround
|
||||
|
||||
Users can still share content with AI tools by:
|
||||
|
||||
1. **Copy page** copies markdown to clipboard, then paste into any AI
|
||||
2. **View as Markdown** opens the raw `.md` file in a browser tab for manual copying
|
||||
3. **Download as SKILL.md** downloads in Anthropic Agent Skills format
|
||||
|
||||
The direct "Open in ChatGPT/Claude/Perplexity" buttons have been disabled since the URLs don't work reliably.
|
||||
|
||||
## Working features
|
||||
|
||||
Despite AI crawler issues, these features work correctly:
|
||||
|
||||
- `/raw/*.md` files load in browsers
|
||||
- `llms.txt` discovery file is accessible
|
||||
- `openapi.yaml` API spec loads properly
|
||||
- Sitemap and RSS feeds generate correctly
|
||||
- Social preview bots (Twitter, Facebook, LinkedIn) receive OG metadata
|
||||
- Claude's web fetcher can access raw markdown
|
||||
|
||||
## Help needed
|
||||
|
||||
If you've solved this or have suggestions, we'd appreciate guidance. The goal is simple: serve static markdown files at `/raw/*.md` to all clients, including AI crawlers, without edge function interception.
|
||||
If you've solved this or have suggestions, open an issue. We've tried:
|
||||
|
||||
GitHub raw URLs work as a workaround, but we'd prefer to use Netlify-hosted files for consistency and to avoid requiring users to configure GitHub repo details when forking.
|
||||
- netlify.toml excludedPath arrays
|
||||
- Code-level path checks in edge functions
|
||||
- AI crawler user agent whitelisting
|
||||
- Netlify Functions as an alternative endpoint
|
||||
- Header configuration adjustments
|
||||
|
||||
None have worked for ChatGPT or Perplexity. GitHub raw URLs remain the most reliable option for AI consumption, but require additional repository configuration when forking.
|
||||
|
||||
@@ -5,13 +5,6 @@
|
||||
[build.environment]
|
||||
NODE_VERSION = "20"
|
||||
|
||||
# API raw markdown endpoint for AI tools (ChatGPT, Claude, Perplexity)
|
||||
[[redirects]]
|
||||
from = "/api/raw/*"
|
||||
to = "/.netlify/functions/raw?slug=:splat"
|
||||
status = 200
|
||||
force = true
|
||||
|
||||
# Raw markdown passthrough - explicit rule prevents SPA fallback from intercepting
|
||||
[[redirects]]
|
||||
from = "/raw/*"
|
||||
|
||||
@@ -1,77 +0,0 @@
|
||||
const fs = require("fs");
|
||||
const path = require("path");
|
||||
|
||||
/**
|
||||
* Netlify Function: /api/raw/:slug
|
||||
*
|
||||
* Serves raw markdown files for AI tools (ChatGPT, Claude, Perplexity).
|
||||
* Returns text/plain with minimal headers for reliable AI ingestion.
|
||||
*/
|
||||
|
||||
function normalizeSlug(input) {
|
||||
return (input || "").trim().replace(/^\/+|\/+$/g, "");
|
||||
}
|
||||
|
||||
function tryRead(p) {
|
||||
try {
|
||||
if (!fs.existsSync(p)) return null;
|
||||
const body = fs.readFileSync(p, "utf8");
|
||||
if (!body || body.trim().length === 0) return null;
|
||||
return body;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
exports.handler = async (event) => {
|
||||
const slugRaw =
|
||||
event.queryStringParameters && event.queryStringParameters.slug;
|
||||
const slug = normalizeSlug(slugRaw);
|
||||
|
||||
if (!slug) {
|
||||
return {
|
||||
statusCode: 400,
|
||||
headers: {
|
||||
"Content-Type": "text/plain; charset=utf-8",
|
||||
"Access-Control-Allow-Origin": "*",
|
||||
},
|
||||
body: "missing slug",
|
||||
};
|
||||
}
|
||||
|
||||
const filename = slug.endsWith(".md") ? slug : `${slug}.md`;
|
||||
const root = process.cwd();
|
||||
|
||||
const candidates = [
|
||||
path.join(root, "public", "raw", filename),
|
||||
path.join(root, "dist", "raw", filename),
|
||||
];
|
||||
|
||||
let body = null;
|
||||
for (const p of candidates) {
|
||||
body = tryRead(p);
|
||||
if (body) break;
|
||||
}
|
||||
|
||||
if (!body) {
|
||||
return {
|
||||
statusCode: 404,
|
||||
headers: {
|
||||
"Content-Type": "text/plain; charset=utf-8",
|
||||
"Access-Control-Allow-Origin": "*",
|
||||
},
|
||||
body: `not found: ${filename}`,
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
statusCode: 200,
|
||||
headers: {
|
||||
"Content-Type": "text/plain; charset=utf-8",
|
||||
"Access-Control-Allow-Origin": "*",
|
||||
"Cache-Control": "public, max-age=3600",
|
||||
},
|
||||
body,
|
||||
};
|
||||
};
|
||||
|
||||
@@ -1,85 +1,9 @@
|
||||
import { useState, useRef, useEffect, useCallback } from "react";
|
||||
import {
|
||||
Copy,
|
||||
MessageSquare,
|
||||
Sparkles,
|
||||
Search,
|
||||
Check,
|
||||
AlertCircle,
|
||||
FileText,
|
||||
Download,
|
||||
} from "lucide-react";
|
||||
import { Copy, Check, AlertCircle, FileText, Download } from "lucide-react";
|
||||
|
||||
// Maximum URL length for query parameters (conservative limit)
|
||||
const MAX_URL_LENGTH = 6000;
|
||||
|
||||
// AI service configurations
|
||||
interface AIService {
|
||||
id: string;
|
||||
name: string;
|
||||
icon: typeof Copy;
|
||||
baseUrl: string;
|
||||
description: string;
|
||||
supportsUrlPrefill: boolean;
|
||||
// Custom URL builder for services with special formats
|
||||
buildUrl?: (prompt: string) => string;
|
||||
// URL-based builder - takes raw markdown file URL for better AI parsing
|
||||
buildUrlFromRawMarkdown?: (rawMarkdownUrl: string) => string;
|
||||
}
|
||||
|
||||
// AI services configuration - uses raw markdown URLs for better AI parsing
|
||||
const AI_SERVICES: AIService[] = [
|
||||
{
|
||||
id: "chatgpt",
|
||||
name: "ChatGPT",
|
||||
icon: MessageSquare,
|
||||
baseUrl: "https://chatgpt.com/",
|
||||
description: "Analyze with ChatGPT",
|
||||
supportsUrlPrefill: true,
|
||||
// Uses raw markdown file URL for direct content access
|
||||
buildUrlFromRawMarkdown: (rawMarkdownUrl) => {
|
||||
const prompt =
|
||||
`Attempt to load and read the raw markdown at the URL below.\n` +
|
||||
`If successful provide a concise summary and then ask what the user needs help with.\n` +
|
||||
`If not accessible do not guess the content. State that the page could not be loaded and ask the user how you can help.\n\n` +
|
||||
`${rawMarkdownUrl}`;
|
||||
return `https://chatgpt.com/?q=${encodeURIComponent(prompt)}`;
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "claude",
|
||||
name: "Claude",
|
||||
icon: Sparkles,
|
||||
baseUrl: "https://claude.ai/",
|
||||
description: "Analyze with Claude",
|
||||
supportsUrlPrefill: true,
|
||||
buildUrlFromRawMarkdown: (rawMarkdownUrl) => {
|
||||
const prompt =
|
||||
`Attempt to load and read the raw markdown at the URL below.\n` +
|
||||
`If successful provide a concise summary and then ask what the user needs help with.\n` +
|
||||
`If not accessible do not guess the content. State that the page could not be loaded and ask the user how you can help.\n\n` +
|
||||
`${rawMarkdownUrl}`;
|
||||
return `https://claude.ai/new?q=${encodeURIComponent(prompt)}`;
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "perplexity",
|
||||
name: "Perplexity",
|
||||
icon: Search,
|
||||
baseUrl: "https://www.perplexity.ai/search",
|
||||
description: "Research with Perplexity",
|
||||
supportsUrlPrefill: true,
|
||||
buildUrlFromRawMarkdown: (rawMarkdownUrl) => {
|
||||
const prompt =
|
||||
`Attempt to load and read the raw markdown at the URL below.\n` +
|
||||
`If successful provide a concise summary and then ask what the user needs help with.\n` +
|
||||
`If not accessible do not guess the content. State that the page could not be loaded and ask the user how you can help.\n\n` +
|
||||
`${rawMarkdownUrl}`;
|
||||
return `https://www.perplexity.ai/search?q=${encodeURIComponent(prompt)}`;
|
||||
},
|
||||
},
|
||||
];
|
||||
|
||||
// Extended props interface with optional metadata
|
||||
interface CopyPageDropdownProps {
|
||||
title: string;
|
||||
@@ -321,67 +245,6 @@ export default function CopyPageDropdown(props: CopyPageDropdownProps) {
|
||||
setTimeout(() => setIsOpen(false), 1500);
|
||||
};
|
||||
|
||||
// Generic handler for opening AI services
|
||||
// Uses /api/raw/:slug endpoint for AI tools (ChatGPT, Claude, Perplexity)
|
||||
// IMPORTANT: window.open must happen BEFORE any await to avoid popup blockers
|
||||
const handleOpenInAI = async (service: AIService) => {
|
||||
// Use /api/raw/:slug endpoint for AI tools - more reliable than static /raw/*.md files
|
||||
if (service.buildUrlFromRawMarkdown) {
|
||||
// Build absolute API URL using current origin
|
||||
// Uses Netlify Function endpoint that returns text/plain with minimal headers
|
||||
const apiRawUrl = new URL(
|
||||
`/api/raw/${props.slug}`,
|
||||
window.location.origin,
|
||||
).toString();
|
||||
const targetUrl = service.buildUrlFromRawMarkdown(apiRawUrl);
|
||||
|
||||
window.open(targetUrl, "_blank");
|
||||
setIsOpen(false);
|
||||
return;
|
||||
}
|
||||
|
||||
// Other services: send full markdown content
|
||||
const markdown = formatAsMarkdown(props);
|
||||
const prompt = `Please analyze this article:\n\n${markdown}`;
|
||||
|
||||
// Build the target URL using the service's buildUrl function
|
||||
if (!service.buildUrl) {
|
||||
// Fallback: open base URL FIRST (sync), then copy to clipboard
|
||||
window.open(service.baseUrl, "_blank");
|
||||
const success = await writeToClipboard(markdown);
|
||||
if (success) {
|
||||
setFeedback("url-too-long");
|
||||
setFeedbackMessage("Copied! Paste in " + service.name);
|
||||
} else {
|
||||
setFeedback("error");
|
||||
setFeedbackMessage("Failed to copy content");
|
||||
}
|
||||
clearFeedback();
|
||||
return;
|
||||
}
|
||||
|
||||
const targetUrl = service.buildUrl(prompt);
|
||||
|
||||
// Check URL length - if too long, open base URL then copy to clipboard
|
||||
if (isUrlTooLong(targetUrl)) {
|
||||
// Open window FIRST (must be sync to avoid popup blocker)
|
||||
window.open(service.baseUrl, "_blank");
|
||||
const success = await writeToClipboard(markdown);
|
||||
if (success) {
|
||||
setFeedback("url-too-long");
|
||||
setFeedbackMessage("Copied! Paste in " + service.name);
|
||||
} else {
|
||||
setFeedback("error");
|
||||
setFeedbackMessage("Failed to copy content");
|
||||
}
|
||||
clearFeedback();
|
||||
} else {
|
||||
// URL is within limits, open directly with prefilled content
|
||||
window.open(targetUrl, "_blank");
|
||||
setIsOpen(false);
|
||||
}
|
||||
};
|
||||
|
||||
// Handle download skill file (Anthropic Agent Skills format)
|
||||
const handleDownloadSkill = () => {
|
||||
const skillContent = formatAsSkill(props);
|
||||
@@ -423,6 +286,10 @@ export default function CopyPageDropdown(props: CopyPageDropdownProps) {
|
||||
}
|
||||
};
|
||||
|
||||
// Suppress unused variable warnings for functions that may be used later
|
||||
void isUrlTooLong;
|
||||
void MAX_URL_LENGTH;
|
||||
|
||||
return (
|
||||
<div className="copy-page-dropdown" ref={dropdownRef}>
|
||||
{/* Trigger button with ARIA attributes */}
|
||||
@@ -484,33 +351,6 @@ export default function CopyPageDropdown(props: CopyPageDropdownProps) {
|
||||
</div>
|
||||
</button>
|
||||
|
||||
{/* AI service options */}
|
||||
{AI_SERVICES.map((service) => {
|
||||
const Icon = service.icon;
|
||||
return (
|
||||
<button
|
||||
key={service.id}
|
||||
className="copy-page-item"
|
||||
onClick={() => handleOpenInAI(service)}
|
||||
role="menuitem"
|
||||
tabIndex={0}
|
||||
>
|
||||
<Icon size={16} className="copy-page-icon" aria-hidden="true" />
|
||||
<div className="copy-page-item-content">
|
||||
<span className="copy-page-item-title">
|
||||
Open in {service.name}
|
||||
<span className="external-arrow" aria-hidden="true">
|
||||
↗
|
||||
</span>
|
||||
</span>
|
||||
<span className="copy-page-item-desc">
|
||||
{service.description}
|
||||
</span>
|
||||
</div>
|
||||
</button>
|
||||
);
|
||||
})}
|
||||
|
||||
{/* View as Markdown option */}
|
||||
<button
|
||||
className="copy-page-item"
|
||||
@@ -553,6 +393,14 @@ export default function CopyPageDropdown(props: CopyPageDropdownProps) {
|
||||
</span>
|
||||
</div>
|
||||
</button>
|
||||
|
||||
{/* AI service options temporarily disabled
|
||||
* ChatGPT, Claude, and Perplexity links were removed because
|
||||
* Netlify edge functions block AI crawler fetch requests to /raw/*.md
|
||||
* despite multiple configuration attempts. See blog post:
|
||||
* /netlify-edge-excludedpath-ai-crawlers for details.
|
||||
* Users can still copy markdown and paste into AI tools.
|
||||
*/}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
|
||||
Reference in New Issue
Block a user