Canonical URL fix for search engines (GitHub Issue #6) and other seo fixes"

This commit is contained in:
Wayne Sutton
2026-01-07 21:48:41 -08:00
parent b274ddf3c9
commit 1257fa220f
41 changed files with 537 additions and 55 deletions

View File

@@ -1408,6 +1408,50 @@ Your blog includes these API endpoints for search engines and AI:
| `/openapi.yaml` | OpenAPI 3.0 specification |
| `/llms.txt` | AI agent discovery |
## SEO and Bot Detection
Your site includes intelligent bot detection that serves different responses to different visitors.
### How It Works
The `netlify/edge-functions/botMeta.ts` edge function intercepts requests and serves pre-rendered HTML with correct meta tags to:
- **Social preview bots** (Twitter, Facebook, LinkedIn, Discord): Get Open Graph tags for link previews
- **Search engine bots** (Google, Bing, DuckDuckGo): Get correct canonical URLs
Regular browsers and AI crawlers receive the normal SPA and let JavaScript update the meta tags.
### Configuration
Edit the bot arrays at the top of `netlify/edge-functions/botMeta.ts` to customize which bots receive pre-rendered HTML:
```typescript
// Social preview bots - for link previews
const SOCIAL_PREVIEW_BOTS = ["twitterbot", "facebookexternalhit", ...];
// Search engine bots - for correct canonical URLs
const SEARCH_ENGINE_BOTS = ["googlebot", "bingbot", ...];
// AI crawlers - get normal SPA (can render JavaScript)
const AI_CRAWLERS = ["gptbot", "claudebot", ...];
```
### Testing
Verify bot detection with curl:
```bash
# Simulate Googlebot
curl -H "User-Agent: Googlebot" https://yoursite.com/post-slug | grep canonical
# Expected: correct page canonical
# Normal request
curl https://yoursite.com/post-slug | grep canonical
# Expected: homepage canonical (JavaScript will update it)
```
See `FORK_CONFIG.md` for detailed configuration options.
## Import External Content
Use Firecrawl to import articles from external URLs as markdown posts:

View File

@@ -11,6 +11,27 @@ docsSectionOrder: 4
All notable changes to this project.
## v2.12.0
Released January 7, 2026
**Canonical URL fix for search engines (GitHub Issue #6)**
Fixed a mismatch where raw HTML was showing the homepage canonical URL instead of the page-specific canonical URL. Search engines that check raw HTML before rendering JavaScript now receive the correct canonical tags.
**Changes:**
- Added search engine bot detection (Google, Bing, DuckDuckGo, etc.) to serve pre-rendered HTML
- Search engines now receive correct canonical URLs in the initial HTML response
- Added SEO Bot Configuration documentation in FORK_CONFIG.md and setup-guide.md
- Bot detection arrays are easily customizable in `netlify/edge-functions/botMeta.ts`
**For forkers:**
The bot detection configuration is documented with clear comments at the top of `botMeta.ts`. You can customize which bots receive pre-rendered HTML by editing the `SOCIAL_PREVIEW_BOTS`, `SEARCH_ENGINE_BOTS`, and `AI_CRAWLERS` arrays.
---
## v2.11.0
Released January 6, 2026

View File

@@ -31,4 +31,6 @@ agents. -->
**Sync Commands** - Sync discovery commands to update AGENTS.md, CLAUDE.md, and llms.txt
**Semantic search** - Find content by meaning, not just keywords, using vector embeddings.
**Semantic search** - Find content by meaning, not just keywords.
**Ask AI** - Chat with your site content. Get answers with sources.