mirror of
https://github.com/waynesutton/markdown-site.git
synced 2026-01-12 04:09:14 +00:00
Canonical URL fix for search engines (GitHub Issue #6) and other seo fixes"
This commit is contained in:
@@ -1408,6 +1408,50 @@ Your blog includes these API endpoints for search engines and AI:
|
||||
| `/openapi.yaml` | OpenAPI 3.0 specification |
|
||||
| `/llms.txt` | AI agent discovery |
|
||||
|
||||
## SEO and Bot Detection
|
||||
|
||||
Your site includes intelligent bot detection that serves different responses to different visitors.
|
||||
|
||||
### How It Works
|
||||
|
||||
The `netlify/edge-functions/botMeta.ts` edge function intercepts requests and serves pre-rendered HTML with correct meta tags to:
|
||||
|
||||
- **Social preview bots** (Twitter, Facebook, LinkedIn, Discord): Get Open Graph tags for link previews
|
||||
- **Search engine bots** (Google, Bing, DuckDuckGo): Get correct canonical URLs
|
||||
|
||||
Regular browsers and AI crawlers receive the normal SPA and let JavaScript update the meta tags.
|
||||
|
||||
### Configuration
|
||||
|
||||
Edit the bot arrays at the top of `netlify/edge-functions/botMeta.ts` to customize which bots receive pre-rendered HTML:
|
||||
|
||||
```typescript
|
||||
// Social preview bots - for link previews
|
||||
const SOCIAL_PREVIEW_BOTS = ["twitterbot", "facebookexternalhit", ...];
|
||||
|
||||
// Search engine bots - for correct canonical URLs
|
||||
const SEARCH_ENGINE_BOTS = ["googlebot", "bingbot", ...];
|
||||
|
||||
// AI crawlers - get normal SPA (can render JavaScript)
|
||||
const AI_CRAWLERS = ["gptbot", "claudebot", ...];
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
Verify bot detection with curl:
|
||||
|
||||
```bash
|
||||
# Simulate Googlebot
|
||||
curl -H "User-Agent: Googlebot" https://yoursite.com/post-slug | grep canonical
|
||||
# Expected: correct page canonical
|
||||
|
||||
# Normal request
|
||||
curl https://yoursite.com/post-slug | grep canonical
|
||||
# Expected: homepage canonical (JavaScript will update it)
|
||||
```
|
||||
|
||||
See `FORK_CONFIG.md` for detailed configuration options.
|
||||
|
||||
## Import External Content
|
||||
|
||||
Use Firecrawl to import articles from external URLs as markdown posts:
|
||||
|
||||
@@ -11,6 +11,27 @@ docsSectionOrder: 4
|
||||
|
||||
All notable changes to this project.
|
||||
|
||||
## v2.12.0
|
||||
|
||||
Released January 7, 2026
|
||||
|
||||
**Canonical URL fix for search engines (GitHub Issue #6)**
|
||||
|
||||
Fixed a mismatch where raw HTML was showing the homepage canonical URL instead of the page-specific canonical URL. Search engines that check raw HTML before rendering JavaScript now receive the correct canonical tags.
|
||||
|
||||
**Changes:**
|
||||
|
||||
- Added search engine bot detection (Google, Bing, DuckDuckGo, etc.) to serve pre-rendered HTML
|
||||
- Search engines now receive correct canonical URLs in the initial HTML response
|
||||
- Added SEO Bot Configuration documentation in FORK_CONFIG.md and setup-guide.md
|
||||
- Bot detection arrays are easily customizable in `netlify/edge-functions/botMeta.ts`
|
||||
|
||||
**For forkers:**
|
||||
|
||||
The bot detection configuration is documented with clear comments at the top of `botMeta.ts`. You can customize which bots receive pre-rendered HTML by editing the `SOCIAL_PREVIEW_BOTS`, `SEARCH_ENGINE_BOTS`, and `AI_CRAWLERS` arrays.
|
||||
|
||||
---
|
||||
|
||||
## v2.11.0
|
||||
|
||||
Released January 6, 2026
|
||||
|
||||
@@ -31,4 +31,6 @@ agents. -->
|
||||
|
||||
**Sync Commands** - Sync discovery commands to update AGENTS.md, CLAUDE.md, and llms.txt
|
||||
|
||||
**Semantic search** - Find content by meaning, not just keywords, using vector embeddings.
|
||||
**Semantic search** - Find content by meaning, not just keywords.
|
||||
|
||||
**Ask AI** - Chat with your site content. Get answers with sources.
|
||||
|
||||
Reference in New Issue
Block a user