Canonical URL fix for search engines (GitHub Issue #6) and other seo fixes"

This commit is contained in:
Wayne Sutton
2026-01-07 21:48:41 -08:00
parent b274ddf3c9
commit 1257fa220f
41 changed files with 537 additions and 55 deletions

View File

@@ -1,6 +1,6 @@
# llms.txt - Information for AI assistants and LLMs
# Learn more: https://llmstxt.org/
# Last updated: 2026-01-06T21:21:00.309Z
# Last updated: 2026-01-07T06:23:37.522Z
> Your content is instantly available to browsers, LLMs, and AI agents.

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
An open-source publishing framework built for AI agents and developers to ship websites, docs, or blogs. Write markdown, sync from the terminal. Your content is instantly available to browsers, LLMs, and AI agents. Built on Convex and Netlify.

View File

@@ -2,11 +2,32 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
All notable changes to this project.
## v2.12.0
Released January 7, 2026
**Canonical URL fix for search engines (GitHub Issue #6)**
Fixed a mismatch where raw HTML was showing the homepage canonical URL instead of the page-specific canonical URL. Search engines that check raw HTML before rendering JavaScript now receive the correct canonical tags.
**Changes:**
- Added search engine bot detection (Google, Bing, DuckDuckGo, etc.) to serve pre-rendered HTML
- Search engines now receive correct canonical URLs in the initial HTML response
- Added SEO Bot Configuration documentation in FORK_CONFIG.md and setup-guide.md
- Bot detection arrays are easily customizable in `netlify/edge-functions/botMeta.ts`
**For forkers:**
The bot detection configuration is documented with clear comments at the top of `botMeta.ts`. You can customize which bots receive pre-rendered HTML by editing the `SOCIAL_PREVIEW_BOTS`, `SEARCH_ENGINE_BOTS`, and `AI_CRAWLERS` arrays.
---
## v2.11.0
Released January 6, 2026

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
You found the contact page. Nice

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Ask AI

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Configuration

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Content

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Dashboard

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Deployment

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Frontmatter

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Keyword Search

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Semantic Search

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
## Getting started

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
Built with [Convex](https://convex.dev) for real-time sync and deployed on [Netlify](https://netlify.com). Read the [project on GitHub](https://github.com/waynesutton/markdown-site) to fork and deploy your own. View [real-time site stats](/stats).

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
An open-source publishing framework built for AI agents and developers to ship **[docs](/docs)**, or **[blogs](/blog)** or **[websites](/)**.
@@ -29,4 +29,6 @@ agents. -->
**Sync Commands** - Sync discovery commands to update AGENTS.md, CLAUDE.md, and llms.txt
**Semantic search** - Find content by meaning, not just keywords, using vector embeddings.
**Semantic search** - Find content by meaning, not just keywords.
**Ask AI** - Chat with your site content. Get answers with sources.

View File

@@ -24,7 +24,9 @@ agents. -->
**Sync Commands** - Sync discovery commands to update AGENTS.md, CLAUDE.md, and llms.txt
**Semantic search** - Find content by meaning, not just keywords, using vector embeddings.
**Semantic search** - Find content by meaning, not just keywords.
**Ask AI** - Chat with your site content. Get answers with sources.
---

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
# Newsletter Demo Page

View File

@@ -2,7 +2,7 @@
---
Type: page
Date: 2026-01-07
Date: 2026-01-08
---
This markdown framework is open source and built to be extended. Here is what ships out of the box.

View File

@@ -1397,6 +1397,50 @@ Your blog includes these API endpoints for search engines and AI:
| `/openapi.yaml` | OpenAPI 3.0 specification |
| `/llms.txt` | AI agent discovery |
## SEO and Bot Detection
Your site includes intelligent bot detection that serves different responses to different visitors.
### How It Works
The `netlify/edge-functions/botMeta.ts` edge function intercepts requests and serves pre-rendered HTML with correct meta tags to:
- **Social preview bots** (Twitter, Facebook, LinkedIn, Discord): Get Open Graph tags for link previews
- **Search engine bots** (Google, Bing, DuckDuckGo): Get correct canonical URLs
Regular browsers and AI crawlers receive the normal SPA and let JavaScript update the meta tags.
### Configuration
Edit the bot arrays at the top of `netlify/edge-functions/botMeta.ts` to customize which bots receive pre-rendered HTML:
```typescript
// Social preview bots - for link previews
const SOCIAL_PREVIEW_BOTS = ["twitterbot", "facebookexternalhit", ...];
// Search engine bots - for correct canonical URLs
const SEARCH_ENGINE_BOTS = ["googlebot", "bingbot", ...];
// AI crawlers - get normal SPA (can render JavaScript)
const AI_CRAWLERS = ["gptbot", "claudebot", ...];
```
### Testing
Verify bot detection with curl:
```bash
# Simulate Googlebot
curl -H "User-Agent: Googlebot" https://yoursite.com/post-slug | grep canonical
# Expected: correct page canonical
# Normal request
curl https://yoursite.com/post-slug | grep canonical
# Expected: homepage canonical (JavaScript will update it)
```
See `FORK_CONFIG.md` for detailed configuration options.
## Import External Content
Use Firecrawl to import articles from external URLs as markdown posts: