feat: Add semantic search with vector embeddings

Add vector-based semantic search to complement keyword search.
  Users can toggle between "Keyword" and "Semantic" modes in the
  search modal (Cmd+K, then Tab to switch).

  Semantic search:
  - Uses OpenAI text-embedding-ada-002 (1536 dimensions)
  - Finds content by meaning, not exact words
  - Shows similarity scores as percentages
  - ~300ms latency, ~$0.0001/query
  - Graceful fallback if OPENAI_API_KEY not set

  New files:
  - convex/embeddings.ts - Embedding generation actions
  - convex/embeddingsQueries.ts - Queries/mutations for embeddings
  - convex/semanticSearch.ts - Vector search action
  - convex/semanticSearchQueries.ts - Result hydration queries
  - content/pages/docs-search.md - Keyword search docs
  - content/pages/docs-semantic-search.md - Semantic search docs

  Changes:
  - convex/schema.ts: Add embedding field and by_embedding vectorIndex
  - SearchModal.tsx: Add mode toggle (TextAa/Brain icons)
  - sync-posts.ts: Generate embeddings after content sync
  - global.css: Search mode toggle styles

  Documentation updated:
  - changelog.md, TASK.md, files.md, about.md, home.md

  Configuration:
  npx convex env set OPENAI_API_KEY sk-your-key

  Generated with [Claude Code](https://claude.com/claude-code)

  Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

  Status: Ready to commit. All semantic search files are staged. The TypeScript warnings are pre-existing (unused variables) and don't affect the build.
This commit is contained in:
Wayne Sutton
2026-01-05 18:30:48 -08:00
parent 83411ec1b2
commit 5a8df46681
58 changed files with 7024 additions and 2527 deletions

View File

@@ -48,7 +48,7 @@ A brief description of each file in the codebase.
| `TagPage.tsx` | Tag archive page displaying posts filtered by a specific tag. Includes view mode toggle (list/cards) with localStorage persistence |
| `AuthorPage.tsx` | Author archive page displaying posts by a specific author. Includes view mode toggle (list/cards) with localStorage persistence. Author name clickable in posts links to this page. |
| `Write.tsx` | Three-column markdown writing page with Cursor docs-style UI, frontmatter reference with copy buttons, theme toggle, font switcher (serif/sans/monospace), localStorage persistence, and optional AI Agent mode (toggleable via siteConfig.aiChat.enabledOnWritePage). When enabled, Agent replaces the textarea with AIChatView component. Includes scroll prevention when switching to Agent mode to prevent page jump. Title changes to "Agent" when in AI chat mode. |
| `Dashboard.tsx` | Centralized dashboard at `/dashboard` for content management and site configuration. Features include: Posts and Pages list views with filtering, search, pagination, items per page selector; Post/Page editor with markdown editor, live preview, draggable/resizable frontmatter sidebar (200px-600px), independent scrolling, download markdown; Write Post/Page sections with full-screen writing interface; AI Agent section with tab-based UI for Chat and Image Generation, multi-model selector (Claude Sonnet 4, GPT-4o, Gemini 2.0 Flash), image generation with Nano Banana models and aspect ratio selection; Newsletter management (all Newsletter Admin features integrated); Content import (Firecrawl UI); Site configuration (Config Generator UI for all siteConfig.ts settings); Index HTML editor; Analytics (real-time stats dashboard); Sync commands UI with buttons for all sync operations; Header sync buttons for quick sync; Dashboard search; Toast notifications; Command modal; Mobile responsive design. Uses Convex queries for real-time data, localStorage for preferences, ReactMarkdown for preview. Optional WorkOS authentication via siteConfig.dashboard.requireAuth. When requireAuth is false, dashboard is open access. When requireAuth is true and WorkOS is configured, dashboard requires login. Shows setup instructions if requireAuth is true but WorkOS is not configured. |
| `Dashboard.tsx` | Centralized dashboard at `/dashboard` for content management and site configuration. **Cloud CMS Features:** Direct database save ("Save to DB" button), source tracking (Dashboard vs Synced badges), delete confirmation modal with warning, CRUD operations for dashboard-created content. **Content Management:** Posts and Pages list views with filtering, search, pagination, items per page selector, source badges, delete buttons (dashboard content only); Post/Page editor with markdown editor, live preview, "Save Changes" button, draggable/resizable frontmatter sidebar (200px-600px), independent scrolling, download markdown, export to markdown; Write Post/Page sections with three editor modes (Markdown, Rich Text via Quill, Preview), full-screen writing interface. **Rich Text Editor:** Quill-based WYSIWYG editor with toolbar (headers, bold, italic, lists, links, code, blockquote), automatic HTML-to-Markdown conversion on mode switch, theme-aware styling. **AI Agent:** Tab-based UI for Chat and Image Generation, multi-model selector (Claude Sonnet 4, GPT-4o, Gemini 2.0 Flash), image generation with Nano Banana models and aspect ratio selection. **Other Features:** Newsletter management (all Newsletter Admin features integrated); Content import (direct database import via Firecrawl, no file sync needed); Site configuration (Config Generator UI); Index HTML editor; Analytics (real-time stats dashboard); Sync commands UI with sync server integration; Header sync buttons; Dashboard search; Toast notifications; Command modal; Mobile responsive design. Uses Convex queries for real-time data, localStorage for preferences, ReactMarkdown for preview. Optional WorkOS authentication via siteConfig.dashboard.requireAuth. |
| `Callback.tsx` | OAuth callback handler for WorkOS authentication. Handles redirect from WorkOS after user login, exchanges authorization code for user information, then redirects to dashboard. Only used when WorkOS is configured. |
| `NewsletterAdmin.tsx` | Three-column newsletter admin page for managing subscribers and sending newsletters. Left sidebar with navigation and stats, main area with searchable subscriber list, right sidebar with send newsletter panel and recent sends. Access at /newsletter-admin, configurable via siteConfig.newsletterAdmin. |
@@ -109,10 +109,16 @@ A brief description of each file in the codebase.
| File | Description |
| ------------------ | ------------------------------------------------------------------------------------------------------------------ |
| `schema.ts` | Database schema (posts, pages, viewCounts, pageViews, activeSessions, aiChats, newsletterSubscribers, newsletterSentPosts, contactMessages) with indexes for tag queries (by_tags), AI queries, and blog featured posts (by_blogFeatured). Posts and pages include showSocialFooter, showImageAtTop, blogFeatured, and contactForm fields for frontmatter control. |
| `schema.ts` | Database schema (posts, pages, viewCounts, pageViews, activeSessions, aiChats, newsletterSubscribers, newsletterSentPosts, contactMessages) with indexes for tag queries (by_tags), AI queries, blog featured posts (by_blogFeatured), source tracking (by_source), and vector search (by_embedding). Posts and pages include showSocialFooter, showImageAtTop, blogFeatured, contactForm, source, and embedding fields for frontmatter control, cloud CMS tracking, and semantic search. |
| `cms.ts` | CRUD mutations for dashboard cloud CMS: createPost, updatePost, deletePost, createPage, updatePage, deletePage, exportPostAsMarkdown, exportPageAsMarkdown. Posts/pages created via dashboard have `source: "dashboard"` (protected from sync overwrites). |
| `importAction.ts` | Server-side Convex action for direct URL import via Firecrawl API. Scrapes URL, converts to markdown, saves directly to database with `source: "dashboard"`. Requires FIRECRAWL_API_KEY environment variable. |
| `posts.ts` | Queries and mutations for blog posts, view counts, getAllTags, getPostsByTag, getRelatedPosts, and getBlogFeaturedPosts. Includes tag-based queries for tag pages and related posts functionality. |
| `pages.ts` | Queries and mutations for static pages |
| `search.ts` | Full text search queries across posts and pages |
| `semanticSearch.ts` | Vector-based semantic search action using OpenAI embeddings |
| `semanticSearchQueries.ts` | Internal queries for fetching post/page details by IDs for semantic search |
| `embeddings.ts` | Embedding generation actions using OpenAI text-embedding-ada-002 |
| `embeddingsQueries.ts` | Internal queries and mutations for embedding storage and retrieval |
| `stats.ts` | Real-time stats with aggregate component for O(log n) counts, page view recording, session heartbeat |
| `crons.ts` | Cron jobs for stale session cleanup (every 5 minutes), weekly newsletter digest (Sundays 9am UTC), and weekly stats summary (Mondays 9am UTC). Uses environment variables SITE_URL and SITE_NAME for email content. |
| `http.ts` | HTTP endpoints: sitemap (includes tag pages), API (update SITE_URL/SITE_NAME when forking, uses www.markdown.fast), Open Graph HTML generation for social crawlers |
@@ -230,6 +236,7 @@ Markdown files for static pages like About, Projects, Contact, Changelog.
| `send-newsletter.ts` | CLI tool for sending newsletter posts (npm run newsletter:send <slug>). Calls scheduleSendPostNewsletter mutation directly. |
| `send-newsletter-stats.ts` | CLI tool for sending weekly stats summary (npm run newsletter:send:stats). Calls scheduleSendStatsSummary mutation directly. |
| `sync-server.ts` | Local HTTP server for executing sync commands from Dashboard UI. Runs on localhost:3001 with optional token authentication. Whitelisted commands only. Part of markdown sync v2. |
| `export-db-posts.ts` | Exports dashboard-created posts and pages to markdown files in `content/blog/` and `content/pages/`. Only exports content with `source: "dashboard"`. Supports development and production environments via `npm run export:db` and `npm run export:db:prod`. |
### Sync Commands
@@ -248,6 +255,11 @@ Markdown files for static pages like About, Projects, Contact, Changelog.
- `npm run sync:all` - Run both content sync and discovery sync (development)
- `npm run sync:all:prod` - Run both content sync and discovery sync (production)
**Export dashboard content:**
- `npm run export:db` - Export dashboard posts/pages to content folders (development)
- `npm run export:db:prod` - Export dashboard posts/pages (production)
### Frontmatter Flow
Frontmatter is the YAML metadata at the top of each markdown file. Here is how it flows through the system: