mirror of
https://github.com/waynesutton/markdown-site.git
synced 2026-01-12 04:09:14 +00:00
Add vector-based semantic search to complement keyword search. Users can toggle between "Keyword" and "Semantic" modes in the search modal (Cmd+K, then Tab to switch). Semantic search: - Uses OpenAI text-embedding-ada-002 (1536 dimensions) - Finds content by meaning, not exact words - Shows similarity scores as percentages - ~300ms latency, ~$0.0001/query - Graceful fallback if OPENAI_API_KEY not set New files: - convex/embeddings.ts - Embedding generation actions - convex/embeddingsQueries.ts - Queries/mutations for embeddings - convex/semanticSearch.ts - Vector search action - convex/semanticSearchQueries.ts - Result hydration queries - content/pages/docs-search.md - Keyword search docs - content/pages/docs-semantic-search.md - Semantic search docs Changes: - convex/schema.ts: Add embedding field and by_embedding vectorIndex - SearchModal.tsx: Add mode toggle (TextAa/Brain icons) - sync-posts.ts: Generate embeddings after content sync - global.css: Search mode toggle styles Documentation updated: - changelog.md, TASK.md, files.md, about.md, home.md Configuration: npx convex env set OPENAI_API_KEY sk-your-key Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Status: Ready to commit. All semantic search files are staged. The TypeScript warnings are pre-existing (unused variables) and don't affect the build.
5.6 KiB
5.6 KiB
title, slug, published, order, showInNav, layout, rightSidebar, showFooter, docsSection, docsSectionOrder, docsSectionGroup, docsSectionGroupIcon
| title | slug | published | order | showInNav | layout | rightSidebar | showFooter | docsSection | docsSectionOrder | docsSectionGroup | docsSectionGroupIcon |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Semantic Search | docs-semantic-search | true | 2 | false | sidebar | true | true | true | 4 | Setup | Rocket |
Semantic Search
Semantic search finds content by meaning, not exact words. Ask questions naturally and find conceptually related content.
Press Cmd+K then Tab to switch to Semantic mode. For exact word matching, see Keyword Search.
When to use each mode
| Use case | Mode |
|---|---|
| "authentication error" (exact term) | Keyword |
| "login problems" (conceptual) | Semantic |
| Find specific code or commands | Keyword |
| "how do I deploy?" (question) | Semantic |
| Need matches highlighted on page | Keyword |
| Not sure of exact terminology | Semantic |
How semantic search works
┌─────────────────────────────────────────────────────────────────────────┐
│ SEMANTIC SEARCH FLOW │
└─────────────────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ User query: │───▶│ OpenAI API │───▶│ Query embedding │
│ "how to │ │ text-embedding- │ │ [0.12, -0.45, │
│ deploy" │ │ ada-002 │ │ 0.78, ...] │
└──────────────┘ └─────────────────┘ └────────┬─────────┘
│
▼
┌─────────────────────┐
│ Convex vectorSearch │
│ Compare to stored │
│ post/page embeddings│
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Results sorted by │
│ similarity score │
│ (0-100%) │
└─────────────────────┘
- Your query is converted to a vector (1536 numbers) using OpenAI's embedding model
- Convex compares this vector to stored embeddings for all posts and pages
- Results are ranked by similarity score (higher = more similar meaning)
- Top 15 results returned
Technical comparison
| Aspect | Keyword | Semantic |
|---|---|---|
| Speed | Instant | ~300ms |
| Cost | Free | ~$0.0001/query |
| Highlighting | Yes | No |
| API required | No | OpenAI |
Configuration
Semantic search requires an OpenAI API key:
npx convex env set OPENAI_API_KEY sk-your-key-here
If the key is not configured:
- Semantic search returns empty results
- Keyword search continues to work normally
- Sync script skips embedding generation
How embeddings are generated
When you run npm run sync:
- Content syncs to Convex (posts and pages)
- Script checks for posts/pages without embeddings
- For each, combines title + content into text
- Calls OpenAI to generate 1536-dimension embedding
- Stores embedding in Convex database
Embeddings are generated once per post/page. If content changes, a new embedding is generated on the next sync.
Files involved
| File | Purpose |
|---|---|
convex/schema.ts |
embedding field and vectorIndex on posts/pages |
convex/embeddings.ts |
Embedding generation actions |
convex/embeddingsQueries.ts |
Queries for posts/pages without embeddings |
convex/semanticSearch.ts |
Vector search action |
convex/semanticSearchQueries.ts |
Queries for hydrating search results |
src/components/SearchModal.tsx |
Mode toggle (Tab to switch) |
scripts/sync-posts.ts |
Triggers embedding generation after sync |
Limitations
- No highlighting: Semantic search finds meaning, not exact words, so matches can't be highlighted
- API cost: Each search query costs ~$0.0001 (embedding generation)
- Latency: ~300ms vs instant for keyword search (API round-trip)
- Requires OpenAI key: Won't work without
OPENAI_API_KEYconfigured - Token limit: Content is truncated to ~8000 characters for embedding
Similarity scores
Results show a percentage score (0-100%):
- 90%+: Very similar meaning
- 70-90%: Related content
- 50-70%: Loosely related
- <50%: Weak match (may not be relevant)
Resources
- Convex Vector Search
- OpenAI Embeddings
- Keyword Search - Full-text search documentation