Files
wiki/content/pages/docs-semantic-search.md
Wayne Sutton 3c9feb071b feat: Make semantic search optional and disabled by default
- Add SemanticSearchConfig interface with enabled toggle to siteConfig.ts
- Default semantic search to disabled (enabled: false) to avoid blocking forks without OPENAI_API_KEY
- Update SearchModal.tsx to conditionally show mode toggle based on config
- Update sync-posts.ts to skip embedding generation when disabled
- Add semantic search toggle to Dashboard config generator
- Update FORK_CONFIG.md with Semantic Search Configuration section
- Update fork-config.json.example with semanticSearch option
- Update docs-semantic-search.md with enable/disable instructions
- Update changelog and documentation

When disabled (default):
- Search modal shows only keyword search (no mode toggle)
- Embedding generation skipped during sync
- No OpenAI API key required

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 22:22:50 -08:00

6.3 KiB

title, slug, published, order, showInNav, layout, rightSidebar, showFooter, docsSection, docsSectionOrder, docsSectionGroup, docsSectionGroupIcon
title slug published order showInNav layout rightSidebar showFooter docsSection docsSectionOrder docsSectionGroup docsSectionGroupIcon
Semantic Search docs-semantic-search true 2 false sidebar true true true 4 Setup Rocket

Semantic search finds content by meaning, not exact words. Ask questions naturally and find conceptually related content.

Press Cmd+K then Tab to switch to Semantic mode. For exact word matching, see Keyword Search.


When to use each mode

Use case Mode
"authentication error" (exact term) Keyword
"login problems" (conceptual) Semantic
Find specific code or commands Keyword
"how do I deploy?" (question) Semantic
Need matches highlighted on page Keyword
Not sure of exact terminology Semantic

How semantic search works

┌─────────────────────────────────────────────────────────────────────────┐
│                     SEMANTIC SEARCH FLOW                                │
└─────────────────────────────────────────────────────────────────────────┘

  ┌──────────────┐    ┌─────────────────┐    ┌──────────────────┐
  │ User query:  │───▶│ OpenAI API      │───▶│ Query embedding  │
  │ "how to      │    │ text-embedding- │    │ [0.12, -0.45,    │
  │  deploy"     │    │ ada-002         │    │  0.78, ...]      │
  └──────────────┘    └─────────────────┘    └────────┬─────────┘
                                                      │
                                                      ▼
                                           ┌─────────────────────┐
                                           │ Convex vectorSearch │
                                           │ Compare to stored   │
                                           │ post/page embeddings│
                                           └──────────┬──────────┘
                                                      │
                                                      ▼
                                           ┌─────────────────────┐
                                           │ Results sorted by   │
                                           │ similarity score    │
                                           │ (0-100%)            │
                                           └─────────────────────┘
  1. Your query is converted to a vector (1536 numbers) using OpenAI's embedding model
  2. Convex compares this vector to stored embeddings for all posts and pages
  3. Results are ranked by similarity score (higher = more similar meaning)
  4. Top 15 results returned

Technical comparison

Aspect Keyword Semantic
Speed Instant ~300ms
Cost Free ~$0.0001/query
Highlighting Yes No
API required No OpenAI

Configuration

Semantic search requires an OpenAI API key:

npx convex env set OPENAI_API_KEY sk-your-key-here

If the key is not configured:

  • Semantic search returns empty results
  • Keyword search continues to work normally
  • Sync script skips embedding generation

Semantic search is disabled by default to avoid requiring API keys for forks. Enable it via src/config/siteConfig.ts:

semanticSearch: {
  enabled: true, // Enable semantic search (requires OPENAI_API_KEY)
},

When disabled (default):

  • Search modal shows only keyword search (no mode toggle)
  • Embedding generation skipped during sync (saves API costs)
  • No OpenAI API key required

When enabled:

  • Search modal shows both Keyword and Semantic modes
  • Embeddings generated during npm run sync
  • Requires OPENAI_API_KEY in Convex

To enable semantic search:

  1. Set semanticSearch.enabled: true in siteConfig.ts
  2. Set OPENAI_API_KEY in Convex: npx convex env set OPENAI_API_KEY sk-xxx
  3. Run npm run sync to generate embeddings

How embeddings are generated

When you run npm run sync:

  1. Content syncs to Convex (posts and pages)
  2. Script checks for posts/pages without embeddings
  3. For each, combines title + content into text
  4. Calls OpenAI to generate 1536-dimension embedding
  5. Stores embedding in Convex database

Embeddings are generated once per post/page. If content changes, a new embedding is generated on the next sync.

Files involved

File Purpose
convex/schema.ts embedding field and vectorIndex on posts/pages
convex/embeddings.ts Embedding generation actions
convex/embeddingsQueries.ts Queries for posts/pages without embeddings
convex/semanticSearch.ts Vector search action
convex/semanticSearchQueries.ts Queries for hydrating search results
src/components/SearchModal.tsx Mode toggle (Tab to switch)
scripts/sync-posts.ts Triggers embedding generation after sync

Limitations

  • No highlighting: Semantic search finds meaning, not exact words, so matches can't be highlighted
  • API cost: Each search query costs ~$0.0001 (embedding generation)
  • Latency: ~300ms vs instant for keyword search (API round-trip)
  • Requires OpenAI key: Won't work without OPENAI_API_KEY configured
  • Token limit: Content is truncated to ~8000 characters for embedding

Similarity scores

Results show a percentage score (0-100%):

  • 90%+: Very similar meaning
  • 70-90%: Related content
  • 50-70%: Loosely related
  • <50%: Weak match (may not be relevant)

Resources