wiki/content/pages/docs-semantic-search.md at main

x/wiki

mirror of https://github.com/waynesutton/markdown-site.git synced 2026-01-12 04:09:14 +00:00

Files

Wayne Sutton 3c9feb071b feat: Make semantic search optional and disabled by default

- Add SemanticSearchConfig interface with enabled toggle to siteConfig.ts
- Default semantic search to disabled (enabled: false) to avoid blocking forks without OPENAI_API_KEY
- Update SearchModal.tsx to conditionally show mode toggle based on config
- Update sync-posts.ts to skip embedding generation when disabled
- Add semantic search toggle to Dashboard config generator
- Update FORK_CONFIG.md with Semantic Search Configuration section
- Update fork-config.json.example with semanticSearch option
- Update docs-semantic-search.md with enable/disable instructions
- Update changelog and documentation

When disabled (default):
- Search modal shows only keyword search (no mode toggle)
- Embedding generation skipped during sync
- No OpenAI API key required

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-05 22:22:50 -08:00

6.3 KiB

Raw Permalink Blame History

title, slug, published, order, showInNav, layout, rightSidebar, showFooter, docsSection, docsSectionOrder, docsSectionGroup, docsSectionGroupIcon

title	slug	published	order	showInNav	layout	rightSidebar	showFooter	docsSection	docsSectionOrder	docsSectionGroup	docsSectionGroupIcon
Semantic Search	docs-semantic-search	true	2	false	sidebar	true	true	true	4	Setup	Rocket

Semantic Search

Semantic search finds content by meaning, not exact words. Ask questions naturally and find conceptually related content.

Press Cmd+K then Tab to switch to Semantic mode. For exact word matching, see Keyword Search.

When to use each mode

Use case	Mode
"authentication error" (exact term)	Keyword
"login problems" (conceptual)	Semantic
Find specific code or commands	Keyword
"how do I deploy?" (question)	Semantic
Need matches highlighted on page	Keyword
Not sure of exact terminology	Semantic

How semantic search works

┌─────────────────────────────────────────────────────────────────────────┐
│                     SEMANTIC SEARCH FLOW                                │
└─────────────────────────────────────────────────────────────────────────┘

  ┌──────────────┐    ┌─────────────────┐    ┌──────────────────┐
  │ User query:  │───▶│ OpenAI API      │───▶│ Query embedding  │
  │ "how to      │    │ text-embedding- │    │ [0.12, -0.45,    │
  │  deploy"     │    │ ada-002         │    │  0.78, ...]      │
  └──────────────┘    └─────────────────┘    └────────┬─────────┘
                                                      │
                                                      ▼
                                           ┌─────────────────────┐
                                           │ Convex vectorSearch │
                                           │ Compare to stored   │
                                           │ post/page embeddings│
                                           └──────────┬──────────┘
                                                      │
                                                      ▼
                                           ┌─────────────────────┐
                                           │ Results sorted by   │
                                           │ similarity score    │
                                           │ (0-100%)            │
                                           └─────────────────────┘

Your query is converted to a vector (1536 numbers) using OpenAI's embedding model
Convex compares this vector to stored embeddings for all posts and pages
Results are ranked by similarity score (higher = more similar meaning)
Top 15 results returned

Technical comparison

Aspect	Keyword	Semantic
Speed	Instant	~300ms
Cost	Free	~$0.0001/query
Highlighting	Yes	No
API required	No	OpenAI

Configuration

Semantic search requires an OpenAI API key:

npx convex env set OPENAI_API_KEY sk-your-key-here

If the key is not configured:

Semantic search returns empty results
Keyword search continues to work normally
Sync script skips embedding generation

Enable/Disable Semantic Search

Semantic search is disabled by default to avoid requiring API keys for forks. Enable it via src/config/siteConfig.ts:

semanticSearch: {
  enabled: true, // Enable semantic search (requires OPENAI_API_KEY)
},

When disabled (default):

Search modal shows only keyword search (no mode toggle)
Embedding generation skipped during sync (saves API costs)
No OpenAI API key required

When enabled:

Search modal shows both Keyword and Semantic modes
Embeddings generated during npm run sync
Requires OPENAI_API_KEY in Convex

To enable semantic search:

Set semanticSearch.enabled: true in siteConfig.ts
Set OPENAI_API_KEY in Convex: npx convex env set OPENAI_API_KEY sk-xxx
Run npm run sync to generate embeddings

How embeddings are generated

When you run npm run sync:

Content syncs to Convex (posts and pages)
Script checks for posts/pages without embeddings
For each, combines title + content into text
Calls OpenAI to generate 1536-dimension embedding
Stores embedding in Convex database

Embeddings are generated once per post/page. If content changes, a new embedding is generated on the next sync.

Files involved

File	Purpose
`convex/schema.ts`	`embedding` field and `vectorIndex` on posts/pages
`convex/embeddings.ts`	Embedding generation actions
`convex/embeddingsQueries.ts`	Queries for posts/pages without embeddings
`convex/semanticSearch.ts`	Vector search action
`convex/semanticSearchQueries.ts`	Queries for hydrating search results
`src/components/SearchModal.tsx`	Mode toggle (Tab to switch)
`scripts/sync-posts.ts`	Triggers embedding generation after sync

Limitations

No highlighting: Semantic search finds meaning, not exact words, so matches can't be highlighted
API cost: Each search query costs ~$0.0001 (embedding generation)
Latency: ~300ms vs instant for keyword search (API round-trip)
Requires OpenAI key: Won't work without OPENAI_API_KEY configured
Token limit: Content is truncated to ~8000 characters for embedding

Similarity scores

Results show a percentage score (0-100%):

90%+: Very similar meaning
70-90%: Related content
50-70%: Loosely related
<50%: Weak match (may not be relevant)

Resources

Convex Vector Search
OpenAI Embeddings
Keyword Search - Full-text search documentation

6.3 KiB Raw Permalink Blame History