wiki/content/pages/docs-semantic-search.md at 5a8df466810ed627b248bd0a66dfd76bc082f6ab

x/wiki

mirror of https://github.com/waynesutton/markdown-site.git synced 2026-01-12 04:09:14 +00:00

Files

Wayne Sutton 5a8df46681 feat: Add semantic search with vector embeddings

Add vector-based semantic search to complement keyword search.
  Users can toggle between "Keyword" and "Semantic" modes in the
  search modal (Cmd+K, then Tab to switch).

  Semantic search:
  - Uses OpenAI text-embedding-ada-002 (1536 dimensions)
  - Finds content by meaning, not exact words
  - Shows similarity scores as percentages
  - ~300ms latency, ~$0.0001/query
  - Graceful fallback if OPENAI_API_KEY not set

  New files:
  - convex/embeddings.ts - Embedding generation actions
  - convex/embeddingsQueries.ts - Queries/mutations for embeddings
  - convex/semanticSearch.ts - Vector search action
  - convex/semanticSearchQueries.ts - Result hydration queries
  - content/pages/docs-search.md - Keyword search docs
  - content/pages/docs-semantic-search.md - Semantic search docs

  Changes:
  - convex/schema.ts: Add embedding field and by_embedding vectorIndex
  - SearchModal.tsx: Add mode toggle (TextAa/Brain icons)
  - sync-posts.ts: Generate embeddings after content sync
  - global.css: Search mode toggle styles

  Documentation updated:
  - changelog.md, TASK.md, files.md, about.md, home.md

  Configuration:
  npx convex env set OPENAI_API_KEY sk-your-key

  Generated with [Claude Code](https://claude.com/claude-code)

  Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

  Status: Ready to commit. All semantic search files are staged. The TypeScript warnings are pre-existing (unused variables) and don't affect the build.

2026-01-05 18:30:48 -08:00

5.6 KiB

Raw Blame History

title, slug, published, order, showInNav, layout, rightSidebar, showFooter, docsSection, docsSectionOrder, docsSectionGroup, docsSectionGroupIcon

title	slug	published	order	showInNav	layout	rightSidebar	showFooter	docsSection	docsSectionOrder	docsSectionGroup	docsSectionGroupIcon
Semantic Search	docs-semantic-search	true	2	false	sidebar	true	true	true	4	Setup	Rocket

Semantic Search

Semantic search finds content by meaning, not exact words. Ask questions naturally and find conceptually related content.

Press Cmd+K then Tab to switch to Semantic mode. For exact word matching, see Keyword Search.

When to use each mode

Use case	Mode
"authentication error" (exact term)	Keyword
"login problems" (conceptual)	Semantic
Find specific code or commands	Keyword
"how do I deploy?" (question)	Semantic
Need matches highlighted on page	Keyword
Not sure of exact terminology	Semantic

How semantic search works

┌─────────────────────────────────────────────────────────────────────────┐
│                     SEMANTIC SEARCH FLOW                                │
└─────────────────────────────────────────────────────────────────────────┘

  ┌──────────────┐    ┌─────────────────┐    ┌──────────────────┐
  │ User query:  │───▶│ OpenAI API      │───▶│ Query embedding  │
  │ "how to      │    │ text-embedding- │    │ [0.12, -0.45,    │
  │  deploy"     │    │ ada-002         │    │  0.78, ...]      │
  └──────────────┘    └─────────────────┘    └────────┬─────────┘
                                                      │
                                                      ▼
                                           ┌─────────────────────┐
                                           │ Convex vectorSearch │
                                           │ Compare to stored   │
                                           │ post/page embeddings│
                                           └──────────┬──────────┘
                                                      │
                                                      ▼
                                           ┌─────────────────────┐
                                           │ Results sorted by   │
                                           │ similarity score    │
                                           │ (0-100%)            │
                                           └─────────────────────┘

Your query is converted to a vector (1536 numbers) using OpenAI's embedding model
Convex compares this vector to stored embeddings for all posts and pages
Results are ranked by similarity score (higher = more similar meaning)
Top 15 results returned

Technical comparison

Aspect	Keyword	Semantic
Speed	Instant	~300ms
Cost	Free	~$0.0001/query
Highlighting	Yes	No
API required	No	OpenAI

Configuration

Semantic search requires an OpenAI API key:

npx convex env set OPENAI_API_KEY sk-your-key-here

If the key is not configured:

Semantic search returns empty results
Keyword search continues to work normally
Sync script skips embedding generation

How embeddings are generated

When you run npm run sync:

Content syncs to Convex (posts and pages)
Script checks for posts/pages without embeddings
For each, combines title + content into text
Calls OpenAI to generate 1536-dimension embedding
Stores embedding in Convex database

Embeddings are generated once per post/page. If content changes, a new embedding is generated on the next sync.

Files involved

File	Purpose
`convex/schema.ts`	`embedding` field and `vectorIndex` on posts/pages
`convex/embeddings.ts`	Embedding generation actions
`convex/embeddingsQueries.ts`	Queries for posts/pages without embeddings
`convex/semanticSearch.ts`	Vector search action
`convex/semanticSearchQueries.ts`	Queries for hydrating search results
`src/components/SearchModal.tsx`	Mode toggle (Tab to switch)
`scripts/sync-posts.ts`	Triggers embedding generation after sync

Limitations

No highlighting: Semantic search finds meaning, not exact words, so matches can't be highlighted
API cost: Each search query costs ~$0.0001 (embedding generation)
Latency: ~300ms vs instant for keyword search (API round-trip)
Requires OpenAI key: Won't work without OPENAI_API_KEY configured
Token limit: Content is truncated to ~8000 characters for embedding

Similarity scores

Results show a percentage score (0-100%):

90%+: Very similar meaning
70-90%: Related content
50-70%: Loosely related
<50%: Weak match (may not be relevant)

Resources

Convex Vector Search
OpenAI Embeddings
Keyword Search - Full-text search documentation

5.6 KiB Raw Blame History