2026-01-05 18:30:48 -08:00
|
|
|
# Semantic Search
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
Type: page
|
2026-01-10 23:46:08 -08:00
|
|
|
Date: 2026-01-11
|
2026-01-05 18:30:48 -08:00
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Semantic Search
|
|
|
|
|
|
|
|
|
|
Semantic search finds content by meaning, not exact words. Ask questions naturally and find conceptually related content.
|
|
|
|
|
|
|
|
|
|
Press `Cmd+K` then `Tab` to switch to Semantic mode. For exact word matching, see [Keyword Search](/docs-search).
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### When to use each mode
|
|
|
|
|
|
|
|
|
|
| Use case | Mode |
|
|
|
|
|
|----------|------|
|
|
|
|
|
| "authentication error" (exact term) | Keyword |
|
|
|
|
|
| "login problems" (conceptual) | Semantic |
|
|
|
|
|
| Find specific code or commands | Keyword |
|
|
|
|
|
| "how do I deploy?" (question) | Semantic |
|
|
|
|
|
| Need matches highlighted on page | Keyword |
|
|
|
|
|
| Not sure of exact terminology | Semantic |
|
|
|
|
|
|
|
|
|
|
### How semantic search works
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
|
|
|
│ SEMANTIC SEARCH FLOW │
|
|
|
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
|
|
|
|
|
|
┌──────────────┐ ┌─────────────────┐ ┌──────────────────┐
|
|
|
|
|
│ User query: │───▶│ OpenAI API │───▶│ Query embedding │
|
|
|
|
|
│ "how to │ │ text-embedding- │ │ [0.12, -0.45, │
|
|
|
|
|
│ deploy" │ │ ada-002 │ │ 0.78, ...] │
|
|
|
|
|
└──────────────┘ └─────────────────┘ └────────┬─────────┘
|
|
|
|
|
│
|
|
|
|
|
▼
|
|
|
|
|
┌─────────────────────┐
|
|
|
|
|
│ Convex vectorSearch │
|
|
|
|
|
│ Compare to stored │
|
|
|
|
|
│ post/page embeddings│
|
|
|
|
|
└──────────┬──────────┘
|
|
|
|
|
│
|
|
|
|
|
▼
|
|
|
|
|
┌─────────────────────┐
|
|
|
|
|
│ Results sorted by │
|
|
|
|
|
│ similarity score │
|
|
|
|
|
│ (0-100%) │
|
|
|
|
|
└─────────────────────┘
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
1. Your query is converted to a vector (1536 numbers) using OpenAI's embedding model
|
|
|
|
|
2. Convex compares this vector to stored embeddings for all posts and pages
|
|
|
|
|
3. Results are ranked by similarity score (higher = more similar meaning)
|
|
|
|
|
4. Top 15 results returned
|
|
|
|
|
|
|
|
|
|
### Technical comparison
|
|
|
|
|
|
|
|
|
|
| Aspect | Keyword | Semantic |
|
|
|
|
|
|--------|---------|----------|
|
|
|
|
|
| Speed | Instant | ~300ms |
|
|
|
|
|
| Cost | Free | ~$0.0001/query |
|
|
|
|
|
| Highlighting | Yes | No |
|
|
|
|
|
| API required | No | OpenAI |
|
|
|
|
|
|
|
|
|
|
### Configuration
|
|
|
|
|
|
|
|
|
|
Semantic search requires an OpenAI API key:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
npx convex env set OPENAI_API_KEY sk-your-key-here
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
If the key is not configured:
|
|
|
|
|
- Semantic search returns empty results
|
|
|
|
|
- Keyword search continues to work normally
|
|
|
|
|
- Sync script skips embedding generation
|
|
|
|
|
|
2026-01-05 23:25:42 -08:00
|
|
|
### Enable/Disable Semantic Search
|
|
|
|
|
|
|
|
|
|
Semantic search is **disabled by default** to avoid requiring API keys for forks. Enable it via `src/config/siteConfig.ts`:
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
semanticSearch: {
|
|
|
|
|
enabled: true, // Enable semantic search (requires OPENAI_API_KEY)
|
|
|
|
|
},
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
When disabled (default):
|
|
|
|
|
- Search modal shows only keyword search (no mode toggle)
|
|
|
|
|
- Embedding generation skipped during sync (saves API costs)
|
|
|
|
|
- No OpenAI API key required
|
|
|
|
|
|
|
|
|
|
When enabled:
|
|
|
|
|
- Search modal shows both Keyword and Semantic modes
|
|
|
|
|
- Embeddings generated during `npm run sync`
|
|
|
|
|
- Requires OPENAI_API_KEY in Convex
|
|
|
|
|
|
|
|
|
|
To enable semantic search:
|
|
|
|
|
1. Set `semanticSearch.enabled: true` in siteConfig.ts
|
|
|
|
|
2. Set `OPENAI_API_KEY` in Convex: `npx convex env set OPENAI_API_KEY sk-xxx`
|
|
|
|
|
3. Run `npm run sync` to generate embeddings
|
|
|
|
|
|
2026-01-05 18:30:48 -08:00
|
|
|
### How embeddings are generated
|
|
|
|
|
|
|
|
|
|
When you run `npm run sync`:
|
|
|
|
|
|
|
|
|
|
1. Content syncs to Convex (posts and pages)
|
|
|
|
|
2. Script checks for posts/pages without embeddings
|
|
|
|
|
3. For each, combines title + content into text
|
|
|
|
|
4. Calls OpenAI to generate 1536-dimension embedding
|
|
|
|
|
5. Stores embedding in Convex database
|
|
|
|
|
|
|
|
|
|
Embeddings are generated once per post/page. If content changes, a new embedding is generated on the next sync.
|
|
|
|
|
|
|
|
|
|
### Files involved
|
|
|
|
|
|
|
|
|
|
| File | Purpose |
|
|
|
|
|
| ---- | ------- |
|
|
|
|
|
| `convex/schema.ts` | `embedding` field and `vectorIndex` on posts/pages |
|
|
|
|
|
| `convex/embeddings.ts` | Embedding generation actions |
|
|
|
|
|
| `convex/embeddingsQueries.ts` | Queries for posts/pages without embeddings |
|
|
|
|
|
| `convex/semanticSearch.ts` | Vector search action |
|
|
|
|
|
| `convex/semanticSearchQueries.ts` | Queries for hydrating search results |
|
|
|
|
|
| `src/components/SearchModal.tsx` | Mode toggle (Tab to switch) |
|
|
|
|
|
| `scripts/sync-posts.ts` | Triggers embedding generation after sync |
|
|
|
|
|
|
|
|
|
|
### Limitations
|
|
|
|
|
|
|
|
|
|
- **No highlighting**: Semantic search finds meaning, not exact words, so matches can't be highlighted
|
|
|
|
|
- **API cost**: Each search query costs ~$0.0001 (embedding generation)
|
|
|
|
|
- **Latency**: ~300ms vs instant for keyword search (API round-trip)
|
|
|
|
|
- **Requires OpenAI key**: Won't work without `OPENAI_API_KEY` configured
|
|
|
|
|
- **Token limit**: Content is truncated to ~8000 characters for embedding
|
|
|
|
|
|
|
|
|
|
### Similarity scores
|
|
|
|
|
|
|
|
|
|
Results show a percentage score (0-100%):
|
|
|
|
|
- **90%+**: Very similar meaning
|
|
|
|
|
- **70-90%**: Related content
|
|
|
|
|
- **50-70%**: Loosely related
|
|
|
|
|
- **<50%**: Weak match (may not be relevant)
|
|
|
|
|
|
|
|
|
|
### Resources
|
|
|
|
|
|
|
|
|
|
- [Convex Vector Search](https://docs.convex.dev/search/vector-search)
|
|
|
|
|
- [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings)
|
|
|
|
|
- [Keyword Search](/docs-search) - Full-text search documentation
|