public/raw/docs-semantic-search.md

# Semantic Search

---
Type: page
Date: 2026-01-11
---

## Semantic Search

Semantic search finds content by meaning, not exact words. Ask questions naturally and find conceptually related content.

Press `Cmd+K` then `Tab` to switch to Semantic mode. For exact word matching, see [Keyword Search](/docs-search).

---

### When to use each mode

| Use case | Mode |
|----------|------|
| "authentication error" (exact term) | Keyword |
| "login problems" (conceptual) | Semantic |
| Find specific code or commands | Keyword |
| "how do I deploy?" (question) | Semantic |
| Need matches highlighted on page | Keyword |
| Not sure of exact terminology | Semantic |

### How semantic search works

```
┌─────────────────────────────────────────────────────────────────────────┐
│                     SEMANTIC SEARCH FLOW                                │
└─────────────────────────────────────────────────────────────────────────┘

  ┌──────────────┐    ┌─────────────────┐    ┌──────────────────┐
  │ User query:  │───▶│ OpenAI API      │───▶│ Query embedding  │
  │ "how to      │    │ text-embedding- │    │ [0.12, -0.45,    │
  │  deploy"     │    │ ada-002         │    │  0.78, ...]      │
  └──────────────┘    └─────────────────┘    └────────┬─────────┘
                                                      │
                                                      ▼
                                           ┌─────────────────────┐
                                           │ Convex vectorSearch │
                                           │ Compare to stored   │
                                           │ post/page embeddings│
                                           └──────────┬──────────┘
                                                      │
                                                      ▼
                                           ┌─────────────────────┐
                                           │ Results sorted by   │
                                           │ similarity score    │
                                           │ (0-100%)            │
                                           └─────────────────────┘
```

1. Your query is converted to a vector (1536 numbers) using OpenAI's embedding model
2. Convex compares this vector to stored embeddings for all posts and pages
3. Results are ranked by similarity score (higher = more similar meaning)
4. Top 15 results returned

### Technical comparison

| Aspect | Keyword | Semantic |
|--------|---------|----------|
| Speed | Instant | ~300ms |
| Cost | Free | ~$0.0001/query |
| Highlighting | Yes | No |
| API required | No | OpenAI |

### Configuration

Semantic search requires an OpenAI API key:

```bash
npx convex env set OPENAI_API_KEY sk-your-key-here
```

If the key is not configured:
- Semantic search returns empty results
- Keyword search continues to work normally
- Sync script skips embedding generation

### Enable/Disable Semantic Search

Semantic search is **disabled by default** to avoid requiring API keys for forks. Enable it via `src/config/siteConfig.ts`:

```typescript
semanticSearch: {
  enabled: true, // Enable semantic search (requires OPENAI_API_KEY)
},
```

When disabled (default):
- Search modal shows only keyword search (no mode toggle)
- Embedding generation skipped during sync (saves API costs)
- No OpenAI API key required

When enabled:
- Search modal shows both Keyword and Semantic modes
- Embeddings generated during `npm run sync`
- Requires OPENAI_API_KEY in Convex

To enable semantic search:
1. Set `semanticSearch.enabled: true` in siteConfig.ts
2. Set `OPENAI_API_KEY` in Convex: `npx convex env set OPENAI_API_KEY sk-xxx`
3. Run `npm run sync` to generate embeddings

### How embeddings are generated

When you run `npm run sync`:

1. Content syncs to Convex (posts and pages)
2. Script checks for posts/pages without embeddings
3. For each, combines title + content into text
4. Calls OpenAI to generate 1536-dimension embedding
5. Stores embedding in Convex database

Embeddings are generated once per post/page. If content changes, a new embedding is generated on the next sync.

### Files involved

| File | Purpose |
| ---- | ------- |
| `convex/schema.ts` | `embedding` field and `vectorIndex` on posts/pages |
| `convex/embeddings.ts` | Embedding generation actions |
| `convex/embeddingsQueries.ts` | Queries for posts/pages without embeddings |
| `convex/semanticSearch.ts` | Vector search action |
| `convex/semanticSearchQueries.ts` | Queries for hydrating search results |
| `src/components/SearchModal.tsx` | Mode toggle (Tab to switch) |
| `scripts/sync-posts.ts` | Triggers embedding generation after sync |

### Limitations

- **No highlighting**: Semantic search finds meaning, not exact words, so matches can't be highlighted
- **API cost**: Each search query costs ~$0.0001 (embedding generation)
- **Latency**: ~300ms vs instant for keyword search (API round-trip)
- **Requires OpenAI key**: Won't work without `OPENAI_API_KEY` configured
- **Token limit**: Content is truncated to ~8000 characters for embedding

### Similarity scores

Results show a percentage score (0-100%):
- **90%+**: Very similar meaning
- **70-90%**: Related content
- **50-70%**: Loosely related
- **<50%**: Weak match (may not be relevant)

### Resources

- [Convex Vector Search](https://docs.convex.dev/search/vector-search)
- [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings)
- [Keyword Search](/docs-search) - Full-text search documentation
feat: Add semantic search with vector embeddings Add vector-based semantic search to complement keyword search. Users can toggle between "Keyword" and "Semantic" modes in the search modal (Cmd+K, then Tab to switch). Semantic search: - Uses OpenAI text-embedding-ada-002 (1536 dimensions) - Finds content by meaning, not exact words - Shows similarity scores as percentages - ~300ms latency, ~$0.0001/query - Graceful fallback if OPENAI_API_KEY not set New files: - convex/embeddings.ts - Embedding generation actions - convex/embeddingsQueries.ts - Queries/mutations for embeddings - convex/semanticSearch.ts - Vector search action - convex/semanticSearchQueries.ts - Result hydration queries - content/pages/docs-search.md - Keyword search docs - content/pages/docs-semantic-search.md - Semantic search docs Changes: - convex/schema.ts: Add embedding field and by_embedding vectorIndex - SearchModal.tsx: Add mode toggle (TextAa/Brain icons) - sync-posts.ts: Generate embeddings after content sync - global.css: Search mode toggle styles Documentation updated: - changelog.md, TASK.md, files.md, about.md, home.md Configuration: npx convex env set OPENAI_API_KEY sk-your-key Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Status: Ready to commit. All semantic search files are staged. The TypeScript warnings are pre-existing (unused variables) and don't affect the build. 2026-01-05 18:30:48 -08:00			`# Semantic Search`

			`---`
			`Type: page`
new: npx create-markdown-sync CLI , ui , related post thumbnails features 2026-01-10 23:46:08 -08:00			`Date: 2026-01-11`
feat: Add semantic search with vector embeddings Add vector-based semantic search to complement keyword search. Users can toggle between "Keyword" and "Semantic" modes in the search modal (Cmd+K, then Tab to switch). Semantic search: - Uses OpenAI text-embedding-ada-002 (1536 dimensions) - Finds content by meaning, not exact words - Shows similarity scores as percentages - ~300ms latency, ~$0.0001/query - Graceful fallback if OPENAI_API_KEY not set New files: - convex/embeddings.ts - Embedding generation actions - convex/embeddingsQueries.ts - Queries/mutations for embeddings - convex/semanticSearch.ts - Vector search action - convex/semanticSearchQueries.ts - Result hydration queries - content/pages/docs-search.md - Keyword search docs - content/pages/docs-semantic-search.md - Semantic search docs Changes: - convex/schema.ts: Add embedding field and by_embedding vectorIndex - SearchModal.tsx: Add mode toggle (TextAa/Brain icons) - sync-posts.ts: Generate embeddings after content sync - global.css: Search mode toggle styles Documentation updated: - changelog.md, TASK.md, files.md, about.md, home.md Configuration: npx convex env set OPENAI_API_KEY sk-your-key Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Status: Ready to commit. All semantic search files are staged. The TypeScript warnings are pre-existing (unused variables) and don't affect the build. 2026-01-05 18:30:48 -08:00			`---`

			`## Semantic Search`

			`Semantic search finds content by meaning, not exact words. Ask questions naturally and find conceptually related content.`

			Press `Cmd+K` then `Tab` to switch to Semantic mode. For exact word matching, see [Keyword Search](/docs-search).

			`---`

			`### When to use each mode`

			`\| Use case \| Mode \|`
			`\|----------\|------\|`
			`\| "authentication error" (exact term) \| Keyword \|`
			`\| "login problems" (conceptual) \| Semantic \|`
			`\| Find specific code or commands \| Keyword \|`
			`\| "how do I deploy?" (question) \| Semantic \|`
			`\| Need matches highlighted on page \| Keyword \|`
			`\| Not sure of exact terminology \| Semantic \|`

			`### How semantic search works`

			```
			`┌─────────────────────────────────────────────────────────────────────────┐`
			`│ SEMANTIC SEARCH FLOW │`
			`└─────────────────────────────────────────────────────────────────────────┘`

			`┌──────────────┐ ┌─────────────────┐ ┌──────────────────┐`
			`│ User query: │───▶│ OpenAI API │───▶│ Query embedding │`
			`│ "how to │ │ text-embedding- │ │ [0.12, -0.45, │`
			`│ deploy" │ │ ada-002 │ │ 0.78, ...] │`
			`└──────────────┘ └─────────────────┘ └────────┬─────────┘`
			`│`
			`▼`
			`┌─────────────────────┐`
			`│ Convex vectorSearch │`
			`│ Compare to stored │`
			`│ post/page embeddings│`
			`└──────────┬──────────┘`
			`│`
			`▼`
			`┌─────────────────────┐`
			`│ Results sorted by │`
			`│ similarity score │`
			`│ (0-100%) │`
			`└─────────────────────┘`
			```

			`1. Your query is converted to a vector (1536 numbers) using OpenAI's embedding model`
			`2. Convex compares this vector to stored embeddings for all posts and pages`
			`3. Results are ranked by similarity score (higher = more similar meaning)`
			`4. Top 15 results returned`

			`### Technical comparison`

			`\| Aspect \| Keyword \| Semantic \|`
			`\|--------\|---------\|----------\|`
			`\| Speed \| Instant \| ~300ms \|`
			`\| Cost \| Free \| ~$0.0001/query \|`
			`\| Highlighting \| Yes \| No \|`
			`\| API required \| No \| OpenAI \|`

			`### Configuration`

			`Semantic search requires an OpenAI API key:`

			```bash
			`npx convex env set OPENAI_API_KEY sk-your-key-here`
			```

			`If the key is not configured:`
			`- Semantic search returns empty results`
			`- Keyword search continues to work normally`
			`- Sync script skips embedding generation`

update: semantic search is now Optional configuration 2026-01-05 23:25:42 -08:00			`### Enable/Disable Semantic Search`

			Semantic search is disabled by default to avoid requiring API keys for forks. Enable it via `src/config/siteConfig.ts`:

			```typescript
			`semanticSearch: {`
			`enabled: true, // Enable semantic search (requires OPENAI_API_KEY)`
			`},`
			```

			`When disabled (default):`
			`- Search modal shows only keyword search (no mode toggle)`
			`- Embedding generation skipped during sync (saves API costs)`
			`- No OpenAI API key required`

			`When enabled:`
			`- Search modal shows both Keyword and Semantic modes`
			- Embeddings generated during `npm run sync`
			`- Requires OPENAI_API_KEY in Convex`

			`To enable semantic search:`
			1. Set `semanticSearch.enabled: true` in siteConfig.ts
			2. Set `OPENAI_API_KEY` in Convex: `npx convex env set OPENAI_API_KEY sk-xxx`
			3. Run `npm run sync` to generate embeddings

feat: Add semantic search with vector embeddings Add vector-based semantic search to complement keyword search. Users can toggle between "Keyword" and "Semantic" modes in the search modal (Cmd+K, then Tab to switch). Semantic search: - Uses OpenAI text-embedding-ada-002 (1536 dimensions) - Finds content by meaning, not exact words - Shows similarity scores as percentages - ~300ms latency, ~$0.0001/query - Graceful fallback if OPENAI_API_KEY not set New files: - convex/embeddings.ts - Embedding generation actions - convex/embeddingsQueries.ts - Queries/mutations for embeddings - convex/semanticSearch.ts - Vector search action - convex/semanticSearchQueries.ts - Result hydration queries - content/pages/docs-search.md - Keyword search docs - content/pages/docs-semantic-search.md - Semantic search docs Changes: - convex/schema.ts: Add embedding field and by_embedding vectorIndex - SearchModal.tsx: Add mode toggle (TextAa/Brain icons) - sync-posts.ts: Generate embeddings after content sync - global.css: Search mode toggle styles Documentation updated: - changelog.md, TASK.md, files.md, about.md, home.md Configuration: npx convex env set OPENAI_API_KEY sk-your-key Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Status: Ready to commit. All semantic search files are staged. The TypeScript warnings are pre-existing (unused variables) and don't affect the build. 2026-01-05 18:30:48 -08:00			`### How embeddings are generated`

			When you run `npm run sync`:

			`1. Content syncs to Convex (posts and pages)`
			`2. Script checks for posts/pages without embeddings`
			`3. For each, combines title + content into text`
			`4. Calls OpenAI to generate 1536-dimension embedding`
			`5. Stores embedding in Convex database`

			`Embeddings are generated once per post/page. If content changes, a new embedding is generated on the next sync.`

			`### Files involved`

			`\| File \| Purpose \|`
			`\| ---- \| ------- \|`
			\| `convex/schema.ts` \| `embedding` field and `vectorIndex` on posts/pages \|
			\| `convex/embeddings.ts` \| Embedding generation actions \|
			\| `convex/embeddingsQueries.ts` \| Queries for posts/pages without embeddings \|
			\| `convex/semanticSearch.ts` \| Vector search action \|
			\| `convex/semanticSearchQueries.ts` \| Queries for hydrating search results \|
			\| `src/components/SearchModal.tsx` \| Mode toggle (Tab to switch) \|
			\| `scripts/sync-posts.ts` \| Triggers embedding generation after sync \|

			`### Limitations`

			`- No highlighting: Semantic search finds meaning, not exact words, so matches can't be highlighted`
			`- API cost: Each search query costs ~$0.0001 (embedding generation)`
			`- Latency: ~300ms vs instant for keyword search (API round-trip)`
			- Requires OpenAI key: Won't work without `OPENAI_API_KEY` configured
			`- Token limit: Content is truncated to ~8000 characters for embedding`

			`### Similarity scores`

			`Results show a percentage score (0-100%):`
			`- 90%+: Very similar meaning`
			`- 70-90%: Related content`
			`- 50-70%: Loosely related`
			`- <50%: Weak match (may not be relevant)`

			`### Resources`

			`- [Convex Vector Search](https://docs.convex.dev/search/vector-search)`
			`- [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings)`
			`- [Keyword Search](/docs-search) - Full-text search documentation`