Unified Knowledge Library System - Proposal

Problem: Knowledge fragmented across brain/, docs/, architecture/, with inconsistent filing and risk of loss
Goal: Single source of truth with flat hierarchy, meaningful topics, cross-referencing, and vector search

Current State Analysis

Scattered Knowledge Locations

Within repository (singular-dream/): - brain/ - Ephemeral session artifacts (risk of loss) - docs/ - Mixed purpose (user docs, research, guides) - 111 files - architecture/ - Architecture decisions, thinking, standards - 203 files - architecture/THINKING/ - Research, proposals - architecture/GOLDEN/ - Canonical architecture - architecture/STANDARDS/ - Standards documents - architecture/TRASH/ - Archived/deprecated - Various README.md files throughout codebase

Beyond repository boundary (SD-HOA/): - SD-HOA/README.md - Marketing website documentation (9,957 bytes) - SD-HOA/*.html - Marketing website files (index, visualizations) - SD-HOA/*.css - Styling documentation - SD-HOA/*.js - Interactive features documentation - Other project-level documentation

Total scope: - Repository: 118+ markdown files found at depth 2 - Parent directory: Additional documentation and web assets - Multi-repository knowledge spanning different contexts

Problems Identified

Misplaced artifacts - Brain directory used for permanent research
Inconsistent filing - Same type of content in different places
Hard to discover - No central index or search
Risk of loss - Ephemeral directories for permanent knowledge
Cognitive overload - Too many places to look
No cross-referencing - Documents exist in isolation
Multi-repository fragmentation - Knowledge split between parent and repo
Boundary confusion - Unclear what belongs where

Proposed Solution: The Knowledge Library

Core Principle

"Library Floor 1, Shelf 16" Model: - One central location for all knowledge - Flat but meaningful topic organization - Consistent cross-referencing - README navigation at every level - Vector search for discovery

Directory Structure

knowledge/
├── README.md                          # Master index & search guide
├── architecture/                      # System architecture
│   ├── README.md
│   ├── golden/                        # Canonical architecture (current state)
│   ├── proposed/                      # Future architecture (planning)
│   ├── retired/                       # Deprecated architecture (history)
│   └── decisions/                     # ADRs (Architecture Decision Records)
├── standards/                         # Standards & practices
│   ├── README.md
│   ├── coding/                        # Code standards
│   ├── processes/                     # Development processes
│   ├── procedures/                    # Step-by-step procedures
│   └── conventions/                   # Naming, formatting, etc.
├── research/                          # Research & thinking
│   ├── README.md
│   ├── ai-development/                # AI & development research
│   ├── technical/                     # Technical research
│   └── business/                      # Business research
├── guides/                            # How-to guides
│   ├── README.md
│   ├── development/                   # Developer guides
│   ├── deployment/                    # Deployment guides
│   ├── operations/                    # Operations guides
│   └── user/                          # User documentation
├── processes/                         # Business processes
│   ├── README.md
│   ├── governance/                    # Governance processes
│   ├── finance/                       # Financial processes
│   ├── operations/                    # Operational processes
│   └── property/                      # Property management
├── reference/                         # Reference materials
│   ├── README.md
│   ├── apis/                          # API documentation
│   ├── tools/                         # Tool documentation
│   ├── integrations/                  # Integration docs
│   └── glossary.md                    # Terms & definitions
└── sessions/                          # Session artifacts
    ├── README.md
    ├── 2026-01/                       # Organized by month
    │   ├── m2-ai-implementation/      # Session topic
    │   └── devops-architecture/       # Session topic
    └── index.md                       # Searchable session index

Multi-Repository Strategy

The Boundary Question

Current structure:

SD-HOA/                          # Parent directory (Antigravity boundary)
├── README.md                    # Marketing website docs
├── index.html, *.css, *.js     # Marketing website
└── singular-dream/              # Monorepo
    ├── knowledge/               # NEW: Knowledge library
    ├── docs/                    # Current docs
    └── architecture/            # Current architecture

Key decision: Where should the knowledge library live?

Option 1: Repository-Scoped (Recommended)

Location: singular-dream/knowledge/

Scope: - All monorepo-related knowledge - Architecture, standards, guides - Development documentation - Session artifacts

Benefits: - ✅ Version controlled with code - ✅ Part of monorepo structure - ✅ Clear ownership - ✅ Portable with repository

Limitations: - ❌ Doesn't include parent-level docs - ❌ Marketing website docs separate

Option 2: Parent-Scoped

Location: SD-HOA/knowledge/

Scope: - All project knowledge (monorepo + marketing) - Cross-repository documentation - Project-level decisions

Benefits: - ✅ Single source of truth for all knowledge - ✅ Includes marketing website docs - ✅ Project-level view

Limitations: - ❌ Not version controlled with monorepo - ❌ Harder to port repository - ❌ Unclear ownership

Recommendation: Hybrid Approach

Primary library: singular-dream/knowledge/ (repository-scoped)

Cross-reference: Link to parent-level docs when needed

Structure:

SD-HOA/
├── README.md                    # Marketing website (stays here)
├── marketing/                   # NEW: Organize marketing assets
│   ├── index.html
│   ├── styles.css
│   └── script.js
└── singular-dream/
    └── knowledge/
        ├── README.md            # Master index
        ├── architecture/
        ├── standards/
        ├── research/
        ├── guides/
        ├── processes/
        ├── reference/
        │   └── marketing-website.md  # Links to ../../../marketing/
        └── sessions/

Benefits: - ✅ Repository knowledge version-controlled - ✅ Marketing website organized but separate - ✅ Cross-references maintain connections - ✅ Clear boundaries and ownership

Design Principles

1. Flat But Meaningful

Avoid deep nesting: - Maximum 3 levels deep - Clear topic separation at top level - Subtopics only when necessary

Example:

✅ Good: knowledge/architecture/golden/monorepo-structure.md
❌ Bad:  knowledge/architecture/systems/backend/structure/monorepo/design.md

2. State-Based Organization

Architecture example: - golden/ - Current canonical state - proposed/ - Future planned state - retired/ - Historical deprecated state

Benefits: - Clear lifecycle management - Easy to find current vs. future - Historical context preserved

3. Topic-Based, Not Type-Based

Organize by topic, not document type:

✅ Good: knowledge/architecture/golden/
         knowledge/architecture/proposed/
         knowledge/architecture/retired/

❌ Bad:  knowledge/markdown-files/
         knowledge/diagrams/
         knowledge/spreadsheets/

Every directory has README.md:

# Architecture

**Purpose**: System architecture documentation

**Contents**:
- `golden/` - Current canonical architecture
- `proposed/` - Planned future architecture
- `retired/` - Deprecated architecture
- `decisions/` - Architecture Decision Records (ADRs)

**Related**:
- [Standards](../standards/) - Coding standards
- [Guides](../guides/development/) - Developer guides
- [Master Index](../README.md) - Full library index

**Search**: Use vector search for "architecture" topics

Cross-Referencing System

Document Frontmatter

Every document includes:

---
title: Monorepo Structure
topic: architecture
state: golden
tags: [monorepo, turborepo, structure]
related:
  - knowledge/standards/coding/monorepo-conventions.md
  - knowledge/guides/development/monorepo-setup.md
created: 2026-01-15
updated: 2026-01-18
---

Automatic Cross-Reference Generation

Script: scripts/generate-knowledge-index.ts

// Scans all knowledge/ documents
// Extracts frontmatter
// Generates:
// - knowledge/INDEX.md (master index)
// - knowledge/TAGS.md (tag index)
// - knowledge/TOPICS.md (topic index)
// - Per-directory README.md updates

Bidirectional Links

Documents reference each other:

## Related Documents

- [Monorepo Conventions](../standards/coding/monorepo-conventions.md)
- [Monorepo Setup Guide](../guides/development/monorepo-setup.md)
- [ADR: Turborepo Selection](../architecture/decisions/adr-001-turborepo.md)

Vector Search Integration

Why Vector Search?

Problems with grep/find: - Requires exact keywords - Misses semantic matches - No relevance ranking - Hard to discover related content

Vector search benefits: - Semantic understanding - "Find documents about batch processing" (not just keyword "batch") - Relevance ranking - Discover related topics

Implementation Options

Option 1: Local Embeddings (Recommended)

Tool: @xenova/transformers (runs locally)

// scripts/lib/knowledge-search.ts
import { pipeline } from '@xenova/transformers';

const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

// Index all documents
const documents = await scanKnowledgeDirectory();
const embeddings = await Promise.all(
  documents.map(doc => embedder(doc.content))
);

// Store in simple JSON
await fs.writeFile('knowledge/.index/embeddings.json', JSON.stringify({
  documents,
  embeddings,
}));

// Search
async function search(query: string, limit = 10) {
  const queryEmbedding = await embedder(query);
  const results = cosineSimilarity(queryEmbedding, embeddings);
  return results.slice(0, limit);
}

CLI:

pnpm devops knowledge search "batch processing architecture"
pnpm devops knowledge search "AI development best practices"

Option 2: Gemini Embeddings API

Use Google's embedding API:

import { GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: 'embedding-001' });

async function embed(text: string) {
  const result = await model.embedContent(text);
  return result.embedding;
}

Pros: Better quality, no local model Cons: API calls, requires internet

Option 3: Hybrid Approach

Local for fast search, API for quality:

// Quick local search
const quickResults = await localSearch(query);

// If user wants more, use API
const detailedResults = await apiSearch(query);

Search Interface

CLI:

# Quick search
pnpm devops knowledge search "monorepo"

# Detailed search
pnpm devops knowledge search "monorepo" --detailed

# Search specific topic
pnpm devops knowledge search "batch" --topic architecture

# Search by tag
pnpm devops knowledge search --tag ai-development

MCP Tool:

// Antigravity can search via MCP
{
  name: 'knowledge_search',
  description: 'Search knowledge library',
  inputSchema: {
    query: string,
    topic?: string,
    limit?: number,
  }
}

Migration Strategy

Phase 1: Create Structure (Week 1)

Tasks: 1. Create knowledge/ directory structure 2. Create all README.md files 3. Setup frontmatter template 4. Create migration script

Phase 2: Migrate Content (Week 2-3)

Priority order: 1. High value, high risk - Research notes in brain/ 2. Canonical architecture - architecture/GOLDEN/ → knowledge/architecture/golden/ 3. Active thinking - architecture/THINKING/ → knowledge/architecture/proposed/ or knowledge/research/ 4. Standards - architecture/STANDARDS/ → knowledge/standards/ 5. Documentation - docs/ → knowledge/guides/ or knowledge/reference/

Migration script:

pnpm devops knowledge migrate --dry-run  # Preview
pnpm devops knowledge migrate --execute  # Execute

Phase 3: Setup Search (Week 3)

Tasks: 1. Implement local embeddings 2. Index all documents 3. Create CLI search command 4. Add MCP search tool 5. Test and refine

Phase 4: Establish Practices (Week 4)

Tasks: 1. Update workflows to use knowledge/ 2. Add pre-commit hook to validate frontmatter 3. Auto-generate cross-reference indexes 4. Train team on new structure

File Naming Conventions

Consistent Naming

Format: {topic}-{description}.md

Examples:

monorepo-structure.md
batch-processing-architecture.md
ai-development-best-practices.md
deployment-procedures-marketing.md

Avoid

❌ doc1.md
❌ notes.md
❌ temp-2026-01-18.md
❌ FINAL_FINAL_v3.md

Date-Based (Only for Sessions)

knowledge/sessions/2026-01/m2-ai-implementation/
  ├── 2026-01-15-planning.md
  ├── 2026-01-16-implementation.md
  └── 2026-01-18-completion.md

README Template

Master README (knowledge/README.md)

# Knowledge Library

**Purpose**: Central repository for all project knowledge

**Quick Start**:
- Browse by [Topic](#topics)
- Search by [Tag](#tags)
- [Vector Search](#search) for semantic discovery

## Topics

- [Architecture](./architecture/) - System architecture
- [Standards](./standards/) - Standards & practices
- [Research](./research/) - Research & thinking
- [Guides](./guides/) - How-to guides
- [Processes](./processes/) - Business processes
- [Reference](./reference/) - Reference materials
- [Sessions](./sessions/) - Session artifacts

## Search

**CLI:**
```bash
pnpm devops knowledge search "your query"

MCP (Antigravity):

Use knowledge_search tool

Contributing

Use frontmatter template
Follow naming conventions
Cross-reference related docs
Update indexes: pnpm devops knowledge index

Indexes

Master Index - All documents
Tag Index - By tag

Topic Index - By topic

### Topic README Template

```markdown
# {Topic Name}

**Purpose**: {One-line description}

**Contents**:
- `{subdirectory}/` - {Description}
- ...

**Related**:
- [{Related Topic}](../path/to/topic/)
- ...

**Search**: Use vector search for "{topic}" topics

## Documents

{Auto-generated list of documents in this directory}

## Quick Links

- Most recent: [{Document}](./path/to/doc.md)
- Most referenced: [{Document}](./path/to/doc.md)

Automation & Tooling

Knowledge CLI Commands

# Search
pnpm devops knowledge search "query"

# Index
pnpm devops knowledge index              # Regenerate all indexes
pnpm devops knowledge index --validate   # Validate frontmatter

# Migrate
pnpm devops knowledge migrate --from docs/ --to knowledge/guides/

# Stats
pnpm devops knowledge stats              # Show statistics
pnpm devops knowledge stats --topic architecture

# Validate
pnpm devops knowledge validate           # Check all documents
pnpm devops knowledge validate --fix     # Auto-fix issues

Pre-commit Hook

# .husky/pre-commit
pnpm devops knowledge validate --staged

Auto-Index Generation

// scripts/lib/knowledge-indexer.ts

interface Document {
  path: string;
  frontmatter: Frontmatter;
  content: string;
}

async function generateIndexes() {
  const docs = await scanKnowledge();

  // Generate master index
  await generateMasterIndex(docs);

  // Generate tag index
  await generateTagIndex(docs);

  // Generate topic index
  await generateTopicIndex(docs);

  // Update README files
  await updateReadmes(docs);
}

Benefits

For Developers

Single source of truth - Always know where to look
Fast discovery - Vector search finds relevant docs
Clear organization - Flat hierarchy, meaningful topics
Cross-referencing - Related docs linked
No cognitive overload - Simple navigation

For AI (Antigravity)

Persistent memory - Knowledge survives sessions
Searchable context - Vector search for relevant docs
Clear structure - Predictable file locations
Cross-references - Understand relationships
MCP integration - Direct search access

For Organization

Institutional memory - Knowledge preserved
Onboarding - New team members find docs easily
Consistency - Standard practices documented
Discoverability - Vector search reveals hidden knowledge
Maintainability - Clear lifecycle (golden/proposed/retired)

Success Metrics

Quantitative

Search success rate: >90% of searches find relevant docs
Time to find: <30 seconds average
Document coverage: >95% of knowledge in library
Cross-reference density: >3 links per document average

Qualitative

Developers can find docs without asking
AI can discover relevant context
New team members onboard faster
Knowledge doesn't get lost

Risks & Mitigation

Risk 1: Migration Disruption

Mitigation: - Gradual migration (high-value first) - Keep old structure during transition - Clear migration guide - Automated migration scripts

Risk 2: Maintenance Burden

Mitigation: - Automated index generation - Pre-commit validation - Simple flat structure - Clear ownership

Risk 3: Over-Organization

Mitigation: - Maximum 3 levels deep - Flat topic hierarchy - Avoid premature categorization - Regular pruning

Risk 4: Search Quality

Mitigation: - Start with local embeddings (fast, free) - Upgrade to API if needed - Hybrid approach (local + API) - Regular search quality reviews

Recommendation

✅ Proceed with Knowledge Library

Why: 1. Solves real problem (fragmented knowledge) 2. Balances organization with simplicity 3. Enables vector search (strategic advantage) 4. Supports AI persistent memory 5. Scalable and maintainable

Next Steps: 1. Review and approve this proposal 2. Create knowledge/ structure 3. Migrate high-value content 4. Implement vector search 5. Establish practices

Questions for Review

Structure: Is the flat hierarchy appropriate?
Topics: Are the top-level topics comprehensive?
Search: Local embeddings vs. API vs. hybrid?
Migration: Phased approach acceptable?
Naming: File naming conventions clear?