DevOps Tooling Architecture - Recommendation
Status: β
IN PROGRESS - Batch subsystem implemented, M2 migration pending
Priority: #2 (Proceeding independently - M2 doesn't block Property refactor)
Date Created: 2026-01-18
Last Updated: 2026-01-18
Decision: Create apps/devops as third application
β IMPLEMENTATION UPDATE: Initial implementation complete! Created
apps/devops/with full batch processing subsystem. M2 migration and remaining script consolidation still pending.
Implementation Progress
Completed (2026-01-18):
- β
Created apps/devops/ directory structure
- β
Batch subsystem fully implemented:
- Types & configuration (job classes A/B/C/D)
- SDK client (BatchClient)
- Queue manager (Redis)
- Job executor
- Batch worker (polling & execution)
- CLI commands (submit, status, list, cancel, queue-stats)
- MCP server (5 tools for Antigravity)
- β
Unified SDK entry point
- β
CLI with Commander.js
- β
Package.json and TypeScript config
Pending:
- β³ M2 AI module migration
- β³ Migration of 180+ scripts from scripts/ directory
- β³ Audit, seed, verify, deploy, monitor modules
- β³ Full deployment to Elastic Muscle
See: apps/devops/ARCHITECTURE.md for implementation details
β οΈ ORIGINAL PRIORITY NOTE: This refactoring was originally blocked by Sovereign Property/Ownership module consolidation. However, M2 and batch systems don't depend on that refactor, so we proceeded independently.
Project Priorities
| Priority | Project | Status | Blocking |
|---|---|---|---|
| #1 | Sovereign Property/Ownership Module | π΄ In Progress | Platform development |
| #2 | DevOps Tooling Application | π‘ Approved (Concept) | Priority #1 completion |
| #3 | Platform Feature Development | βΈοΈ Paused | Priority #1 completion |
Execution Order: 1. Complete Sovereign Property/Ownership module consolidation (Directory + Property β Unified) 2. Resume platform feature development 3. Plan detailed design for DevOps tooling application 4. Execute DevOps tooling refactor (when bandwidth allows)
Current State Analysis
Scripts Directory Inventory
- Total Scripts: 180+ files
- TypeScript Scripts: 145+
.tsfiles - Shell Scripts: 30+
.shfiles - Subdirectories: 6 (checks, git-hooks, jobs, legacy, lib, utils)
Script Categories Identified
| Category | Count | Examples |
|---|---|---|
| Audit Tools | 12 | audit-capabilities.ts, audit-rbac.ts, audit-schema.ts |
| Data Seeders | 8 | dataseeder-gold.seed.ts, dataseeder-chaos.seed.ts |
| AutoTesters | 4 | autotester-overnight.ts, autotester-sprint.ts |
| Verification | 30+ | verify-*.ts (finance, inventory, permissions, etc.) |
| Infrastructure | 15+ | check-elastic-muscle.ts, monitor-*.sh, setup-*.sh |
| Deployment | 8 | deploy-marketing.ts, sync-doppler-to-vercel.sh |
| Development | 10+ | ghostbuster.ts, dev-mode.ts, sunrise.ts, goodnight.ts |
| Testing | 15+ | test-*.ts (governance, finance, operations) |
| Debug Tools | 10+ | debug-*.ts, probe-*.ts |
| Batch Processing | 5 | process-batch-queue.ts, submit-batch-job.ts |
Problems with Current Structure
1. Lack of Organization
- 180+ files in a single flat directory
- No clear separation of concerns
- Difficult to discover and maintain tools
2. No Dependency Management
- Scripts import from
apps/platformdirectly - Circular dependency risks
- No shared utilities between scripts
3. No Type Safety for Shared Code
lib/andutils/subdirectories exist but are ad-hoc- No consistent patterns for reusable code
- Duplication across scripts
4. Deployment Challenges
- Scripts run locally or on Elastic Muscle
- No bundling or optimization
- Manual dependency tracking
5. Testing Gaps
- Scripts are not unit tested
- No integration tests for tooling
- Hard to verify script behavior
6. Documentation Scattered
- Some scripts have inline docs
- No central registry of tools
- Hard to know what tools exist
Recommendation: Create apps/devops
Architecture
apps/
βββ platform/ # Main application (existing)
βββ marketing/ # Marketing site (existing)
βββ devops/ # NEW: DevOps tooling application
βββ src/
β βββ commands/ # CLI commands (organized by domain)
β β βββ audit/
β β β βββ capabilities.ts
β β β βββ rbac.ts
β β β βββ schema.ts
β β βββ seed/
β β β βββ gold.ts
β β β βββ chaos.ts
β β β βββ finance.ts
β β βββ verify/
β β β βββ finance.ts
β β β βββ inventory.ts
β β β βββ permissions.ts
β β βββ deploy/
β β β βββ marketing.ts
β β β βββ sync-env.ts
β β βββ monitor/
β β β βββ elastic-muscle.ts
β β β βββ infrastructure.ts
β β βββ dev/
β β β βββ ghostbuster.ts
β β β βββ sunrise.ts
β β β βββ goodnight.ts
β β βββ batch/
β β βββ submit.ts # Submit jobs to queue
β β βββ status.ts # Check job status
β β βββ cancel.ts # Cancel pending jobs
β βββ batch/ # Batch processing subsystem
β β βββ core/
β β β βββ job-classes.ts # A/B/C/D priority classes
β β β βββ job-scheduler.ts # Scheduling logic
β β β βββ queue-manager.ts # Redis queue operations
β β β βββ executor.ts # Job execution engine
β β βββ jobs/
β β β βββ build.ts # Build job
β β β βββ deploy-marketing.ts # Marketing deploy job
β β β βββ test.ts # Test job
β β β βββ refactor.ts # Refactor job
β β β βββ index.ts # Job registry
β β βββ dispatcher/
β β β βββ worker.ts # Job worker process
β β β βββ scheduler.ts # Scheduled job processor
β β β βββ monitor.ts # Health monitoring
β β βββ types/
β β βββ job-request.ts
β β βββ job-result.ts
β β βββ queue-config.ts
β βββ lib/ # Shared utilities
β β βββ firebase/
β β βββ doppler/
β β βββ vercel/
β β βββ redis/
β β βββ logger/
β β βββ cli/
β βββ types/ # Shared types
β βββ index.ts # Main CLI entry point
βββ package.json
βββ tsconfig.json
βββ README.md
Benefits
1. Clear Separation of Concerns
- DevOps tools separate from application code
- Platform code doesn't mix with tooling
- Marketing site remains independent
2. Proper Dependency Management
- DevOps app can depend on
@sd/modules/*and@sd/foundation/* - No reverse dependencies (platform doesn't depend on devops)
- Shared utilities properly packaged
3. Better Developer Experience
- Single CLI entry point:
pnpm devops <command> - Auto-completion and help text
- Organized command structure
4. Type Safety
- Full TypeScript support
- Shared types between commands
- Compile-time validation
5. Testing
- Unit tests for each command
- Integration tests for workflows
- CI/CD validation
6. Deployment Options
- Bundle as standalone CLI
- Deploy to Elastic Muscle as service
- Package for distribution
7. Documentation
- Auto-generated command docs
- Centralized tool registry
- Examples and usage guides
Batch Processing Subsystem
The batch processing system is a critical subsystem that deserves dedicated organization.
Current Batch Components
| Component | Purpose | Current Location |
|---|---|---|
| Job Submitter | Submit jobs to queue | scripts/submit-batch-job.ts |
| Job Processor | Execute queued jobs | scripts/process-batch-queue.ts |
| Status Checker | Check job status | scripts/check-batch-status.ts |
| Job Definitions | Actual job logic | scripts/jobs/*.ts |
| Queue Manager | Redis queue operations | scripts/lib/redis-init.ts |
| Batch Client | Client library | scripts/lib/batch-client.ts |
Batch Architecture in DevOps App
apps/devops/src/batch/
βββ core/ # Core batch logic
β βββ job-classes.ts # A/B/C/D priority system
β β - CRITICAL (A): Production deploys, emergency fixes
β β - HIGH (B): Time-sensitive, high-priority features
β β - NORMAL (C): Regular builds, tests, refactors
β β - LOW (D): Cleanup, maintenance tasks
β βββ job-scheduler.ts # Scheduling engine
β β - Immediate execution
β β - Scheduled execution (cron-like)
β β - Recurring jobs
β βββ queue-manager.ts # Redis queue operations
β β - Priority queues (queue:batch:class-A/B/C/D)
β β - Scheduled queue (queue:batch:scheduled)
β β - Job lifecycle management
β βββ executor.ts # Job execution engine
β - Resource allocation
β - CPU priority (nice values)
β - Instance type selection (on-demand vs spot)
β - Tier selection (micro/mid/extreme)
β
βββ jobs/ # Job implementations
β βββ build.ts # Build jobs
β βββ deploy-marketing.ts # Marketing deployment
β βββ test.ts # Test execution
β βββ refactor.ts # Code refactoring
β βββ lighthouse.ts # Performance audits
β βββ index.ts # Job registry & factory
β
βββ dispatcher/ # Job dispatching
β βββ worker.ts # Worker process
β β - Poll queues by priority
β β - Execute jobs
β β - Update status in Firestore
β βββ scheduler.ts # Scheduled job processor
β β - Check scheduled queue
β β - Move jobs to execution queue
β βββ monitor.ts # Health monitoring
β β - Worker health checks
β β - Queue depth monitoring
β β - Cost tracking
β βββ server.ts # Batch server (runs on Elastic Muscle)
β
βββ types/ # Type definitions
βββ job-request.ts # BatchJobRequest interface
βββ job-result.ts # BatchJobResult interface
βββ queue-config.ts # Queue configuration
βββ job-class-config.ts # Priority class configuration
Batch CLI Commands
# Submit jobs
pnpm devops batch submit build --class C --now
pnpm devops batch submit deploy-marketing --class B --now
pnpm devops batch submit refactor --class C --at "2026-01-19T02:00:00Z"
# Check status
pnpm devops batch status <job-id>
pnpm devops batch list --class A
pnpm devops batch list --status running
# Cancel jobs
pnpm devops batch cancel <job-id>
pnpm devops batch cancel --class D --status pending
# Monitor
pnpm devops batch monitor queues
pnpm devops batch monitor workers
pnpm devops batch monitor costs
Batch Server Deployment
The batch server runs on Elastic Muscle:
# Deploy batch server to Elastic Muscle
pnpm devops deploy batch-server
# Start batch worker
ssh elastic-muscle "cd /opt/singular-dream && pnpm devops batch worker start"
# Monitor batch server
pnpm devops monitor batch-server
Implementation Plan
Phase 1: Setup (Week 1)
- [ ] Create
apps/devopsdirectory structure - [ ] Setup
package.jsonwith CLI framework (Commander.js or Yargs) - [ ] Create base CLI entry point
- [ ] Setup TypeScript configuration
- [ ] Add to TurboRepo pipeline
Phase 2: Migrate Core Tools (Week 2)
- [ ] Migrate
dev-mode.ts,ghostbuster.ts,sunrise.ts,goodnight.ts - [ ] Create
commands/dev/structure - [ ] Extract shared utilities to
lib/ - [ ] Update root
package.jsonscripts to use new CLI
Phase 3: Migrate Audit Tools (Week 3)
- [ ] Migrate all
audit-*.tsscripts - [ ] Create
commands/audit/structure - [ ] Standardize audit output format
- [ ] Add tests
Phase 4: Migrate Data Seeders (Week 4)
- [ ] Migrate all
dataseeder-*.seed.tsscripts - [ ] Create
commands/seed/structure - [ ] Standardize seeding patterns
- [ ] Add idempotency checks
Phase 5: Migrate Verification Tools (Week 5-6)
- [ ] Migrate all
verify-*.tsscripts - [ ] Create
commands/verify/structure - [ ] Standardize verification patterns
- [ ] Add comprehensive tests
Phase 6: Migrate Infrastructure Tools (Week 7)
- [ ] Migrate monitoring, deployment, and setup scripts
- [ ] Create
commands/monitor/,commands/deploy/,commands/setup/ - [ ] Add health check dashboards
Phase 7: Documentation \u0026 Polish (Week 8)
- [ ] Generate command documentation
- [ ] Create usage guides
- [ ] Add examples
- [ ] Update Engineering Handbook
CLI Interface Design
Proposed Command Structure
# Development commands
pnpm devops dev start # Start dev environment
pnpm devops dev ghostbuster # Kill zombie processes
pnpm devops dev sunrise # Morning protocol
pnpm devops dev goodnight # Evening protocol
# Audit commands
pnpm devops audit capabilities # Audit capability registry
pnpm devops audit rbac # Audit RBAC implementation
pnpm devops audit schema # Audit Zod schemas
pnpm devops audit all # Run all audits
# Seeding commands
pnpm devops seed gold # Seed golden data
pnpm devops seed chaos # Seed chaos data
pnpm devops seed finance # Seed finance data
pnpm devops seed all # Seed all data
# Verification commands
pnpm devops verify finance # Verify finance module
pnpm devops verify inventory # Verify inventory module
pnpm devops verify all # Run all verifications
# Deployment commands
pnpm devops deploy marketing # Deploy marketing site
pnpm devops deploy platform # Deploy platform
pnpm devops sync env # Sync environment variables
# Monitoring commands
pnpm devops monitor elastic # Check Elastic Muscle health
pnpm devops monitor infra # Check infrastructure
pnpm devops monitor costs # Check Vercel costs
# Testing commands
pnpm devops test overnight # Run overnight tests
pnpm devops test sprint # Run sprint certification
pnpm devops test sanctuary # Run production health checks
Migration Strategy
Backward Compatibility
Keep existing scripts/ directory during migration:
// Root package.json
{
"scripts": {
// Old (deprecated but working)
"sunrise": "tsx scripts/sunrise.ts",
// New (preferred)
"sunrise": "pnpm devops dev sunrise",
// Alias for transition
"dev:sunrise": "pnpm devops dev sunrise"
}
}
Gradual Migration
- Week 1-2: Core dev tools (high usage)
- Week 3-4: Audit and seed tools (medium usage)
- Week 5-6: Verification tools (low usage)
- Week 7-8: Infrastructure and monitoring (specialized)
Deprecation Timeline
- Month 1: Both old and new work
- Month 2: Warnings added to old scripts
- Month 3: Old scripts removed,
scripts/becomesscripts/legacy/
Alternative: Package-Based Approach
Instead of apps/devops, could create packages/devops-cli:
Pros
- Lighter weight
- Can be imported by other packages
- Easier to version independently
Cons
- Not a deployable application
- Harder to bundle for distribution
- Less clear separation from library code
Recommendation: Stick with apps/devops for clearer separation and deployment options.
Technical Stack
CLI Framework
Recommendation: Commander.js - TypeScript-first - Auto-completion support - Subcommand organization - Help text generation
Alternatives Considered
- Yargs: More complex API
- Oclif: Overkill for internal tools
- Custom: Reinventing the wheel
Shared Libraries
@sd/foundation-firebase- Firebase utilities@sd/foundation-doppler- Doppler integration (new)@sd/foundation-logger- Structured logging (new)@sd/foundation-cli- CLI helpers (new)
Success Metrics
Developer Experience
- [ ] Time to discover a tool: < 30 seconds
- [ ] Time to run a command: < 5 seconds
- [ ] Command success rate: > 95%
Code Quality
- [ ] Test coverage: > 80%
- [ ] Type safety: 100% (no
any) - [ ] Documentation: Every command documented
Maintenance
- [ ] Reduce script duplication by 50%
- [ ] Centralize shared utilities
- [ ] Standardize error handling
Risks \u0026 Mitigation
Risk 1: Migration Disruption
Mitigation: Gradual migration with backward compatibility
Risk 2: Learning Curve
Mitigation: Clear documentation, examples, and training
Risk 3: Over-Engineering
Mitigation: Start simple, add complexity only when needed
Risk 4: Dependency Bloat
Mitigation: Careful dependency management, tree-shaking
Recommendation
β
YES - Create apps/devops
Why?
- 180+ scripts is too many for a flat directory
- Clear separation between application and tooling
- Better DX with organized CLI
- Proper testing and type safety
- Future-proof for deployment and distribution
Next Steps
- Review and approve this plan
- Create
apps/devopsskeleton - Migrate core dev tools (sunrise, ghostbuster, goodnight)
- Update root scripts to use new CLI
- Continue phased migration
Questions for Review
- Naming: Is
apps/devopsthe right name? Alternatives:apps/cli,apps/tools,apps/ops - Scope: Should this include ALL scripts or just TypeScript ones?
- Timeline: Is 8 weeks reasonable or should we accelerate/slow down?
- CLI Framework: Commander.js vs. alternatives?
- Backward Compatibility: How long should we maintain old scripts?