Batch System Integration & Environment Strategy

Date: 2026-01-18
Status: Architecture Decision Required
Context: Checkpoint 1 complete, need to define integration and environment strategy

Critical Question

Should we have: 1. One unified batch system spanning all environments (dev/stg/prod)? 2. Separate batch systems per environment?

Current Infrastructure Context

Existing Environments

Firebase Projects:

{
  "projects": {
    "default": "singular-dream",      // Production
    "dev": "singular-dream-dev",      // Development/Staging
    "stg": "singular-dream-stg"       // Staging (separate)
  }
}

Environment Characteristics: - Production (singular-dream): Live data, real users - Staging (singular-dream-stg): Pre-production testing - Development (singular-dream-dev): Active development

Existing Systems to Integrate

1. Firebase (Multi-environment) - Firestore databases (separate per environment) - Authentication (separate per environment) - Storage (separate per environment)

2. Doppler (Environment-aware) - Secrets management - Environment-specific configurations - Project: singular-dream with configs: dev, stg, prod

3. Elastic Muscle (Shared compute) - GCP VM for heavy workloads - Currently shared across environments - Could run multiple workers

4. Redis (To be determined) - Queue storage - Environment separation TBD

Integration Analysis

What Batch Jobs Will Do

Development/Testing Jobs: - Run test suites - Build verification - Schema migrations (test) - Data seeding (test data) - Lighthouse audits (dev sites)

Staging Jobs: - Pre-production builds - Integration testing - Performance testing - Schema migrations (staging) - Data validation

Production Jobs: - Scheduled reports - Data backups - Maintenance tasks - Schema migrations (production) - Analytics processing

Option 1: Unified Batch System

Architecture

Single batch infrastructure:

Elastic Muscle (Shared)
├── Redis (Single instance)
│   ├── queue:batch:dev:class-A
│   ├── queue:batch:dev:class-B
│   ├── queue:batch:stg:class-A
│   ├── queue:batch:prod:class-A
│   └── ...
├── Batch Worker (Environment-aware)
│   └── Executes jobs with environment context
└── API Server (Single)
    └── Routes jobs to environment-specific queues

Job submission:

await batch.submit({
  script: 'refactor-overnight',
  class: 'C',
  environment: 'dev',  // NEW: Environment parameter
});

Pros ✅

Single source of truth
One batch system to maintain
Unified monitoring dashboard
Consistent job definitions
Resource efficiency
Shared Redis instance
Shared worker pool
Better resource utilization
Cross-environment workflows
Deploy dev → stg → prod pipeline
Promote jobs across environments
Unified scheduling
Simpler operations
One system to monitor
One system to deploy
One set of credentials
Cost efficiency
Single Redis instance
Shared compute resources
No duplication

Cons ❌

Blast radius
Bug in batch system affects all environments
Production at risk from dev/stg issues
Security concerns
Dev jobs could access prod queues (if misconfigured)
Shared credentials risk
Requires strict access control
Resource contention
Dev jobs could starve prod jobs
Need priority management across environments
Complexity
Environment-aware job execution
Environment-specific Doppler configs
More complex routing logic

Option 2: Separated Batch Systems

Architecture

Per-environment infrastructure:

Development:
├── Redis (dev)
├── Batch Worker (dev)
└── API Server (dev)

Staging:
├── Redis (stg)
├── Batch Worker (stg)
└── API Server (stg)

Production:
├── Redis (prod)
├── Batch Worker (prod)
└── API Server (prod)

Job submission:

// Dev environment
await devBatch.submit({
  script: 'refactor-overnight',
  class: 'C',
});

// Prod environment
await prodBatch.submit({
  script: 'backup-database',
  class: 'A',
});

Pros ✅

Complete isolation
Dev issues don't affect prod
Separate failure domains
Independent scaling
Security by separation
No cross-environment access
Separate credentials
Clear boundaries
Environment-specific tuning
Dev: Fast, loose
Prod: Slow, careful
Different worker configs
Simpler per-environment logic
No environment parameter needed
Clearer job execution context
Less conditional logic

Cons ❌

Operational overhead
3x systems to maintain
3x monitoring dashboards
3x deployments
Resource waste
3x Redis instances
Underutilized resources
Higher costs
No cross-environment workflows
Can't orchestrate dev → stg → prod
Manual promotion required
Harder to coordinate
Code duplication
Same job definitions in 3 places
Harder to keep in sync
More maintenance burden

Recommendation: Hybrid Approach

Best of Both Worlds

Shared infrastructure, environment-aware execution:

Elastic Muscle
├── Redis (Single, namespace-separated)
│   ├── Namespace: dev
│   ├── Namespace: stg
│   └── Namespace: prod
├── Batch Worker Pool
│   ├── Workers tagged by environment
│   ├── Dev workers: 4 concurrent
│   ├── Stg workers: 2 concurrent
│   └── Prod workers: 2 concurrent (priority)
└── API Server (Environment-aware routing)

Implementation

1. Environment-Namespaced Queues

// QueueConfig with environment
interface QueueConfig {
  redis: RedisConfig;
  environment: 'dev' | 'stg' | 'prod';
  queues: {
    A: `queue:batch:${environment}:class-A`,
    B: `queue:batch:${environment}:class-B`,
    // ...
  };
}

2. Environment-Tagged Workers

// Worker with environment filter
class BatchWorker {
  constructor(
    queueConfig: QueueConfig,
    workerConfig: WorkerConfig,
    environment: 'dev' | 'stg' | 'prod'  // NEW
  ) {
    this.environment = environment;
  }

  // Only poll queues for this environment
  private async pollQueues() {
    const envQueues = this.getEnvironmentQueues();
    // ...
  }
}

3. Environment-Specific Doppler Configs

// Executor with environment-aware Doppler
class JobExecutor {
  async execute(job: BatchJobRequest) {
    const dopplerConfig = this.getDopplerConfig(job.environment);
    const command = `doppler run --project singular-dream --config ${dopplerConfig} -- pnpm tsx scripts/${job.script}.ts`;
    // ...
  }

  private getDopplerConfig(env: string): string {
    return env; // 'dev', 'stg', or 'prod'
  }
}

4. Priority Management

// Production jobs always have higher priority
const ENVIRONMENT_PRIORITY = {
  prod: 1,  // Highest
  stg: 2,
  dev: 3,   // Lowest
};

// Worker processes prod jobs first
async pollQueues() {
  for (const env of ['prod', 'stg', 'dev']) {
    for (const jobClass of ['A', 'B', 'C', 'D']) {
      const job = await this.dequeue(env, jobClass);
      if (job) this.processJob(job);
    }
  }
}

Hybrid Pros ✅

Isolation with efficiency
Namespace separation in Redis
Shared infrastructure
Cost-effective
Security through tagging
Workers only process their environment
Doppler configs environment-specific
Clear access boundaries
Priority management
Prod jobs prioritized
Dev jobs can't starve prod
Fair resource allocation
Operational simplicity
Single system to monitor
Single deployment
Unified dashboard (filtered by env)
Cross-environment workflows
Can orchestrate across environments
Promote jobs dev → stg → prod
Unified scheduling

Hybrid Cons ❌

Moderate complexity
Environment parameter required
Worker tagging logic
Priority management
Shared failure domain
Redis failure affects all environments
Worker crash affects all
Requires robust error handling

Integration Points

1. Firebase Integration

Environment-aware:

// Job execution with correct Firebase project
const firebaseProject = {
  dev: 'singular-dream-dev',
  stg: 'singular-dream-stg',
  prod: 'singular-dream',
}[job.environment];

process.env.FIREBASE_PROJECT = firebaseProject;

2. Doppler Integration

Already environment-aware:

doppler run --project singular-dream --config dev -- script.ts
doppler run --project singular-dream --config stg -- script.ts
doppler run --project singular-dream --config prod -- script.ts

3. Elastic Muscle

Shared compute, environment-tagged workers:

# Start dev worker
BATCH_ENVIRONMENT=dev pnpm devops batch:worker

# Start stg worker
BATCH_ENVIRONMENT=stg pnpm devops batch:worker

# Start prod worker
BATCH_ENVIRONMENT=prod pnpm devops batch:worker

4. Redis

Namespace separation:

dev:queue:batch:class-A
dev:queue:batch:class-B
stg:queue:batch:class-A
stg:queue:batch:class-B
prod:queue:batch:class-A
prod:queue:batch:class-B

Security Considerations

Access Control

Environment-based permissions:

// API server validates environment access
async submitJob(req, res) {
  const { environment, script, class } = req.body;

  // Check user has permission for this environment
  if (!await hasEnvironmentAccess(req.user, environment)) {
    return res.status(403).json({ error: 'Forbidden' });
  }

  // Submit to environment-specific queue
  await queueManager.enqueue(environment, job);
}

Credential Isolation

Doppler provides environment isolation: - Dev Doppler config → Dev Firebase, Dev APIs - Prod Doppler config → Prod Firebase, Prod APIs - No cross-environment credential leakage

Recommendation Summary

✅ Hybrid Approach (Recommended)

Rationale: 1. Cost-effective: Single infrastructure 2. Secure: Namespace + environment tagging 3. Flexible: Cross-environment workflows 4. Scalable: Priority management 5. Maintainable: Single codebase

Implementation: - Single Redis with namespaces - Environment-tagged workers - Environment-aware job execution - Doppler config per environment - Priority: prod > stg > dev

Trade-offs: - Moderate complexity (acceptable) - Shared failure domain (mitigated by robust error handling)

Next Steps

Update batch types to include environment field
Modify QueueManager for namespace support
Update BatchWorker with environment filtering
Add environment validation to API server
Document environment strategy in ARCHITECTURE.md
Create deployment guide for multi-environment workers

Questions for User

Confirm hybrid approach? Or prefer full separation?
Redis hosting? Cloud Redis (managed) or self-hosted on Elastic Muscle?
Worker allocation? How many workers per environment?
Priority rules? Should prod always preempt dev/stg?
Cross-environment jobs? Allow jobs that span environments (e.g., promote dev → stg)?