Skip to content

Batch System Integration & Environment Strategy

Date: 2026-01-18
Status: Architecture Decision Required
Context: Checkpoint 1 complete, need to define integration and environment strategy


Critical Question

Should we have: 1. One unified batch system spanning all environments (dev/stg/prod)? 2. Separate batch systems per environment?


Current Infrastructure Context

Existing Environments

Firebase Projects:

{
  "projects": {
    "default": "singular-dream",      // Production
    "dev": "singular-dream-dev",      // Development/Staging
    "stg": "singular-dream-stg"       // Staging (separate)
  }
}

Environment Characteristics: - Production (singular-dream): Live data, real users - Staging (singular-dream-stg): Pre-production testing - Development (singular-dream-dev): Active development

Existing Systems to Integrate

1. Firebase (Multi-environment) - Firestore databases (separate per environment) - Authentication (separate per environment) - Storage (separate per environment)

2. Doppler (Environment-aware) - Secrets management - Environment-specific configurations - Project: singular-dream with configs: dev, stg, prod

3. Elastic Muscle (Shared compute) - GCP VM for heavy workloads - Currently shared across environments - Could run multiple workers

4. Redis (To be determined) - Queue storage - Environment separation TBD


Integration Analysis

What Batch Jobs Will Do

Development/Testing Jobs: - Run test suites - Build verification - Schema migrations (test) - Data seeding (test data) - Lighthouse audits (dev sites)

Staging Jobs: - Pre-production builds - Integration testing - Performance testing - Schema migrations (staging) - Data validation

Production Jobs: - Scheduled reports - Data backups - Maintenance tasks - Schema migrations (production) - Analytics processing


Option 1: Unified Batch System

Architecture

Single batch infrastructure:

Elastic Muscle (Shared)
├── Redis (Single instance)
│   ├── queue:batch:dev:class-A
│   ├── queue:batch:dev:class-B
│   ├── queue:batch:stg:class-A
│   ├── queue:batch:prod:class-A
│   └── ...
├── Batch Worker (Environment-aware)
│   └── Executes jobs with environment context
└── API Server (Single)
    └── Routes jobs to environment-specific queues

Job submission:

await batch.submit({
  script: 'refactor-overnight',
  class: 'C',
  environment: 'dev',  // NEW: Environment parameter
});

Pros ✅

  1. Single source of truth
  2. One batch system to maintain
  3. Unified monitoring dashboard
  4. Consistent job definitions

  5. Resource efficiency

  6. Shared Redis instance
  7. Shared worker pool
  8. Better resource utilization

  9. Cross-environment workflows

  10. Deploy dev → stg → prod pipeline
  11. Promote jobs across environments
  12. Unified scheduling

  13. Simpler operations

  14. One system to monitor
  15. One system to deploy
  16. One set of credentials

  17. Cost efficiency

  18. Single Redis instance
  19. Shared compute resources
  20. No duplication

Cons ❌

  1. Blast radius
  2. Bug in batch system affects all environments
  3. Production at risk from dev/stg issues

  4. Security concerns

  5. Dev jobs could access prod queues (if misconfigured)
  6. Shared credentials risk
  7. Requires strict access control

  8. Resource contention

  9. Dev jobs could starve prod jobs
  10. Need priority management across environments

  11. Complexity

  12. Environment-aware job execution
  13. Environment-specific Doppler configs
  14. More complex routing logic

Option 2: Separated Batch Systems

Architecture

Per-environment infrastructure:

Development:
├── Redis (dev)
├── Batch Worker (dev)
└── API Server (dev)

Staging:
├── Redis (stg)
├── Batch Worker (stg)
└── API Server (stg)

Production:
├── Redis (prod)
├── Batch Worker (prod)
└── API Server (prod)

Job submission:

// Dev environment
await devBatch.submit({
  script: 'refactor-overnight',
  class: 'C',
});

// Prod environment
await prodBatch.submit({
  script: 'backup-database',
  class: 'A',
});

Pros ✅

  1. Complete isolation
  2. Dev issues don't affect prod
  3. Separate failure domains
  4. Independent scaling

  5. Security by separation

  6. No cross-environment access
  7. Separate credentials
  8. Clear boundaries

  9. Environment-specific tuning

  10. Dev: Fast, loose
  11. Prod: Slow, careful
  12. Different worker configs

  13. Simpler per-environment logic

  14. No environment parameter needed
  15. Clearer job execution context
  16. Less conditional logic

Cons ❌

  1. Operational overhead
  2. 3x systems to maintain
  3. 3x monitoring dashboards
  4. 3x deployments

  5. Resource waste

  6. 3x Redis instances
  7. Underutilized resources
  8. Higher costs

  9. No cross-environment workflows

  10. Can't orchestrate dev → stg → prod
  11. Manual promotion required
  12. Harder to coordinate

  13. Code duplication

  14. Same job definitions in 3 places
  15. Harder to keep in sync
  16. More maintenance burden

Recommendation: Hybrid Approach

Best of Both Worlds

Shared infrastructure, environment-aware execution:

Elastic Muscle
├── Redis (Single, namespace-separated)
│   ├── Namespace: dev
│   ├── Namespace: stg
│   └── Namespace: prod
├── Batch Worker Pool
│   ├── Workers tagged by environment
│   ├── Dev workers: 4 concurrent
│   ├── Stg workers: 2 concurrent
│   └── Prod workers: 2 concurrent (priority)
└── API Server (Environment-aware routing)

Implementation

1. Environment-Namespaced Queues

// QueueConfig with environment
interface QueueConfig {
  redis: RedisConfig;
  environment: 'dev' | 'stg' | 'prod';
  queues: {
    A: `queue:batch:${environment}:class-A`,
    B: `queue:batch:${environment}:class-B`,
    // ...
  };
}

2. Environment-Tagged Workers

// Worker with environment filter
class BatchWorker {
  constructor(
    queueConfig: QueueConfig,
    workerConfig: WorkerConfig,
    environment: 'dev' | 'stg' | 'prod'  // NEW
  ) {
    this.environment = environment;
  }

  // Only poll queues for this environment
  private async pollQueues() {
    const envQueues = this.getEnvironmentQueues();
    // ...
  }
}

3. Environment-Specific Doppler Configs

// Executor with environment-aware Doppler
class JobExecutor {
  async execute(job: BatchJobRequest) {
    const dopplerConfig = this.getDopplerConfig(job.environment);
    const command = `doppler run --project singular-dream --config ${dopplerConfig} -- pnpm tsx scripts/${job.script}.ts`;
    // ...
  }

  private getDopplerConfig(env: string): string {
    return env; // 'dev', 'stg', or 'prod'
  }
}

4. Priority Management

// Production jobs always have higher priority
const ENVIRONMENT_PRIORITY = {
  prod: 1,  // Highest
  stg: 2,
  dev: 3,   // Lowest
};

// Worker processes prod jobs first
async pollQueues() {
  for (const env of ['prod', 'stg', 'dev']) {
    for (const jobClass of ['A', 'B', 'C', 'D']) {
      const job = await this.dequeue(env, jobClass);
      if (job) this.processJob(job);
    }
  }
}

Hybrid Pros ✅

  1. Isolation with efficiency
  2. Namespace separation in Redis
  3. Shared infrastructure
  4. Cost-effective

  5. Security through tagging

  6. Workers only process their environment
  7. Doppler configs environment-specific
  8. Clear access boundaries

  9. Priority management

  10. Prod jobs prioritized
  11. Dev jobs can't starve prod
  12. Fair resource allocation

  13. Operational simplicity

  14. Single system to monitor
  15. Single deployment
  16. Unified dashboard (filtered by env)

  17. Cross-environment workflows

  18. Can orchestrate across environments
  19. Promote jobs dev → stg → prod
  20. Unified scheduling

Hybrid Cons ❌

  1. Moderate complexity
  2. Environment parameter required
  3. Worker tagging logic
  4. Priority management

  5. Shared failure domain

  6. Redis failure affects all environments
  7. Worker crash affects all
  8. Requires robust error handling

Integration Points

1. Firebase Integration

Environment-aware:

// Job execution with correct Firebase project
const firebaseProject = {
  dev: 'singular-dream-dev',
  stg: 'singular-dream-stg',
  prod: 'singular-dream',
}[job.environment];

process.env.FIREBASE_PROJECT = firebaseProject;

2. Doppler Integration

Already environment-aware:

doppler run --project singular-dream --config dev -- script.ts
doppler run --project singular-dream --config stg -- script.ts
doppler run --project singular-dream --config prod -- script.ts

3. Elastic Muscle

Shared compute, environment-tagged workers:

# Start dev worker
BATCH_ENVIRONMENT=dev pnpm devops batch:worker

# Start stg worker
BATCH_ENVIRONMENT=stg pnpm devops batch:worker

# Start prod worker
BATCH_ENVIRONMENT=prod pnpm devops batch:worker

4. Redis

Namespace separation:

dev:queue:batch:class-A
dev:queue:batch:class-B
stg:queue:batch:class-A
stg:queue:batch:class-B
prod:queue:batch:class-A
prod:queue:batch:class-B


Security Considerations

Access Control

Environment-based permissions:

// API server validates environment access
async submitJob(req, res) {
  const { environment, script, class } = req.body;

  // Check user has permission for this environment
  if (!await hasEnvironmentAccess(req.user, environment)) {
    return res.status(403).json({ error: 'Forbidden' });
  }

  // Submit to environment-specific queue
  await queueManager.enqueue(environment, job);
}

Credential Isolation

Doppler provides environment isolation: - Dev Doppler config → Dev Firebase, Dev APIs - Prod Doppler config → Prod Firebase, Prod APIs - No cross-environment credential leakage


Recommendation Summary

Rationale: 1. Cost-effective: Single infrastructure 2. Secure: Namespace + environment tagging 3. Flexible: Cross-environment workflows 4. Scalable: Priority management 5. Maintainable: Single codebase

Implementation: - Single Redis with namespaces - Environment-tagged workers - Environment-aware job execution - Doppler config per environment - Priority: prod > stg > dev

Trade-offs: - Moderate complexity (acceptable) - Shared failure domain (mitigated by robust error handling)


Next Steps

  1. Update batch types to include environment field
  2. Modify QueueManager for namespace support
  3. Update BatchWorker with environment filtering
  4. Add environment validation to API server
  5. Document environment strategy in ARCHITECTURE.md
  6. Create deployment guide for multi-environment workers

Questions for User

  1. Confirm hybrid approach? Or prefer full separation?
  2. Redis hosting? Cloud Redis (managed) or self-hosted on Elastic Muscle?
  3. Worker allocation? How many workers per environment?
  4. Priority rules? Should prod always preempt dev/stg?
  5. Cross-environment jobs? Allow jobs that span environments (e.g., promote dev → stg)?