Batch System Integration & Environment Strategy
Date: 2026-01-18
Status: Architecture Decision Required
Context: Checkpoint 1 complete, need to define integration and environment strategy
Critical Question
Should we have: 1. One unified batch system spanning all environments (dev/stg/prod)? 2. Separate batch systems per environment?
Current Infrastructure Context
Existing Environments
Firebase Projects:
{
"projects": {
"default": "singular-dream", // Production
"dev": "singular-dream-dev", // Development/Staging
"stg": "singular-dream-stg" // Staging (separate)
}
}
Environment Characteristics:
- Production (singular-dream): Live data, real users
- Staging (singular-dream-stg): Pre-production testing
- Development (singular-dream-dev): Active development
Existing Systems to Integrate
1. Firebase (Multi-environment) - Firestore databases (separate per environment) - Authentication (separate per environment) - Storage (separate per environment)
2. Doppler (Environment-aware)
- Secrets management
- Environment-specific configurations
- Project: singular-dream with configs: dev, stg, prod
3. Elastic Muscle (Shared compute) - GCP VM for heavy workloads - Currently shared across environments - Could run multiple workers
4. Redis (To be determined) - Queue storage - Environment separation TBD
Integration Analysis
What Batch Jobs Will Do
Development/Testing Jobs: - Run test suites - Build verification - Schema migrations (test) - Data seeding (test data) - Lighthouse audits (dev sites)
Staging Jobs: - Pre-production builds - Integration testing - Performance testing - Schema migrations (staging) - Data validation
Production Jobs: - Scheduled reports - Data backups - Maintenance tasks - Schema migrations (production) - Analytics processing
Option 1: Unified Batch System
Architecture
Single batch infrastructure:
Elastic Muscle (Shared)
├── Redis (Single instance)
│ ├── queue:batch:dev:class-A
│ ├── queue:batch:dev:class-B
│ ├── queue:batch:stg:class-A
│ ├── queue:batch:prod:class-A
│ └── ...
├── Batch Worker (Environment-aware)
│ └── Executes jobs with environment context
└── API Server (Single)
└── Routes jobs to environment-specific queues
Job submission:
await batch.submit({
script: 'refactor-overnight',
class: 'C',
environment: 'dev', // NEW: Environment parameter
});
Pros ✅
- Single source of truth
- One batch system to maintain
- Unified monitoring dashboard
-
Consistent job definitions
-
Resource efficiency
- Shared Redis instance
- Shared worker pool
-
Better resource utilization
-
Cross-environment workflows
- Deploy dev → stg → prod pipeline
- Promote jobs across environments
-
Unified scheduling
-
Simpler operations
- One system to monitor
- One system to deploy
-
One set of credentials
-
Cost efficiency
- Single Redis instance
- Shared compute resources
- No duplication
Cons ❌
- Blast radius
- Bug in batch system affects all environments
-
Production at risk from dev/stg issues
-
Security concerns
- Dev jobs could access prod queues (if misconfigured)
- Shared credentials risk
-
Requires strict access control
-
Resource contention
- Dev jobs could starve prod jobs
-
Need priority management across environments
-
Complexity
- Environment-aware job execution
- Environment-specific Doppler configs
- More complex routing logic
Option 2: Separated Batch Systems
Architecture
Per-environment infrastructure:
Development:
├── Redis (dev)
├── Batch Worker (dev)
└── API Server (dev)
Staging:
├── Redis (stg)
├── Batch Worker (stg)
└── API Server (stg)
Production:
├── Redis (prod)
├── Batch Worker (prod)
└── API Server (prod)
Job submission:
// Dev environment
await devBatch.submit({
script: 'refactor-overnight',
class: 'C',
});
// Prod environment
await prodBatch.submit({
script: 'backup-database',
class: 'A',
});
Pros ✅
- Complete isolation
- Dev issues don't affect prod
- Separate failure domains
-
Independent scaling
-
Security by separation
- No cross-environment access
- Separate credentials
-
Clear boundaries
-
Environment-specific tuning
- Dev: Fast, loose
- Prod: Slow, careful
-
Different worker configs
-
Simpler per-environment logic
- No environment parameter needed
- Clearer job execution context
- Less conditional logic
Cons ❌
- Operational overhead
- 3x systems to maintain
- 3x monitoring dashboards
-
3x deployments
-
Resource waste
- 3x Redis instances
- Underutilized resources
-
Higher costs
-
No cross-environment workflows
- Can't orchestrate dev → stg → prod
- Manual promotion required
-
Harder to coordinate
-
Code duplication
- Same job definitions in 3 places
- Harder to keep in sync
- More maintenance burden
Recommendation: Hybrid Approach
Best of Both Worlds
Shared infrastructure, environment-aware execution:
Elastic Muscle
├── Redis (Single, namespace-separated)
│ ├── Namespace: dev
│ ├── Namespace: stg
│ └── Namespace: prod
├── Batch Worker Pool
│ ├── Workers tagged by environment
│ ├── Dev workers: 4 concurrent
│ ├── Stg workers: 2 concurrent
│ └── Prod workers: 2 concurrent (priority)
└── API Server (Environment-aware routing)
Implementation
1. Environment-Namespaced Queues
// QueueConfig with environment
interface QueueConfig {
redis: RedisConfig;
environment: 'dev' | 'stg' | 'prod';
queues: {
A: `queue:batch:${environment}:class-A`,
B: `queue:batch:${environment}:class-B`,
// ...
};
}
2. Environment-Tagged Workers
// Worker with environment filter
class BatchWorker {
constructor(
queueConfig: QueueConfig,
workerConfig: WorkerConfig,
environment: 'dev' | 'stg' | 'prod' // NEW
) {
this.environment = environment;
}
// Only poll queues for this environment
private async pollQueues() {
const envQueues = this.getEnvironmentQueues();
// ...
}
}
3. Environment-Specific Doppler Configs
// Executor with environment-aware Doppler
class JobExecutor {
async execute(job: BatchJobRequest) {
const dopplerConfig = this.getDopplerConfig(job.environment);
const command = `doppler run --project singular-dream --config ${dopplerConfig} -- pnpm tsx scripts/${job.script}.ts`;
// ...
}
private getDopplerConfig(env: string): string {
return env; // 'dev', 'stg', or 'prod'
}
}
4. Priority Management
// Production jobs always have higher priority
const ENVIRONMENT_PRIORITY = {
prod: 1, // Highest
stg: 2,
dev: 3, // Lowest
};
// Worker processes prod jobs first
async pollQueues() {
for (const env of ['prod', 'stg', 'dev']) {
for (const jobClass of ['A', 'B', 'C', 'D']) {
const job = await this.dequeue(env, jobClass);
if (job) this.processJob(job);
}
}
}
Hybrid Pros ✅
- Isolation with efficiency
- Namespace separation in Redis
- Shared infrastructure
-
Cost-effective
-
Security through tagging
- Workers only process their environment
- Doppler configs environment-specific
-
Clear access boundaries
-
Priority management
- Prod jobs prioritized
- Dev jobs can't starve prod
-
Fair resource allocation
-
Operational simplicity
- Single system to monitor
- Single deployment
-
Unified dashboard (filtered by env)
-
Cross-environment workflows
- Can orchestrate across environments
- Promote jobs dev → stg → prod
- Unified scheduling
Hybrid Cons ❌
- Moderate complexity
- Environment parameter required
- Worker tagging logic
-
Priority management
-
Shared failure domain
- Redis failure affects all environments
- Worker crash affects all
- Requires robust error handling
Integration Points
1. Firebase Integration
Environment-aware:
// Job execution with correct Firebase project
const firebaseProject = {
dev: 'singular-dream-dev',
stg: 'singular-dream-stg',
prod: 'singular-dream',
}[job.environment];
process.env.FIREBASE_PROJECT = firebaseProject;
2. Doppler Integration
Already environment-aware:
doppler run --project singular-dream --config dev -- script.ts
doppler run --project singular-dream --config stg -- script.ts
doppler run --project singular-dream --config prod -- script.ts
3. Elastic Muscle
Shared compute, environment-tagged workers:
# Start dev worker
BATCH_ENVIRONMENT=dev pnpm devops batch:worker
# Start stg worker
BATCH_ENVIRONMENT=stg pnpm devops batch:worker
# Start prod worker
BATCH_ENVIRONMENT=prod pnpm devops batch:worker
4. Redis
Namespace separation:
dev:queue:batch:class-A
dev:queue:batch:class-B
stg:queue:batch:class-A
stg:queue:batch:class-B
prod:queue:batch:class-A
prod:queue:batch:class-B
Security Considerations
Access Control
Environment-based permissions:
// API server validates environment access
async submitJob(req, res) {
const { environment, script, class } = req.body;
// Check user has permission for this environment
if (!await hasEnvironmentAccess(req.user, environment)) {
return res.status(403).json({ error: 'Forbidden' });
}
// Submit to environment-specific queue
await queueManager.enqueue(environment, job);
}
Credential Isolation
Doppler provides environment isolation: - Dev Doppler config → Dev Firebase, Dev APIs - Prod Doppler config → Prod Firebase, Prod APIs - No cross-environment credential leakage
Recommendation Summary
✅ Hybrid Approach (Recommended)
Rationale: 1. Cost-effective: Single infrastructure 2. Secure: Namespace + environment tagging 3. Flexible: Cross-environment workflows 4. Scalable: Priority management 5. Maintainable: Single codebase
Implementation: - Single Redis with namespaces - Environment-tagged workers - Environment-aware job execution - Doppler config per environment - Priority: prod > stg > dev
Trade-offs: - Moderate complexity (acceptable) - Shared failure domain (mitigated by robust error handling)
Next Steps
- Update batch types to include
environmentfield - Modify QueueManager for namespace support
- Update BatchWorker with environment filtering
- Add environment validation to API server
- Document environment strategy in ARCHITECTURE.md
- Create deployment guide for multi-environment workers
Questions for User
- Confirm hybrid approach? Or prefer full separation?
- Redis hosting? Cloud Redis (managed) or self-hosted on Elastic Muscle?
- Worker allocation? How many workers per environment?
- Priority rules? Should prod always preempt dev/stg?
- Cross-environment jobs? Allow jobs that span environments (e.g., promote dev → stg)?