Sentry Centralized Logging for Batch System
Date: 2026-01-18
Status: Architecture Design
Purpose: Define Sentry's role in centralized logging for batch jobs
Sentry Logging Capabilities
Current Features (Activated)
From SENTRY_ACTIVATION_COMPLETE.md: - ✅ Error tracking - Exceptions and errors - ✅ Performance monitoring - Transaction traces - ✅ Session Replay - User session recordings - ✅ Logs - Centralized log aggregation ← THIS IS KEY - ✅ MCP server monitoring - Cursor integration
Centralized Logging Architecture
What Sentry Logs Provide
1. Unified Log Stream - All application logs in one place - Searchable and filterable - Correlated with errors and transactions - Retention and archival
2. Structured Logging - JSON-formatted logs - Metadata and context - Tags and custom fields - Trace correlation
3. Log Levels
- debug - Detailed debugging info
- info - General information
- warning - Warning messages
- error - Error messages
- fatal - Critical failures
Batch System Logging Integration
Current Problem
Without centralized logging:
Batch Job Execution:
├── stdout → Lost after job completes
├── stderr → Lost after job completes
└── Exit code → Stored in Redis, but no context
Result: Hard to debug failed jobs, no audit trail
Solution: Sentry Centralized Logging
With Sentry logging:
Batch Job Execution:
├── stdout → Sent to Sentry Logs
├── stderr → Sent to Sentry Logs
├── Exit code → Stored in Redis + Sentry
└── Context → Job metadata, environment, user
Result: Full audit trail, searchable logs, correlated with errors
Implementation
Level 1: Capture Job Output
Modify JobExecutor to send logs to Sentry:
import * as Sentry from '@sentry/node';
class JobExecutor {
async execute(job: BatchJobRequest): Promise<BatchJobResult> {
const transaction = Sentry.startTransaction({
op: 'batch.job',
name: `Batch Job: ${job.script}`,
});
try {
const command = this.buildCommand(job);
// Log job start
Sentry.addBreadcrumb({
category: 'batch.job',
message: `Starting job: ${job.script}`,
level: 'info',
data: {
jobId: job.id,
tier: job.tier,
namespace: job.namespace,
class: job.class,
},
});
// Execute with streaming output
const result = await this.executeWithLogging(command, job);
// Log job completion
Sentry.addBreadcrumb({
category: 'batch.job',
message: `Job completed: ${job.script}`,
level: 'info',
data: {
jobId: job.id,
exitCode: result.exitCode,
duration: result.duration,
},
});
return result;
} catch (error) {
Sentry.captureException(error);
throw error;
} finally {
transaction.finish();
}
}
private async executeWithLogging(
command: string,
job: BatchJobRequest
): Promise<BatchJobResult> {
const { spawn } = require('child_process');
const process = spawn('sh', ['-c', command]);
let stdout = '';
let stderr = '';
// Stream stdout to Sentry
process.stdout.on('data', (data: Buffer) => {
const line = data.toString();
stdout += line;
// Send each line to Sentry as a log
Sentry.captureMessage(line, {
level: 'info',
tags: {
jobId: job.id,
stream: 'stdout',
},
});
});
// Stream stderr to Sentry
process.stderr.on('data', (data: Buffer) => {
const line = data.toString();
stderr += line;
// Send each line to Sentry as a warning
Sentry.captureMessage(line, {
level: 'warning',
tags: {
jobId: job.id,
stream: 'stderr',
},
});
});
return new Promise((resolve, reject) => {
process.on('close', (exitCode: number) => {
if (exitCode === 0) {
resolve({
id: job.id,
status: 'completed',
exitCode,
output: stdout,
error: stderr,
});
} else {
reject(new Error(`Job failed with exit code ${exitCode}`));
}
});
});
}
}
Level 2: Structured Logging
Use Sentry's structured logging:
// Instead of console.log
console.log('Processing job', jobId);
// Use Sentry structured logs
Sentry.captureMessage('Processing job', {
level: 'info',
tags: {
jobId: job.id,
tier: job.tier,
namespace: job.namespace,
},
contexts: {
job: {
script: job.script,
class: job.class,
submittedBy: job.metadata?.user,
submittedAt: job.metadata?.timestamp,
},
},
});
Benefits: - Searchable by job ID - Filterable by tier/namespace - Correlated with errors - Rich context
Level 3: Log Aggregation Dashboard
Sentry provides:
Search queries:
# All logs for a specific job
jobId:abc-123
# All production batch jobs
tier:prod AND category:batch.job
# All failed jobs
level:error AND category:batch.job
# Jobs in last hour
timestamp:>now-1h AND category:batch.job
Dashboard widgets: - Job execution timeline - Error rate by tier - Most common failures - Average job duration
Batch Job Audit Trail
Complete Lifecycle Logging
Job submission:
Sentry.captureMessage('Batch job submitted', {
level: 'info',
tags: {
jobId: job.id,
tier: job.tier,
submittedBy: user.id,
},
contexts: {
job: { /* full job details */ },
},
});
Job queued:
Sentry.captureMessage('Job queued', {
level: 'info',
tags: {
jobId: job.id,
queueDepth: queueStats.pending,
},
});
Job started:
Sentry.captureMessage('Job execution started', {
level: 'info',
tags: {
jobId: job.id,
workerId: worker.id,
},
});
Job progress:
// From within the job script
Sentry.captureMessage('Processing 1000 records', {
level: 'info',
tags: {
jobId: process.env.BATCH_JOB_ID,
progress: '50%',
},
});
Job completed:
Sentry.captureMessage('Job completed successfully', {
level: 'info',
tags: {
jobId: job.id,
duration: result.duration,
exitCode: 0,
},
});
Result: Complete audit trail from submission to completion
Production Use Cases
1. Debugging Failed Jobs
Scenario: Production job fails at 3 AM
Without Sentry logs: - Check Redis for job status: "failed" - No stdout/stderr saved - No context about what went wrong - Have to re-run job to debug
With Sentry logs:
1. Search: jobId:failed-job-123
2. View complete log stream
3. See exact error message
4. View context (environment, inputs)
5. Correlate with other errors
6. Fix and re-run
2. Performance Analysis
Query Sentry:
Result:
- backup-database: 45 minutes (normal)
- generate-reports: 2 hours (slow!)
- cleanup-logs: 5 minutes (fast)
Action: Optimize generate-reports job
3. Compliance Audit
Question: "Who ran what jobs in production last month?"
Sentry query:
Result: Complete audit trail with user attribution
4. Error Pattern Detection
Sentry automatically detects: - Same error across multiple jobs - Increasing error rate - New error types - Correlated failures
Example:
🚨 Alert: Database connection errors
- Affecting 15 batch jobs
- Started 30 minutes ago
- All in production tier
- Suggested action: Check database pool
Integration with Existing Sentry Setup
Already Configured
From SENTRY_ACTIVATION_COMPLETE.md: - ✅ Sentry SDK installed - ✅ DSN configured - ✅ Auth token secured - ✅ Infrastructure monitoring active - ✅ Cron monitors running
What to Add
For batch system: 1. Initialize Sentry in batch worker 2. Add logging to JobExecutor 3. Configure log retention 4. Setup log-based alerts 5. Create log dashboard
Log Retention Strategy
Sentry Log Retention
Free tier: - 7 days retention - 1GB/month
Paid tier: - 30-90 days retention - Unlimited volume
Batch System Strategy
Production tier: - Keep all logs in Sentry (30 days) - Archive critical job logs to GCS (long-term) - Alert on log volume spikes
Non-production tier: - Keep logs in Sentry (7 days) - No archival needed - Informational only
Cost Considerations
Sentry Pricing
Logs: - Free: 1GB/month - Paid: $0.20/GB
Estimated batch log volume: - 100 jobs/day × 1MB/job = 100MB/day = 3GB/month - Cost: ~$0.60/month
Very affordable for the value provided
Implementation Checklist
Phase 1: Basic Logging (Checkpoint 2)
- [ ] Initialize Sentry in batch worker
- [ ] Add breadcrumbs for job lifecycle
- [ ] Capture stdout/stderr
- [ ] Test log viewing in Sentry
Phase 2: Structured Logging (Week 1)
- [ ] Convert to structured logs
- [ ] Add rich context
- [ ] Setup log-based alerts
- [ ] Create log dashboard
Phase 3: Advanced Features (Week 2)
- [ ] Log archival to GCS
- [ ] Performance analysis queries
- [ ] Compliance reporting
- [ ] Cost optimization
Comparison: Sentry vs. Other Solutions
Sentry Logs
Pros: - ✅ Already integrated - ✅ Correlated with errors - ✅ Searchable and filterable - ✅ No additional setup - ✅ Affordable
Cons: - ❌ Limited retention (free tier) - ❌ Not specialized for logs
Google Cloud Logging
Pros: - ✅ Unlimited retention - ✅ Deep GCP integration - ✅ Advanced querying
Cons: - ❌ Separate system - ❌ Not correlated with Sentry errors - ❌ More expensive
Recommendation
Use both: - Sentry for operational logs (debugging, monitoring) - GCS for long-term archival (compliance, analysis)
Conclusion
Yes, Sentry has a critical role in centralized logging for the batch system.
Benefits: 1. Unified logging - All batch job output in one place 2. Searchable - Find logs by job ID, tier, namespace 3. Correlated - Logs linked to errors and performance 4. Audit trail - Complete job lifecycle tracking 5. Affordable - ~$0.60/month for batch logs
Next Step: Implement basic logging in Checkpoint 2 alongside Sentry-Batch integration.