Sentry Centralized Logging for Batch System

Date: 2026-01-18
Status: Architecture Design
Purpose: Define Sentry's role in centralized logging for batch jobs

Sentry Logging Capabilities

Current Features (Activated)

From SENTRY_ACTIVATION_COMPLETE.md: - ✅ Error tracking - Exceptions and errors - ✅ Performance monitoring - Transaction traces - ✅ Session Replay - User session recordings - ✅ Logs - Centralized log aggregation ← THIS IS KEY - ✅ MCP server monitoring - Cursor integration

Centralized Logging Architecture

What Sentry Logs Provide

1. Unified Log Stream - All application logs in one place - Searchable and filterable - Correlated with errors and transactions - Retention and archival

2. Structured Logging - JSON-formatted logs - Metadata and context - Tags and custom fields - Trace correlation

3. Log Levels - debug - Detailed debugging info - info - General information - warning - Warning messages - error - Error messages - fatal - Critical failures

Batch System Logging Integration

Current Problem

Without centralized logging:

Batch Job Execution:
├── stdout → Lost after job completes
├── stderr → Lost after job completes
└── Exit code → Stored in Redis, but no context

Result: Hard to debug failed jobs, no audit trail

Solution: Sentry Centralized Logging

With Sentry logging:

Batch Job Execution:
├── stdout → Sent to Sentry Logs
├── stderr → Sent to Sentry Logs
├── Exit code → Stored in Redis + Sentry
└── Context → Job metadata, environment, user

Result: Full audit trail, searchable logs, correlated with errors

Implementation

Level 1: Capture Job Output

Modify JobExecutor to send logs to Sentry:

import * as Sentry from '@sentry/node';

class JobExecutor {
  async execute(job: BatchJobRequest): Promise<BatchJobResult> {
    const transaction = Sentry.startTransaction({
      op: 'batch.job',
      name: `Batch Job: ${job.script}`,
    });

    try {
      const command = this.buildCommand(job);

      // Log job start
      Sentry.addBreadcrumb({
        category: 'batch.job',
        message: `Starting job: ${job.script}`,
        level: 'info',
        data: {
          jobId: job.id,
          tier: job.tier,
          namespace: job.namespace,
          class: job.class,
        },
      });

      // Execute with streaming output
      const result = await this.executeWithLogging(command, job);

      // Log job completion
      Sentry.addBreadcrumb({
        category: 'batch.job',
        message: `Job completed: ${job.script}`,
        level: 'info',
        data: {
          jobId: job.id,
          exitCode: result.exitCode,
          duration: result.duration,
        },
      });

      return result;
    } catch (error) {
      Sentry.captureException(error);
      throw error;
    } finally {
      transaction.finish();
    }
  }

  private async executeWithLogging(
    command: string, 
    job: BatchJobRequest
  ): Promise<BatchJobResult> {
    const { spawn } = require('child_process');
    const process = spawn('sh', ['-c', command]);

    let stdout = '';
    let stderr = '';

    // Stream stdout to Sentry
    process.stdout.on('data', (data: Buffer) => {
      const line = data.toString();
      stdout += line;

      // Send each line to Sentry as a log
      Sentry.captureMessage(line, {
        level: 'info',
        tags: {
          jobId: job.id,
          stream: 'stdout',
        },
      });
    });

    // Stream stderr to Sentry
    process.stderr.on('data', (data: Buffer) => {
      const line = data.toString();
      stderr += line;

      // Send each line to Sentry as a warning
      Sentry.captureMessage(line, {
        level: 'warning',
        tags: {
          jobId: job.id,
          stream: 'stderr',
        },
      });
    });

    return new Promise((resolve, reject) => {
      process.on('close', (exitCode: number) => {
        if (exitCode === 0) {
          resolve({
            id: job.id,
            status: 'completed',
            exitCode,
            output: stdout,
            error: stderr,
          });
        } else {
          reject(new Error(`Job failed with exit code ${exitCode}`));
        }
      });
    });
  }
}

Level 2: Structured Logging

Use Sentry's structured logging:

// Instead of console.log
console.log('Processing job', jobId);

// Use Sentry structured logs
Sentry.captureMessage('Processing job', {
  level: 'info',
  tags: {
    jobId: job.id,
    tier: job.tier,
    namespace: job.namespace,
  },
  contexts: {
    job: {
      script: job.script,
      class: job.class,
      submittedBy: job.metadata?.user,
      submittedAt: job.metadata?.timestamp,
    },
  },
});

Benefits: - Searchable by job ID - Filterable by tier/namespace - Correlated with errors - Rich context

Level 3: Log Aggregation Dashboard

Sentry provides:

Search queries:

# All logs for a specific job
jobId:abc-123

# All production batch jobs
tier:prod AND category:batch.job

# All failed jobs
level:error AND category:batch.job

# Jobs in last hour
timestamp:>now-1h AND category:batch.job

Dashboard widgets: - Job execution timeline - Error rate by tier - Most common failures - Average job duration

Batch Job Audit Trail

Complete Lifecycle Logging

Job submission:

Sentry.captureMessage('Batch job submitted', {
  level: 'info',
  tags: {
    jobId: job.id,
    tier: job.tier,
    submittedBy: user.id,
  },
  contexts: {
    job: { /* full job details */ },
  },
});

Job queued:

Sentry.captureMessage('Job queued', {
  level: 'info',
  tags: {
    jobId: job.id,
    queueDepth: queueStats.pending,
  },
});

Job started:

Sentry.captureMessage('Job execution started', {
  level: 'info',
  tags: {
    jobId: job.id,
    workerId: worker.id,
  },
});

Job progress:

// From within the job script
Sentry.captureMessage('Processing 1000 records', {
  level: 'info',
  tags: {
    jobId: process.env.BATCH_JOB_ID,
    progress: '50%',
  },
});

Job completed:

Sentry.captureMessage('Job completed successfully', {
  level: 'info',
  tags: {
    jobId: job.id,
    duration: result.duration,
    exitCode: 0,
  },
});

Result: Complete audit trail from submission to completion

Production Use Cases

1. Debugging Failed Jobs

Scenario: Production job fails at 3 AM

Without Sentry logs: - Check Redis for job status: "failed" - No stdout/stderr saved - No context about what went wrong - Have to re-run job to debug

With Sentry logs:

1. Search: jobId:failed-job-123
2. View complete log stream
3. See exact error message
4. View context (environment, inputs)
5. Correlate with other errors
6. Fix and re-run

2. Performance Analysis

Query Sentry:

category:batch.job AND tier:prod
Group by: script
Metric: avg(duration)

Result: - backup-database: 45 minutes (normal) - generate-reports: 2 hours (slow!) - cleanup-logs: 5 minutes (fast)

Action: Optimize generate-reports job

3. Compliance Audit

Question: "Who ran what jobs in production last month?"

Sentry query:

tier:prod 
AND timestamp:>2026-01-01 
AND timestamp:<2026-02-01
Group by: submittedBy

Result: Complete audit trail with user attribution

4. Error Pattern Detection

Sentry automatically detects: - Same error across multiple jobs - Increasing error rate - New error types - Correlated failures

Example:

🚨 Alert: Database connection errors
- Affecting 15 batch jobs
- Started 30 minutes ago
- All in production tier
- Suggested action: Check database pool

Integration with Existing Sentry Setup

Already Configured

From SENTRY_ACTIVATION_COMPLETE.md: - ✅ Sentry SDK installed - ✅ DSN configured - ✅ Auth token secured - ✅ Infrastructure monitoring active - ✅ Cron monitors running

What to Add

For batch system: 1. Initialize Sentry in batch worker 2. Add logging to JobExecutor 3. Configure log retention 4. Setup log-based alerts 5. Create log dashboard

Log Retention Strategy

Sentry Log Retention

Free tier: - 7 days retention - 1GB/month

Paid tier: - 30-90 days retention - Unlimited volume

Batch System Strategy

Production tier: - Keep all logs in Sentry (30 days) - Archive critical job logs to GCS (long-term) - Alert on log volume spikes

Non-production tier: - Keep logs in Sentry (7 days) - No archival needed - Informational only

Cost Considerations

Sentry Pricing

Logs: - Free: 1GB/month - Paid: $0.20/GB

Estimated batch log volume: - 100 jobs/day × 1MB/job = 100MB/day = 3GB/month - Cost: ~$0.60/month

Very affordable for the value provided

Implementation Checklist

Phase 1: Basic Logging (Checkpoint 2)

[ ] Initialize Sentry in batch worker
[ ] Add breadcrumbs for job lifecycle
[ ] Capture stdout/stderr
[ ] Test log viewing in Sentry

Phase 2: Structured Logging (Week 1)

[ ] Convert to structured logs
[ ] Add rich context
[ ] Setup log-based alerts
[ ] Create log dashboard

Phase 3: Advanced Features (Week 2)

[ ] Log archival to GCS
[ ] Performance analysis queries
[ ] Compliance reporting
[ ] Cost optimization

Comparison: Sentry vs. Other Solutions

Sentry Logs

Pros: - ✅ Already integrated - ✅ Correlated with errors - ✅ Searchable and filterable - ✅ No additional setup - ✅ Affordable

Cons: - ❌ Limited retention (free tier) - ❌ Not specialized for logs

Google Cloud Logging

Pros: - ✅ Unlimited retention - ✅ Deep GCP integration - ✅ Advanced querying

Cons: - ❌ Separate system - ❌ Not correlated with Sentry errors - ❌ More expensive

Recommendation

Use both: - Sentry for operational logs (debugging, monitoring) - GCS for long-term archival (compliance, analysis)

Conclusion

Yes, Sentry has a critical role in centralized logging for the batch system.

Benefits: 1. Unified logging - All batch job output in one place 2. Searchable - Find logs by job ID, tier, namespace 3. Correlated - Logs linked to errors and performance 4. Audit trail - Complete job lifecycle tracking 5. Affordable - ~$0.60/month for batch logs

Next Step: Implement basic logging in Checkpoint 2 alongside Sentry-Batch integration.

Quick Start

// In apps/devops/src/batch/core/executor.ts

import * as Sentry from '@sentry/node';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.BATCH_TIER, // 'prod' or 'nonprod'
  tracesSampleRate: 1.0,
});

// Now all batch jobs send logs to Sentry automatically!