Skip to content

Production Database Connection Monitoring Guide

TL;DR: Are we at risk?

No. The test database error (database is being accessed by other users) will NOT occur in production because: 1. You never drop the production database (error only happens on DROP DATABASE) 2. Production uses connection pooling with health checks 3. No multi-threaded operations in production code 4. Current usage: 3/100 connections (~3% utilization)

Current Production Health: ✅ EXCELLENT

Database Connections: 3/100 (3% utilization)
├── Active: 1
└── Idle: 2

Gunicorn Workers: 2 workers × 2 threads = 4 max connections
Celery Services: ~3 connections (worker + beat + flower)
Connection Management: Fully configured with safeguards

What Makes Production Different from Tests

Factor Tests (Had Issues) Production (Safe)
Threading ThreadPoolExecutor, 10-20 threads 2 workers × 2 threads only
Connection Pattern Rapid create/destroy Pooled, reused
Database Operations DROP DATABASE on teardown Normal CRUD only
Connection Lifetime CONN_MAX_AGE = 0 CONN_MAX_AGE = 60s
Health Checks Disabled Enabled ✅
Worker Restarts N/A Every 500 requests ✅

Built-in Safeguards Already Configured

1. Connection Pooling (settings.py:338-339)

"CONN_MAX_AGE": 60,              # Connections recycled every 60s
"CONN_HEALTH_CHECKS": True,      # Auto-cleanup broken connections

2. Database Timeouts (settings.py:334-335)

"connect_timeout": 10,           # Can't hang forever on connect
"statement_timeout": 30000,      # Queries timeout after 30s

3. Worker Recycling (gunicorn_config.py:22-24)

max_requests = 500               # Workers restart after 500 requests
max_requests_jitter = 50         # Prevents all workers restarting at once

4. Resource Limits

PostgreSQL max_connections: 100
Gunicorn max concurrent: 4 (2 workers × 2 threads)
Current utilization: 3% ✅
Headroom: 97 connections available

Monitoring Commands

Check Current Connection Status

# Quick overview
docker compose exec db psql -U user -d piquetickets -c "
  SELECT count(*) as total, state
  FROM pg_stat_activity
  WHERE datname = 'piquetickets'
  GROUP BY state;"

# Detailed view
docker compose exec db psql -U user -d piquetickets -c "
  SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    state_change,
    query_start,
    NOW() - query_start AS duration,
    LEFT(query, 50) as query_preview
  FROM pg_stat_activity
  WHERE datname = 'piquetickets'
  ORDER BY query_start DESC;"

Check for Connection Leaks

# Find idle connections older than 5 minutes
docker compose exec db psql -U user -d piquetickets -c "
  SELECT
    pid,
    usename,
    state,
    NOW() - state_change AS idle_duration,
    application_name
  FROM pg_stat_activity
  WHERE datname = 'piquetickets'
    AND state = 'idle'
    AND NOW() - state_change > interval '5 minutes';"

Check Connection Pool Utilization

# Calculate utilization percentage
docker compose exec db psql -U user -d postgres -c "
  SELECT
    (SELECT count(*) FROM pg_stat_activity WHERE datname = 'piquetickets') as current,
    (SELECT setting::int FROM pg_settings WHERE name = 'max_connections') as max_connections,
    ROUND(
      100.0 * (SELECT count(*) FROM pg_stat_activity WHERE datname = 'piquetickets')::numeric /
      (SELECT setting::int FROM pg_settings WHERE name = 'max_connections')::numeric,
      2
    ) as utilization_percent;"

When to Be Concerned

🟢 Normal (Current State)

  • ≤ 20 connections (< 20% utilization)
  • All workers healthy
  • No idle connections > 10 minutes
  • Response times normal

🟡 Monitor Closely

  • 20-50 connections (20-50% utilization)
  • Some idle connections > 10 minutes
  • Response times slightly elevated
  • Action: Check for connection leaks, review slow queries

🔴 Alert / Investigate

  • 50 connections (> 50% utilization)

  • Many idle connections > 10 minutes
  • Connection pool near exhaustion
  • Response times degraded
  • Action:
  • Identify connection source: SELECT application_name, count(*) FROM pg_stat_activity GROUP BY application_name;
  • Check for runaway Celery tasks
  • Consider increasing max_connections temporarily
  • Review recent deployments

🚨 Critical

  • 90 connections (> 90% utilization)

  • New connections failing
  • Service degradation
  • Action:
  • Terminate idle connections: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND state_change < NOW() - interval '10 minutes';
  • Restart Celery workers: docker compose restart celery_worker
  • Restart API: docker compose restart api
  • Scale up if needed

Connection Math (Expected Usage)

Normal Load

Gunicorn (API):        2-4 connections  (workers × threads)
Celery Worker:         1-2 connections  (per worker)
Celery Beat:           1 connection
Celery Flower:         1 connection
Django Admin:          0-2 connections  (when in use)
Automated Tasks:       1-3 connections  (periodic)
                      ─────────────────
Total Expected:        6-13 connections ✅
Current Actual:        3 connections ✅
Safety Margin:         87 connections available

High Load (e.g., ticket sale rush)

Gunicorn (API):        4 connections    (all threads active)
Celery Worker:         2-3 connections  (processing tasks)
Celery Beat:           1 connection
Celery Flower:         1 connection
Concurrent Checkouts:  5-10 connections (temporary spikes)
                      ─────────────────
Peak Expected:         13-19 connections ✅
Still well under limit of 100

Celery-Specific Considerations

Your Celery is configured with:

worker_prefetch_multiplier = 4  # Pre-fetch 4 tasks per worker

Celery Connection Best Practices

  1. Task timeouts configured: Prevents hung tasks from holding connections ✅
  2. Result backend = Redis: Celery results don't use DB connections ✅
  3. Beat scheduler = Database: Uses 1 persistent connection ✅

Potential Celery Issues (To Watch)

  • Long-running tasks: Can hold DB connections longer than expected
  • Monitor: celery -A brktickets inspect active
  • Task failures: Failed tasks should release connections, but verify
  • Monitor: celery -A brktickets inspect stats

Logging Connection Issues

Add this to monitor connection pool in production:

# In settings.py, add this to LOGGING configuration
'loggers': {
    'django.db.backends': {
        'level': 'WARNING',  # Set to 'DEBUG' temporarily to see all queries
        'handlers': ['console'],
    },
}

If You Ever Need to Increase max_connections

# Connect to PostgreSQL
docker compose exec db psql -U user -d postgres

# Check current setting
SHOW max_connections;

# To increase (requires PostgreSQL restart):
ALTER SYSTEM SET max_connections = 200;

# Restart PostgreSQL
docker compose restart db

Note: Each connection uses ~10MB RAM. Current 100 limit = ~1GB RAM for connections.

Summary

Your production setup is well-configured - Connection pooling with health checks enabled - Timeouts prevent runaway connections - Workers restart regularly to prevent leaks - Current utilization: 3% (extremely healthy)

⚠️ The test error will NOT happen in production - Test error was specific to DROP DATABASE operation - Production never drops the database - Different connection patterns (pooled vs rapid create/destroy)

📊 Recommended monitoring frequency: - Normal operations: Check weekly - After deployments: Check daily for 3 days - High-traffic events: Check hourly during event - If seeing performance issues: Check immediately

🔧 No immediate action required - Your configuration is production-ready!