Production Database Connection Monitoring Guide¶

TL;DR: Are we at risk?¶

No. The test database error (database is being accessed by other users) will NOT occur in production because: 1. You never drop the production database (error only happens on DROP DATABASE) 2. Production uses connection pooling with health checks 3. No multi-threaded operations in production code 4. Current usage: 3/100 connections (~3% utilization)

Current Production Health: ✅ EXCELLENT¶

Database Connections: 3/100 (3% utilization)
├── Active: 1
└── Idle: 2

Gunicorn Workers: 2 workers × 2 threads = 4 max connections
Celery Services: ~3 connections (worker + beat + flower)
Connection Management: Fully configured with safeguards

What Makes Production Different from Tests¶

Factor	Tests (Had Issues)	Production (Safe)
Threading	ThreadPoolExecutor, 10-20 threads	2 workers × 2 threads only
Connection Pattern	Rapid create/destroy	Pooled, reused
Database Operations	DROP DATABASE on teardown	Normal CRUD only
Connection Lifetime	CONN_MAX_AGE = 0	CONN_MAX_AGE = 60s
Health Checks	Disabled	Enabled ✅
Worker Restarts	N/A	Every 500 requests ✅

Built-in Safeguards Already Configured¶

1. Connection Pooling (settings.py:338-339)¶

"CONN_MAX_AGE": 60,              # Connections recycled every 60s
"CONN_HEALTH_CHECKS": True,      # Auto-cleanup broken connections

2. Database Timeouts (settings.py:334-335)¶

"connect_timeout": 10,           # Can't hang forever on connect
"statement_timeout": 30000,      # Queries timeout after 30s

3. Worker Recycling (gunicorn_config.py:22-24)¶

max_requests = 500               # Workers restart after 500 requests
max_requests_jitter = 50         # Prevents all workers restarting at once

4. Resource Limits¶

PostgreSQL max_connections: 100
Gunicorn max concurrent: 4 (2 workers × 2 threads)
Current utilization: 3% ✅
Headroom: 97 connections available

Monitoring Commands¶

Check Current Connection Status¶

# Quick overview
docker compose exec db psql -U user -d piquetickets -c "
  SELECT count(*) as total, state
  FROM pg_stat_activity
  WHERE datname = 'piquetickets'
  GROUP BY state;"

# Detailed view
docker compose exec db psql -U user -d piquetickets -c "
  SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    state_change,
    query_start,
    NOW() - query_start AS duration,
    LEFT(query, 50) as query_preview
  FROM pg_stat_activity
  WHERE datname = 'piquetickets'
  ORDER BY query_start DESC;"

Check for Connection Leaks¶

# Find idle connections older than 5 minutes
docker compose exec db psql -U user -d piquetickets -c "
  SELECT
    pid,
    usename,
    state,
    NOW() - state_change AS idle_duration,
    application_name
  FROM pg_stat_activity
  WHERE datname = 'piquetickets'
    AND state = 'idle'
    AND NOW() - state_change > interval '5 minutes';"

Check Connection Pool Utilization¶

# Calculate utilization percentage
docker compose exec db psql -U user -d postgres -c "
  SELECT
    (SELECT count(*) FROM pg_stat_activity WHERE datname = 'piquetickets') as current,
    (SELECT setting::int FROM pg_settings WHERE name = 'max_connections') as max_connections,
    ROUND(
      100.0 * (SELECT count(*) FROM pg_stat_activity WHERE datname = 'piquetickets')::numeric /
      (SELECT setting::int FROM pg_settings WHERE name = 'max_connections')::numeric,
      2
    ) as utilization_percent;"

When to Be Concerned¶

🟢 Normal (Current State)¶

≤ 20 connections (< 20% utilization)
All workers healthy
No idle connections > 10 minutes
Response times normal

🟡 Monitor Closely¶

20-50 connections (20-50% utilization)
Some idle connections > 10 minutes
Response times slightly elevated
Action: Check for connection leaks, review slow queries

🔴 Alert / Investigate¶

50 connections (> 50% utilization)
Many idle connections > 10 minutes
Connection pool near exhaustion
Response times degraded
Action:
Identify connection source: SELECT application_name, count(*) FROM pg_stat_activity GROUP BY application_name;
Check for runaway Celery tasks
Consider increasing max_connections temporarily
Review recent deployments

🚨 Critical¶

90 connections (> 90% utilization)
New connections failing
Service degradation
Action:
Terminate idle connections: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND state_change < NOW() - interval '10 minutes';
Restart Celery workers: docker compose restart celery_worker
Restart API: docker compose restart api
Scale up if needed

Connection Math (Expected Usage)¶

Normal Load¶

Gunicorn (API):        2-4 connections  (workers × threads)
Celery Worker:         1-2 connections  (per worker)
Celery Beat:           1 connection
Celery Flower:         1 connection
Django Admin:          0-2 connections  (when in use)
Automated Tasks:       1-3 connections  (periodic)
                      ─────────────────
Total Expected:        6-13 connections ✅
Current Actual:        3 connections ✅
Safety Margin:         87 connections available

High Load (e.g., ticket sale rush)¶

Gunicorn (API):        4 connections    (all threads active)
Celery Worker:         2-3 connections  (processing tasks)
Celery Beat:           1 connection
Celery Flower:         1 connection
Concurrent Checkouts:  5-10 connections (temporary spikes)
                      ─────────────────
Peak Expected:         13-19 connections ✅
Still well under limit of 100

Celery-Specific Considerations¶

Your Celery is configured with:

worker_prefetch_multiplier = 4  # Pre-fetch 4 tasks per worker

Celery Connection Best Practices¶

Task timeouts configured: Prevents hung tasks from holding connections ✅
Result backend = Redis: Celery results don't use DB connections ✅
Beat scheduler = Database: Uses 1 persistent connection ✅

Potential Celery Issues (To Watch)¶

Long-running tasks: Can hold DB connections longer than expected
Monitor: celery -A brktickets inspect active
Task failures: Failed tasks should release connections, but verify
Monitor: celery -A brktickets inspect stats

Logging Connection Issues¶

Add this to monitor connection pool in production:

# In settings.py, add this to LOGGING configuration
'loggers': {
    'django.db.backends': {
        'level': 'WARNING',  # Set to 'DEBUG' temporarily to see all queries
        'handlers': ['console'],
    },
}

If You Ever Need to Increase max_connections¶

# Connect to PostgreSQL
docker compose exec db psql -U user -d postgres

# Check current setting
SHOW max_connections;

# To increase (requires PostgreSQL restart):
ALTER SYSTEM SET max_connections = 200;

# Restart PostgreSQL
docker compose restart db

Note: Each connection uses ~10MB RAM. Current 100 limit = ~1GB RAM for connections.

Summary¶

✅ Your production setup is well-configured - Connection pooling with health checks enabled - Timeouts prevent runaway connections - Workers restart regularly to prevent leaks - Current utilization: 3% (extremely healthy)

⚠️ The test error will NOT happen in production - Test error was specific to DROP DATABASE operation - Production never drops the database - Different connection patterns (pooled vs rapid create/destroy)

📊 Recommended monitoring frequency: - Normal operations: Check weekly - After deployments: Check daily for 3 days - High-traffic events: Check hourly during event - If seeing performance issues: Check immediately

🔧 No immediate action required - Your configuration is production-ready!