Skip to content

Load Testing Plan for PiqueTickets Django API

Document Version: 1.0 Created: 2025-10-18 Status: Draft


Table of Contents

  1. Executive Summary
  2. Current Architecture Analysis
  3. Critical API Endpoints
  4. Load Testing Strategy
  5. Implementation Plan
  6. Infrastructure Considerations
  7. Testing Tools & Commands
  8. Monitoring & Metrics
  9. Success Criteria
  10. Risk Assessment
  11. Deliverables
  12. Next Steps

Executive Summary

Current Architecture

Tech Stack: - Django 5.1.13 - Django REST Framework 3.15.2 - PostgreSQL (with connection pooling) - Redis (caching and Celery broker) - Celery 5.4.0 (async task processing) - Gunicorn 23.0.0 (WSGI server) - Docker Compose deployment

Server Configuration: - Gunicorn: 1 worker, 2 threads (60s timeout) - Celery: 4 workers, 100 max tasks per child - Deployment: Railway (Docker containers)

Key Features: - Event ticketing system - Stripe payment integration - PDF ticket generation - Email tracking - Producer portal - Check-in system

Primary Choice: Locust

Why Locust? - Python-based scripting (natural fit for Django developers) - Distributed load testing capabilities - Real-time web UI for monitoring - Easy integration with existing test patterns - Scalable to millions of users

Alternative Tools: - k6: Lightweight, CI/CD friendly, JavaScript-based - Artillery: Modern, supports WebSockets/GraphQL, YAML configuration


Current Architecture Analysis

Django Configuration

File: apps/api/brktickets/settings.py

Key Settings:

# Database
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        "OPTIONS": {
            "connect_timeout": 10,
            "options": "-c statement_timeout=30000",  # 30s query timeout
        },
        "CONN_MAX_AGE": 60,  # 60s connection pooling
        "CONN_HEALTH_CHECKS": True,
    }
}

# Cache (Redis)
CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "OPTIONS": {
            "socket_timeout": 5,
            "retry_on_timeout": True,
        },
        "TIMEOUT": 300,
    }
}

# Celery
CELERY_BROKER_URL = "redis://localhost:6379/0"
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"

Gunicorn Configuration

File: apps/api/gunicorn_config.py

Current Settings:

workers = 1                    # ⚠️ CRITICAL BOTTLENECK
worker_class = 'gthread'
threads = 2                    # Max 2 concurrent requests
worker_connections = 100
timeout = 60
max_requests = 500
max_requests_jitter = 50

⚠️ Critical Concern: - Only 1 worker with 2 threads = maximum 2 concurrent requests - This configuration is optimized for memory on Railway - Will be a significant bottleneck under load - Recommended: At least 2-4 workers with 4 threads each (8-16 concurrent)

API Architecture

Main URL Patterns:

/                        → Tickets app
/api/v1/                 → REST API endpoints
/health/                 → Health check
/auth/                   → Authentication
/email-tracking/         → Email tracking
/admin/                  → Django admin

REST API Endpoints (/api/v1/):

/tickets/                → TicketViewSet
/shows/                  → ShowViewSet
/venues/                 → VenueViewSet
/performers/             → PerformerViewSet
/featured-shows/         → FeaturedShowViewSet
/producers/              → ProducerViewSet
/search/                 → SearchView
/active-cities/          → ActiveCitiesView
/subscriber/create/      → Subscriber creation
/checkin/                → Check-in endpoints
/portal/                 → Producer portal

Tickets App Endpoints:

/checkout/               → CheckoutSessionView
/success/                → SuccessSessionView
/cancel/                 → CancelSessionView
/validate-promo/         → ValidatePromoCodeView


Critical API Endpoints

High Priority (User-Facing, High Traffic)

Endpoint Method Description Expected Load
/api/v1/shows/ GET Show listings Very High
/api/v1/shows/{id}/ GET Show details High
/api/v1/featured-shows/ GET Featured shows High
/checkout/ POST Create checkout session Medium
/api/v1/active-cities/ GET Active cities lookup Medium
/health/ GET Health check Continuous

Medium Priority (Producer Portal)

Endpoint Method Description Load
/api/v1/producers/ GET Producer listings Medium
/api/v1/producers/stripe-connect-link/ POST Stripe Connect Low
/api/v1/producers/stripe-balance/ GET Financial data Low
/api/v1/producers/shows/{id}/checkin-link/ POST Check-in links Low

High Impact (Payment Processing)

Endpoint Method Description Critical
/success/ POST Payment success handler Yes
/validate-promo/ POST Promo code validation Medium

Backend Services

Service Description Load Impact
Celery Tasks PDF generation, email sending High
Stripe API Payment processing Medium
PostgreSQL Data storage High
Redis Caching, task queue High

Load Testing Strategy

Phase 1: Baseline Performance (Week 1)

Goal: Establish performance baselines for read-heavy endpoints

Test Scenarios:

1.1 Show Listing Load Test

Users: 10 → 100 (ramp up over 5 minutes)
Duration: 30 minutes
Pattern: Continuous browsing of shows

User Actions: - GET /api/v1/shows/ - GET /api/v1/shows/?start_time__gt={date} - GET /api/v1/shows/?slug={slug} - GET /api/v1/featured-shows/

1.2 Read-Heavy Traffic Simulation

Duration: 1 hour
Concurrent Users: 50-100

Traffic Distribution: - 70% show browsing (/api/v1/shows/) - 20% show details (/api/v1/shows/{id}/) - 10% featured shows (/api/v1/featured-shows/)

Target Metrics: - Throughput: 100-500 RPS - Average response time: < 200ms - 95th percentile: < 500ms - 99th percentile: < 1000ms - Error rate: < 0.1%

Phase 2: Checkout Flow Testing (Week 2)

Goal: Test payment processing and write operations under load

Test Scenarios:

2.1 Checkout Session Creation

Users: 10 → 50 concurrent
Duration: 30 minutes
Think time: 5-15 seconds between actions

User Flow: 1. Browse shows → GET /api/v1/shows/ 2. View show details → GET /api/v1/shows/{id}/ 3. Validate promo code (optional) → POST /validate-promo/ 4. Create checkout session → POST /checkout/ 5. Complete payment → POST /success/

2.2 Complete Purchase Flow

Users: 20-30 concurrent
Duration: 1 hour
Realistic think time: 5-30 seconds

Target Metrics: - Checkout completion rate: > 95% - Checkout session creation: < 3s - Payment success processing: < 2s - No timeouts during Stripe API calls - Database connection pool: < 80% utilization - Redis cache hit rate: > 80%

Phase 3: Stress Testing (Week 3)

Goal: Find breaking points and system limits

Test Scenarios:

3.1 Spike Test

Pattern: 0 → 500 users in 1 minute
Duration: 10 minutes
Scenario: Viral event announcement

Objectives: - Test auto-scaling behavior - Identify failure modes - Monitor error recovery

3.2 Soak Test

Users: 100 concurrent
Duration: 2-4 hours
Pattern: Sustained load

Monitor For: - Memory leaks - Connection pool exhaustion - Gradual performance degradation - Celery queue buildup

3.3 Capacity Test

Pattern: Gradual increase until failure
Start: 10 users
Increment: +10 users every 2 minutes
Stop: When error rate > 5%

Key Metrics: - Maximum concurrent users before degradation - Memory usage over time - Database connection exhaustion point - Redis connection limits - Breaking point identification

Phase 4: Production-Like Testing (Week 4)

Goal: Simulate realistic traffic patterns

Test Scenarios:

4.1 Mixed Workload

Duration: 2 hours
Total Users: 100-200 concurrent

Traffic Distribution: - 60% read operations (shows, venues, performers) - 25% authenticated operations (producer portal) - 10% write operations (checkout, orders) - 5% heavy operations (PDF generation, email sending)

4.2 Geographic Distribution

Simulate users from multiple regions
Test CDN and static file delivery
Include realistic network latency

Success Metrics: - Maintain < 200ms response time under realistic load - Celery queue processing time < 1s - Email delivery success rate > 99% - PDF generation time < 2s - No database deadlocks - Redis eviction rate < 1%


Implementation Plan

Setup Phase (Days 1-3)

Install Locust

cd apps/api
pip install locust
# Add to requirements.txt
echo "locust==2.32.3" >> requirements.txt

Create Load Test Directory Structure

apps/api/loadtests/
├── __init__.py
├── locustfile.py          # Main entry point
├── README.md              # Load test documentation
├── tasks/
│   ├── __init__.py
│   ├── browsing.py        # Show browsing tasks
│   ├── checkout.py        # Checkout flow tasks
│   ├── producer.py        # Producer portal tasks
│   └── search.py          # Search and filter tasks
├── config/
│   ├── __init__.py
│   ├── baseline.py        # Baseline test configuration
│   ├── stress.py          # Stress test configuration
│   └── production.py      # Production simulation config
├── utils/
│   ├── __init__.py
│   ├── helpers.py         # Test data generators
│   └── fixtures.py        # Test fixtures
└── reports/               # Test reports directory
    └── .gitkeep

Sample Locustfile

File: apps/api/loadtests/locustfile.py

"""
Main Locust load testing file for PiqueTickets API
"""
import random
from locust import HttpUser, task, between, constant_pacing
from locust import events


class ShowBrowsingUser(HttpUser):
    """Simulates users browsing shows"""
    wait_time = between(1, 5)  # Think time between requests
    weight = 7  # 70% of traffic

    @task(3)  # 3x more likely than other tasks
    def browse_shows(self):
        """Browse all shows"""
        with self.client.get(
            "/api/v1/shows/",
            name="/api/v1/shows/ [list]"
        ) as response:
            if response.status_code == 200:
                self.shows = response.json()

    @task(2)
    def view_show_details(self):
        """View specific show details"""
        if hasattr(self, 'shows') and self.shows:
            show = random.choice(self.shows)
            self.client.get(
                f"/api/v1/shows/{show['id']}/",
                name="/api/v1/shows/[id]/"
            )

    @task(1)
    def view_featured_shows(self):
        """View featured shows"""
        self.client.get("/api/v1/featured-shows/")

    @task(1)
    def filter_shows_by_date(self):
        """Filter shows by date"""
        self.client.get(
            "/api/v1/shows/?start_time__gt=2025-01-01T00:00:00Z",
            name="/api/v1/shows/ [filtered]"
        )

    @task(1)
    def view_active_cities(self):
        """View active cities"""
        self.client.get("/api/v1/active-cities/")


class CheckoutUser(HttpUser):
    """Simulates users completing checkout"""
    wait_time = between(5, 15)
    weight = 2  # 20% of traffic

    def on_start(self):
        """Setup: Get list of shows"""
        response = self.client.get("/api/v1/shows/")
        if response.status_code == 200:
            self.shows = response.json()

    @task
    def complete_checkout_flow(self):
        """Complete full checkout flow"""
        if not hasattr(self, 'shows') or not self.shows:
            return

        # 1. Browse shows
        self.client.get("/api/v1/shows/")

        # 2. View show details
        show = random.choice(self.shows)
        self.client.get(f"/api/v1/shows/{show['id']}/")

        # 3. Validate promo code (30% of users)
        if random.random() < 0.3:
            self.client.post(
                "/validate-promo/",
                json={"code": "TESTPROMO"}
            )

        # 4. Create checkout session
        # Note: This will fail without valid data, adjust as needed
        self.client.post(
            "/checkout/",
            json={
                "show_id": show['id'],
                "tickets": [{"ticket_id": 1, "quantity": 2}],
                "order_email": "test@example.com",
                "order_name": "Test User"
            },
            name="/checkout/ [session]"
        )


class ProducerUser(HttpUser):
    """Simulates producer portal users"""
    wait_time = between(3, 10)
    weight = 1  # 10% of traffic

    def on_start(self):
        """Login as producer"""
        # Implement authentication here
        pass

    @task(3)
    def view_producers(self):
        """View producer list"""
        self.client.get("/api/v1/producers/")

    @task(1)
    def check_stripe_status(self):
        """Check Stripe onboarding status"""
        # Requires authentication
        pass


@events.test_start.add_listener
def on_test_start(environment, **kwargs):
    """Called when test starts"""
    print("Load test starting...")


@events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
    """Called when test stops"""
    print("Load test completed!")

Week 1: Baseline Tests

Tasks: - [ ] Set up Locust environment - [ ] Create baseline test scenarios - [ ] Configure staging environment - [ ] Run initial baseline test (10-100 users, 30 min) - [ ] Document baseline metrics - [ ] Create performance dashboard

Commands:

# Web UI mode (recommended for Week 1)
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080

# Access at: http://localhost:8089

# Headless mode with report
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=30m \
  --headless \
  --html=apps/api/loadtests/reports/baseline_report.html \
  --csv=apps/api/loadtests/reports/baseline

Expected Output:

Baseline Performance Report
---------------------------
Total Requests: ~180,000
Average RPS: ~100
Median Response Time: <200ms
95th Percentile: <500ms
Error Rate: <0.1%

Week 2: Checkout Flow

Tasks: - [ ] Implement checkout user simulation - [ ] Configure Stripe test mode - [ ] Test promo code validation - [ ] Monitor database query performance - [ ] Analyze Celery task queue - [ ] Document checkout performance

Key Monitoring:

# Monitor PostgreSQL connections
docker compose exec db psql -U user -d piquetickets -c \
  "SELECT count(*) FROM pg_stat_activity;"

# Monitor Redis memory
docker compose exec redis redis-cli INFO memory

# Monitor Celery via Flower
# Open: http://localhost:5555

# Django query debugging
# In Django shell during test:
from django.db import connection
print(len(connection.queries))

Week 3: Stress Testing

Tasks: - [ ] Run spike tests (0→500 users in 1 min) - [ ] Execute soak tests (100 users for 2-4 hours) - [ ] Identify bottlenecks - [ ] Document breaking points - [ ] Create optimization recommendations

Spike Test Command:

locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=500 \
  --spawn-rate=500 \
  --run-time=10m \
  --headless \
  --html=apps/api/loadtests/reports/spike_test_report.html

Soak Test Command:

locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=4h \
  --headless \
  --html=apps/api/loadtests/reports/soak_test_report.html

Week 4: Production Simulation

Tasks: - [ ] Run mixed workload tests - [ ] Test with production-like data volumes - [ ] Validate auto-scaling (if applicable) - [ ] Generate final performance report - [ ] Create optimization implementation plan

Mixed Workload Command:

# Use all user classes with proper weights
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=200 \
  --spawn-rate=20 \
  --run-time=2h \
  --headless \
  --html=apps/api/loadtests/reports/production_simulation.html

Infrastructure Considerations

Gunicorn Configuration Issues

Current Configuration (apps/api/gunicorn_config.py):

workers = 1
worker_class = 'gthread'
threads = 2

Problem: - Maximum 2 concurrent requests - Will saturate at ~20-30 RPS - Optimized for Railway memory constraints

Recommendations:

workers = 2                    # Increase to 2
worker_class = 'gthread'
threads = 4                    # Increase to 4
# Total concurrent: 2 * 4 = 8 requests

Option 2: Use Formula-Based Workers

import multiprocessing

workers = min(multiprocessing.cpu_count() * 2 + 1, 4)  # Max 4 workers
worker_class = 'gthread'
threads = 4
# Total concurrent: 4 * 4 = 16 requests

Option 3: Use Gevent (Async)

workers = 2
worker_class = 'gevent'
worker_connections = 1000      # Many concurrent connections
# Install: pip install gevent

Memory Impact: - Current: ~200-250MB total - With 2 workers + 4 threads: ~300-400MB total - Monitor Railway memory limits

Database Connection Pooling

Current Configuration:

DATABASES = {
    "default": {
        "CONN_MAX_AGE": 60,
        "CONN_HEALTH_CHECKS": True,
    }
}

Recommendations:

  1. Monitor Connection Pool Usage:

    SELECT count(*) FROM pg_stat_activity
    WHERE datname = 'piquetickets';
    

  2. Consider PgBouncer for connection pooling:

    # docker-compose.yml
    pgbouncer:
      image: pgbouncer/pgbouncer
      environment:
        - DATABASES=piquetickets=postgres://user:password@db:5432/piquetickets
        - POOL_MODE=transaction
        - MAX_CLIENT_CONN=100
        - DEFAULT_POOL_SIZE=20
    

  3. Adjust Django Pool Size:

    # For 4 gunicorn workers
    CONN_MAX_AGE = 300  # 5 minutes
    

Redis Optimization

Current Configuration:

CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "TIMEOUT": 300,
    }
}

Recommendations:

  1. Monitor Memory Usage:

    redis-cli INFO memory
    redis-cli INFO stats
    

  2. Configure Eviction Policy:

    # In redis.conf or via command
    redis-cli CONFIG SET maxmemory 512mb
    redis-cli CONFIG SET maxmemory-policy allkeys-lru
    

  3. Monitor Cache Hit Rate:

    redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'
    

Celery Worker Optimization

Current Configuration:

celery -A brktickets worker \
  -l INFO \
  -E \
  --max-tasks-per-child=100 \
  --concurrency=4 \
  --prefetch-multiplier=4

Recommendations:

  1. Monitor Queue Lengths:
  2. Use Flower: http://localhost:5555
  3. Check task success/failure rates

  4. Adjust Concurrency for Load:

    --concurrency=8  # Increase during load tests
    

  5. Separate Queues:

    # settings.py
    CELERY_TASK_ROUTES = {
        'tickets.tasks.generate_pdf': {'queue': 'pdf'},
        'tickets.tasks.send_email': {'queue': 'email'},
    }
    


Testing Tools & Commands

Locust Usage

# Start Locust web UI
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080

# Access at: http://localhost:8089
# Configure users and spawn rate in UI

Headless Mode (CI/CD)

# Basic headless test
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=10m \
  --headless

# With HTML report
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=10m \
  --headless \
  --html=report.html

# With CSV output
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=10m \
  --headless \
  --csv=results/test

Distributed Mode

# Master node
locust -f apps/api/loadtests/locustfile.py \
  --master \
  --expect-workers=4

# Worker nodes (run on different machines)
locust -f apps/api/loadtests/locustfile.py \
  --worker \
  --master-host=<master-ip>

Alternative Tool: k6

If you prefer k6 (JavaScript-based):

# Install k6
brew install k6  # macOS
# or
sudo apt-get install k6  # Ubuntu

Sample k6 Test:

// k6-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp up
    { duration: '5m', target: 100 },   // Stay at 100 users
    { duration: '2m', target: 200 },   // Spike
    { duration: '3m', target: 200 },   // Stay at 200
    { duration: '2m', target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95% of requests < 500ms
    http_req_failed: ['rate<0.01'],    // Error rate < 1%
  },
};

export default function () {
  // Show listing
  let res = http.get('http://localhost:8080/api/v1/shows/');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });

  sleep(1);

  // Featured shows
  res = http.get('http://localhost:8080/api/v1/featured-shows/');
  check(res, { 'status is 200': (r) => r.status === 200 });

  sleep(2);
}

Run k6 Test:

k6 run k6-test.js

# With output
k6 run --out json=results.json k6-test.js

# With InfluxDB (for Grafana visualization)
k6 run --out influxdb=http://localhost:8086/k6 k6-test.js

Alternative Tool: Artillery

Installation:

npm install -g artillery

Sample Artillery Config:

# artillery-config.yml
config:
  target: "http://localhost:8080"
  phases:
    - duration: 60
      arrivalRate: 10
      name: "Warm up"
    - duration: 120
      arrivalRate: 50
      name: "Sustained load"
  processor: "./processor.js"

scenarios:
  - name: "Show browsing"
    weight: 70
    flow:
      - get:
          url: "/api/v1/shows/"
      - think: 3
      - get:
          url: "/api/v1/featured-shows/"

  - name: "Checkout flow"
    weight: 30
    flow:
      - get:
          url: "/api/v1/shows/"
      - think: 5
      - post:
          url: "/checkout/"
          json:
            show_id: 1
            tickets: [{ ticket_id: 1, quantity: 2 }]

Run Artillery:

artillery run artillery-config.yml

# With report
artillery run --output report.json artillery-config.yml
artillery report report.json

Monitoring & Metrics

Application-Level Monitoring

Django Debug Toolbar (Development Only)

# settings.py
if DEBUG:
    INSTALLED_APPS += ['debug_toolbar']
    MIDDLEWARE += ['debug_toolbar.middleware.DebugToolbarMiddleware']
    INTERNAL_IPS = ['127.0.0.1']

Django Silk (Performance Profiling)

pip install django-silk
# settings.py
INSTALLED_APPS += ['silk']
MIDDLEWARE += ['silk.middleware.SilkyMiddleware']

# urls.py
urlpatterns += [path('silk/', include('silk.urls', namespace='silk'))]

Access at: http://localhost:8080/silk/

Database Monitoring

PostgreSQL Query Performance

-- Slow queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;

-- Connection count
SELECT count(*) as connections FROM pg_stat_activity;

-- Active connections by state
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;

-- Database size
SELECT pg_size_pretty(pg_database_size('piquetickets'));

-- Table sizes
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

-- Index usage
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Redis Monitoring

# Memory usage
redis-cli INFO memory

# Stats
redis-cli INFO stats

# Connected clients
redis-cli CLIENT LIST

# Cache hit/miss rate
redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'

# Calculate hit rate
redis-cli INFO stats | awk '
  /keyspace_hits/ { hits = $2 }
  /keyspace_misses/ { misses = $2 }
  END {
    total = hits + misses
    if (total > 0) {
      hit_rate = (hits / total) * 100
      printf "Hit Rate: %.2f%%\n", hit_rate
    }
  }'

# Monitor in real-time
redis-cli --stat

# Monitor commands
redis-cli MONITOR

Celery Monitoring

Using Flower (Already Configured)

# Access Flower at: http://localhost:5555

# View tasks
curl http://localhost:5555/api/tasks

# View workers
curl http://localhost:5555/api/workers

Command Line Monitoring

# Worker status
celery -A brktickets inspect active

# Scheduled tasks
celery -A brktickets inspect scheduled

# Queue length
celery -A brktickets inspect stats

# Registered tasks
celery -A brktickets inspect registered

System Resource Monitoring

Docker Stats

# Real-time stats
docker stats

# Specific container
docker stats piquetickets-api-1

# All containers with formatting
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"

System Metrics

# CPU usage
top -b -n 1 | head -20

# Memory usage
free -h

# Disk usage
df -h

# Network connections
netstat -an | grep :8080 | wc -l

Monitoring During Load Tests

Create Monitoring Script:

#!/bin/bash
# monitor.sh

echo "=== Load Test Monitoring ==="
echo "Timestamp: $(date)"
echo ""

echo "=== Docker Stats ==="
docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"
echo ""

echo "=== PostgreSQL Connections ==="
docker compose exec -T db psql -U user -d piquetickets -c \
  "SELECT count(*) as connections, state FROM pg_stat_activity GROUP BY state;"
echo ""

echo "=== Redis Memory ==="
docker compose exec -T redis redis-cli INFO memory | grep used_memory_human
echo ""

echo "=== Celery Queue ==="
docker compose exec -T api celery -A brktickets inspect stats | grep -A 5 "total"
echo ""

Run During Tests:

chmod +x monitor.sh
watch -n 5 ./monitor.sh

Success Criteria

Performance Targets

Metric Target Acceptable Threshold
Average Response Time (GET) < 200ms < 300ms < 500ms
Average Response Time (POST) < 500ms < 750ms < 1s
95th Percentile Response Time < 500ms < 750ms < 1s
99th Percentile Response Time < 1s < 1.5s < 2s
Error Rate < 0.1% < 0.5% < 1%
Concurrent Users 100+ 75+ 50+
Throughput (RPS) 500+ 300+ 200+
Database Query Time < 50ms < 100ms < 200ms
Cache Hit Rate > 80% > 70% > 60%

Scalability Targets

Scenario Target Acceptable Notes
Normal Load 100 concurrent users 75 users Daily operations
Peak Load 500 concurrent users 300 users Event announcements
Spike Handling 0→500 in 1 min 0→300 in 1 min Viral scenarios
Sustained Load 100 users for 2h 75 users for 2h No degradation
Checkout Flow 20 checkouts/min 15 checkouts/min Payment processing

Infrastructure Targets

Component Metric Target Threshold
Gunicorn Workers CPU Usage < 70% < 90%
PostgreSQL Connection Usage < 60% < 80%
Redis Memory Usage < 70% < 90%
Celery Workers Queue Length < 10 tasks < 50 tasks
Docker Container Memory Usage < 70% < 90%

Risk Assessment

High Risks

Risk Probability Impact Mitigation Strategy
Limited Gunicorn workers (1 worker, 2 threads) High Critical Increase to 2-4 workers with 4 threads each
Database connection exhaustion Medium High Implement PgBouncer, increase CONN_MAX_AGE
Stripe API rate limits Medium High Implement request queuing, use webhooks
PDF generation blocking requests Medium Medium Ensure all PDF generation via Celery tasks
Railway memory limits Medium Medium Monitor usage, upgrade plan if needed
Redis memory exhaustion Low Medium Configure maxmemory and eviction policy

Infrastructure Bottlenecks

1. Gunicorn Configuration

Issue: Single worker configuration limits concurrent requests to 2

Impact: - Saturates at ~20-30 RPS - High response times under load - Request queuing and timeouts

Solution:

# Recommended configuration
workers = 2
worker_class = 'gthread'
threads = 4
# Total concurrent: 8 requests

Memory Impact: +100-150MB

2. PostgreSQL Connection Pool

Issue: Limited connections with multiple workers

Impact: - Connection exhaustion errors - Slow query performance - Database bottleneck

Solution: - Implement PgBouncer - Increase CONN_MAX_AGE - Monitor connection usage

3. Railway Memory Constraints

Issue: Current configuration optimized for memory

Impact: - Limited worker processes - Reduced throughput capacity

Solution: - Monitor memory usage during tests - Consider upgrading Railway plan - Optimize application memory usage

4. Celery Worker Capacity

Issue: 4 workers may bottleneck with heavy PDF generation

Impact: - Queue buildup during peak - Delayed email/PDF delivery

Solution: - Increase workers to 8 during peak - Separate PDF and email queues - Monitor queue lengths

Technical Debt Risks

Risk Impact Mitigation
No application performance monitoring High Implement Django Silk or similar
No query optimization Medium Analyze slow queries, add indexes
No cache warming strategy Medium Implement cache preloading
No rate limiting on API High Implement django-ratelimit
No CDN for static files Medium Configure CloudFlare or similar

Deliverables

Week 1: Baseline Performance

  • ✅ Locust installation and setup
  • ✅ Basic load test scenarios
  • ✅ Baseline performance report including:
  • Average response times by endpoint
  • Throughput metrics (RPS)
  • Error rates
  • Resource utilization (CPU, memory, connections)
  • ✅ Initial bottleneck identification
  • ✅ Recommendations for Week 2

Week 2: Checkout Flow

  • ✅ Checkout flow test implementation
  • ✅ Stripe integration performance data
  • ✅ Database query performance analysis
  • ✅ Celery task processing metrics
  • ✅ Database optimization recommendations
  • ✅ Cache effectiveness analysis

Week 3: Stress Testing

  • ✅ Spike test results and analysis
  • ✅ Soak test results (2-4 hours)
  • ✅ Capacity test findings
  • ✅ Breaking point documentation
  • ✅ Infrastructure scaling recommendations
  • ✅ Performance degradation analysis
  • ✅ Recovery behavior documentation

Week 4: Production Simulation

  • ✅ Mixed workload test results
  • ✅ Production traffic pattern analysis
  • ✅ Final comprehensive performance report
  • ✅ Infrastructure upgrade recommendations
  • ✅ Cost-benefit analysis for scaling
  • ✅ Load testing playbook for future use
  • ✅ Monitoring dashboard setup
  • ✅ Runbook for production deployment

Final Deliverables

  1. Load Testing Playbook - Complete guide for running future tests
  2. Performance Baseline Document - Reference metrics for comparison
  3. Optimization Recommendations - Prioritized list of improvements
  4. Monitoring Setup - Dashboards and alerts configuration
  5. Capacity Planning Report - Growth projections and scaling plan

Next Steps

Immediate Actions (This Week)

  1. Install Locust

    cd apps/api
    pip install locust==2.32.3
    echo "locust==2.32.3" >> requirements.txt
    

  2. Create Load Test Directory

    mkdir -p apps/api/loadtests/{tasks,config,utils,reports}
    touch apps/api/loadtests/{__init__.py,locustfile.py,README.md}
    

  3. Set Up Staging Environment

  4. Create staging branch
  5. Deploy to Railway staging environment
  6. Configure test database with production-like data

  7. Implement Basic Load Test

  8. Create show browsing scenario
  9. Test against local environment
  10. Verify metrics collection

Short Term (This Month)

  1. Week 1: Baseline Testing
  2. Run baseline performance tests
  3. Document current performance
  4. Identify immediate bottlenecks

  5. Week 2: Checkout Flow

  6. Implement checkout flow tests
  7. Test Stripe integration
  8. Analyze database queries

  9. Week 3: Stress Testing

  10. Run spike, soak, and capacity tests
  11. Find breaking points
  12. Create optimization plan

  13. Week 4: Production Simulation

  14. Run realistic traffic patterns
  15. Generate final report
  16. Implement critical fixes

Long Term (Ongoing)

  1. Integrate with CI/CD
  2. Run load tests on every deployment
  3. Set performance budgets
  4. Automated regression testing

  5. Quarterly Capacity Tests

  6. Reassess capacity every 3 months
  7. Update baseline metrics
  8. Plan for growth

  9. Production Monitoring

  10. Set up APM (Application Performance Monitoring)
  11. Configure alerts for performance degradation
  12. Track key metrics over time

  13. Continuous Optimization

  14. Review slow queries monthly
  15. Optimize caching strategy
  16. Database index optimization

Questions for Stakeholders

Before beginning load testing, please provide answers to the following:

Traffic Expectations

  1. Current Traffic
  2. How many daily active users?
  3. Peak concurrent users currently?
  4. Average requests per second?

  5. Expected Growth

  6. Projected user growth over 6 months?
  7. Expected peak traffic (events, announcements)?
  8. Largest expected event size?

  9. Business Goals

  10. Target event capacity?
  11. Expected checkout volume?
  12. Geographic distribution of users?

Infrastructure Budget

  1. Current Costs
  2. Railway plan and limits?
  3. Current monthly infrastructure cost?

  4. Budget for Scaling

  5. Budget for plan upgrades?
  6. Acceptable cost per user?
  7. Cost constraints for scaling?

  8. Performance vs. Cost

  9. Willing to pay more for better performance?
  10. Priority: Cost optimization or performance?

Testing Environment

  1. Staging Environment
  2. Staging environment available?
  3. Production-like data in staging?
  4. Can we test against production (carefully)?

  5. Test Timing

  6. Best time for load testing?
  7. Maintenance windows available?
  8. User impact acceptable during tests?

Success Definition

  1. Performance Requirements
  2. Required response time targets?
  3. Acceptable error rates?
  4. Minimum throughput needed?

  5. Business Requirements

  6. Must handle X events simultaneously?
  7. Must support Y concurrent checkouts?
  8. Specific SLA requirements?

  9. User Experience

  10. Acceptable page load time?
  11. Payment processing time limits?
  12. Email delivery time expectations?

Appendix

A. Useful Resources

Load Testing Tools: - Locust Documentation - k6 Documentation - Artillery Documentation

Django Performance: - Django Performance Optimization - DRF Performance Tips - PostgreSQL Query Optimization

Monitoring: - Django Debug Toolbar - Django Silk - Flower - Celery Monitoring

B. Sample Test Data Generator

# apps/api/loadtests/utils/fixtures.py

from faker import Faker
import random
from datetime import datetime, timedelta

fake = Faker()

def generate_show_data():
    """Generate realistic show data for testing"""
    return {
        "title": fake.catch_phrase(),
        "description": fake.text(max_nb_chars=500),
        "start_time": (datetime.now() + timedelta(days=random.randint(1, 90))).isoformat(),
        "published": True,
    }

def generate_order_data(show_id):
    """Generate realistic order data"""
    return {
        "show_id": show_id,
        "order_email": fake.email(),
        "order_name": fake.name(),
        "tickets": [
            {
                "ticket_id": random.randint(1, 5),
                "quantity": random.randint(1, 4)
            }
        ]
    }

C. Performance Testing Checklist

Pre-Test: - [ ] Backup production database - [ ] Configure staging environment - [ ] Install monitoring tools - [ ] Set up alerting - [ ] Document current performance - [ ] Notify team of testing schedule

During Test: - [ ] Monitor resource usage - [ ] Watch error rates - [ ] Track response times - [ ] Check database connections - [ ] Monitor Redis memory - [ ] Observe Celery queues

Post-Test: - [ ] Analyze results - [ ] Generate reports - [ ] Document findings - [ ] Create optimization tickets - [ ] Update capacity plan - [ ] Share results with team


Document Metadata: - Version: 1.0 - Last Updated: 2025-10-18 - Author: Claude Code - Reviewers: TBD - Next Review: After Week 1 Testing


End of Document