Load Testing Plan for PiqueTickets Django API¶

Document Version: 1.0 Created: 2025-10-18 Status: Draft

Table of Contents¶

Executive Summary
Current Architecture Analysis
Critical API Endpoints
Load Testing Strategy
Implementation Plan
Infrastructure Considerations
Testing Tools & Commands
Monitoring & Metrics
Success Criteria
Risk Assessment
Deliverables
Next Steps

Executive Summary¶

Current Architecture¶

Tech Stack: - Django 5.1.13 - Django REST Framework 3.15.2 - PostgreSQL (with connection pooling) - Redis (caching and Celery broker) - Celery 5.4.0 (async task processing) - Gunicorn 23.0.0 (WSGI server) - Docker Compose deployment

Server Configuration: - Gunicorn: 1 worker, 2 threads (60s timeout) - Celery: 4 workers, 100 max tasks per child - Deployment: Railway (Docker containers)

Key Features: - Event ticketing system - Stripe payment integration - PDF ticket generation - Email tracking - Producer portal - Check-in system

Recommended Load Testing Tool¶

Primary Choice: Locust

Why Locust? - Python-based scripting (natural fit for Django developers) - Distributed load testing capabilities - Real-time web UI for monitoring - Easy integration with existing test patterns - Scalable to millions of users

Alternative Tools: - k6: Lightweight, CI/CD friendly, JavaScript-based - Artillery: Modern, supports WebSockets/GraphQL, YAML configuration

Current Architecture Analysis¶

Django Configuration¶

File: apps/api/brktickets/settings.py

Key Settings:

# Database
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        "OPTIONS": {
            "connect_timeout": 10,
            "options": "-c statement_timeout=30000",  # 30s query timeout
        },
        "CONN_MAX_AGE": 60,  # 60s connection pooling
        "CONN_HEALTH_CHECKS": True,
    }
}

# Cache (Redis)
CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "OPTIONS": {
            "socket_timeout": 5,
            "retry_on_timeout": True,
        },
        "TIMEOUT": 300,
    }
}

# Celery
CELERY_BROKER_URL = "redis://localhost:6379/0"
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"

Gunicorn Configuration¶

File: apps/api/gunicorn_config.py

Current Settings:

workers = 1                    # ⚠️ CRITICAL BOTTLENECK
worker_class = 'gthread'
threads = 2                    # Max 2 concurrent requests
worker_connections = 100
timeout = 60
max_requests = 500
max_requests_jitter = 50

⚠️ Critical Concern: - Only 1 worker with 2 threads = maximum 2 concurrent requests - This configuration is optimized for memory on Railway - Will be a significant bottleneck under load - Recommended: At least 2-4 workers with 4 threads each (8-16 concurrent)

API Architecture¶

Main URL Patterns:

/                        → Tickets app
/api/v1/                 → REST API endpoints
/health/                 → Health check
/auth/                   → Authentication
/email-tracking/         → Email tracking
/admin/                  → Django admin

REST API Endpoints (/api/v1/):

/tickets/                → TicketViewSet
/shows/                  → ShowViewSet
/venues/                 → VenueViewSet
/performers/             → PerformerViewSet
/featured-shows/         → FeaturedShowViewSet
/producers/              → ProducerViewSet
/search/                 → SearchView
/active-cities/          → ActiveCitiesView
/subscriber/create/      → Subscriber creation
/checkin/                → Check-in endpoints
/portal/                 → Producer portal

Tickets App Endpoints:

/checkout/               → CheckoutSessionView
/success/                → SuccessSessionView
/cancel/                 → CancelSessionView
/validate-promo/         → ValidatePromoCodeView

Critical API Endpoints¶

High Priority (User-Facing, High Traffic)¶

Endpoint	Method	Description	Expected Load
`/api/v1/shows/`	GET	Show listings	Very High
`/api/v1/shows/{id}/`	GET	Show details	High
`/api/v1/featured-shows/`	GET	Featured shows	High
`/checkout/`	POST	Create checkout session	Medium
`/api/v1/active-cities/`	GET	Active cities lookup	Medium
`/health/`	GET	Health check	Continuous

Medium Priority (Producer Portal)¶

Endpoint	Method	Description	Load
`/api/v1/producers/`	GET	Producer listings	Medium
`/api/v1/producers/stripe-connect-link/`	POST	Stripe Connect	Low
`/api/v1/producers/stripe-balance/`	GET	Financial data	Low
`/api/v1/producers/shows/{id}/checkin-link/`	POST	Check-in links	Low

High Impact (Payment Processing)¶

Endpoint	Method	Description	Critical
`/success/`	POST	Payment success handler	Yes
`/validate-promo/`	POST	Promo code validation	Medium

Backend Services¶

Service	Description	Load Impact
Celery Tasks	PDF generation, email sending	High
Stripe API	Payment processing	Medium
PostgreSQL	Data storage	High
Redis	Caching, task queue	High

Load Testing Strategy¶

Phase 1: Baseline Performance (Week 1)¶

Goal: Establish performance baselines for read-heavy endpoints

Test Scenarios:

1.1 Show Listing Load Test¶

Users: 10 → 100 (ramp up over 5 minutes)
Duration: 30 minutes
Pattern: Continuous browsing of shows

User Actions: - GET /api/v1/shows/ - GET /api/v1/shows/?start_time__gt={date} - GET /api/v1/shows/?slug={slug} - GET /api/v1/featured-shows/

1.2 Read-Heavy Traffic Simulation¶

Duration: 1 hour
Concurrent Users: 50-100

Traffic Distribution: - 70% show browsing (/api/v1/shows/) - 20% show details (/api/v1/shows/{id}/) - 10% featured shows (/api/v1/featured-shows/)

Target Metrics: - Throughput: 100-500 RPS - Average response time: < 200ms - 95th percentile: < 500ms - 99th percentile: < 1000ms - Error rate: < 0.1%

Phase 2: Checkout Flow Testing (Week 2)¶

Goal: Test payment processing and write operations under load

Test Scenarios:

2.1 Checkout Session Creation¶

Users: 10 → 50 concurrent
Duration: 30 minutes
Think time: 5-15 seconds between actions

User Flow: 1. Browse shows → GET /api/v1/shows/ 2. View show details → GET /api/v1/shows/{id}/ 3. Validate promo code (optional) → POST /validate-promo/ 4. Create checkout session → POST /checkout/ 5. Complete payment → POST /success/

2.2 Complete Purchase Flow¶

Users: 20-30 concurrent
Duration: 1 hour
Realistic think time: 5-30 seconds

Target Metrics: - Checkout completion rate: > 95% - Checkout session creation: < 3s - Payment success processing: < 2s - No timeouts during Stripe API calls - Database connection pool: < 80% utilization - Redis cache hit rate: > 80%

Phase 3: Stress Testing (Week 3)¶

Goal: Find breaking points and system limits

Test Scenarios:

3.1 Spike Test¶

Pattern: 0 → 500 users in 1 minute
Duration: 10 minutes
Scenario: Viral event announcement

Objectives: - Test auto-scaling behavior - Identify failure modes - Monitor error recovery

3.2 Soak Test¶

Users: 100 concurrent
Duration: 2-4 hours
Pattern: Sustained load

Monitor For: - Memory leaks - Connection pool exhaustion - Gradual performance degradation - Celery queue buildup

3.3 Capacity Test¶

Pattern: Gradual increase until failure
Start: 10 users
Increment: +10 users every 2 minutes
Stop: When error rate > 5%

Key Metrics: - Maximum concurrent users before degradation - Memory usage over time - Database connection exhaustion point - Redis connection limits - Breaking point identification

Phase 4: Production-Like Testing (Week 4)¶

Goal: Simulate realistic traffic patterns

Test Scenarios:

4.1 Mixed Workload¶

Duration: 2 hours
Total Users: 100-200 concurrent

Traffic Distribution: - 60% read operations (shows, venues, performers) - 25% authenticated operations (producer portal) - 10% write operations (checkout, orders) - 5% heavy operations (PDF generation, email sending)

4.2 Geographic Distribution¶

Simulate users from multiple regions
Test CDN and static file delivery
Include realistic network latency

Success Metrics: - Maintain < 200ms response time under realistic load - Celery queue processing time < 1s - Email delivery success rate > 99% - PDF generation time < 2s - No database deadlocks - Redis eviction rate < 1%

Implementation Plan¶

Setup Phase (Days 1-3)¶

Install Locust¶

cd apps/api
pip install locust
# Add to requirements.txt
echo "locust==2.32.3" >> requirements.txt

Create Load Test Directory Structure¶

apps/api/loadtests/
├── __init__.py
├── locustfile.py          # Main entry point
├── README.md              # Load test documentation
├── tasks/
│   ├── __init__.py
│   ├── browsing.py        # Show browsing tasks
│   ├── checkout.py        # Checkout flow tasks
│   ├── producer.py        # Producer portal tasks
│   └── search.py          # Search and filter tasks
├── config/
│   ├── __init__.py
│   ├── baseline.py        # Baseline test configuration
│   ├── stress.py          # Stress test configuration
│   └── production.py      # Production simulation config
├── utils/
│   ├── __init__.py
│   ├── helpers.py         # Test data generators
│   └── fixtures.py        # Test fixtures
└── reports/               # Test reports directory
    └── .gitkeep

Sample Locustfile¶

File: apps/api/loadtests/locustfile.py

"""
Main Locust load testing file for PiqueTickets API
"""
import random
from locust import HttpUser, task, between, constant_pacing
from locust import events


class ShowBrowsingUser(HttpUser):
    """Simulates users browsing shows"""
    wait_time = between(1, 5)  # Think time between requests
    weight = 7  # 70% of traffic

    @task(3)  # 3x more likely than other tasks
    def browse_shows(self):
        """Browse all shows"""
        with self.client.get(
            "/api/v1/shows/",
            name="/api/v1/shows/ [list]"
        ) as response:
            if response.status_code == 200:
                self.shows = response.json()

    @task(2)
    def view_show_details(self):
        """View specific show details"""
        if hasattr(self, 'shows') and self.shows:
            show = random.choice(self.shows)
            self.client.get(
                f"/api/v1/shows/{show['id']}/",
                name="/api/v1/shows/[id]/"
            )

    @task(1)
    def view_featured_shows(self):
        """View featured shows"""
        self.client.get("/api/v1/featured-shows/")

    @task(1)
    def filter_shows_by_date(self):
        """Filter shows by date"""
        self.client.get(
            "/api/v1/shows/?start_time__gt=2025-01-01T00:00:00Z",
            name="/api/v1/shows/ [filtered]"
        )

    @task(1)
    def view_active_cities(self):
        """View active cities"""
        self.client.get("/api/v1/active-cities/")


class CheckoutUser(HttpUser):
    """Simulates users completing checkout"""
    wait_time = between(5, 15)
    weight = 2  # 20% of traffic

    def on_start(self):
        """Setup: Get list of shows"""
        response = self.client.get("/api/v1/shows/")
        if response.status_code == 200:
            self.shows = response.json()

    @task
    def complete_checkout_flow(self):
        """Complete full checkout flow"""
        if not hasattr(self, 'shows') or not self.shows:
            return

        # 1. Browse shows
        self.client.get("/api/v1/shows/")

        # 2. View show details
        show = random.choice(self.shows)
        self.client.get(f"/api/v1/shows/{show['id']}/")

        # 3. Validate promo code (30% of users)
        if random.random() < 0.3:
            self.client.post(
                "/validate-promo/",
                json={"code": "TESTPROMO"}
            )

        # 4. Create checkout session
        # Note: This will fail without valid data, adjust as needed
        self.client.post(
            "/checkout/",
            json={
                "show_id": show['id'],
                "tickets": [{"ticket_id": 1, "quantity": 2}],
                "order_email": "test@example.com",
                "order_name": "Test User"
            },
            name="/checkout/ [session]"
        )


class ProducerUser(HttpUser):
    """Simulates producer portal users"""
    wait_time = between(3, 10)
    weight = 1  # 10% of traffic

    def on_start(self):
        """Login as producer"""
        # Implement authentication here
        pass

    @task(3)
    def view_producers(self):
        """View producer list"""
        self.client.get("/api/v1/producers/")

    @task(1)
    def check_stripe_status(self):
        """Check Stripe onboarding status"""
        # Requires authentication
        pass


@events.test_start.add_listener
def on_test_start(environment, **kwargs):
    """Called when test starts"""
    print("Load test starting...")


@events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
    """Called when test stops"""
    print("Load test completed!")

Week 1: Baseline Tests¶

Tasks: - [ ] Set up Locust environment - [ ] Create baseline test scenarios - [ ] Configure staging environment - [ ] Run initial baseline test (10-100 users, 30 min) - [ ] Document baseline metrics - [ ] Create performance dashboard

Commands:

# Web UI mode (recommended for Week 1)
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080

# Access at: http://localhost:8089

# Headless mode with report
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=30m \
  --headless \
  --html=apps/api/loadtests/reports/baseline_report.html \
  --csv=apps/api/loadtests/reports/baseline

Expected Output:

Baseline Performance Report
---------------------------
Total Requests: ~180,000
Average RPS: ~100
Median Response Time: <200ms
95th Percentile: <500ms
Error Rate: <0.1%

Week 2: Checkout Flow¶

Tasks: - [ ] Implement checkout user simulation - [ ] Configure Stripe test mode - [ ] Test promo code validation - [ ] Monitor database query performance - [ ] Analyze Celery task queue - [ ] Document checkout performance

Key Monitoring:

# Monitor PostgreSQL connections
docker compose exec db psql -U user -d piquetickets -c \
  "SELECT count(*) FROM pg_stat_activity;"

# Monitor Redis memory
docker compose exec redis redis-cli INFO memory

# Monitor Celery via Flower
# Open: http://localhost:5555

# Django query debugging
# In Django shell during test:
from django.db import connection
print(len(connection.queries))

Week 3: Stress Testing¶

Tasks: - [ ] Run spike tests (0→500 users in 1 min) - [ ] Execute soak tests (100 users for 2-4 hours) - [ ] Identify bottlenecks - [ ] Document breaking points - [ ] Create optimization recommendations

Spike Test Command:

locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=500 \
  --spawn-rate=500 \
  --run-time=10m \
  --headless \
  --html=apps/api/loadtests/reports/spike_test_report.html

Soak Test Command:

locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=4h \
  --headless \
  --html=apps/api/loadtests/reports/soak_test_report.html

Week 4: Production Simulation¶

Tasks: - [ ] Run mixed workload tests - [ ] Test with production-like data volumes - [ ] Validate auto-scaling (if applicable) - [ ] Generate final performance report - [ ] Create optimization implementation plan

Mixed Workload Command:

# Use all user classes with proper weights
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=200 \
  --spawn-rate=20 \
  --run-time=2h \
  --headless \
  --html=apps/api/loadtests/reports/production_simulation.html

Infrastructure Considerations¶

Gunicorn Configuration Issues¶

Current Configuration (apps/api/gunicorn_config.py):

workers = 1
worker_class = 'gthread'
threads = 2

Problem: - Maximum 2 concurrent requests - Will saturate at ~20-30 RPS - Optimized for Railway memory constraints

Recommendations:

Option 1: Increase Workers (Recommended)¶

workers = 2                    # Increase to 2
worker_class = 'gthread'
threads = 4                    # Increase to 4
# Total concurrent: 2 * 4 = 8 requests

Option 2: Use Formula-Based Workers¶

import multiprocessing

workers = min(multiprocessing.cpu_count() * 2 + 1, 4)  # Max 4 workers
worker_class = 'gthread'
threads = 4
# Total concurrent: 4 * 4 = 16 requests

Option 3: Use Gevent (Async)¶

workers = 2
worker_class = 'gevent'
worker_connections = 1000      # Many concurrent connections
# Install: pip install gevent

Memory Impact: - Current: ~200-250MB total - With 2 workers + 4 threads: ~300-400MB total - Monitor Railway memory limits

Database Connection Pooling¶

Current Configuration:

DATABASES = {
    "default": {
        "CONN_MAX_AGE": 60,
        "CONN_HEALTH_CHECKS": True,
    }
}

Recommendations:

Monitor Connection Pool Usage:

SELECT count(*) FROM pg_stat_activity
WHERE datname = 'piquetickets';

Consider PgBouncer for connection pooling:

# docker-compose.yml
pgbouncer:
  image: pgbouncer/pgbouncer
  environment:
    - DATABASES=piquetickets=postgres://user:password@db:5432/piquetickets
    - POOL_MODE=transaction
    - MAX_CLIENT_CONN=100
    - DEFAULT_POOL_SIZE=20

Adjust Django Pool Size:

# For 4 gunicorn workers
CONN_MAX_AGE = 300  # 5 minutes

Redis Optimization¶

Current Configuration:

CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "TIMEOUT": 300,
    }
}

Recommendations:

Monitor Memory Usage:

redis-cli INFO memory
redis-cli INFO stats

Configure Eviction Policy:

# In redis.conf or via command
redis-cli CONFIG SET maxmemory 512mb
redis-cli CONFIG SET maxmemory-policy allkeys-lru

Monitor Cache Hit Rate:

redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'

Celery Worker Optimization¶

Current Configuration:

celery -A brktickets worker \
  -l INFO \
  -E \
  --max-tasks-per-child=100 \
  --concurrency=4 \
  --prefetch-multiplier=4

Recommendations:

Monitor Queue Lengths:
Use Flower: http://localhost:5555
Check task success/failure rates

Adjust Concurrency for Load:

--concurrency=8  # Increase during load tests

Separate Queues:

# settings.py
CELERY_TASK_ROUTES = {
    'tickets.tasks.generate_pdf': {'queue': 'pdf'},
    'tickets.tasks.send_email': {'queue': 'email'},
}

Testing Tools & Commands¶

Locust Usage¶

Web UI Mode (Recommended for Development)¶

# Start Locust web UI
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080

# Access at: http://localhost:8089
# Configure users and spawn rate in UI

Headless Mode (CI/CD)¶

# Basic headless test
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=10m \
  --headless

# With HTML report
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=10m \
  --headless \
  --html=report.html

# With CSV output
locust -f apps/api/loadtests/locustfile.py \
  --host=http://localhost:8080 \
  --users=100 \
  --spawn-rate=10 \
  --run-time=10m \
  --headless \
  --csv=results/test

Distributed Mode¶

# Master node
locust -f apps/api/loadtests/locustfile.py \
  --master \
  --expect-workers=4

# Worker nodes (run on different machines)
locust -f apps/api/loadtests/locustfile.py \
  --worker \
  --master-host=<master-ip>

Alternative Tool: k6¶

If you prefer k6 (JavaScript-based):

# Install k6
brew install k6  # macOS
# or
sudo apt-get install k6  # Ubuntu

Sample k6 Test:

// k6-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp up
    { duration: '5m', target: 100 },   // Stay at 100 users
    { duration: '2m', target: 200 },   // Spike
    { duration: '3m', target: 200 },   // Stay at 200
    { duration: '2m', target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95% of requests < 500ms
    http_req_failed: ['rate<0.01'],    // Error rate < 1%
  },
};

export default function () {
  // Show listing
  let res = http.get('http://localhost:8080/api/v1/shows/');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });

  sleep(1);

  // Featured shows
  res = http.get('http://localhost:8080/api/v1/featured-shows/');
  check(res, { 'status is 200': (r) => r.status === 200 });

  sleep(2);
}

Run k6 Test:

k6 run k6-test.js

# With output
k6 run --out json=results.json k6-test.js

# With InfluxDB (for Grafana visualization)
k6 run --out influxdb=http://localhost:8086/k6 k6-test.js

Alternative Tool: Artillery¶

Installation:

npm install -g artillery

Sample Artillery Config:

# artillery-config.yml
config:
  target: "http://localhost:8080"
  phases:
    - duration: 60
      arrivalRate: 10
      name: "Warm up"
    - duration: 120
      arrivalRate: 50
      name: "Sustained load"
  processor: "./processor.js"

scenarios:
  - name: "Show browsing"
    weight: 70
    flow:
      - get:
          url: "/api/v1/shows/"
      - think: 3
      - get:
          url: "/api/v1/featured-shows/"

  - name: "Checkout flow"
    weight: 30
    flow:
      - get:
          url: "/api/v1/shows/"
      - think: 5
      - post:
          url: "/checkout/"
          json:
            show_id: 1
            tickets: [{ ticket_id: 1, quantity: 2 }]

Run Artillery:

artillery run artillery-config.yml

# With report
artillery run --output report.json artillery-config.yml
artillery report report.json

Monitoring & Metrics¶

Application-Level Monitoring¶

# settings.py
if DEBUG:
    INSTALLED_APPS += ['debug_toolbar']
    MIDDLEWARE += ['debug_toolbar.middleware.DebugToolbarMiddleware']
    INTERNAL_IPS = ['127.0.0.1']

Django Silk (Performance Profiling)¶

pip install django-silk

# settings.py
INSTALLED_APPS += ['silk']
MIDDLEWARE += ['silk.middleware.SilkyMiddleware']

# urls.py
urlpatterns += [path('silk/', include('silk.urls', namespace='silk'))]

Access at: http://localhost:8080/silk/

Database Monitoring¶

PostgreSQL Query Performance¶

-- Slow queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;

-- Connection count
SELECT count(*) as connections FROM pg_stat_activity;

-- Active connections by state
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;

-- Database size
SELECT pg_size_pretty(pg_database_size('piquetickets'));

-- Table sizes
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

-- Index usage
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Redis Monitoring¶

# Memory usage
redis-cli INFO memory

# Stats
redis-cli INFO stats

# Connected clients
redis-cli CLIENT LIST

# Cache hit/miss rate
redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'

# Calculate hit rate
redis-cli INFO stats | awk '
  /keyspace_hits/ { hits = $2 }
  /keyspace_misses/ { misses = $2 }
  END {
    total = hits + misses
    if (total > 0) {
      hit_rate = (hits / total) * 100
      printf "Hit Rate: %.2f%%\n", hit_rate
    }
  }'

# Monitor in real-time
redis-cli --stat

# Monitor commands
redis-cli MONITOR

Celery Monitoring¶

Using Flower (Already Configured)¶

# Access Flower at: http://localhost:5555

# View tasks
curl http://localhost:5555/api/tasks

# View workers
curl http://localhost:5555/api/workers

Command Line Monitoring¶

# Worker status
celery -A brktickets inspect active

# Scheduled tasks
celery -A brktickets inspect scheduled

# Queue length
celery -A brktickets inspect stats

# Registered tasks
celery -A brktickets inspect registered

System Resource Monitoring¶

Docker Stats¶

# Real-time stats
docker stats

# Specific container
docker stats piquetickets-api-1

# All containers with formatting
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"

System Metrics¶

# CPU usage
top -b -n 1 | head -20

# Memory usage
free -h

# Disk usage
df -h

# Network connections
netstat -an | grep :8080 | wc -l

Monitoring During Load Tests¶

Create Monitoring Script:

#!/bin/bash
# monitor.sh

echo "=== Load Test Monitoring ==="
echo "Timestamp: $(date)"
echo ""

echo "=== Docker Stats ==="
docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"
echo ""

echo "=== PostgreSQL Connections ==="
docker compose exec -T db psql -U user -d piquetickets -c \
  "SELECT count(*) as connections, state FROM pg_stat_activity GROUP BY state;"
echo ""

echo "=== Redis Memory ==="
docker compose exec -T redis redis-cli INFO memory | grep used_memory_human
echo ""

echo "=== Celery Queue ==="
docker compose exec -T api celery -A brktickets inspect stats | grep -A 5 "total"
echo ""

Run During Tests:

chmod +x monitor.sh
watch -n 5 ./monitor.sh

Success Criteria¶

Performance Targets¶

Metric	Target	Acceptable	Threshold
Average Response Time (GET)	< 200ms	< 300ms	< 500ms
Average Response Time (POST)	< 500ms	< 750ms	< 1s
95th Percentile Response Time	< 500ms	< 750ms	< 1s
99th Percentile Response Time	< 1s	< 1.5s	< 2s
Error Rate	< 0.1%	< 0.5%	< 1%
Concurrent Users	100+	75+	50+
Throughput (RPS)	500+	300+	200+
Database Query Time	< 50ms	< 100ms	< 200ms
Cache Hit Rate	> 80%	> 70%	> 60%

Scalability Targets¶

Scenario	Target	Acceptable	Notes
Normal Load	100 concurrent users	75 users	Daily operations
Peak Load	500 concurrent users	300 users	Event announcements
Spike Handling	0→500 in 1 min	0→300 in 1 min	Viral scenarios
Sustained Load	100 users for 2h	75 users for 2h	No degradation
Checkout Flow	20 checkouts/min	15 checkouts/min	Payment processing

Infrastructure Targets¶

Component	Metric	Target	Threshold
Gunicorn Workers	CPU Usage	< 70%	< 90%
PostgreSQL	Connection Usage	< 60%	< 80%
Redis	Memory Usage	< 70%	< 90%
Celery Workers	Queue Length	< 10 tasks	< 50 tasks
Docker Container	Memory Usage	< 70%	< 90%

Risk Assessment¶

High Risks¶

Risk	Probability	Impact	Mitigation Strategy
Limited Gunicorn workers (1 worker, 2 threads)	High	Critical	Increase to 2-4 workers with 4 threads each
Database connection exhaustion	Medium	High	Implement PgBouncer, increase CONN_MAX_AGE
Stripe API rate limits	Medium	High	Implement request queuing, use webhooks
PDF generation blocking requests	Medium	Medium	Ensure all PDF generation via Celery tasks
Railway memory limits	Medium	Medium	Monitor usage, upgrade plan if needed
Redis memory exhaustion	Low	Medium	Configure maxmemory and eviction policy

Infrastructure Bottlenecks¶

1. Gunicorn Configuration¶

Issue: Single worker configuration limits concurrent requests to 2

Impact: - Saturates at ~20-30 RPS - High response times under load - Request queuing and timeouts

Solution:

# Recommended configuration
workers = 2
worker_class = 'gthread'
threads = 4
# Total concurrent: 8 requests

Memory Impact: +100-150MB

2. PostgreSQL Connection Pool¶

Issue: Limited connections with multiple workers

Impact: - Connection exhaustion errors - Slow query performance - Database bottleneck

Solution: - Implement PgBouncer - Increase CONN_MAX_AGE - Monitor connection usage

3. Railway Memory Constraints¶

Issue: Current configuration optimized for memory

Impact: - Limited worker processes - Reduced throughput capacity

Solution: - Monitor memory usage during tests - Consider upgrading Railway plan - Optimize application memory usage

4. Celery Worker Capacity¶

Issue: 4 workers may bottleneck with heavy PDF generation

Impact: - Queue buildup during peak - Delayed email/PDF delivery

Solution: - Increase workers to 8 during peak - Separate PDF and email queues - Monitor queue lengths

Technical Debt Risks¶

Risk	Impact	Mitigation
No application performance monitoring	High	Implement Django Silk or similar
No query optimization	Medium	Analyze slow queries, add indexes
No cache warming strategy	Medium	Implement cache preloading
No rate limiting on API	High	Implement django-ratelimit
No CDN for static files	Medium	Configure CloudFlare or similar

Deliverables¶

Week 1: Baseline Performance¶

✅ Locust installation and setup
✅ Basic load test scenarios
✅ Baseline performance report including:
Average response times by endpoint
Throughput metrics (RPS)
Error rates
Resource utilization (CPU, memory, connections)
✅ Initial bottleneck identification
✅ Recommendations for Week 2

Week 2: Checkout Flow¶

✅ Checkout flow test implementation
✅ Stripe integration performance data
✅ Database query performance analysis
✅ Celery task processing metrics
✅ Database optimization recommendations
✅ Cache effectiveness analysis

Week 3: Stress Testing¶

✅ Spike test results and analysis
✅ Soak test results (2-4 hours)
✅ Capacity test findings
✅ Breaking point documentation
✅ Infrastructure scaling recommendations
✅ Performance degradation analysis
✅ Recovery behavior documentation

Week 4: Production Simulation¶

✅ Mixed workload test results
✅ Production traffic pattern analysis
✅ Final comprehensive performance report
✅ Infrastructure upgrade recommendations
✅ Cost-benefit analysis for scaling
✅ Load testing playbook for future use
✅ Monitoring dashboard setup
✅ Runbook for production deployment

Final Deliverables¶

Load Testing Playbook - Complete guide for running future tests
Performance Baseline Document - Reference metrics for comparison
Optimization Recommendations - Prioritized list of improvements
Monitoring Setup - Dashboards and alerts configuration
Capacity Planning Report - Growth projections and scaling plan

Next Steps¶

Immediate Actions (This Week)¶

Install Locust

cd apps/api
pip install locust==2.32.3
echo "locust==2.32.3" >> requirements.txt

Create Load Test Directory

mkdir -p apps/api/loadtests/{tasks,config,utils,reports}
touch apps/api/loadtests/{__init__.py,locustfile.py,README.md}

Set Up Staging Environment
Create staging branch
Deploy to Railway staging environment
Configure test database with production-like data
Implement Basic Load Test
Create show browsing scenario
Test against local environment
Verify metrics collection

Short Term (This Month)¶

Week 1: Baseline Testing
Run baseline performance tests
Document current performance
Identify immediate bottlenecks
Week 2: Checkout Flow
Implement checkout flow tests
Test Stripe integration
Analyze database queries
Week 3: Stress Testing
Run spike, soak, and capacity tests
Find breaking points
Create optimization plan
Week 4: Production Simulation
Run realistic traffic patterns
Generate final report
Implement critical fixes

Long Term (Ongoing)¶

Integrate with CI/CD
Run load tests on every deployment
Set performance budgets
Automated regression testing
Quarterly Capacity Tests
Reassess capacity every 3 months
Update baseline metrics
Plan for growth
Production Monitoring
Set up APM (Application Performance Monitoring)
Configure alerts for performance degradation
Track key metrics over time
Continuous Optimization
Review slow queries monthly
Optimize caching strategy
Database index optimization

Questions for Stakeholders¶

Before beginning load testing, please provide answers to the following:

Traffic Expectations¶

Current Traffic
How many daily active users?
Peak concurrent users currently?
Average requests per second?
Expected Growth
Projected user growth over 6 months?
Expected peak traffic (events, announcements)?
Largest expected event size?
Business Goals
Target event capacity?
Expected checkout volume?
Geographic distribution of users?

Infrastructure Budget¶

Current Costs
Railway plan and limits?
Current monthly infrastructure cost?
Budget for Scaling
Budget for plan upgrades?
Acceptable cost per user?
Cost constraints for scaling?
Performance vs. Cost
Willing to pay more for better performance?
Priority: Cost optimization or performance?

Testing Environment¶

Staging Environment
Staging environment available?
Production-like data in staging?
Can we test against production (carefully)?
Test Timing
Best time for load testing?
Maintenance windows available?
User impact acceptable during tests?

Success Definition¶

Performance Requirements
Required response time targets?
Acceptable error rates?
Minimum throughput needed?
Business Requirements
Must handle X events simultaneously?
Must support Y concurrent checkouts?
Specific SLA requirements?
User Experience
Acceptable page load time?
Payment processing time limits?
Email delivery time expectations?

Appendix¶

A. Useful Resources¶

Load Testing Tools: - Locust Documentation - k6 Documentation - Artillery Documentation

Django Performance: - Django Performance Optimization - DRF Performance Tips - PostgreSQL Query Optimization

Monitoring: - Django Debug Toolbar - Django Silk - Flower - Celery Monitoring

B. Sample Test Data Generator¶

# apps/api/loadtests/utils/fixtures.py

from faker import Faker
import random
from datetime import datetime, timedelta

fake = Faker()

def generate_show_data():
    """Generate realistic show data for testing"""
    return {
        "title": fake.catch_phrase(),
        "description": fake.text(max_nb_chars=500),
        "start_time": (datetime.now() + timedelta(days=random.randint(1, 90))).isoformat(),
        "published": True,
    }

def generate_order_data(show_id):
    """Generate realistic order data"""
    return {
        "show_id": show_id,
        "order_email": fake.email(),
        "order_name": fake.name(),
        "tickets": [
            {
                "ticket_id": random.randint(1, 5),
                "quantity": random.randint(1, 4)
            }
        ]
    }

C. Performance Testing Checklist¶

Pre-Test: - [ ] Backup production database - [ ] Configure staging environment - [ ] Install monitoring tools - [ ] Set up alerting - [ ] Document current performance - [ ] Notify team of testing schedule

During Test: - [ ] Monitor resource usage - [ ] Watch error rates - [ ] Track response times - [ ] Check database connections - [ ] Monitor Redis memory - [ ] Observe Celery queues

Post-Test: - [ ] Analyze results - [ ] Generate reports - [ ] Document findings - [ ] Create optimization tickets - [ ] Update capacity plan - [ ] Share results with team

Document Metadata: - Version: 1.0 - Last Updated: 2025-10-18 - Author: Claude Code - Reviewers: TBD - Next Review: After Week 1 Testing

End of Document

Load Testing Plan for PiqueTickets Django API¶

Table of Contents¶

Executive Summary¶

Current Architecture¶

Recommended Load Testing Tool¶

Current Architecture Analysis¶

Django Configuration¶

Gunicorn Configuration¶

API Architecture¶

Critical API Endpoints¶

High Priority (User-Facing, High Traffic)¶

Medium Priority (Producer Portal)¶

High Impact (Payment Processing)¶

Backend Services¶

Load Testing Strategy¶

Phase 1: Baseline Performance (Week 1)¶

1.1 Show Listing Load Test¶

1.2 Read-Heavy Traffic Simulation¶

Phase 2: Checkout Flow Testing (Week 2)¶

2.1 Checkout Session Creation¶

2.2 Complete Purchase Flow¶

Phase 3: Stress Testing (Week 3)¶

3.1 Spike Test¶

3.2 Soak Test¶

3.3 Capacity Test¶

Phase 4: Production-Like Testing (Week 4)¶

4.1 Mixed Workload¶

4.2 Geographic Distribution¶

Implementation Plan¶

Setup Phase (Days 1-3)¶

Install Locust¶

Create Load Test Directory Structure¶

Sample Locustfile¶

Week 1: Baseline Tests¶

Week 2: Checkout Flow¶

Week 3: Stress Testing¶

Week 4: Production Simulation¶

Infrastructure Considerations¶

Gunicorn Configuration Issues¶

Option 1: Increase Workers (Recommended)¶

Option 2: Use Formula-Based Workers¶

Option 3: Use Gevent (Async)¶

Database Connection Pooling¶

Redis Optimization¶

Celery Worker Optimization¶

Testing Tools & Commands¶

Locust Usage¶

Web UI Mode (Recommended for Development)¶

Headless Mode (CI/CD)¶

Distributed Mode¶

Alternative Tool: k6¶

Alternative Tool: Artillery¶

Monitoring & Metrics¶

Application-Level Monitoring¶

Django Debug Toolbar (Development Only)¶

Django Silk (Performance Profiling)¶

Database Monitoring¶

PostgreSQL Query Performance¶

Redis Monitoring¶

Celery Monitoring¶

Using Flower (Already Configured)¶

Command Line Monitoring¶

System Resource Monitoring¶

Docker Stats¶

System Metrics¶

Monitoring During Load Tests¶

Success Criteria¶

Performance Targets¶

Scalability Targets¶

Infrastructure Targets¶

Risk Assessment¶

High Risks¶

Infrastructure Bottlenecks¶

1. Gunicorn Configuration¶

2. PostgreSQL Connection Pool¶

3. Railway Memory Constraints¶

4. Celery Worker Capacity¶

Technical Debt Risks¶

Deliverables¶

Week 1: Baseline Performance¶