Load Testing Plan for PiqueTickets Django API¶
Document Version: 1.0 Created: 2025-10-18 Status: Draft
Table of Contents¶
- Executive Summary
- Current Architecture Analysis
- Critical API Endpoints
- Load Testing Strategy
- Implementation Plan
- Infrastructure Considerations
- Testing Tools & Commands
- Monitoring & Metrics
- Success Criteria
- Risk Assessment
- Deliverables
- Next Steps
Executive Summary¶
Current Architecture¶
Tech Stack: - Django 5.1.13 - Django REST Framework 3.15.2 - PostgreSQL (with connection pooling) - Redis (caching and Celery broker) - Celery 5.4.0 (async task processing) - Gunicorn 23.0.0 (WSGI server) - Docker Compose deployment
Server Configuration: - Gunicorn: 1 worker, 2 threads (60s timeout) - Celery: 4 workers, 100 max tasks per child - Deployment: Railway (Docker containers)
Key Features: - Event ticketing system - Stripe payment integration - PDF ticket generation - Email tracking - Producer portal - Check-in system
Recommended Load Testing Tool¶
Primary Choice: Locust
Why Locust? - Python-based scripting (natural fit for Django developers) - Distributed load testing capabilities - Real-time web UI for monitoring - Easy integration with existing test patterns - Scalable to millions of users
Alternative Tools: - k6: Lightweight, CI/CD friendly, JavaScript-based - Artillery: Modern, supports WebSockets/GraphQL, YAML configuration
Current Architecture Analysis¶
Django Configuration¶
File: apps/api/brktickets/settings.py
Key Settings:
# Database
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql_psycopg2",
"OPTIONS": {
"connect_timeout": 10,
"options": "-c statement_timeout=30000", # 30s query timeout
},
"CONN_MAX_AGE": 60, # 60s connection pooling
"CONN_HEALTH_CHECKS": True,
}
}
# Cache (Redis)
CACHES = {
"default": {
"BACKEND": "django.core.cache.backends.redis.RedisCache",
"OPTIONS": {
"socket_timeout": 5,
"retry_on_timeout": True,
},
"TIMEOUT": 300,
}
}
# Celery
CELERY_BROKER_URL = "redis://localhost:6379/0"
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
Gunicorn Configuration¶
File: apps/api/gunicorn_config.py
Current Settings:
workers = 1 # ⚠️ CRITICAL BOTTLENECK
worker_class = 'gthread'
threads = 2 # Max 2 concurrent requests
worker_connections = 100
timeout = 60
max_requests = 500
max_requests_jitter = 50
⚠️ Critical Concern: - Only 1 worker with 2 threads = maximum 2 concurrent requests - This configuration is optimized for memory on Railway - Will be a significant bottleneck under load - Recommended: At least 2-4 workers with 4 threads each (8-16 concurrent)
API Architecture¶
Main URL Patterns:
/ → Tickets app
/api/v1/ → REST API endpoints
/health/ → Health check
/auth/ → Authentication
/email-tracking/ → Email tracking
/admin/ → Django admin
REST API Endpoints (/api/v1/):
/tickets/ → TicketViewSet
/shows/ → ShowViewSet
/venues/ → VenueViewSet
/performers/ → PerformerViewSet
/featured-shows/ → FeaturedShowViewSet
/producers/ → ProducerViewSet
/search/ → SearchView
/active-cities/ → ActiveCitiesView
/subscriber/create/ → Subscriber creation
/checkin/ → Check-in endpoints
/portal/ → Producer portal
Tickets App Endpoints:
/checkout/ → CheckoutSessionView
/success/ → SuccessSessionView
/cancel/ → CancelSessionView
/validate-promo/ → ValidatePromoCodeView
Critical API Endpoints¶
High Priority (User-Facing, High Traffic)¶
| Endpoint | Method | Description | Expected Load |
|---|---|---|---|
/api/v1/shows/ |
GET | Show listings | Very High |
/api/v1/shows/{id}/ |
GET | Show details | High |
/api/v1/featured-shows/ |
GET | Featured shows | High |
/checkout/ |
POST | Create checkout session | Medium |
/api/v1/active-cities/ |
GET | Active cities lookup | Medium |
/health/ |
GET | Health check | Continuous |
Medium Priority (Producer Portal)¶
| Endpoint | Method | Description | Load |
|---|---|---|---|
/api/v1/producers/ |
GET | Producer listings | Medium |
/api/v1/producers/stripe-connect-link/ |
POST | Stripe Connect | Low |
/api/v1/producers/stripe-balance/ |
GET | Financial data | Low |
/api/v1/producers/shows/{id}/checkin-link/ |
POST | Check-in links | Low |
High Impact (Payment Processing)¶
| Endpoint | Method | Description | Critical |
|---|---|---|---|
/success/ |
POST | Payment success handler | Yes |
/validate-promo/ |
POST | Promo code validation | Medium |
Backend Services¶
| Service | Description | Load Impact |
|---|---|---|
| Celery Tasks | PDF generation, email sending | High |
| Stripe API | Payment processing | Medium |
| PostgreSQL | Data storage | High |
| Redis | Caching, task queue | High |
Load Testing Strategy¶
Phase 1: Baseline Performance (Week 1)¶
Goal: Establish performance baselines for read-heavy endpoints
Test Scenarios:
1.1 Show Listing Load Test¶
User Actions:
- GET /api/v1/shows/
- GET /api/v1/shows/?start_time__gt={date}
- GET /api/v1/shows/?slug={slug}
- GET /api/v1/featured-shows/
1.2 Read-Heavy Traffic Simulation¶
Traffic Distribution:
- 70% show browsing (/api/v1/shows/)
- 20% show details (/api/v1/shows/{id}/)
- 10% featured shows (/api/v1/featured-shows/)
Target Metrics: - Throughput: 100-500 RPS - Average response time: < 200ms - 95th percentile: < 500ms - 99th percentile: < 1000ms - Error rate: < 0.1%
Phase 2: Checkout Flow Testing (Week 2)¶
Goal: Test payment processing and write operations under load
Test Scenarios:
2.1 Checkout Session Creation¶
User Flow:
1. Browse shows → GET /api/v1/shows/
2. View show details → GET /api/v1/shows/{id}/
3. Validate promo code (optional) → POST /validate-promo/
4. Create checkout session → POST /checkout/
5. Complete payment → POST /success/
2.2 Complete Purchase Flow¶
Target Metrics: - Checkout completion rate: > 95% - Checkout session creation: < 3s - Payment success processing: < 2s - No timeouts during Stripe API calls - Database connection pool: < 80% utilization - Redis cache hit rate: > 80%
Phase 3: Stress Testing (Week 3)¶
Goal: Find breaking points and system limits
Test Scenarios:
3.1 Spike Test¶
Objectives: - Test auto-scaling behavior - Identify failure modes - Monitor error recovery
3.2 Soak Test¶
Monitor For: - Memory leaks - Connection pool exhaustion - Gradual performance degradation - Celery queue buildup
3.3 Capacity Test¶
Pattern: Gradual increase until failure
Start: 10 users
Increment: +10 users every 2 minutes
Stop: When error rate > 5%
Key Metrics: - Maximum concurrent users before degradation - Memory usage over time - Database connection exhaustion point - Redis connection limits - Breaking point identification
Phase 4: Production-Like Testing (Week 4)¶
Goal: Simulate realistic traffic patterns
Test Scenarios:
4.1 Mixed Workload¶
Traffic Distribution: - 60% read operations (shows, venues, performers) - 25% authenticated operations (producer portal) - 10% write operations (checkout, orders) - 5% heavy operations (PDF generation, email sending)
4.2 Geographic Distribution¶
Simulate users from multiple regions
Test CDN and static file delivery
Include realistic network latency
Success Metrics: - Maintain < 200ms response time under realistic load - Celery queue processing time < 1s - Email delivery success rate > 99% - PDF generation time < 2s - No database deadlocks - Redis eviction rate < 1%
Implementation Plan¶
Setup Phase (Days 1-3)¶
Install Locust¶
Create Load Test Directory Structure¶
apps/api/loadtests/
├── __init__.py
├── locustfile.py # Main entry point
├── README.md # Load test documentation
├── tasks/
│ ├── __init__.py
│ ├── browsing.py # Show browsing tasks
│ ├── checkout.py # Checkout flow tasks
│ ├── producer.py # Producer portal tasks
│ └── search.py # Search and filter tasks
├── config/
│ ├── __init__.py
│ ├── baseline.py # Baseline test configuration
│ ├── stress.py # Stress test configuration
│ └── production.py # Production simulation config
├── utils/
│ ├── __init__.py
│ ├── helpers.py # Test data generators
│ └── fixtures.py # Test fixtures
└── reports/ # Test reports directory
└── .gitkeep
Sample Locustfile¶
File: apps/api/loadtests/locustfile.py
"""
Main Locust load testing file for PiqueTickets API
"""
import random
from locust import HttpUser, task, between, constant_pacing
from locust import events
class ShowBrowsingUser(HttpUser):
"""Simulates users browsing shows"""
wait_time = between(1, 5) # Think time between requests
weight = 7 # 70% of traffic
@task(3) # 3x more likely than other tasks
def browse_shows(self):
"""Browse all shows"""
with self.client.get(
"/api/v1/shows/",
name="/api/v1/shows/ [list]"
) as response:
if response.status_code == 200:
self.shows = response.json()
@task(2)
def view_show_details(self):
"""View specific show details"""
if hasattr(self, 'shows') and self.shows:
show = random.choice(self.shows)
self.client.get(
f"/api/v1/shows/{show['id']}/",
name="/api/v1/shows/[id]/"
)
@task(1)
def view_featured_shows(self):
"""View featured shows"""
self.client.get("/api/v1/featured-shows/")
@task(1)
def filter_shows_by_date(self):
"""Filter shows by date"""
self.client.get(
"/api/v1/shows/?start_time__gt=2025-01-01T00:00:00Z",
name="/api/v1/shows/ [filtered]"
)
@task(1)
def view_active_cities(self):
"""View active cities"""
self.client.get("/api/v1/active-cities/")
class CheckoutUser(HttpUser):
"""Simulates users completing checkout"""
wait_time = between(5, 15)
weight = 2 # 20% of traffic
def on_start(self):
"""Setup: Get list of shows"""
response = self.client.get("/api/v1/shows/")
if response.status_code == 200:
self.shows = response.json()
@task
def complete_checkout_flow(self):
"""Complete full checkout flow"""
if not hasattr(self, 'shows') or not self.shows:
return
# 1. Browse shows
self.client.get("/api/v1/shows/")
# 2. View show details
show = random.choice(self.shows)
self.client.get(f"/api/v1/shows/{show['id']}/")
# 3. Validate promo code (30% of users)
if random.random() < 0.3:
self.client.post(
"/validate-promo/",
json={"code": "TESTPROMO"}
)
# 4. Create checkout session
# Note: This will fail without valid data, adjust as needed
self.client.post(
"/checkout/",
json={
"show_id": show['id'],
"tickets": [{"ticket_id": 1, "quantity": 2}],
"order_email": "test@example.com",
"order_name": "Test User"
},
name="/checkout/ [session]"
)
class ProducerUser(HttpUser):
"""Simulates producer portal users"""
wait_time = between(3, 10)
weight = 1 # 10% of traffic
def on_start(self):
"""Login as producer"""
# Implement authentication here
pass
@task(3)
def view_producers(self):
"""View producer list"""
self.client.get("/api/v1/producers/")
@task(1)
def check_stripe_status(self):
"""Check Stripe onboarding status"""
# Requires authentication
pass
@events.test_start.add_listener
def on_test_start(environment, **kwargs):
"""Called when test starts"""
print("Load test starting...")
@events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
"""Called when test stops"""
print("Load test completed!")
Week 1: Baseline Tests¶
Tasks: - [ ] Set up Locust environment - [ ] Create baseline test scenarios - [ ] Configure staging environment - [ ] Run initial baseline test (10-100 users, 30 min) - [ ] Document baseline metrics - [ ] Create performance dashboard
Commands:
# Web UI mode (recommended for Week 1)
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080
# Access at: http://localhost:8089
# Headless mode with report
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080 \
--users=100 \
--spawn-rate=10 \
--run-time=30m \
--headless \
--html=apps/api/loadtests/reports/baseline_report.html \
--csv=apps/api/loadtests/reports/baseline
Expected Output:
Baseline Performance Report
---------------------------
Total Requests: ~180,000
Average RPS: ~100
Median Response Time: <200ms
95th Percentile: <500ms
Error Rate: <0.1%
Week 2: Checkout Flow¶
Tasks: - [ ] Implement checkout user simulation - [ ] Configure Stripe test mode - [ ] Test promo code validation - [ ] Monitor database query performance - [ ] Analyze Celery task queue - [ ] Document checkout performance
Key Monitoring:
# Monitor PostgreSQL connections
docker compose exec db psql -U user -d piquetickets -c \
"SELECT count(*) FROM pg_stat_activity;"
# Monitor Redis memory
docker compose exec redis redis-cli INFO memory
# Monitor Celery via Flower
# Open: http://localhost:5555
# Django query debugging
# In Django shell during test:
from django.db import connection
print(len(connection.queries))
Week 3: Stress Testing¶
Tasks: - [ ] Run spike tests (0→500 users in 1 min) - [ ] Execute soak tests (100 users for 2-4 hours) - [ ] Identify bottlenecks - [ ] Document breaking points - [ ] Create optimization recommendations
Spike Test Command:
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080 \
--users=500 \
--spawn-rate=500 \
--run-time=10m \
--headless \
--html=apps/api/loadtests/reports/spike_test_report.html
Soak Test Command:
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080 \
--users=100 \
--spawn-rate=10 \
--run-time=4h \
--headless \
--html=apps/api/loadtests/reports/soak_test_report.html
Week 4: Production Simulation¶
Tasks: - [ ] Run mixed workload tests - [ ] Test with production-like data volumes - [ ] Validate auto-scaling (if applicable) - [ ] Generate final performance report - [ ] Create optimization implementation plan
Mixed Workload Command:
# Use all user classes with proper weights
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080 \
--users=200 \
--spawn-rate=20 \
--run-time=2h \
--headless \
--html=apps/api/loadtests/reports/production_simulation.html
Infrastructure Considerations¶
Gunicorn Configuration Issues¶
Current Configuration (apps/api/gunicorn_config.py):
Problem: - Maximum 2 concurrent requests - Will saturate at ~20-30 RPS - Optimized for Railway memory constraints
Recommendations:
Option 1: Increase Workers (Recommended)¶
workers = 2 # Increase to 2
worker_class = 'gthread'
threads = 4 # Increase to 4
# Total concurrent: 2 * 4 = 8 requests
Option 2: Use Formula-Based Workers¶
import multiprocessing
workers = min(multiprocessing.cpu_count() * 2 + 1, 4) # Max 4 workers
worker_class = 'gthread'
threads = 4
# Total concurrent: 4 * 4 = 16 requests
Option 3: Use Gevent (Async)¶
workers = 2
worker_class = 'gevent'
worker_connections = 1000 # Many concurrent connections
# Install: pip install gevent
Memory Impact: - Current: ~200-250MB total - With 2 workers + 4 threads: ~300-400MB total - Monitor Railway memory limits
Database Connection Pooling¶
Current Configuration:
Recommendations:
-
Monitor Connection Pool Usage:
-
Consider PgBouncer for connection pooling:
-
Adjust Django Pool Size:
Redis Optimization¶
Current Configuration:
CACHES = {
"default": {
"BACKEND": "django.core.cache.backends.redis.RedisCache",
"TIMEOUT": 300,
}
}
Recommendations:
-
Monitor Memory Usage:
-
Configure Eviction Policy:
-
Monitor Cache Hit Rate:
Celery Worker Optimization¶
Current Configuration:
celery -A brktickets worker \
-l INFO \
-E \
--max-tasks-per-child=100 \
--concurrency=4 \
--prefetch-multiplier=4
Recommendations:
- Monitor Queue Lengths:
- Use Flower: http://localhost:5555
-
Check task success/failure rates
-
Adjust Concurrency for Load:
-
Separate Queues:
Testing Tools & Commands¶
Locust Usage¶
Web UI Mode (Recommended for Development)¶
# Start Locust web UI
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080
# Access at: http://localhost:8089
# Configure users and spawn rate in UI
Headless Mode (CI/CD)¶
# Basic headless test
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080 \
--users=100 \
--spawn-rate=10 \
--run-time=10m \
--headless
# With HTML report
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080 \
--users=100 \
--spawn-rate=10 \
--run-time=10m \
--headless \
--html=report.html
# With CSV output
locust -f apps/api/loadtests/locustfile.py \
--host=http://localhost:8080 \
--users=100 \
--spawn-rate=10 \
--run-time=10m \
--headless \
--csv=results/test
Distributed Mode¶
# Master node
locust -f apps/api/loadtests/locustfile.py \
--master \
--expect-workers=4
# Worker nodes (run on different machines)
locust -f apps/api/loadtests/locustfile.py \
--worker \
--master-host=<master-ip>
Alternative Tool: k6¶
If you prefer k6 (JavaScript-based):
Sample k6 Test:
// k6-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp up
{ duration: '5m', target: 100 }, // Stay at 100 users
{ duration: '2m', target: 200 }, // Spike
{ duration: '3m', target: 200 }, // Stay at 200
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests < 500ms
http_req_failed: ['rate<0.01'], // Error rate < 1%
},
};
export default function () {
// Show listing
let res = http.get('http://localhost:8080/api/v1/shows/');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
});
sleep(1);
// Featured shows
res = http.get('http://localhost:8080/api/v1/featured-shows/');
check(res, { 'status is 200': (r) => r.status === 200 });
sleep(2);
}
Run k6 Test:
k6 run k6-test.js
# With output
k6 run --out json=results.json k6-test.js
# With InfluxDB (for Grafana visualization)
k6 run --out influxdb=http://localhost:8086/k6 k6-test.js
Alternative Tool: Artillery¶
Installation:
Sample Artillery Config:
# artillery-config.yml
config:
target: "http://localhost:8080"
phases:
- duration: 60
arrivalRate: 10
name: "Warm up"
- duration: 120
arrivalRate: 50
name: "Sustained load"
processor: "./processor.js"
scenarios:
- name: "Show browsing"
weight: 70
flow:
- get:
url: "/api/v1/shows/"
- think: 3
- get:
url: "/api/v1/featured-shows/"
- name: "Checkout flow"
weight: 30
flow:
- get:
url: "/api/v1/shows/"
- think: 5
- post:
url: "/checkout/"
json:
show_id: 1
tickets: [{ ticket_id: 1, quantity: 2 }]
Run Artillery:
artillery run artillery-config.yml
# With report
artillery run --output report.json artillery-config.yml
artillery report report.json
Monitoring & Metrics¶
Application-Level Monitoring¶
Django Debug Toolbar (Development Only)¶
# settings.py
if DEBUG:
INSTALLED_APPS += ['debug_toolbar']
MIDDLEWARE += ['debug_toolbar.middleware.DebugToolbarMiddleware']
INTERNAL_IPS = ['127.0.0.1']
Django Silk (Performance Profiling)¶
# settings.py
INSTALLED_APPS += ['silk']
MIDDLEWARE += ['silk.middleware.SilkyMiddleware']
# urls.py
urlpatterns += [path('silk/', include('silk.urls', namespace='silk'))]
Access at: http://localhost:8080/silk/
Database Monitoring¶
PostgreSQL Query Performance¶
-- Slow queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;
-- Connection count
SELECT count(*) as connections FROM pg_stat_activity;
-- Active connections by state
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;
-- Database size
SELECT pg_size_pretty(pg_database_size('piquetickets'));
-- Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
-- Index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
Redis Monitoring¶
# Memory usage
redis-cli INFO memory
# Stats
redis-cli INFO stats
# Connected clients
redis-cli CLIENT LIST
# Cache hit/miss rate
redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'
# Calculate hit rate
redis-cli INFO stats | awk '
/keyspace_hits/ { hits = $2 }
/keyspace_misses/ { misses = $2 }
END {
total = hits + misses
if (total > 0) {
hit_rate = (hits / total) * 100
printf "Hit Rate: %.2f%%\n", hit_rate
}
}'
# Monitor in real-time
redis-cli --stat
# Monitor commands
redis-cli MONITOR
Celery Monitoring¶
Using Flower (Already Configured)¶
# Access Flower at: http://localhost:5555
# View tasks
curl http://localhost:5555/api/tasks
# View workers
curl http://localhost:5555/api/workers
Command Line Monitoring¶
# Worker status
celery -A brktickets inspect active
# Scheduled tasks
celery -A brktickets inspect scheduled
# Queue length
celery -A brktickets inspect stats
# Registered tasks
celery -A brktickets inspect registered
System Resource Monitoring¶
Docker Stats¶
# Real-time stats
docker stats
# Specific container
docker stats piquetickets-api-1
# All containers with formatting
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"
System Metrics¶
# CPU usage
top -b -n 1 | head -20
# Memory usage
free -h
# Disk usage
df -h
# Network connections
netstat -an | grep :8080 | wc -l
Monitoring During Load Tests¶
Create Monitoring Script:
#!/bin/bash
# monitor.sh
echo "=== Load Test Monitoring ==="
echo "Timestamp: $(date)"
echo ""
echo "=== Docker Stats ==="
docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"
echo ""
echo "=== PostgreSQL Connections ==="
docker compose exec -T db psql -U user -d piquetickets -c \
"SELECT count(*) as connections, state FROM pg_stat_activity GROUP BY state;"
echo ""
echo "=== Redis Memory ==="
docker compose exec -T redis redis-cli INFO memory | grep used_memory_human
echo ""
echo "=== Celery Queue ==="
docker compose exec -T api celery -A brktickets inspect stats | grep -A 5 "total"
echo ""
Run During Tests:
Success Criteria¶
Performance Targets¶
| Metric | Target | Acceptable | Threshold |
|---|---|---|---|
| Average Response Time (GET) | < 200ms | < 300ms | < 500ms |
| Average Response Time (POST) | < 500ms | < 750ms | < 1s |
| 95th Percentile Response Time | < 500ms | < 750ms | < 1s |
| 99th Percentile Response Time | < 1s | < 1.5s | < 2s |
| Error Rate | < 0.1% | < 0.5% | < 1% |
| Concurrent Users | 100+ | 75+ | 50+ |
| Throughput (RPS) | 500+ | 300+ | 200+ |
| Database Query Time | < 50ms | < 100ms | < 200ms |
| Cache Hit Rate | > 80% | > 70% | > 60% |
Scalability Targets¶
| Scenario | Target | Acceptable | Notes |
|---|---|---|---|
| Normal Load | 100 concurrent users | 75 users | Daily operations |
| Peak Load | 500 concurrent users | 300 users | Event announcements |
| Spike Handling | 0→500 in 1 min | 0→300 in 1 min | Viral scenarios |
| Sustained Load | 100 users for 2h | 75 users for 2h | No degradation |
| Checkout Flow | 20 checkouts/min | 15 checkouts/min | Payment processing |
Infrastructure Targets¶
| Component | Metric | Target | Threshold |
|---|---|---|---|
| Gunicorn Workers | CPU Usage | < 70% | < 90% |
| PostgreSQL | Connection Usage | < 60% | < 80% |
| Redis | Memory Usage | < 70% | < 90% |
| Celery Workers | Queue Length | < 10 tasks | < 50 tasks |
| Docker Container | Memory Usage | < 70% | < 90% |
Risk Assessment¶
High Risks¶
| Risk | Probability | Impact | Mitigation Strategy |
|---|---|---|---|
| Limited Gunicorn workers (1 worker, 2 threads) | High | Critical | Increase to 2-4 workers with 4 threads each |
| Database connection exhaustion | Medium | High | Implement PgBouncer, increase CONN_MAX_AGE |
| Stripe API rate limits | Medium | High | Implement request queuing, use webhooks |
| PDF generation blocking requests | Medium | Medium | Ensure all PDF generation via Celery tasks |
| Railway memory limits | Medium | Medium | Monitor usage, upgrade plan if needed |
| Redis memory exhaustion | Low | Medium | Configure maxmemory and eviction policy |
Infrastructure Bottlenecks¶
1. Gunicorn Configuration¶
Issue: Single worker configuration limits concurrent requests to 2
Impact: - Saturates at ~20-30 RPS - High response times under load - Request queuing and timeouts
Solution:
# Recommended configuration
workers = 2
worker_class = 'gthread'
threads = 4
# Total concurrent: 8 requests
Memory Impact: +100-150MB
2. PostgreSQL Connection Pool¶
Issue: Limited connections with multiple workers
Impact: - Connection exhaustion errors - Slow query performance - Database bottleneck
Solution:
- Implement PgBouncer
- Increase CONN_MAX_AGE
- Monitor connection usage
3. Railway Memory Constraints¶
Issue: Current configuration optimized for memory
Impact: - Limited worker processes - Reduced throughput capacity
Solution: - Monitor memory usage during tests - Consider upgrading Railway plan - Optimize application memory usage
4. Celery Worker Capacity¶
Issue: 4 workers may bottleneck with heavy PDF generation
Impact: - Queue buildup during peak - Delayed email/PDF delivery
Solution: - Increase workers to 8 during peak - Separate PDF and email queues - Monitor queue lengths
Technical Debt Risks¶
| Risk | Impact | Mitigation |
|---|---|---|
| No application performance monitoring | High | Implement Django Silk or similar |
| No query optimization | Medium | Analyze slow queries, add indexes |
| No cache warming strategy | Medium | Implement cache preloading |
| No rate limiting on API | High | Implement django-ratelimit |
| No CDN for static files | Medium | Configure CloudFlare or similar |
Deliverables¶
Week 1: Baseline Performance¶
- ✅ Locust installation and setup
- ✅ Basic load test scenarios
- ✅ Baseline performance report including:
- Average response times by endpoint
- Throughput metrics (RPS)
- Error rates
- Resource utilization (CPU, memory, connections)
- ✅ Initial bottleneck identification
- ✅ Recommendations for Week 2
Week 2: Checkout Flow¶
- ✅ Checkout flow test implementation
- ✅ Stripe integration performance data
- ✅ Database query performance analysis
- ✅ Celery task processing metrics
- ✅ Database optimization recommendations
- ✅ Cache effectiveness analysis
Week 3: Stress Testing¶
- ✅ Spike test results and analysis
- ✅ Soak test results (2-4 hours)
- ✅ Capacity test findings
- ✅ Breaking point documentation
- ✅ Infrastructure scaling recommendations
- ✅ Performance degradation analysis
- ✅ Recovery behavior documentation
Week 4: Production Simulation¶
- ✅ Mixed workload test results
- ✅ Production traffic pattern analysis
- ✅ Final comprehensive performance report
- ✅ Infrastructure upgrade recommendations
- ✅ Cost-benefit analysis for scaling
- ✅ Load testing playbook for future use
- ✅ Monitoring dashboard setup
- ✅ Runbook for production deployment
Final Deliverables¶
- Load Testing Playbook - Complete guide for running future tests
- Performance Baseline Document - Reference metrics for comparison
- Optimization Recommendations - Prioritized list of improvements
- Monitoring Setup - Dashboards and alerts configuration
- Capacity Planning Report - Growth projections and scaling plan
Next Steps¶
Immediate Actions (This Week)¶
-
Install Locust
-
Create Load Test Directory
-
Set Up Staging Environment
- Create staging branch
- Deploy to Railway staging environment
-
Configure test database with production-like data
-
Implement Basic Load Test
- Create show browsing scenario
- Test against local environment
- Verify metrics collection
Short Term (This Month)¶
- Week 1: Baseline Testing
- Run baseline performance tests
- Document current performance
-
Identify immediate bottlenecks
-
Week 2: Checkout Flow
- Implement checkout flow tests
- Test Stripe integration
-
Analyze database queries
-
Week 3: Stress Testing
- Run spike, soak, and capacity tests
- Find breaking points
-
Create optimization plan
-
Week 4: Production Simulation
- Run realistic traffic patterns
- Generate final report
- Implement critical fixes
Long Term (Ongoing)¶
- Integrate with CI/CD
- Run load tests on every deployment
- Set performance budgets
-
Automated regression testing
-
Quarterly Capacity Tests
- Reassess capacity every 3 months
- Update baseline metrics
-
Plan for growth
-
Production Monitoring
- Set up APM (Application Performance Monitoring)
- Configure alerts for performance degradation
-
Track key metrics over time
-
Continuous Optimization
- Review slow queries monthly
- Optimize caching strategy
- Database index optimization
Questions for Stakeholders¶
Before beginning load testing, please provide answers to the following:
Traffic Expectations¶
- Current Traffic
- How many daily active users?
- Peak concurrent users currently?
-
Average requests per second?
-
Expected Growth
- Projected user growth over 6 months?
- Expected peak traffic (events, announcements)?
-
Largest expected event size?
-
Business Goals
- Target event capacity?
- Expected checkout volume?
- Geographic distribution of users?
Infrastructure Budget¶
- Current Costs
- Railway plan and limits?
-
Current monthly infrastructure cost?
-
Budget for Scaling
- Budget for plan upgrades?
- Acceptable cost per user?
-
Cost constraints for scaling?
-
Performance vs. Cost
- Willing to pay more for better performance?
- Priority: Cost optimization or performance?
Testing Environment¶
- Staging Environment
- Staging environment available?
- Production-like data in staging?
-
Can we test against production (carefully)?
-
Test Timing
- Best time for load testing?
- Maintenance windows available?
- User impact acceptable during tests?
Success Definition¶
- Performance Requirements
- Required response time targets?
- Acceptable error rates?
-
Minimum throughput needed?
-
Business Requirements
- Must handle X events simultaneously?
- Must support Y concurrent checkouts?
-
Specific SLA requirements?
-
User Experience
- Acceptable page load time?
- Payment processing time limits?
- Email delivery time expectations?
Appendix¶
A. Useful Resources¶
Load Testing Tools: - Locust Documentation - k6 Documentation - Artillery Documentation
Django Performance: - Django Performance Optimization - DRF Performance Tips - PostgreSQL Query Optimization
Monitoring: - Django Debug Toolbar - Django Silk - Flower - Celery Monitoring
B. Sample Test Data Generator¶
# apps/api/loadtests/utils/fixtures.py
from faker import Faker
import random
from datetime import datetime, timedelta
fake = Faker()
def generate_show_data():
"""Generate realistic show data for testing"""
return {
"title": fake.catch_phrase(),
"description": fake.text(max_nb_chars=500),
"start_time": (datetime.now() + timedelta(days=random.randint(1, 90))).isoformat(),
"published": True,
}
def generate_order_data(show_id):
"""Generate realistic order data"""
return {
"show_id": show_id,
"order_email": fake.email(),
"order_name": fake.name(),
"tickets": [
{
"ticket_id": random.randint(1, 5),
"quantity": random.randint(1, 4)
}
]
}
C. Performance Testing Checklist¶
Pre-Test: - [ ] Backup production database - [ ] Configure staging environment - [ ] Install monitoring tools - [ ] Set up alerting - [ ] Document current performance - [ ] Notify team of testing schedule
During Test: - [ ] Monitor resource usage - [ ] Watch error rates - [ ] Track response times - [ ] Check database connections - [ ] Monitor Redis memory - [ ] Observe Celery queues
Post-Test: - [ ] Analyze results - [ ] Generate reports - [ ] Document findings - [ ] Create optimization tickets - [ ] Update capacity plan - [ ] Share results with team
Document Metadata: - Version: 1.0 - Last Updated: 2025-10-18 - Author: Claude Code - Reviewers: TBD - Next Review: After Week 1 Testing
End of Document