API Test Pipeline Performance Optimization¶
Status: Draft - Ready for Review Created: 2025-11-06 Author: Claude Code (AI Assistant) Version: 1.0
Executive Summary¶
This document outlines a comprehensive plan to improve the API test pipeline performance by 50-70% through parallel test execution, database optimization, and configuration improvements. The solution leverages Django's built-in capabilities, requires minimal code changes, and can be implemented in phases with clear rollback options.
Quick Wins (Phase 1 - 2-4 hours):¶
- Enable parallel test execution with
--parallel=auto - Preserve test database with
--keepdb - Optimize password hashing for tests
- Configure coverage for parallel mode
Expected Impact: 3-8x speedup with minimal effort
1. Problem Statement¶
Current State¶
The API test pipeline is running extremely slow with: - 54 test files containing comprehensive test coverage - Sequential execution not utilizing available CPU cores - Database recreation on each test run (30-60s overhead) - Heavy use of TransactionTestCase for concurrency testing - No parallelization in CI/CD pipeline
Pain Points¶
- Long CI/CD pipeline execution times (15-30+ minutes) blocking merges
- Developer productivity impact due to slow feedback loops
- Wasted computational resources with sequential execution
- Database setup overhead repeated unnecessarily
Impact¶
- Developers: Extended waiting for test feedback
- Team Velocity: Slowed deployment cadence
- CI/CD Resources: Inefficient runner utilization
- Development Experience: Frustration with testing workflow
2. Solution Overview¶
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ GitHub Actions Workflow │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ 1. Setup: PostgreSQL + Redis Services │ │
│ │ 2. Install dependencies │ │
│ │ 3. Run: manage.py test --parallel=auto --keepdb │ │
│ │ 4. Coverage: coverage run --parallel-mode │ │
│ │ 5. Coverage: coverage combine │ │
│ │ 6. Coverage: coverage report/html/xml │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
↓ ↓ ↓ ↓
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Worker 1│ │Worker 2│ │Worker 3│ │Worker 4│
│test_db1│ │test_db2│ │test_db3│ │test_db4│
└────────┘ └────────┘ └────────┘ └────────┘
↓ ↓ ↓ ↓
┌──────────────────────────────────────────┐
│ PostgreSQL Server (Parallel DBs) │
└──────────────────────────────────────────┘
┌──────────────────────────────────────────┐
│ Redis Server (Shared Cache/Locks) │
└──────────────────────────────────────────┘
Key Optimizations¶
- Parallel Test Execution
- Use Django's
--parallelflag to run tests concurrently - Automatic worker count detection with
--parallel=auto -
Expected 2-4x speedup on GitHub Actions runners
-
Database Preservation
- Use
--keepdbflag to reuse test database - Reduces setup time from 30-60s to <5s
-
Safe with proper migration handling
-
Password Hashing Optimization
- Switch to MD5 hasher for test user creation
- 10-100x faster authentication in tests
-
Test-only configuration
-
Coverage Parallel Mode
- Enable parallel coverage tracking
- Combine results after test execution
- Maintain accuracy with multiple workers
3. Implementation Plan¶
Phase 1: Foundation (Quick Wins - 2-4 hours)¶
Deliverable: 50-70% reduction in test time with minimal code changes
Task 1: Update Coverage Configuration¶
File: apps/api/.coveragerc
Time: 15 minutes
Task 2: Update GitHub Actions Workflow¶
File: .github/workflows/django-test.yml (lines 64-109)
- name: Run Django tests with coverage
env:
PGDATABASE: test_db
PGUSER: user
PGPASSWORD: password
PGHOST: localhost
PGPORT: 5432
REDIS_URL: redis://localhost:6379
DEBUG: 'True'
DJ_SECRET_KEY: ${{ secrets.DJ_SECRET_KEY }}
S3_ENABLED: 'False'
STRIPE_PUBLISHABLE_KEY: ${{ secrets.STRIPE_PUBLISHABLE_KEY }}
STRIPE_SECRET_KEY: ${{ secrets.STRIPE_SECRET_KEY }}
FRONTEND_URL: 'http://localhost:3000'
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
run: |
cd apps/api
python manage.py migrate
coverage run --parallel-mode manage.py test --parallel=auto --keepdb
coverage combine
coverage report
coverage xml
coverage html
Changes:
- Line 82: Change from coverage run manage.py test to coverage run --parallel-mode manage.py test --parallel=auto --keepdb
- After line 82: Add coverage combine step
Time: 30 minutes
Task 3: Add Test Password Hasher¶
File: apps/api/brktickets/settings.py (around line 350)
# Test configuration optimizations
if 'test' in sys.argv or 'pytest' in sys.modules:
# Use faster password hasher for tests (10-100x speedup)
PASSWORD_HASHERS = [
'django.contrib.auth.hashers.MD5PasswordHasher',
]
# Database settings already optimized at line 351-352
Time: 15 minutes
Task 4: Test Locally¶
cd apps/api
# Set environment variables
export PGDATABASE=test_db
export PGUSER=user
export PGPASSWORD=password
export PGHOST=localhost
export PGPORT=5432
export DEBUG=True
export DJ_SECRET_KEY=test-secret-key
export S3_ENABLED=False
export FRONTEND_URL=http://localhost:3000
# Run with parallel execution
coverage run --parallel-mode manage.py test --parallel=4 --keepdb
# Combine coverage
coverage combine
# View results
coverage report
Time: 1 hour (including fixing any issues)
Phase 2: Validation & Tuning (4-6 hours)¶
Deliverable: Stable parallel execution with documented performance gains
Tasks:¶
- Run full test suite in CI multiple times (3+ runs)
- Identify and fix any flaky tests from parallelization
- Benchmark execution time before/after
- Document slowest tests for future optimization
- Create troubleshooting guide
Phase 3: Advanced Optimization (Optional - 1-2 weeks)¶
Deliverable: Further optimized fixtures and test organization
Potential Improvements:¶
- Migrate
conftest.pyfixtures to factory_boy pattern - Categorize tests by speed (fast/medium/slow)
- Implement selective test execution
- Consider pytest migration for better tooling
4. Files to Modify¶
Modified Files¶
| File | Changes | Impact |
|---|---|---|
apps/api/.coveragerc |
Add parallel = True and concurrency = multiprocessing |
Coverage tracking for parallel tests |
.github/workflows/django-test.yml |
Update test command with flags, add combine step | Enable parallel CI execution |
apps/api/brktickets/settings.py |
Add test-specific PASSWORD_HASHERS | 10-100x faster user creation |
No New Files Required¶
All changes are configuration updates to existing files.
5. Testing Strategy¶
Validation Tests¶
Parallel Execution Test¶
# Run subset of tests in parallel
python manage.py test tickets.tests.test_checkout_core --parallel=4
# Verify no failures
# Check for proper isolation
Coverage Accuracy Test¶
# Run with parallel coverage
coverage run --parallel-mode manage.py test --parallel=4
# Combine results
coverage combine
# Compare to sequential coverage (should be same ±1%)
coverage report
Flaky Test Detection¶
# Run full suite 3 times
for i in {1..3}; do
echo "Run $i"
python manage.py test --parallel=auto --keepdb
done
# Monitor for inconsistent failures
Acceptance Criteria¶
- Test execution time reduced by 50-70%
- All 54 test files pass without failures
- Coverage percentage maintained or improved
- No flaky tests (3+ consecutive successful runs)
- CI/CD pipeline completes successfully
- Performance benchmarks documented
6. Risk Assessment¶
Technical Risks¶
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Flaky tests from parallelization | High | Medium | Identify with multiple runs, fix shared state |
| Database connection exhaustion | Medium | High | Configure pooling, limit to 4-8 workers |
| Coverage data corruption | Low | Medium | Test locally first, keep sequential as fallback |
| Test data leakage between workers | Medium | High | Audit for shared state, ensure proper isolation |
Rollback Strategy¶
Easy Rollback:
# Remove flags from GitHub Actions
coverage run manage.py test # Back to sequential
# Revert .coveragerc changes
git checkout apps/api/.coveragerc
Gradual Rollout: 1. Test in feature branch first 2. Monitor 3+ CI runs before merging 3. Keep sequential execution as option for debugging
7. Performance Benchmarks¶
Expected Improvements¶
Baseline (Current - To be measured): - Total test time: ~30-45 minutes (estimated) - Database setup: ~30-60 seconds - Coverage overhead: ~20% slower
Target (After Phase 1): - Total test time: 10-15 minutes (50-70% reduction) - Database setup: <5 seconds (with --keepdb) - Parallel workers: 4-8 (auto-detected) - Coverage overhead: Negligible (parallel mode)
Measurement Method¶
# Before optimization
time python manage.py test
# After optimization
time python manage.py test --parallel=auto --keepdb
8. Research & Best Practices¶
This plan is based on industry-standard Django optimization practices:
Key References¶
- Django Official Documentation
- Testing Tools
- Parallel testing support since Django 1.9
-
Recommended for projects with proper test isolation
-
Industry Case Studies
- ORFIUM Engineering: Achieved 200% speedup with parallel execution
- Various Django shops: 50-70% improvement is standard
-
Coverage.py: Parallel mode designed for this use case
-
Best Practices (2025)
- Always start with
--paralleland--keepdb(quick wins) - Use MD5 password hasher for tests only
- Enable coverage parallel mode for accurate tracking
- Monitor for flaky tests after enabling parallelization
9. Developer Guide¶
Local Testing Commands¶
# Parallel execution (recommended for full suite)
python manage.py test --parallel=auto --keepdb
# Sequential execution (for debugging specific tests)
python manage.py test tickets.tests.test_checkout_core
# Run specific app with parallelization
python manage.py test tickets.tests --parallel=4 --keepdb
# First run (create database)
python manage.py test --parallel=auto
# Subsequent runs (reuse database)
python manage.py test --parallel=auto --keepdb
Troubleshooting¶
Flaky Test:
# Run test multiple times to confirm
python manage.py test tickets.tests.test_specific --parallel=1
# If fails only in parallel, check for:
# - Shared state (class variables, cache)
# - Database fixtures not properly isolated
# - Time-dependent logic
Database Issues:
# Reset test database if corrupted
python manage.py test --keepdb=False
# Or manually drop
dropdb test_db_1 test_db_2 test_db_3 test_db_4
10. Success Metrics¶
Key Performance Indicators¶
Primary Metrics: - Total CI test time: Target <10 minutes (current: ~30-45 min) - Time savings per developer: ~30-60 min/day - Deployment frequency: Potential increase with faster feedback
Quality Metrics: - Test pass rate: Maintain 100% - Coverage percentage: Maintain or improve current level - Flaky test count: Target 0 after stabilization
Review Schedule¶
- Week 1: Initial deployment and monitoring
- Week 2: Performance validation and optimization
- Month 1: Success evaluation and team feedback
11. Conclusion¶
This optimization plan provides a clear, phased approach to dramatically improve test performance:
Benefits¶
- 50-70% faster test execution with minimal effort
- Better resource utilization with parallel execution
- Faster feedback loops for developers
- Easy rollback if issues arise
Recommended Approach¶
- Start with Phase 1 (Quick wins - 2-4 hours)
- Validate stability (1-2 weeks of monitoring)
- Consider Phase 3 based on further needs
Next Steps¶
- Review and approve this plan
- Create feature branch for implementation
- Execute Phase 1 changes
- Monitor CI pipeline for 3+ runs
- Merge after successful validation
Appendices¶
Appendix A: Current Test Infrastructure¶
Test Runner: ForceCleanupTestRunner (custom)
- Location: apps/api/brktickets/test_runner.py
- Purpose: Handles PostgreSQL connection cleanup
- Compatibility: Works with parallel execution
Test Files: 54 files across multiple apps
- apps/api/tickets/tests/ - 22 files
- apps/api/portal/tests/ - 7 files
- apps/api/producers/tests/ - 4 files
- Other apps - 21 files
Test Types:
- TransactionTestCase - Used for concurrency testing
- APITestCase - REST framework tests
- Mix of unit, integration, and performance tests
Appendix B: Example Configuration Changes¶
Complete .coveragerc:
[run]
# Enable parallel coverage tracking
parallel = True
concurrency = multiprocessing
source = .
# Omit specific files and patterns from coverage measurement
omit =
*/migrations/*
*/management/commands/*
manage.py
*/settings.py
*/settings/*.py
*/tests.py
*/tests/*
*/test_*.py
test_*.py
*/conftest.py
*/venv/*
*/env/*
*/.venv/*
*/__pycache__/*
*/__init__.py
*/staticfiles/*
*/static/*
*/templates/*
[report]
show_missing = True
ignore_errors = True
skip_empty = True
precision = 2
sort = Cover
[html]
directory = htmlcov
[xml]
output = coverage.xml
Appendix C: Related Documentation¶
Document Status: Draft - Ready for Technical Review Estimated Implementation Time: Phase 1: 2-4 hours | Full: 1-2 weeks Expected ROI: High (50-70% time savings with minimal effort)