API Test Pipeline Performance Optimization¶

Status: Draft - Ready for Review Created: 2025-11-06 Author: Claude Code (AI Assistant) Version: 1.0

Executive Summary¶

This document outlines a comprehensive plan to improve the API test pipeline performance by 50-70% through parallel test execution, database optimization, and configuration improvements. The solution leverages Django's built-in capabilities, requires minimal code changes, and can be implemented in phases with clear rollback options.

Quick Wins (Phase 1 - 2-4 hours):¶

Enable parallel test execution with --parallel=auto
Preserve test database with --keepdb
Optimize password hashing for tests
Configure coverage for parallel mode

Expected Impact: 3-8x speedup with minimal effort

1. Problem Statement¶

Current State¶

The API test pipeline is running extremely slow with: - 54 test files containing comprehensive test coverage - Sequential execution not utilizing available CPU cores - Database recreation on each test run (30-60s overhead) - Heavy use of TransactionTestCase for concurrency testing - No parallelization in CI/CD pipeline

Pain Points¶

Long CI/CD pipeline execution times (15-30+ minutes) blocking merges
Developer productivity impact due to slow feedback loops
Wasted computational resources with sequential execution
Database setup overhead repeated unnecessarily

Impact¶

Developers: Extended waiting for test feedback
Team Velocity: Slowed deployment cadence
CI/CD Resources: Inefficient runner utilization
Development Experience: Frustration with testing workflow

2. Solution Overview¶

Architecture¶

┌─────────────────────────────────────────────────────────────┐
│                   GitHub Actions Workflow                    │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ 1. Setup: PostgreSQL + Redis Services                 │ │
│  │ 2. Install dependencies                                │ │
│  │ 3. Run: manage.py test --parallel=auto --keepdb       │ │
│  │ 4. Coverage: coverage run --parallel-mode              │ │
│  │ 5. Coverage: coverage combine                          │ │
│  │ 6. Coverage: coverage report/html/xml                  │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
        ↓           ↓           ↓           ↓
   ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
   │Worker 1│  │Worker 2│  │Worker 3│  │Worker 4│
   │test_db1│  │test_db2│  │test_db3│  │test_db4│
   └────────┘  └────────┘  └────────┘  └────────┘
        ↓           ↓           ↓           ↓
   ┌──────────────────────────────────────────┐
   │   PostgreSQL Server (Parallel DBs)       │
   └──────────────────────────────────────────┘
   ┌──────────────────────────────────────────┐
   │   Redis Server (Shared Cache/Locks)      │
   └──────────────────────────────────────────┘

Key Optimizations¶

Parallel Test Execution
Use Django's --parallel flag to run tests concurrently
Automatic worker count detection with --parallel=auto
Expected 2-4x speedup on GitHub Actions runners
Database Preservation
Use --keepdb flag to reuse test database
Reduces setup time from 30-60s to <5s
Safe with proper migration handling
Password Hashing Optimization
Switch to MD5 hasher for test user creation
10-100x faster authentication in tests
Test-only configuration
Coverage Parallel Mode
Enable parallel coverage tracking
Combine results after test execution
Maintain accuracy with multiple workers

3. Implementation Plan¶

Phase 1: Foundation (Quick Wins - 2-4 hours)¶

Deliverable: 50-70% reduction in test time with minimal code changes

Task 1: Update Coverage Configuration¶

File: apps/api/.coveragerc

[run]
parallel = True
concurrency = multiprocessing
source = .

# ... rest of existing config ...

Time: 15 minutes

Task 2: Update GitHub Actions Workflow¶

File: .github/workflows/django-test.yml (lines 64-109)

- name: Run Django tests with coverage
  env:
    PGDATABASE: test_db
    PGUSER: user
    PGPASSWORD: password
    PGHOST: localhost
    PGPORT: 5432
    REDIS_URL: redis://localhost:6379
    DEBUG: 'True'
    DJ_SECRET_KEY: ${{ secrets.DJ_SECRET_KEY }}
    S3_ENABLED: 'False'
    STRIPE_PUBLISHABLE_KEY: ${{ secrets.STRIPE_PUBLISHABLE_KEY }}
    STRIPE_SECRET_KEY: ${{ secrets.STRIPE_SECRET_KEY }}
    FRONTEND_URL: 'http://localhost:3000'
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
  run: |
    cd apps/api
    python manage.py migrate
    coverage run --parallel-mode manage.py test --parallel=auto --keepdb
    coverage combine
    coverage report
    coverage xml
    coverage html

Changes: - Line 82: Change from coverage run manage.py test to coverage run --parallel-mode manage.py test --parallel=auto --keepdb - After line 82: Add coverage combine step

Time: 30 minutes

Task 3: Add Test Password Hasher¶

File: apps/api/brktickets/settings.py (around line 350)

# Test configuration optimizations
if 'test' in sys.argv or 'pytest' in sys.modules:
    # Use faster password hasher for tests (10-100x speedup)
    PASSWORD_HASHERS = [
        'django.contrib.auth.hashers.MD5PasswordHasher',
    ]

    # Database settings already optimized at line 351-352

Time: 15 minutes

Task 4: Test Locally¶

cd apps/api

# Set environment variables
export PGDATABASE=test_db
export PGUSER=user
export PGPASSWORD=password
export PGHOST=localhost
export PGPORT=5432
export DEBUG=True
export DJ_SECRET_KEY=test-secret-key
export S3_ENABLED=False
export FRONTEND_URL=http://localhost:3000

# Run with parallel execution
coverage run --parallel-mode manage.py test --parallel=4 --keepdb

# Combine coverage
coverage combine

# View results
coverage report

Time: 1 hour (including fixing any issues)

Phase 2: Validation & Tuning (4-6 hours)¶

Deliverable: Stable parallel execution with documented performance gains

Tasks:¶

Run full test suite in CI multiple times (3+ runs)
Identify and fix any flaky tests from parallelization
Benchmark execution time before/after
Document slowest tests for future optimization
Create troubleshooting guide

Phase 3: Advanced Optimization (Optional - 1-2 weeks)¶

Deliverable: Further optimized fixtures and test organization

Potential Improvements:¶

Migrate conftest.py fixtures to factory_boy pattern
Categorize tests by speed (fast/medium/slow)
Implement selective test execution
Consider pytest migration for better tooling

4. Files to Modify¶

Modified Files¶

File	Changes	Impact
`apps/api/.coveragerc`	Add `parallel = True` and `concurrency = multiprocessing`	Coverage tracking for parallel tests
`.github/workflows/django-test.yml`	Update test command with flags, add combine step	Enable parallel CI execution
`apps/api/brktickets/settings.py`	Add test-specific PASSWORD_HASHERS	10-100x faster user creation

No New Files Required¶

All changes are configuration updates to existing files.

5. Testing Strategy¶

Validation Tests¶

Parallel Execution Test¶

# Run subset of tests in parallel
python manage.py test tickets.tests.test_checkout_core --parallel=4

# Verify no failures
# Check for proper isolation

Coverage Accuracy Test¶

# Run with parallel coverage
coverage run --parallel-mode manage.py test --parallel=4

# Combine results
coverage combine

# Compare to sequential coverage (should be same ±1%)
coverage report

Flaky Test Detection¶

# Run full suite 3 times
for i in {1..3}; do
    echo "Run $i"
    python manage.py test --parallel=auto --keepdb
done

# Monitor for inconsistent failures

Acceptance Criteria¶

Test execution time reduced by 50-70%
All 54 test files pass without failures
Coverage percentage maintained or improved
No flaky tests (3+ consecutive successful runs)
CI/CD pipeline completes successfully
Performance benchmarks documented

6. Risk Assessment¶

Technical Risks¶

Risk	Probability	Impact	Mitigation
Flaky tests from parallelization	High	Medium	Identify with multiple runs, fix shared state
Database connection exhaustion	Medium	High	Configure pooling, limit to 4-8 workers
Coverage data corruption	Low	Medium	Test locally first, keep sequential as fallback
Test data leakage between workers	Medium	High	Audit for shared state, ensure proper isolation

Rollback Strategy¶

Easy Rollback:

# Remove flags from GitHub Actions
coverage run manage.py test  # Back to sequential

# Revert .coveragerc changes
git checkout apps/api/.coveragerc

Gradual Rollout: 1. Test in feature branch first 2. Monitor 3+ CI runs before merging 3. Keep sequential execution as option for debugging

7. Performance Benchmarks¶

Expected Improvements¶

Baseline (Current - To be measured): - Total test time: ~30-45 minutes (estimated) - Database setup: ~30-60 seconds - Coverage overhead: ~20% slower

Target (After Phase 1): - Total test time: 10-15 minutes (50-70% reduction) - Database setup: <5 seconds (with --keepdb) - Parallel workers: 4-8 (auto-detected) - Coverage overhead: Negligible (parallel mode)

Measurement Method¶

# Before optimization
time python manage.py test

# After optimization
time python manage.py test --parallel=auto --keepdb

8. Research & Best Practices¶

This plan is based on industry-standard Django optimization practices:

Key References¶

Django Official Documentation
Testing Tools
Parallel testing support since Django 1.9
Recommended for projects with proper test isolation
Industry Case Studies
ORFIUM Engineering: Achieved 200% speedup with parallel execution
Various Django shops: 50-70% improvement is standard
Coverage.py: Parallel mode designed for this use case
Best Practices (2025)
Always start with --parallel and --keepdb (quick wins)
Use MD5 password hasher for tests only
Enable coverage parallel mode for accurate tracking
Monitor for flaky tests after enabling parallelization

9. Developer Guide¶

Local Testing Commands¶

# Parallel execution (recommended for full suite)
python manage.py test --parallel=auto --keepdb

# Sequential execution (for debugging specific tests)
python manage.py test tickets.tests.test_checkout_core

# Run specific app with parallelization
python manage.py test tickets.tests --parallel=4 --keepdb

# First run (create database)
python manage.py test --parallel=auto

# Subsequent runs (reuse database)
python manage.py test --parallel=auto --keepdb

Troubleshooting¶

Flaky Test:

# Run test multiple times to confirm
python manage.py test tickets.tests.test_specific --parallel=1

# If fails only in parallel, check for:
# - Shared state (class variables, cache)
# - Database fixtures not properly isolated
# - Time-dependent logic

Database Issues:

# Reset test database if corrupted
python manage.py test --keepdb=False

# Or manually drop
dropdb test_db_1 test_db_2 test_db_3 test_db_4

10. Success Metrics¶

Key Performance Indicators¶

Primary Metrics: - Total CI test time: Target <10 minutes (current: ~30-45 min) - Time savings per developer: ~30-60 min/day - Deployment frequency: Potential increase with faster feedback

Quality Metrics: - Test pass rate: Maintain 100% - Coverage percentage: Maintain or improve current level - Flaky test count: Target 0 after stabilization

Review Schedule¶

Week 1: Initial deployment and monitoring
Week 2: Performance validation and optimization
Month 1: Success evaluation and team feedback

11. Conclusion¶

This optimization plan provides a clear, phased approach to dramatically improve test performance:

Benefits¶

50-70% faster test execution with minimal effort
Better resource utilization with parallel execution
Faster feedback loops for developers
Easy rollback if issues arise

Recommended Approach¶

Start with Phase 1 (Quick wins - 2-4 hours)
Validate stability (1-2 weeks of monitoring)
Consider Phase 3 based on further needs

Next Steps¶

Review and approve this plan
Create feature branch for implementation
Execute Phase 1 changes
Monitor CI pipeline for 3+ runs
Merge after successful validation

Appendices¶

Appendix A: Current Test Infrastructure¶

Test Runner: ForceCleanupTestRunner (custom) - Location: apps/api/brktickets/test_runner.py - Purpose: Handles PostgreSQL connection cleanup - Compatibility: Works with parallel execution

Test Files: 54 files across multiple apps - apps/api/tickets/tests/ - 22 files - apps/api/portal/tests/ - 7 files - apps/api/producers/tests/ - 4 files - Other apps - 21 files

Test Types: - TransactionTestCase - Used for concurrency testing - APITestCase - REST framework tests - Mix of unit, integration, and performance tests

Appendix B: Example Configuration Changes¶

Complete .coveragerc:

[run]
# Enable parallel coverage tracking
parallel = True
concurrency = multiprocessing
source = .

# Omit specific files and patterns from coverage measurement
omit =
    */migrations/*
    */management/commands/*
    manage.py
    */settings.py
    */settings/*.py
    */tests.py
    */tests/*
    */test_*.py
    test_*.py
    */conftest.py
    */venv/*
    */env/*
    */.venv/*
    */__pycache__/*
    */__init__.py
    */staticfiles/*
    */static/*
    */templates/*

[report]
show_missing = True
ignore_errors = True
skip_empty = True
precision = 2
sort = Cover

[html]
directory = htmlcov

[xml]
output = coverage.xml

Document Status: Draft - Ready for Technical Review Estimated Implementation Time: Phase 1: 2-4 hours | Full: 1-2 weeks Expected ROI: High (50-70% time savings with minimal effort)