Skip to content

API Test Pipeline Performance Optimization

Status: Draft - Ready for Review Created: 2025-11-06 Author: Claude Code (AI Assistant) Version: 1.0


Executive Summary

This document outlines a comprehensive plan to improve the API test pipeline performance by 50-70% through parallel test execution, database optimization, and configuration improvements. The solution leverages Django's built-in capabilities, requires minimal code changes, and can be implemented in phases with clear rollback options.

Quick Wins (Phase 1 - 2-4 hours):

  • Enable parallel test execution with --parallel=auto
  • Preserve test database with --keepdb
  • Optimize password hashing for tests
  • Configure coverage for parallel mode

Expected Impact: 3-8x speedup with minimal effort


1. Problem Statement

Current State

The API test pipeline is running extremely slow with: - 54 test files containing comprehensive test coverage - Sequential execution not utilizing available CPU cores - Database recreation on each test run (30-60s overhead) - Heavy use of TransactionTestCase for concurrency testing - No parallelization in CI/CD pipeline

Pain Points

  • Long CI/CD pipeline execution times (15-30+ minutes) blocking merges
  • Developer productivity impact due to slow feedback loops
  • Wasted computational resources with sequential execution
  • Database setup overhead repeated unnecessarily

Impact

  • Developers: Extended waiting for test feedback
  • Team Velocity: Slowed deployment cadence
  • CI/CD Resources: Inefficient runner utilization
  • Development Experience: Frustration with testing workflow

2. Solution Overview

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   GitHub Actions Workflow                    │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ 1. Setup: PostgreSQL + Redis Services                 │ │
│  │ 2. Install dependencies                                │ │
│  │ 3. Run: manage.py test --parallel=auto --keepdb       │ │
│  │ 4. Coverage: coverage run --parallel-mode              │ │
│  │ 5. Coverage: coverage combine                          │ │
│  │ 6. Coverage: coverage report/html/xml                  │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
        ↓           ↓           ↓           ↓
   ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
   │Worker 1│  │Worker 2│  │Worker 3│  │Worker 4│
   │test_db1│  │test_db2│  │test_db3│  │test_db4│
   └────────┘  └────────┘  └────────┘  └────────┘
        ↓           ↓           ↓           ↓
   ┌──────────────────────────────────────────┐
   │   PostgreSQL Server (Parallel DBs)       │
   └──────────────────────────────────────────┘
   ┌──────────────────────────────────────────┐
   │   Redis Server (Shared Cache/Locks)      │
   └──────────────────────────────────────────┘

Key Optimizations

  1. Parallel Test Execution
  2. Use Django's --parallel flag to run tests concurrently
  3. Automatic worker count detection with --parallel=auto
  4. Expected 2-4x speedup on GitHub Actions runners

  5. Database Preservation

  6. Use --keepdb flag to reuse test database
  7. Reduces setup time from 30-60s to <5s
  8. Safe with proper migration handling

  9. Password Hashing Optimization

  10. Switch to MD5 hasher for test user creation
  11. 10-100x faster authentication in tests
  12. Test-only configuration

  13. Coverage Parallel Mode

  14. Enable parallel coverage tracking
  15. Combine results after test execution
  16. Maintain accuracy with multiple workers

3. Implementation Plan

Phase 1: Foundation (Quick Wins - 2-4 hours)

Deliverable: 50-70% reduction in test time with minimal code changes

Task 1: Update Coverage Configuration

File: apps/api/.coveragerc

[run]
parallel = True
concurrency = multiprocessing
source = .

# ... rest of existing config ...

Time: 15 minutes

Task 2: Update GitHub Actions Workflow

File: .github/workflows/django-test.yml (lines 64-109)

- name: Run Django tests with coverage
  env:
    PGDATABASE: test_db
    PGUSER: user
    PGPASSWORD: password
    PGHOST: localhost
    PGPORT: 5432
    REDIS_URL: redis://localhost:6379
    DEBUG: 'True'
    DJ_SECRET_KEY: ${{ secrets.DJ_SECRET_KEY }}
    S3_ENABLED: 'False'
    STRIPE_PUBLISHABLE_KEY: ${{ secrets.STRIPE_PUBLISHABLE_KEY }}
    STRIPE_SECRET_KEY: ${{ secrets.STRIPE_SECRET_KEY }}
    FRONTEND_URL: 'http://localhost:3000'
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
  run: |
    cd apps/api
    python manage.py migrate
    coverage run --parallel-mode manage.py test --parallel=auto --keepdb
    coverage combine
    coverage report
    coverage xml
    coverage html

Changes: - Line 82: Change from coverage run manage.py test to coverage run --parallel-mode manage.py test --parallel=auto --keepdb - After line 82: Add coverage combine step

Time: 30 minutes

Task 3: Add Test Password Hasher

File: apps/api/brktickets/settings.py (around line 350)

# Test configuration optimizations
if 'test' in sys.argv or 'pytest' in sys.modules:
    # Use faster password hasher for tests (10-100x speedup)
    PASSWORD_HASHERS = [
        'django.contrib.auth.hashers.MD5PasswordHasher',
    ]

    # Database settings already optimized at line 351-352

Time: 15 minutes

Task 4: Test Locally

cd apps/api

# Set environment variables
export PGDATABASE=test_db
export PGUSER=user
export PGPASSWORD=password
export PGHOST=localhost
export PGPORT=5432
export DEBUG=True
export DJ_SECRET_KEY=test-secret-key
export S3_ENABLED=False
export FRONTEND_URL=http://localhost:3000

# Run with parallel execution
coverage run --parallel-mode manage.py test --parallel=4 --keepdb

# Combine coverage
coverage combine

# View results
coverage report

Time: 1 hour (including fixing any issues)


Phase 2: Validation & Tuning (4-6 hours)

Deliverable: Stable parallel execution with documented performance gains

Tasks:

  1. Run full test suite in CI multiple times (3+ runs)
  2. Identify and fix any flaky tests from parallelization
  3. Benchmark execution time before/after
  4. Document slowest tests for future optimization
  5. Create troubleshooting guide

Phase 3: Advanced Optimization (Optional - 1-2 weeks)

Deliverable: Further optimized fixtures and test organization

Potential Improvements:

  1. Migrate conftest.py fixtures to factory_boy pattern
  2. Categorize tests by speed (fast/medium/slow)
  3. Implement selective test execution
  4. Consider pytest migration for better tooling

4. Files to Modify

Modified Files

File Changes Impact
apps/api/.coveragerc Add parallel = True and concurrency = multiprocessing Coverage tracking for parallel tests
.github/workflows/django-test.yml Update test command with flags, add combine step Enable parallel CI execution
apps/api/brktickets/settings.py Add test-specific PASSWORD_HASHERS 10-100x faster user creation

No New Files Required

All changes are configuration updates to existing files.


5. Testing Strategy

Validation Tests

Parallel Execution Test

# Run subset of tests in parallel
python manage.py test tickets.tests.test_checkout_core --parallel=4

# Verify no failures
# Check for proper isolation

Coverage Accuracy Test

# Run with parallel coverage
coverage run --parallel-mode manage.py test --parallel=4

# Combine results
coverage combine

# Compare to sequential coverage (should be same ±1%)
coverage report

Flaky Test Detection

# Run full suite 3 times
for i in {1..3}; do
    echo "Run $i"
    python manage.py test --parallel=auto --keepdb
done

# Monitor for inconsistent failures

Acceptance Criteria

  • Test execution time reduced by 50-70%
  • All 54 test files pass without failures
  • Coverage percentage maintained or improved
  • No flaky tests (3+ consecutive successful runs)
  • CI/CD pipeline completes successfully
  • Performance benchmarks documented

6. Risk Assessment

Technical Risks

Risk Probability Impact Mitigation
Flaky tests from parallelization High Medium Identify with multiple runs, fix shared state
Database connection exhaustion Medium High Configure pooling, limit to 4-8 workers
Coverage data corruption Low Medium Test locally first, keep sequential as fallback
Test data leakage between workers Medium High Audit for shared state, ensure proper isolation

Rollback Strategy

Easy Rollback:

# Remove flags from GitHub Actions
coverage run manage.py test  # Back to sequential

# Revert .coveragerc changes
git checkout apps/api/.coveragerc

Gradual Rollout: 1. Test in feature branch first 2. Monitor 3+ CI runs before merging 3. Keep sequential execution as option for debugging


7. Performance Benchmarks

Expected Improvements

Baseline (Current - To be measured): - Total test time: ~30-45 minutes (estimated) - Database setup: ~30-60 seconds - Coverage overhead: ~20% slower

Target (After Phase 1): - Total test time: 10-15 minutes (50-70% reduction) - Database setup: <5 seconds (with --keepdb) - Parallel workers: 4-8 (auto-detected) - Coverage overhead: Negligible (parallel mode)

Measurement Method

# Before optimization
time python manage.py test

# After optimization
time python manage.py test --parallel=auto --keepdb

8. Research & Best Practices

This plan is based on industry-standard Django optimization practices:

Key References

  1. Django Official Documentation
  2. Testing Tools
  3. Parallel testing support since Django 1.9
  4. Recommended for projects with proper test isolation

  5. Industry Case Studies

  6. ORFIUM Engineering: Achieved 200% speedup with parallel execution
  7. Various Django shops: 50-70% improvement is standard
  8. Coverage.py: Parallel mode designed for this use case

  9. Best Practices (2025)

  10. Always start with --parallel and --keepdb (quick wins)
  11. Use MD5 password hasher for tests only
  12. Enable coverage parallel mode for accurate tracking
  13. Monitor for flaky tests after enabling parallelization

9. Developer Guide

Local Testing Commands

# Parallel execution (recommended for full suite)
python manage.py test --parallel=auto --keepdb

# Sequential execution (for debugging specific tests)
python manage.py test tickets.tests.test_checkout_core

# Run specific app with parallelization
python manage.py test tickets.tests --parallel=4 --keepdb

# First run (create database)
python manage.py test --parallel=auto

# Subsequent runs (reuse database)
python manage.py test --parallel=auto --keepdb

Troubleshooting

Flaky Test:

# Run test multiple times to confirm
python manage.py test tickets.tests.test_specific --parallel=1

# If fails only in parallel, check for:
# - Shared state (class variables, cache)
# - Database fixtures not properly isolated
# - Time-dependent logic

Database Issues:

# Reset test database if corrupted
python manage.py test --keepdb=False

# Or manually drop
dropdb test_db_1 test_db_2 test_db_3 test_db_4


10. Success Metrics

Key Performance Indicators

Primary Metrics: - Total CI test time: Target <10 minutes (current: ~30-45 min) - Time savings per developer: ~30-60 min/day - Deployment frequency: Potential increase with faster feedback

Quality Metrics: - Test pass rate: Maintain 100% - Coverage percentage: Maintain or improve current level - Flaky test count: Target 0 after stabilization

Review Schedule

  • Week 1: Initial deployment and monitoring
  • Week 2: Performance validation and optimization
  • Month 1: Success evaluation and team feedback

11. Conclusion

This optimization plan provides a clear, phased approach to dramatically improve test performance:

Benefits

  • 50-70% faster test execution with minimal effort
  • Better resource utilization with parallel execution
  • Faster feedback loops for developers
  • Easy rollback if issues arise
  1. Start with Phase 1 (Quick wins - 2-4 hours)
  2. Validate stability (1-2 weeks of monitoring)
  3. Consider Phase 3 based on further needs

Next Steps

  1. Review and approve this plan
  2. Create feature branch for implementation
  3. Execute Phase 1 changes
  4. Monitor CI pipeline for 3+ runs
  5. Merge after successful validation

Appendices

Appendix A: Current Test Infrastructure

Test Runner: ForceCleanupTestRunner (custom) - Location: apps/api/brktickets/test_runner.py - Purpose: Handles PostgreSQL connection cleanup - Compatibility: Works with parallel execution

Test Files: 54 files across multiple apps - apps/api/tickets/tests/ - 22 files - apps/api/portal/tests/ - 7 files - apps/api/producers/tests/ - 4 files - Other apps - 21 files

Test Types: - TransactionTestCase - Used for concurrency testing - APITestCase - REST framework tests - Mix of unit, integration, and performance tests

Appendix B: Example Configuration Changes

Complete .coveragerc:

[run]
# Enable parallel coverage tracking
parallel = True
concurrency = multiprocessing
source = .

# Omit specific files and patterns from coverage measurement
omit =
    */migrations/*
    */management/commands/*
    manage.py
    */settings.py
    */settings/*.py
    */tests.py
    */tests/*
    */test_*.py
    test_*.py
    */conftest.py
    */venv/*
    */env/*
    */.venv/*
    */__pycache__/*
    */__init__.py
    */staticfiles/*
    */static/*
    */templates/*

[report]
show_missing = True
ignore_errors = True
skip_empty = True
precision = 2
sort = Cover

[html]
directory = htmlcov

[xml]
output = coverage.xml


Document Status: Draft - Ready for Technical Review Estimated Implementation Time: Phase 1: 2-4 hours | Full: 1-2 weeks Expected ROI: High (50-70% time savings with minimal effort)