Archived: 2025-11-01 Reason: Completed - Memory optimizations implemented Status: Consolidated from 3 separate memory optimization documents Note: This document combines MEMORY_OPTIMIZATION_PLAN.md, MEMORY_CODE_OPTIMIZATIONS.md, and MEMORY_REDUCTION_STRATEGIES.md

Memory Footprint Optimization Plan - PiqueTickets API Deployment¶

Created: 2025-10-17 Status: Planning Phase Priority: HIGH

1. Executive Summary & Feature Overview¶

1.1 Feature Description¶

Feature Name: API Deployment Memory Footprint Reduction
Feature Type: Performance Optimization / Infrastructure Enhancement
Priority Level: High

1.2 Problem Statement¶

Current State: - Django API deployed on Railway with 4 separate services (API, Celery Worker, Celery Beat, Flower) - Gunicorn configured with 4 workers using sync worker class - Celery worker with concurrency=4, prefetch-multiplier=4 - WeasyPrint PDF generation loads entire documents into memory - Pillow image processing creates multiple copies during transformations - No memory limits defined in deployment configuration - Database queries loading full datasets without pagination

Pain Points: - High memory consumption per container/service - Potential OOM (Out of Memory) kills during peak load - Inefficient resource utilization on Railway platform - Memory leaks from long-running processes - Duplicate image objects during format conversions - N+1 query patterns in admin interface - Full dataset loads for reports and CSV exports

User Impact: - Increased hosting costs due to higher memory allocation - Potential service disruptions during memory spikes - Slower response times when memory pressure is high - Failed PDF/report generation for large events

Business Value: - Reduced infrastructure costs - Improved service reliability and uptime - Better scalability for growing event sizes - Faster response times across all services

1.3 Expected Outcomes¶

Success Metrics: - Reduce API service memory usage from ~500MB to ~300MB baseline - Reduce Celery worker memory from ~400MB to ~200MB per worker - Eliminate memory-related crashes (target: 0 OOM events per month) - Reduce average response time for PDF generation by 20-30% - Reduce database query memory consumption by 40-50%

Technical Goals: - Implement streaming responses for large data exports - Add memory limits and resource constraints to all services - Optimize Gunicorn and Celery configurations - Implement lazy loading and pagination for database queries - Add garbage collection after memory-intensive operations - Convert synchronous PDF generation to async background tasks

2. Current State Analysis¶

2.1 Deployment Architecture¶

Services Running: 1. API Service (apps/api/Dockerfile) - Gunicorn with 4 sync workers - No memory limits defined - Current config: apps/api/railway.json:10

Celery Worker (apps/api/Dockerfile)
Concurrency: 4 workers
Prefetch multiplier: 4
Max tasks per child: 100
Current config: apps/api/railway.worker.json:10
Celery Beat (apps/api/Dockerfile)
Single scheduler process
Database-backed scheduler
Current config: apps/api/railway.beats.json:10
Flower (apps/api/Dockerfile)
Monitoring dashboard
Current config: apps/api/railway.flower.json:10

Docker Configuration: - Base image: python:3.12-slim (apps/api/Dockerfile:2) - System dependencies include: cairo, pango, gdk-pixbuf (for WeasyPrint) - Installs 40 Python packages (apps/api/requirements.txt)

2.2 Memory-Intensive Operations (Identified)¶

CRITICAL Issues:

PDF Generation - Highest Impact
Files: tickets/utils/pdf.py, tickets/utils/reports.py
WeasyPrint loads entire HTML + renders to memory
No streaming support
Generates ticket PDFs synchronously in Celery tasks
Image Processing - High Impact
File: tickets/image_optimization.py
Creates 4 responsive sizes per image (400, 800, 1200, 1600px)
Multiple PIL Image objects in memory simultaneously
RGBA→RGB conversion creates duplicates
Triggered on every Show save (models.py:403)
Report Generation - High Impact
File: tickets/utils/reports.py:19-121
Loads all orders for show into memory
N+1 query pattern for tickets and ticket orders
Builds large dictionaries before serialization
CSV Exports - Medium Impact
Files: tickets/utils/csv_export.py, tickets/utils/preshow.py
Prefetch loads all relations before streaming
Uses StringIO for entire CSV in memory

MEDIUM Issues:

Admin List Views - Medium Impact
File: tickets/admin.py:422-456
N queries for revenue calculations (one per row)
No select_related/prefetch_related optimization
Database Query Patterns - Medium Impact
Check-in view loads all attendees (admin.py:816)
Order queries without pagination
Missing database indexes on frequently queried fields

2.3 Current Dependencies¶

Core Memory-Intensive Packages:

weasyprint==63.1           # PDF generation (HIGH memory)
pillow==11.0.0             # Image processing (MEDIUM memory)
gunicorn==23.0.0           # Application server
celery==5.4.0              # Task queue
redis==5.2.1               # Cache/broker
boto3==1.36.20             # AWS S3 (if enabled)
djangorestframework==3.15.2

System Dependencies (from Dockerfile): - libcairo2, libpango-1.0-0 (WeasyPrint dependencies) - libjpeg62-turbo, libwebp-dev (Image processing)

3. Solution Design¶

3.1 Optimization Strategy (Phased Approach)¶

Phase 1: Quick Wins - Configuration Optimization¶

Low risk, immediate impact changes to deployment configuration.

Phase 2: Code Optimization¶

Medium risk, targeted code changes to reduce memory consumption.

📄 Detailed Code Changes: See MEMORY_CODE_OPTIMIZATIONS.md

Phase 3: Architectural Improvements¶

Higher complexity changes requiring more extensive testing.

3.2 Detailed Recommendations¶

PHASE 1: Configuration Optimization¶

1.1 Gunicorn Worker Configuration¶

Current Configuration (apps/api/Dockerfile:78):

gunicorn brktickets.wsgi:application \
  --bind 0.0.0.0:8000 \
  --workers 4 \
  --worker-class sync \
  --worker-connections 1000 \
  --max-requests 1000 \
  --max-requests-jitter 100 \
  --timeout 30 \
  --keep-alive 5 \
  --preload

Optimized Configuration:

gunicorn brktickets.wsgi:application \
  --bind 0.0.0.0:8000 \
  --workers 2 \
  --threads 4 \
  --worker-class gthread \
  --worker-connections 1000 \
  --max-requests 500 \
  --max-requests-jitter 50 \
  --timeout 60 \
  --keep-alive 5 \
  --worker-tmp-dir /dev/shm \
  --preload

Changes & Rationale: - ✅ Reduce workers from 4→2, add 4 threads = 8 concurrent requests (same throughput) - ✅ Change to gthread worker class (threads share memory, workers don't) - ✅ Reduce max-requests to 500 (faster memory recycling) - ✅ Add --worker-tmp-dir /dev/shm (use RAM for tmp files, faster) - ✅ Increase timeout to 60s (accommodate PDF generation)

Expected Impact: ~40% memory reduction (2 workers vs 4)

1.2 Celery Worker Configuration¶

Current Configuration (apps/api/railway.worker.json:10):

celery -A brktickets worker -l INFO -E \
  --max-tasks-per-child=100 \
  --concurrency=4 \
  --prefetch-multiplier=4

Optimized Configuration:

celery -A brktickets worker -l INFO -E \
  --max-tasks-per-child=50 \
  --concurrency=2 \
  --prefetch-multiplier=1 \
  --max-memory-per-child=300000 \
  --time-limit=300 \
  --soft-time-limit=270

Changes & Rationale: - ✅ Reduce concurrency from 4→2 (fewer processes in memory) - ✅ Set prefetch-multiplier to 1 (prevent queue hoarding, better for long tasks) - ✅ Reduce max-tasks-per-child to 50 (more frequent process recycling) - ✅ Add --max-memory-per-child=300000 (300MB limit, auto-restart on breach) - ✅ Add time limits (prevent runaway tasks)

Expected Impact: ~50% memory reduction per worker

1.3 Database Connection Pooling¶

Current Configuration (apps/api/brktickets/settings.py:315-329):

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        ...
        "CONN_MAX_AGE": 600,
        "CONN_HEALTH_CHECKS": True,
    }
}

Optimized Configuration:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        ...
        "CONN_MAX_AGE": 60,  # Reduced from 600
        "CONN_HEALTH_CHECKS": True,
        "OPTIONS": {
            "connect_timeout": 10,
            "options": "-c statement_timeout=30000",  # 30s query timeout
        },
    }
}

Changes & Rationale: - ✅ Reduce CONN_MAX_AGE to 60s (close idle connections faster) - ✅ Add statement_timeout (prevent long-running queries)

Expected Impact: Reduced memory from idle connections

1.4 Cache Configuration¶

Current Configuration (apps/api/brktickets/settings.py:783-794):

CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": os.getenv("REDIS_URL", "redis://localhost:6379/0"),
        "KEY_PREFIX": "piquetickets",
        "TIMEOUT": 300,
    }
}

Optimized Configuration:

CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": os.getenv("REDIS_URL", "redis://localhost:6379/0"),
        "OPTIONS": {
            "socket_timeout": 5,
            "retry_on_timeout": True,
            "socket_connect_timeout": 5,
            "max_connections": 50,  # Limit connection pool
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        },
        "KEY_PREFIX": "piquetickets",
        "TIMEOUT": 300,
    }
}

Changes & Rationale: - ✅ Limit Redis connection pool to 50 - ✅ Add connection timeouts

Expected Impact: Reduced memory from Redis connection pool

PHASE 2: Infrastructure & Architecture¶

2.1 Add Memory Limits to Railway Deployments¶

Update Railway Configuration Files:

File: apps/api/railway.json

{
  "$schema": "https://railway.com/railway.schema.json",
  "build": {
    "builder": "DOCKERFILE",
    "dockerfilePath": "apps/api/Dockerfile"
  },
  "deploy": {
    "runtime": "V2",
    "numReplicas": 1,
    "startCommand": "sh -c 'python manage.py migrate && python manage.py collectstatic --noinput && gunicorn brktickets.wsgi --config gunicorn_config.py'",
    "sleepApplication": false,
    "healthcheckPath": "/health",
    "healthcheckTimeout": 180,
    "restartPolicyType": "ON_FAILURE",
    "restartPolicyMaxRetries": 10,
    "memoryLimit": 512,
    "cpuLimit": 1.0
  }
}

File: apps/api/railway.worker.json

{
  "deploy": {
    "memoryLimit": 384,
    "cpuLimit": 0.5
  }
}

2.2 Create Gunicorn Configuration File¶

Create: apps/api/gunicorn_config.py

import multiprocessing
import os

# Server socket
bind = "0.0.0.0:8000"
backlog = 2048

# Worker processes
workers = 2
worker_class = 'gthread'
threads = 4
worker_connections = 1000
max_requests = 500
max_requests_jitter = 50
timeout = 60
keepalive = 5
graceful_timeout = 30

# Memory management
worker_tmp_dir = '/dev/shm'
preload_app = True

# Logging
accesslog = '-'
errorlog = '-'
loglevel = 'info'
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'

# Process naming
proc_name = 'piquetickets-api'

# Server hooks
def on_starting(server):
    """Called just before the master process is initialized."""
    pass

def when_ready(server):
    """Called just after the server is started."""
    server.log.info("Server is ready. Spawning workers")

def post_worker_init(worker):
    """Called just after a worker has been initialized."""
    worker.log.info(f"Worker spawned (pid: {worker.pid})")

def pre_fork(server, worker):
    """Called just prior to forking the worker subprocess."""
    pass

def post_fork(server, worker):
    """Called just after a worker has been forked."""
    pass

def worker_exit(server, worker):
    """Called just after a worker has been exited."""
    worker.log.info(f"Worker exited (pid: {worker.pid})")

2.3 Add Monitoring & Alerts¶

Create: apps/api/tickets/management/commands/check_memory.py

import psutil
from django.core.management.base import BaseCommand

class Command(BaseCommand):
    help = 'Check memory usage and log warnings'

    def handle(self, *args, **options):
        process = psutil.Process()
        memory_info = process.memory_info()
        memory_mb = memory_info.rss / 1024 / 1024

        self.stdout.write(f"Memory usage: {memory_mb:.2f} MB")

        if memory_mb > 400:
            self.stdout.write(self.style.WARNING(
                f"High memory usage detected: {memory_mb:.2f} MB"
            ))

3. Implementation Plan¶

3.1 Phase 1: Configuration¶

Task	Files	Priority
Create gunicorn config file	`apps/api/gunicorn_config.py`	HIGH
Update Dockerfile CMD	`apps/api/Dockerfile:78`	HIGH
Update Railway configs	`railway.json`, `railway.worker.json`	HIGH
Update database settings	`brktickets/settings.py:315-329`	MEDIUM
Update cache settings	`brktickets/settings.py:783-794`	MEDIUM
Test in staging	-	HIGH

3.2 Phase 2: Code Optimization¶

📄 See MEMORY_CODE_OPTIMIZATIONS.md for detailed implementation guide

Task	Files	Priority
Add GC to PDF generation	`tickets/utils/pdf.py`	HIGH
Optimize image processing	`tickets/image_optimization.py`	HIGH
Add Pillow memory limits	`brktickets/settings.py`	HIGH
Optimize admin queries	`tickets/admin.py`	HIGH
Optimize report generation	`tickets/utils/reports.py`	HIGH
Implement CSV streaming	`tickets/utils/preshow.py`	MEDIUM
Update CSV export view	`tickets/views.py`	MEDIUM
Add database indexes	`tickets/models.py`	MEDIUM
Testing & validation	-	HIGH

3.3 Phase 3: Monitoring & Validation¶

Task	Files	Priority
Add memory monitoring command	`management/commands/check_memory.py`	MEDIUM
Set up Railway metrics	-	HIGH
Load testing	-	HIGH
Documentation	README updates	MEDIUM

4. Risk Assessment & Mitigation¶

Risk	Probability	Impact	Mitigation
Reduced workers cause performance degradation	Medium	High	Monitor response times closely; threads compensate for worker reduction
Celery tasks timeout with new limits	Low	Medium	Increase time-limit if needed; split large tasks
PDF generation fails with GC	Low	Low	GC is called after PDF creation, not during
Database query changes break functionality	Low	High	Comprehensive testing; verify prefetch results match old queries
Memory limits cause OOM crashes	Medium	High	Start with conservative limits; gradually reduce based on monitoring
Streaming CSV breaks existing integrations	Low	Medium	Keep old endpoint; add new streaming endpoint

5. Success Measurement¶

5.1 Key Metrics¶

Before Optimization (Baseline): - API service: ~500MB average, ~800MB peak - Celery worker: ~400MB average, ~600MB peak - OOM events: 2-3 per week

After Optimization (Targets): - API service: ~300MB average (~40% reduction) - Celery worker: ~200MB average (~50% reduction) - OOM events: 0 per month

5.2 Monitoring Plan¶

Week 1 Post-Deployment: - Daily memory usage checks - Response time monitoring - Error rate tracking

Week 2-4: - Weekly reviews - Gradual limit reduction if stable

Month 2-3: - Monthly performance reviews - Cost analysis (Railway billing)

6. Rollback Strategy¶

6.1 Configuration Rollback¶

Keep old configuration files as .backup
Railway allows instant rollback to previous deployment

6.2 Code Rollback¶

Create feature branch pique-530-memory-deployment
Deploy incrementally, can revert specific commits
Use feature flags for code changes if needed

7. File Change Summary¶

Documentation Files:¶

✅ MEMORY_OPTIMIZATION_PLAN.md - This file (overall strategy)
✅ MEMORY_CODE_OPTIMIZATIONS.md - Detailed code changes (Phase 2)

New Files (Phase 1 & 3):¶

apps/api/gunicorn_config.py - Gunicorn configuration
apps/api/tickets/management/commands/check_memory.py - Memory monitoring

Modified Files (Phase 1 - Configuration):¶

apps/api/Dockerfile - Update CMD to use gunicorn config
apps/api/railway.json - Add memory limits, update start command
apps/api/railway.worker.json - Add memory limits, update Celery command
apps/api/brktickets/settings.py - Database, cache, Pillow settings

8. Testing Strategy¶

8.1 Load Testing¶

Simulate 100 concurrent users
Test PDF generation under load
Test CSV export with 5000+ attendees
Monitor memory usage during tests

8.2 Functional Testing¶

Verify all PDF generation endpoints work
Verify CSV exports produce correct data
Verify admin pages load correctly
Verify image uploads process correctly

8.3 Performance Testing¶

Measure baseline vs optimized response times
Measure memory usage over 24 hours
Test memory limits (intentionally breach to verify restart)

9. Quick Implementation Checklist¶

Phase 1 - Configuration:¶

Create gunicorn_config.py with optimized settings
Update Dockerfile CMD to use config file
Update railway.json with memory limits
Update railway.worker.json with optimized Celery settings
Update database CONN_MAX_AGE in settings
Update cache configuration
Deploy to staging environment
Monitor for 24 hours, verify stability

Phase 2 - Infrastructure:¶

Add memory limits to Railway deployments
Create Gunicorn configuration file
Add monitoring & alerts

Phase 3 - Monitoring:¶

Create memory monitoring management command
Set up Railway metrics dashboard

Appendix A: Reference Links¶

Document Status: Ready for Review and Approval Next Steps: Present to team, get approval, begin Phase 1 implementation

Memory Optimization - Code Changes¶

Created: 2025-10-18 Status: Ready for Implementation Priority: HIGH Related Plan: MEMORY_OPTIMIZATION_PLAN.md

Overview¶

This document contains all code-level memory optimizations for the PiqueTickets API. These are Phase 2 optimizations that require code changes (as opposed to configuration changes).

Expected Overall Impact: - 30-50% reduction in memory consumption for key operations - 20-35% faster processing for PDFs and reports - Elimination of N+1 query patterns

1. PDF Generation Memory Optimization¶

Priority: HIGH File: apps/api/tickets/utils/pdf.py Impact: 20-30% reduction in PDF generation memory spikes

Problem¶

WeasyPrint loads entire HTML + renders PDF into memory simultaneously. No cleanup occurs after generation.

Current Implementation¶

def generate_ticket_pdf(order):
    html_string = render_to_string('tickets/ticket_pdf.html', context)
    pdf = HTML(string=html_string).write_pdf()  # ALL IN MEMORY
    return pdf

Optimized Implementation¶

import gc

def generate_ticket_pdf(order):
    html_string = render_to_string('tickets/ticket_pdf.html', context)
    pdf = HTML(string=html_string).write_pdf()

    # Force garbage collection after PDF generation
    del html_string
    gc.collect()

    return pdf

Changes Required¶

Add import gc at top of file
Add explicit cleanup (del html_string) after PDF generation
Call gc.collect() to force garbage collection
Verify all PDF generation goes through Celery async tasks (already implemented in tickets/task.py:181)

Testing¶

Generate PDF for small order (1-5 tickets)
Generate PDF for large order (50+ tickets)
Monitor memory usage during generation
Verify PDF content is identical to pre-optimization

2. Image Processing Optimization¶

Priority: HIGH File: apps/api/tickets/image_optimization.py Impact: 30-40% reduction in image processing memory

Problem¶

Creates multiple PIL Image objects during resize/conversion. RGBA→RGB conversion creates duplicates. All 4 responsive sizes (400, 800, 1200, 1600px) exist in memory simultaneously.

Current Implementation¶

def generate_responsive_images(image_path):
    img = Image.open(image_path)
    if img.mode == 'RGBA':
        img = img.convert('RGB')  # DUPLICATE CREATED

    for width in [400, 800, 1200, 1600]:
        resized = img.resize((width, height))  # ANOTHER COPY
        # Process...

Optimized Implementation¶

import gc
from PIL import Image

def generate_responsive_images(image_path):
    img = Image.open(image_path)

    # Convert once at the beginning, avoiding duplicate memory
    if img.mode == 'RGBA':
        background = Image.new('RGB', img.size, (255, 255, 255))
        background.paste(img, mask=img.split()[3])
        img.close()  # Close original
        img = background

    for width in [400, 800, 1200, 1600]:
        # Use copy() + thumbnail() for memory efficiency
        img_copy = img.copy()
        img_copy.thumbnail((width, height), Image.LANCZOS)

        # Save operations...
        # ... existing save code ...

        # Immediately close and delete
        img_copy.close()
        del img_copy

    img.close()
    gc.collect()

Additional Change: Set Pillow Memory Limits¶

File: apps/api/brktickets/settings.py

Add at the bottom of the file:

# Pillow Memory Limits
from PIL import Image
Image.MAX_IMAGE_PIXELS = 178956970  # ~150MB limit for safety

Changes Required¶

Replace img.convert('RGB') with proper RGBA handling using background paste
Add explicit img.close() calls after each image operation
Use thumbnail() instead of resize() where possible (modifies in-place)
Add gc.collect() after processing all sizes
Add Pillow memory limit to settings.py

Testing¶

Upload RGBA PNG image
Upload RGB JPEG image
Upload very large image (4000x3000px)
Verify all 4 responsive sizes are generated correctly
Monitor memory usage during processing
Verify image quality is maintained

3. Database Query Optimization¶

Priority: HIGH Impact: 40-50% reduction in query memory consumption

3.1 Admin List View Queries¶

File: apps/api/tickets/admin.py:575-582

Problem¶

No query optimization in admin list views, causing N+1 queries for related fields.

Current Implementation¶

def get_queryset(self, request):
    return super().get_queryset(request)

Optimized Implementation¶

def get_queryset(self, request):
    qs = super().get_queryset(request)
    return qs.select_related(
        'producer',
        'venue'
    ).prefetch_related(
        'tickets',
        'ticket_page_design'
    )

3.2 Admin Revenue Calculations¶

File: apps/api/tickets/admin.py:422-456

Problem¶

N queries for revenue calculations (one per row in admin list view).

Current Implementation¶

def show_gross_revenue(self, obj):
    attendees = TicketAttendee.objects.filter(show=obj)  # N queries
    return sum([attendee.total_paid for attendee in attendees])

Optimized Implementation¶

from django.db.models import Sum

def show_gross_revenue(self, obj):
    result = TicketAttendee.objects.filter(
        show=obj
    ).aggregate(total=Sum('total_paid'))
    return result['total'] or 0

3.3 Report Generation¶

File: apps/api/tickets/utils/reports.py:19-121

Problem¶

Loads all orders into memory, N+1 query pattern for tickets and ticket orders.

Current Implementation¶

def generate_show_report_data(show):
    orders = Order.objects.filter(show=show, success=True)  # ALL ORDERS
    tickets = show.tickets.all()  # ALL TICKETS

    for ticket in tickets:
        ticket_orders = TicketOrder.objects.filter(ticket=ticket)  # N+1

Optimized Implementation¶

def generate_show_report_data(show):
    # Optimize order query with prefetch
    orders = Order.objects.filter(
        show=show,
        success=True
    ).select_related('created_by').prefetch_related(
        'ticketorder_set__ticket'
    )

    # Optimize ticket query with prefetch
    tickets = show.tickets.prefetch_related(
        'ticketorder_set__order'
    )

    # Use prefetched data instead of additional queries
    # Modify existing loops to use .all() instead of .filter()
    for ticket in tickets:
        # Access via prefetched relationship
        ticket_orders = ticket.ticketorder_set.all()  # No additional query

Changes Required¶

Add select_related() and prefetch_related() to all admin querysets
Replace aggregation loops with .aggregate() database operations
Replace N+1 query patterns with prefetch relationships
Update report generation to use prefetched data

Testing¶

Load admin Show list view with 50+ shows
Check Django Debug Toolbar for query count (should reduce from 100+ to ~5)
Generate report for show with 1000+ orders
Verify revenue calculations are accurate
Verify report data matches pre-optimization output

4. CSV Export Streaming¶

Priority: MEDIUM File: apps/api/tickets/utils/preshow.py:10-57 Impact: 60-70% reduction for large exports (1000+ attendees)

Problem¶

Loads all data into StringIO buffer before streaming begins. For large events, this can consume 50-100MB of memory.

Current Implementation¶

def generate_preshow_csv(show):
    output = io.StringIO()  # ALL IN MEMORY
    writer = csv.writer(output)

    attendees = TicketAttendee.objects.filter(
        show=show
    ).prefetch_related('orders__ticket')  # LOADS ALL

    for attendee in attendees:
        writer.writerow([...])

    return output.getvalue()  # RETURNS FULL STRING

Optimized Implementation¶

Step 1: Create Streaming Generator¶

def generate_preshow_csv_streaming(show):
    """Generator function for streaming CSV"""
    # Yield header row
    yield 'Name,Email,Ticket Type,Check In Code,Purchased By\n'

    # Use iterator() to prevent loading all at once
    attendees = TicketAttendee.objects.filter(
        show=show
    ).select_related(
        'orders__ticket'
    ).iterator(chunk_size=100)

    for attendee in attendees:
        # Build CSV row manually (faster than csv.writer for streaming)
        row = f'"{attendee.name}","{attendee.email}","{attendee.ticket_type}","{attendee.check_in_code}","{attendee.purchased_by}"\n'
        yield row

Step 2: Update View to Use StreamingHttpResponse¶

File: apps/api/tickets/views.py (or wherever the export view is)

from django.http import StreamingHttpResponse
from django.shortcuts import get_object_or_404

def export_preshow_csv(request, show_id):
    show = get_object_or_404(Show, id=show_id)

    response = StreamingHttpResponse(
        generate_preshow_csv_streaming(show),
        content_type='text/csv'
    )
    response['Content-Disposition'] = f'attachment; filename="preshow_{show_id}.csv"'
    return response

Changes Required¶

Create new generator function generate_preshow_csv_streaming()
Use .iterator(chunk_size=100) instead of loading all querysets
Build CSV rows manually instead of using csv.writer (for true streaming)
Update view to use StreamingHttpResponse
Keep old function as fallback if needed

Testing¶

Export CSV for small event (10 attendees)
Export CSV for large event (1000+ attendees)
Verify CSV format is identical to pre-optimization
Monitor memory usage during export (should stay flat)
Test with different browsers (Chrome, Safari, Firefox)
Verify special characters are properly escaped

5. Implementation Checklist¶

Phase 2A - High Priority Code Changes:¶

Add gc.collect() to PDF generation functions (tickets/utils/pdf.py)
Optimize image processing with .close() and .thumbnail() (tickets/image_optimization.py)
Add Pillow MAX_IMAGE_PIXELS limit (brktickets/settings.py)
Add select_related/prefetch_related to admin queryset (tickets/admin.py:575-582)
Convert admin aggregations to use .aggregate() (tickets/admin.py:422-456)
Optimize report generation queries (tickets/utils/reports.py:19-121)

Phase 2B - Medium Priority Code Changes:¶

Implement streaming CSV generator (tickets/utils/preshow.py)
Update export view to use StreamingHttpResponse
Add database indexes if needed (check query performance)

Testing & Validation:¶

Run comprehensive test suite
Run Django Debug Toolbar to verify query counts
Load test with production-like data
Monitor memory usage for 24 hours in staging
Verify all outputs match pre-optimization

6. File Change Summary¶

Files to Modify:¶

apps/api/tickets/utils/pdf.py - Add garbage collection
apps/api/tickets/image_optimization.py - Optimize image processing
apps/api/brktickets/settings.py - Add Pillow memory limits
apps/api/tickets/admin.py - Query optimization (multiple methods)
apps/api/tickets/utils/reports.py - Query optimization
apps/api/tickets/utils/preshow.py - Streaming CSV
apps/api/tickets/views.py - Update CSV export view

New Imports Required:¶

import gc  # For explicit garbage collection
from django.db.models import Sum  # For aggregations
from django.http import StreamingHttpResponse  # For CSV streaming
from PIL import Image  # For memory limits (in settings.py)

7. Rollback Strategy¶

If Issues Occur:¶

Each optimization is isolated to specific functions
Can revert individual changes without affecting others
Keep old implementations commented out during initial deployment
Feature branch: pique-530-memory-deployment

Quick Rollback Commands:¶

# Revert specific file
git checkout HEAD -- apps/api/tickets/utils/pdf.py

# Revert entire Phase 2
git revert <commit-hash>

8. Performance Targets¶

Before Optimization (Current):¶

PDF generation: 3-5 seconds, 150-200MB memory spike
Image processing: 2-4 seconds, 100-150MB memory spike
Admin list view: 50-100 queries for 20 shows
Report generation: 8-12 seconds, 200-300MB memory
CSV export (1000 attendees): 5-8 seconds, 80-120MB memory

After Optimization (Expected):¶

PDF generation: 2-4 seconds, 100-140MB memory spike (30% reduction)
Image processing: 1.5-3 seconds, 60-90MB memory spike (40% reduction)
Admin list view: 5-10 queries for 20 shows (90% reduction)
Report generation: 5-8 seconds, 120-180MB memory (40% reduction)
CSV export (1000 attendees): 5-8 seconds, 20-30MB memory (75% reduction)

9. Dependencies¶

Required Packages (Already Installed):¶

pillow==11.0.0
weasyprint==63.1
django==5.1.4
djangorestframework==3.15.2

Optional for Testing:¶

django-debug-toolbar (already in dev requirements)
memory_profiler (for detailed profiling)

Main Plan: MEMORY_OPTIMIZATION_PLAN.md
Configuration Changes: See Phase 1 in main plan
Infrastructure Changes: See Phase 3 in main plan

Document Status: Ready for Implementation Next Steps: Begin Phase 2A implementation, test each change individually

Memory Reduction Strategies for Next.js Frontend¶

Date: 2025-10-25 Context: Addressing actual memory footprint reduction vs cache management Related: NEXTJS_CACHE_MANAGEMENT_PLAN.md

Understanding the Problem¶

The cache management plan prevents unbounded growth but may not reduce existing memory footprint. This document outlines strategies that will actually free memory.

Strategy Comparison¶

Strategy	Memory Reduction	Complexity	Impact on Performance
Cache invalidation (`revalidateTag`)	⚠️ Minimal	Low	Minimal
Hard cache limits (`isrMemoryCacheSize`)	✅ Moderate	Low	Low (may increase cache misses)
Reduce cached routes	✅ High	Medium	Medium (more server-side rendering)
Container restarts	✅ Complete	Low	High (temporary downtime)
Disable caching	✅ Complete	Low	Very High (all SSR)
Edge caching (CDN)	✅ High	High	Low (may improve performance)
Optimize bundle size	✅ Moderate	Medium	Positive (faster loads)
Use streaming SSR	✅ Low	Medium	Positive (better UX)

Recommended Approaches for PiqueTickets¶

1. Set Aggressive Cache Size Limits ⭐ (Recommended)¶

Impact: High reduction, low complexity

Update next.config.ts:

const nextConfig: NextConfig = {
  experimental: {
    // Existing optimizations
    optimizeCss: true,
    optimizePackageImports: ['react-icons'],
    webpackMemoryOptimizations: true,
    preloadEntriesOnStart: false,

    // Set strict memory limit for ISR cache
    isrMemoryCacheSize: 25 * 1024 * 1024, // 25MB (reduced from 50MB)

    // Limit the number of pages kept in memory
    isrFlushToDisk: true, // Flush to disk instead of keeping in memory
  },

  // Reduce memory for static generation
  generateBuildId: async () => {
    return 'piquetickets-' + Date.now()
  },
}

Expected Memory Reduction: 30-50MB

2. Selective Route Caching ⭐ (Recommended)¶

Impact: Moderate reduction, moderate complexity

Current State: You're caching all pages with 30-second revalidation

Optimization: Only cache high-traffic routes, make others fully dynamic

// apps/frontend/src/app/shows/[slug]/page.tsx

// BEFORE: Caches every show page
export const revalidate = 30

// AFTER: Only cache popular shows, others are dynamic
export const revalidate = false // Make fully dynamic by default
export const dynamicParams = true

// Then selectively cache via fetch instead
export async function generateStaticParams() {
  // Only pre-generate top 50 shows
  const shows = await fetchShows()
  return shows.slice(0, 50).map(show => ({
    slug: show.slug
  }))
}

For Homepage:

// apps/frontend/src/app/page.tsx

// Consider removing revalidate entirely - make it SSR
// Only use cache at the fetch level, not page level
// export const revalidate = 30 // REMOVE THIS

export default async function Home() {
  // Fetch with shorter cache
  const [shows, producers, featuredShows, activeCities] = await Promise.all([
    fetchShows(), // This still has 30s cache at fetch level
    fetchProducers(),
    fetchFeaturedShows(),
    fetchActiveCities()
  ]);
  // ...
}

Expected Memory Reduction: 50-100MB (depends on number of cached routes)

3. Move to Fetch-Level Caching Only¶

Impact: High reduction, moderate complexity

Current: Page-level ISR + fetch-level caching = double caching Better: Only fetch-level caching with shorter TTL

// Remove page-level revalidate from ALL pages
// Keep caching only in fetch requests

// apps/frontend/src/lib/api.ts
export async function fetchShow(slug: string) {
  const response = await fetch(
    `${process.env.NEXT_PUBLIC_API_URL}/shows/${slug}`,
    {
      next: {
        revalidate: 60, // 1 minute cache
        tags: ['shows', `show-${slug}`]
      }
    }
  )
  return response.json()
}

// For frequently changing data
export async function fetchShows() {
  const response = await fetch(
    `${process.env.NEXT_PUBLIC_API_URL}/shows`,
    {
      next: {
        revalidate: 30, // 30 second cache
        tags: ['shows', 'shows-list']
      },
      // Or use no cache for lists
      // cache: 'no-store'
    }
  )
  return response.json()
}

Expected Memory Reduction: 100-200MB

4. Implement CDN/Edge Caching ⭐⭐ (Best Long-term)¶

Impact: Very high reduction, high complexity

Strategy: Move caching from origin to edge (Cloudflare, Vercel Edge, etc.)

Option A: Railway + Cloudflare CDN

// next.config.ts
const nextConfig: NextConfig = {
  // ... existing config

  async headers() {
    return [
      {
        source: '/shows/:path*',
        headers: [
          {
            key: 'Cache-Control',
            // Cache at CDN for 5 minutes, stale for 1 hour
            value: 'public, s-maxage=300, stale-while-revalidate=3600',
          },
          {
            key: 'CDN-Cache-Control',
            // Cloudflare-specific: cache for 1 hour
            value: 'public, max-age=3600',
          },
        ],
      },
      {
        source: '/api/:path*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'public, s-maxage=60, stale-while-revalidate=300',
          },
        ],
      },
    ]
  },
}

Then configure Cloudflare to cache HTML:

Cache Rule (Cloudflare Dashboard):
- Match: piquetickets.com/shows/*
- Cache Level: Standard
- Edge TTL: 1 hour
- Browser TTL: 5 minutes

Expected Memory Reduction: 200-400MB (most caching moves to edge)

Option B: Deploy to Vercel Edge Runtime

Move to Vercel for built-in edge caching:

// apps/frontend/src/app/shows/[slug]/page.tsx
export const runtime = 'edge' // Run on Vercel Edge

export const revalidate = 60 // Edge cache for 1 minute

export default async function ShowPage({ params }: Params) {
  // Runs on edge, minimal origin memory usage
  const show = await fetchShow((await params).slug)
  // ...
}

Expected Memory Reduction: 300-500MB (minimal origin caching)

5. Scheduled Container Restarts ⚠️ (Short-term Fix)¶

Impact: Complete memory reset, low complexity, high disruption

Configure Railway to restart containers daily:

# railway.json
{
  "deploy": {
    "restartPolicyType": "ON_FAILURE",
    "healthcheckPath": "/api/health",
    "healthcheckTimeout": 30,

    # Add cron to trigger health check failure
    # (forces container restart)
  }
}

Or use Railway's built-in scheduling: - Dashboard > Service > Settings > "Restart Service" - Schedule daily restart at 4 AM (low traffic)

Expected Memory Reduction: 100% (full reset) Downside: 10-30 seconds downtime during restart

6. Optimize Component Memory Usage¶

Impact: Moderate reduction, medium complexity

Current Issue: Dynamic imports are good, but can optimize further

// apps/frontend/src/app/shows/[slug]/page.tsx

// CURRENT: Good use of dynamic imports
const TicketCheckoutForm = dynamicImport(
  () => import('@/components/Forms/tickets'),
  { loading: () => <TicketFormSkeleton /> }
);

// OPTIMIZATION: Don't load until needed
const TicketCheckoutForm = dynamicImport(
  () => import('@/components/Forms/tickets'),
  {
    loading: () => <TicketFormSkeleton />,
    ssr: false, // Don't render server-side (saves server memory)
  }
);

// BETTER: Load only when user scrolls to tickets section
'use client'
import { lazy, Suspense } from 'react'
import { useInView } from 'react-intersection-observer'

const TicketCheckoutForm = lazy(() => import('@/components/Forms/tickets'))

function TicketSection({ show }) {
  const { ref, inView } = useInView({ triggerOnce: true })

  return (
    <div ref={ref}>
      {inView && (
        <Suspense fallback={<TicketFormSkeleton />}>
          <TicketCheckoutForm show_id={show.id} tickets={show.tickets} />
        </Suspense>
      )}
    </div>
  )
}

Expected Memory Reduction: 20-50MB

7. Reduce API Response Sizes¶

Impact: Low-moderate reduction, easy wins

Optimization: Fetch only needed fields from API

// apps/frontend/src/lib/api.ts

// BEFORE: Fetching full show objects
export async function fetchShows() {
  const response = await fetch(`${API_URL}/shows`, {
    next: { revalidate: 30, tags: ['shows'] }
  })
  return response.json()
}

// AFTER: Fetch minimal fields for list views
export async function fetchShows(fields?: string[]) {
  const params = new URLSearchParams()
  if (fields) {
    params.set('fields', fields.join(','))
  }

  const response = await fetch(`${API_URL}/shows?${params}`, {
    next: { revalidate: 30, tags: ['shows'] }
  })
  return response.json()
}

// Usage in homepage
const shows = await fetchShows([
  'id', 'slug', 'title', 'banner_image',
  'start_time', 'venue_name'
]) // Only fetch fields needed for cards

Expected Memory Reduction: 10-30MB (smaller cached responses)

8. Disable ISR Entirely (Nuclear Option)¶

Impact: Maximum reduction, all SSR, slower pages

// Remove all revalidate exports from pages
// Remove cache from fetch requests

// apps/frontend/src/lib/api.ts
export async function fetchShow(slug: string) {
  const response = await fetch(
    `${process.env.NEXT_PUBLIC_API_URL}/shows/${slug}`,
    {
      cache: 'no-store', // No caching
      // Remove: next: { revalidate: 30, tags: ['shows'] }
    }
  )
  return response.json()
}

Expected Memory Reduction: 400-600MB Expected Performance Impact: Pages 2-5x slower

Recommended Implementation Plan¶

Phase 1: Quick Wins (Week 1)¶

✅ Set isrMemoryCacheSize: 25MB
✅ Remove page-level revalidate from low-traffic pages
✅ Optimize API response sizes
✅ Add lazy loading for heavy components

Expected Reduction: 100-150MB

Phase 2: Caching Strategy (Week 2-3)¶

✅ Implement fetch-level only caching
✅ Selective static generation (top 50 shows only)
✅ Implement cache purge API (from previous plan)

Expected Reduction: 200-300MB total

Phase 3: Edge/CDN (Week 4-6)¶

✅ Set up Cloudflare CDN caching
✅ Configure cache rules for static assets
✅ Move HTML caching to edge
✅ Reduce origin cache TTL

Expected Reduction: 400-500MB total

Phase 4: Monitor & Optimize (Ongoing)¶

✅ Track memory usage in Railway
✅ A/B test cache strategies
✅ Consider Vercel Edge if needed

Testing Memory Reduction¶

Before Implementation¶

# SSH into Railway container
railway run bash

# Check current memory
node -e "console.log(process.memoryUsage())"

# Monitor over time
watch -n 60 "node -e 'console.log(process.memoryUsage())'"

After Each Phase¶

# Compare memory before/after
# Document in spreadsheet:
# - Phase | Baseline | After | Reduction | P95 Response Time

Load Testing¶

# Use Artillery or k6
npm install -g artillery

# Create test script
cat > load-test.yml <<EOF
config:
  target: "https://piquetickets.railway.app"
  phases:
    - duration: 300
      arrivalRate: 50
scenarios:
  - flow:
      - get:
          url: "/"
      - get:
          url: "/shows"
      - get:
          url: "/shows/{{ \$randomString() }}"
EOF

artillery run load-test.yml

Memory Monitoring Dashboard¶

Set up Railway metrics tracking:

// apps/frontend/src/app/api/metrics/route.ts
export async function GET() {
  const usage = process.memoryUsage()

  return Response.json({
    timestamp: new Date().toISOString(),
    memory: {
      rss: `${Math.round(usage.rss / 1024 / 1024)}MB`,
      heapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
      heapUsed: `${Math.round(usage.heapUsed / 1024 / 1024)}MB`,
      external: `${Math.round(usage.external / 1024 / 1024)}MB`,
    },
    process: {
      uptime: `${Math.round(process.uptime() / 60)}min`,
      cpuUsage: process.cpuUsage(),
    }
  })
}

Then poll this endpoint and graph in Datadog/Grafana/Railway metrics.

Success Criteria¶

Memory usage < 200MB baseline (currently ~400-600MB?)
Memory growth < 10MB/day
P95 response time < 1 second
Cache hit rate > 70%
No OOM crashes for 30 days

Last Updated: 2025-10-25 Next Review: After Phase 1 implementation