Skip to content

Archived: 2025-11-01 Reason: Completed - Memory optimizations implemented Status: Consolidated from 3 separate memory optimization documents Note: This document combines MEMORY_OPTIMIZATION_PLAN.md, MEMORY_CODE_OPTIMIZATIONS.md, and MEMORY_REDUCTION_STRATEGIES.md

Memory Footprint Optimization Plan - PiqueTickets API Deployment

Created: 2025-10-17 Status: Planning Phase Priority: HIGH


1. Executive Summary & Feature Overview

1.1 Feature Description

  • Feature Name: API Deployment Memory Footprint Reduction
  • Feature Type: Performance Optimization / Infrastructure Enhancement
  • Priority Level: High

1.2 Problem Statement

Current State: - Django API deployed on Railway with 4 separate services (API, Celery Worker, Celery Beat, Flower) - Gunicorn configured with 4 workers using sync worker class - Celery worker with concurrency=4, prefetch-multiplier=4 - WeasyPrint PDF generation loads entire documents into memory - Pillow image processing creates multiple copies during transformations - No memory limits defined in deployment configuration - Database queries loading full datasets without pagination

Pain Points: - High memory consumption per container/service - Potential OOM (Out of Memory) kills during peak load - Inefficient resource utilization on Railway platform - Memory leaks from long-running processes - Duplicate image objects during format conversions - N+1 query patterns in admin interface - Full dataset loads for reports and CSV exports

User Impact: - Increased hosting costs due to higher memory allocation - Potential service disruptions during memory spikes - Slower response times when memory pressure is high - Failed PDF/report generation for large events

Business Value: - Reduced infrastructure costs - Improved service reliability and uptime - Better scalability for growing event sizes - Faster response times across all services

1.3 Expected Outcomes

Success Metrics: - Reduce API service memory usage from ~500MB to ~300MB baseline - Reduce Celery worker memory from ~400MB to ~200MB per worker - Eliminate memory-related crashes (target: 0 OOM events per month) - Reduce average response time for PDF generation by 20-30% - Reduce database query memory consumption by 40-50%

Technical Goals: - Implement streaming responses for large data exports - Add memory limits and resource constraints to all services - Optimize Gunicorn and Celery configurations - Implement lazy loading and pagination for database queries - Add garbage collection after memory-intensive operations - Convert synchronous PDF generation to async background tasks


2. Current State Analysis

2.1 Deployment Architecture

Services Running: 1. API Service (apps/api/Dockerfile) - Gunicorn with 4 sync workers - No memory limits defined - Current config: apps/api/railway.json:10

  1. Celery Worker (apps/api/Dockerfile)
  2. Concurrency: 4 workers
  3. Prefetch multiplier: 4
  4. Max tasks per child: 100
  5. Current config: apps/api/railway.worker.json:10

  6. Celery Beat (apps/api/Dockerfile)

  7. Single scheduler process
  8. Database-backed scheduler
  9. Current config: apps/api/railway.beats.json:10

  10. Flower (apps/api/Dockerfile)

  11. Monitoring dashboard
  12. Current config: apps/api/railway.flower.json:10

Docker Configuration: - Base image: python:3.12-slim (apps/api/Dockerfile:2) - System dependencies include: cairo, pango, gdk-pixbuf (for WeasyPrint) - Installs 40 Python packages (apps/api/requirements.txt)

2.2 Memory-Intensive Operations (Identified)

CRITICAL Issues:

  1. PDF Generation - Highest Impact
  2. Files: tickets/utils/pdf.py, tickets/utils/reports.py
  3. WeasyPrint loads entire HTML + renders to memory
  4. No streaming support
  5. Generates ticket PDFs synchronously in Celery tasks

  6. Image Processing - High Impact

  7. File: tickets/image_optimization.py
  8. Creates 4 responsive sizes per image (400, 800, 1200, 1600px)
  9. Multiple PIL Image objects in memory simultaneously
  10. RGBA→RGB conversion creates duplicates
  11. Triggered on every Show save (models.py:403)

  12. Report Generation - High Impact

  13. File: tickets/utils/reports.py:19-121
  14. Loads all orders for show into memory
  15. N+1 query pattern for tickets and ticket orders
  16. Builds large dictionaries before serialization

  17. CSV Exports - Medium Impact

  18. Files: tickets/utils/csv_export.py, tickets/utils/preshow.py
  19. Prefetch loads all relations before streaming
  20. Uses StringIO for entire CSV in memory

MEDIUM Issues:

  1. Admin List Views - Medium Impact
  2. File: tickets/admin.py:422-456
  3. N queries for revenue calculations (one per row)
  4. No select_related/prefetch_related optimization

  5. Database Query Patterns - Medium Impact

  6. Check-in view loads all attendees (admin.py:816)
  7. Order queries without pagination
  8. Missing database indexes on frequently queried fields

2.3 Current Dependencies

Core Memory-Intensive Packages:

weasyprint==63.1           # PDF generation (HIGH memory)
pillow==11.0.0             # Image processing (MEDIUM memory)
gunicorn==23.0.0           # Application server
celery==5.4.0              # Task queue
redis==5.2.1               # Cache/broker
boto3==1.36.20             # AWS S3 (if enabled)
djangorestframework==3.15.2

System Dependencies (from Dockerfile): - libcairo2, libpango-1.0-0 (WeasyPrint dependencies) - libjpeg62-turbo, libwebp-dev (Image processing)


3. Solution Design

3.1 Optimization Strategy (Phased Approach)

Phase 1: Quick Wins - Configuration Optimization

Low risk, immediate impact changes to deployment configuration.

Phase 2: Code Optimization

Medium risk, targeted code changes to reduce memory consumption.

📄 Detailed Code Changes: See MEMORY_CODE_OPTIMIZATIONS.md

Phase 3: Architectural Improvements

Higher complexity changes requiring more extensive testing.

3.2 Detailed Recommendations


PHASE 1: Configuration Optimization

1.1 Gunicorn Worker Configuration

Current Configuration (apps/api/Dockerfile:78):

gunicorn brktickets.wsgi:application \
  --bind 0.0.0.0:8000 \
  --workers 4 \
  --worker-class sync \
  --worker-connections 1000 \
  --max-requests 1000 \
  --max-requests-jitter 100 \
  --timeout 30 \
  --keep-alive 5 \
  --preload

Optimized Configuration:

gunicorn brktickets.wsgi:application \
  --bind 0.0.0.0:8000 \
  --workers 2 \
  --threads 4 \
  --worker-class gthread \
  --worker-connections 1000 \
  --max-requests 500 \
  --max-requests-jitter 50 \
  --timeout 60 \
  --keep-alive 5 \
  --worker-tmp-dir /dev/shm \
  --preload

Changes & Rationale: - ✅ Reduce workers from 4→2, add 4 threads = 8 concurrent requests (same throughput) - ✅ Change to gthread worker class (threads share memory, workers don't) - ✅ Reduce max-requests to 500 (faster memory recycling) - ✅ Add --worker-tmp-dir /dev/shm (use RAM for tmp files, faster) - ✅ Increase timeout to 60s (accommodate PDF generation)

Expected Impact: ~40% memory reduction (2 workers vs 4)


1.2 Celery Worker Configuration

Current Configuration (apps/api/railway.worker.json:10):

celery -A brktickets worker -l INFO -E \
  --max-tasks-per-child=100 \
  --concurrency=4 \
  --prefetch-multiplier=4

Optimized Configuration:

celery -A brktickets worker -l INFO -E \
  --max-tasks-per-child=50 \
  --concurrency=2 \
  --prefetch-multiplier=1 \
  --max-memory-per-child=300000 \
  --time-limit=300 \
  --soft-time-limit=270

Changes & Rationale: - ✅ Reduce concurrency from 4→2 (fewer processes in memory) - ✅ Set prefetch-multiplier to 1 (prevent queue hoarding, better for long tasks) - ✅ Reduce max-tasks-per-child to 50 (more frequent process recycling) - ✅ Add --max-memory-per-child=300000 (300MB limit, auto-restart on breach) - ✅ Add time limits (prevent runaway tasks)

Expected Impact: ~50% memory reduction per worker


1.3 Database Connection Pooling

Current Configuration (apps/api/brktickets/settings.py:315-329):

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        ...
        "CONN_MAX_AGE": 600,
        "CONN_HEALTH_CHECKS": True,
    }
}

Optimized Configuration:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        ...
        "CONN_MAX_AGE": 60,  # Reduced from 600
        "CONN_HEALTH_CHECKS": True,
        "OPTIONS": {
            "connect_timeout": 10,
            "options": "-c statement_timeout=30000",  # 30s query timeout
        },
    }
}

Changes & Rationale: - ✅ Reduce CONN_MAX_AGE to 60s (close idle connections faster) - ✅ Add statement_timeout (prevent long-running queries)

Expected Impact: Reduced memory from idle connections


1.4 Cache Configuration

Current Configuration (apps/api/brktickets/settings.py:783-794):

CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": os.getenv("REDIS_URL", "redis://localhost:6379/0"),
        "KEY_PREFIX": "piquetickets",
        "TIMEOUT": 300,
    }
}

Optimized Configuration:

CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": os.getenv("REDIS_URL", "redis://localhost:6379/0"),
        "OPTIONS": {
            "socket_timeout": 5,
            "retry_on_timeout": True,
            "socket_connect_timeout": 5,
            "max_connections": 50,  # Limit connection pool
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        },
        "KEY_PREFIX": "piquetickets",
        "TIMEOUT": 300,
    }
}

Changes & Rationale: - ✅ Limit Redis connection pool to 50 - ✅ Add connection timeouts

Expected Impact: Reduced memory from Redis connection pool


PHASE 2: Infrastructure & Architecture

2.1 Add Memory Limits to Railway Deployments

Update Railway Configuration Files:

File: apps/api/railway.json

{
  "$schema": "https://railway.com/railway.schema.json",
  "build": {
    "builder": "DOCKERFILE",
    "dockerfilePath": "apps/api/Dockerfile"
  },
  "deploy": {
    "runtime": "V2",
    "numReplicas": 1,
    "startCommand": "sh -c 'python manage.py migrate && python manage.py collectstatic --noinput && gunicorn brktickets.wsgi --config gunicorn_config.py'",
    "sleepApplication": false,
    "healthcheckPath": "/health",
    "healthcheckTimeout": 180,
    "restartPolicyType": "ON_FAILURE",
    "restartPolicyMaxRetries": 10,
    "memoryLimit": 512,
    "cpuLimit": 1.0
  }
}

File: apps/api/railway.worker.json

{
  "deploy": {
    "memoryLimit": 384,
    "cpuLimit": 0.5
  }
}

2.2 Create Gunicorn Configuration File

Create: apps/api/gunicorn_config.py

import multiprocessing
import os

# Server socket
bind = "0.0.0.0:8000"
backlog = 2048

# Worker processes
workers = 2
worker_class = 'gthread'
threads = 4
worker_connections = 1000
max_requests = 500
max_requests_jitter = 50
timeout = 60
keepalive = 5
graceful_timeout = 30

# Memory management
worker_tmp_dir = '/dev/shm'
preload_app = True

# Logging
accesslog = '-'
errorlog = '-'
loglevel = 'info'
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'

# Process naming
proc_name = 'piquetickets-api'

# Server hooks
def on_starting(server):
    """Called just before the master process is initialized."""
    pass

def when_ready(server):
    """Called just after the server is started."""
    server.log.info("Server is ready. Spawning workers")

def post_worker_init(worker):
    """Called just after a worker has been initialized."""
    worker.log.info(f"Worker spawned (pid: {worker.pid})")

def pre_fork(server, worker):
    """Called just prior to forking the worker subprocess."""
    pass

def post_fork(server, worker):
    """Called just after a worker has been forked."""
    pass

def worker_exit(server, worker):
    """Called just after a worker has been exited."""
    worker.log.info(f"Worker exited (pid: {worker.pid})")

2.3 Add Monitoring & Alerts

Create: apps/api/tickets/management/commands/check_memory.py

import psutil
from django.core.management.base import BaseCommand

class Command(BaseCommand):
    help = 'Check memory usage and log warnings'

    def handle(self, *args, **options):
        process = psutil.Process()
        memory_info = process.memory_info()
        memory_mb = memory_info.rss / 1024 / 1024

        self.stdout.write(f"Memory usage: {memory_mb:.2f} MB")

        if memory_mb > 400:
            self.stdout.write(self.style.WARNING(
                f"High memory usage detected: {memory_mb:.2f} MB"
            ))


3. Implementation Plan

3.1 Phase 1: Configuration

Task Files Priority
Create gunicorn config file apps/api/gunicorn_config.py HIGH
Update Dockerfile CMD apps/api/Dockerfile:78 HIGH
Update Railway configs railway.json, railway.worker.json HIGH
Update database settings brktickets/settings.py:315-329 MEDIUM
Update cache settings brktickets/settings.py:783-794 MEDIUM
Test in staging - HIGH

3.2 Phase 2: Code Optimization

📄 See MEMORY_CODE_OPTIMIZATIONS.md for detailed implementation guide

Task Files Priority
Add GC to PDF generation tickets/utils/pdf.py HIGH
Optimize image processing tickets/image_optimization.py HIGH
Add Pillow memory limits brktickets/settings.py HIGH
Optimize admin queries tickets/admin.py HIGH
Optimize report generation tickets/utils/reports.py HIGH
Implement CSV streaming tickets/utils/preshow.py MEDIUM
Update CSV export view tickets/views.py MEDIUM
Add database indexes tickets/models.py MEDIUM
Testing & validation - HIGH

3.3 Phase 3: Monitoring & Validation

Task Files Priority
Add memory monitoring command management/commands/check_memory.py MEDIUM
Set up Railway metrics - HIGH
Load testing - HIGH
Documentation README updates MEDIUM

4. Risk Assessment & Mitigation

Risk Probability Impact Mitigation
Reduced workers cause performance degradation Medium High Monitor response times closely; threads compensate for worker reduction
Celery tasks timeout with new limits Low Medium Increase time-limit if needed; split large tasks
PDF generation fails with GC Low Low GC is called after PDF creation, not during
Database query changes break functionality Low High Comprehensive testing; verify prefetch results match old queries
Memory limits cause OOM crashes Medium High Start with conservative limits; gradually reduce based on monitoring
Streaming CSV breaks existing integrations Low Medium Keep old endpoint; add new streaming endpoint

5. Success Measurement

5.1 Key Metrics

Before Optimization (Baseline): - API service: ~500MB average, ~800MB peak - Celery worker: ~400MB average, ~600MB peak - OOM events: 2-3 per week

After Optimization (Targets): - API service: ~300MB average (~40% reduction) - Celery worker: ~200MB average (~50% reduction) - OOM events: 0 per month

5.2 Monitoring Plan

Week 1 Post-Deployment: - Daily memory usage checks - Response time monitoring - Error rate tracking

Week 2-4: - Weekly reviews - Gradual limit reduction if stable

Month 2-3: - Monthly performance reviews - Cost analysis (Railway billing)


6. Rollback Strategy

6.1 Configuration Rollback

  • Keep old configuration files as .backup
  • Railway allows instant rollback to previous deployment

6.2 Code Rollback

  • Create feature branch pique-530-memory-deployment
  • Deploy incrementally, can revert specific commits
  • Use feature flags for code changes if needed

7. File Change Summary

Documentation Files:

  • MEMORY_OPTIMIZATION_PLAN.md - This file (overall strategy)
  • MEMORY_CODE_OPTIMIZATIONS.md - Detailed code changes (Phase 2)

New Files (Phase 1 & 3):

  • apps/api/gunicorn_config.py - Gunicorn configuration
  • apps/api/tickets/management/commands/check_memory.py - Memory monitoring

Modified Files (Phase 1 - Configuration):

  • apps/api/Dockerfile - Update CMD to use gunicorn config
  • apps/api/railway.json - Add memory limits, update start command
  • apps/api/railway.worker.json - Add memory limits, update Celery command
  • apps/api/brktickets/settings.py - Database, cache, Pillow settings

8. Testing Strategy

8.1 Load Testing

  • Simulate 100 concurrent users
  • Test PDF generation under load
  • Test CSV export with 5000+ attendees
  • Monitor memory usage during tests

8.2 Functional Testing

  • Verify all PDF generation endpoints work
  • Verify CSV exports produce correct data
  • Verify admin pages load correctly
  • Verify image uploads process correctly

8.3 Performance Testing

  • Measure baseline vs optimized response times
  • Measure memory usage over 24 hours
  • Test memory limits (intentionally breach to verify restart)

9. Quick Implementation Checklist

Phase 1 - Configuration:

  • Create gunicorn_config.py with optimized settings
  • Update Dockerfile CMD to use config file
  • Update railway.json with memory limits
  • Update railway.worker.json with optimized Celery settings
  • Update database CONN_MAX_AGE in settings
  • Update cache configuration
  • Deploy to staging environment
  • Monitor for 24 hours, verify stability

Phase 2 - Infrastructure:

  • Add memory limits to Railway deployments
  • Create Gunicorn configuration file
  • Add monitoring & alerts

Phase 3 - Monitoring:

  • Create memory monitoring management command
  • Set up Railway metrics dashboard


Document Status: Ready for Review and Approval Next Steps: Present to team, get approval, begin Phase 1 implementation

Memory Optimization - Code Changes

Created: 2025-10-18 Status: Ready for Implementation Priority: HIGH Related Plan: MEMORY_OPTIMIZATION_PLAN.md


Overview

This document contains all code-level memory optimizations for the PiqueTickets API. These are Phase 2 optimizations that require code changes (as opposed to configuration changes).

Expected Overall Impact: - 30-50% reduction in memory consumption for key operations - 20-35% faster processing for PDFs and reports - Elimination of N+1 query patterns


1. PDF Generation Memory Optimization

Priority: HIGH
File: apps/api/tickets/utils/pdf.py
Impact: 20-30% reduction in PDF generation memory spikes

Problem

WeasyPrint loads entire HTML + renders PDF into memory simultaneously. No cleanup occurs after generation.

Current Implementation

def generate_ticket_pdf(order):
    html_string = render_to_string('tickets/ticket_pdf.html', context)
    pdf = HTML(string=html_string).write_pdf()  # ALL IN MEMORY
    return pdf

Optimized Implementation

import gc

def generate_ticket_pdf(order):
    html_string = render_to_string('tickets/ticket_pdf.html', context)
    pdf = HTML(string=html_string).write_pdf()

    # Force garbage collection after PDF generation
    del html_string
    gc.collect()

    return pdf

Changes Required

  1. Add import gc at top of file
  2. Add explicit cleanup (del html_string) after PDF generation
  3. Call gc.collect() to force garbage collection
  4. Verify all PDF generation goes through Celery async tasks (already implemented in tickets/task.py:181)

Testing

  • Generate PDF for small order (1-5 tickets)
  • Generate PDF for large order (50+ tickets)
  • Monitor memory usage during generation
  • Verify PDF content is identical to pre-optimization

2. Image Processing Optimization

Priority: HIGH
File: apps/api/tickets/image_optimization.py
Impact: 30-40% reduction in image processing memory

Problem

Creates multiple PIL Image objects during resize/conversion. RGBA→RGB conversion creates duplicates. All 4 responsive sizes (400, 800, 1200, 1600px) exist in memory simultaneously.

Current Implementation

def generate_responsive_images(image_path):
    img = Image.open(image_path)
    if img.mode == 'RGBA':
        img = img.convert('RGB')  # DUPLICATE CREATED

    for width in [400, 800, 1200, 1600]:
        resized = img.resize((width, height))  # ANOTHER COPY
        # Process...

Optimized Implementation

import gc
from PIL import Image

def generate_responsive_images(image_path):
    img = Image.open(image_path)

    # Convert once at the beginning, avoiding duplicate memory
    if img.mode == 'RGBA':
        background = Image.new('RGB', img.size, (255, 255, 255))
        background.paste(img, mask=img.split()[3])
        img.close()  # Close original
        img = background

    for width in [400, 800, 1200, 1600]:
        # Use copy() + thumbnail() for memory efficiency
        img_copy = img.copy()
        img_copy.thumbnail((width, height), Image.LANCZOS)

        # Save operations...
        # ... existing save code ...

        # Immediately close and delete
        img_copy.close()
        del img_copy

    img.close()
    gc.collect()

Additional Change: Set Pillow Memory Limits

File: apps/api/brktickets/settings.py

Add at the bottom of the file:

# Pillow Memory Limits
from PIL import Image
Image.MAX_IMAGE_PIXELS = 178956970  # ~150MB limit for safety

Changes Required

  1. Replace img.convert('RGB') with proper RGBA handling using background paste
  2. Add explicit img.close() calls after each image operation
  3. Use thumbnail() instead of resize() where possible (modifies in-place)
  4. Add gc.collect() after processing all sizes
  5. Add Pillow memory limit to settings.py

Testing

  • Upload RGBA PNG image
  • Upload RGB JPEG image
  • Upload very large image (4000x3000px)
  • Verify all 4 responsive sizes are generated correctly
  • Monitor memory usage during processing
  • Verify image quality is maintained

3. Database Query Optimization

Priority: HIGH
Impact: 40-50% reduction in query memory consumption

3.1 Admin List View Queries

File: apps/api/tickets/admin.py:575-582

Problem

No query optimization in admin list views, causing N+1 queries for related fields.

Current Implementation

def get_queryset(self, request):
    return super().get_queryset(request)

Optimized Implementation

def get_queryset(self, request):
    qs = super().get_queryset(request)
    return qs.select_related(
        'producer',
        'venue'
    ).prefetch_related(
        'tickets',
        'ticket_page_design'
    )

3.2 Admin Revenue Calculations

File: apps/api/tickets/admin.py:422-456

Problem

N queries for revenue calculations (one per row in admin list view).

Current Implementation

def show_gross_revenue(self, obj):
    attendees = TicketAttendee.objects.filter(show=obj)  # N queries
    return sum([attendee.total_paid for attendee in attendees])

Optimized Implementation

from django.db.models import Sum

def show_gross_revenue(self, obj):
    result = TicketAttendee.objects.filter(
        show=obj
    ).aggregate(total=Sum('total_paid'))
    return result['total'] or 0

3.3 Report Generation

File: apps/api/tickets/utils/reports.py:19-121

Problem

Loads all orders into memory, N+1 query pattern for tickets and ticket orders.

Current Implementation

def generate_show_report_data(show):
    orders = Order.objects.filter(show=show, success=True)  # ALL ORDERS
    tickets = show.tickets.all()  # ALL TICKETS

    for ticket in tickets:
        ticket_orders = TicketOrder.objects.filter(ticket=ticket)  # N+1

Optimized Implementation

def generate_show_report_data(show):
    # Optimize order query with prefetch
    orders = Order.objects.filter(
        show=show,
        success=True
    ).select_related('created_by').prefetch_related(
        'ticketorder_set__ticket'
    )

    # Optimize ticket query with prefetch
    tickets = show.tickets.prefetch_related(
        'ticketorder_set__order'
    )

    # Use prefetched data instead of additional queries
    # Modify existing loops to use .all() instead of .filter()
    for ticket in tickets:
        # Access via prefetched relationship
        ticket_orders = ticket.ticketorder_set.all()  # No additional query

Changes Required

  1. Add select_related() and prefetch_related() to all admin querysets
  2. Replace aggregation loops with .aggregate() database operations
  3. Replace N+1 query patterns with prefetch relationships
  4. Update report generation to use prefetched data

Testing

  • Load admin Show list view with 50+ shows
  • Check Django Debug Toolbar for query count (should reduce from 100+ to ~5)
  • Generate report for show with 1000+ orders
  • Verify revenue calculations are accurate
  • Verify report data matches pre-optimization output

4. CSV Export Streaming

Priority: MEDIUM
File: apps/api/tickets/utils/preshow.py:10-57
Impact: 60-70% reduction for large exports (1000+ attendees)

Problem

Loads all data into StringIO buffer before streaming begins. For large events, this can consume 50-100MB of memory.

Current Implementation

def generate_preshow_csv(show):
    output = io.StringIO()  # ALL IN MEMORY
    writer = csv.writer(output)

    attendees = TicketAttendee.objects.filter(
        show=show
    ).prefetch_related('orders__ticket')  # LOADS ALL

    for attendee in attendees:
        writer.writerow([...])

    return output.getvalue()  # RETURNS FULL STRING

Optimized Implementation

Step 1: Create Streaming Generator

def generate_preshow_csv_streaming(show):
    """Generator function for streaming CSV"""
    # Yield header row
    yield 'Name,Email,Ticket Type,Check In Code,Purchased By\n'

    # Use iterator() to prevent loading all at once
    attendees = TicketAttendee.objects.filter(
        show=show
    ).select_related(
        'orders__ticket'
    ).iterator(chunk_size=100)

    for attendee in attendees:
        # Build CSV row manually (faster than csv.writer for streaming)
        row = f'"{attendee.name}","{attendee.email}","{attendee.ticket_type}","{attendee.check_in_code}","{attendee.purchased_by}"\n'
        yield row

Step 2: Update View to Use StreamingHttpResponse

File: apps/api/tickets/views.py (or wherever the export view is)

from django.http import StreamingHttpResponse
from django.shortcuts import get_object_or_404

def export_preshow_csv(request, show_id):
    show = get_object_or_404(Show, id=show_id)

    response = StreamingHttpResponse(
        generate_preshow_csv_streaming(show),
        content_type='text/csv'
    )
    response['Content-Disposition'] = f'attachment; filename="preshow_{show_id}.csv"'
    return response

Changes Required

  1. Create new generator function generate_preshow_csv_streaming()
  2. Use .iterator(chunk_size=100) instead of loading all querysets
  3. Build CSV rows manually instead of using csv.writer (for true streaming)
  4. Update view to use StreamingHttpResponse
  5. Keep old function as fallback if needed

Testing

  • Export CSV for small event (10 attendees)
  • Export CSV for large event (1000+ attendees)
  • Verify CSV format is identical to pre-optimization
  • Monitor memory usage during export (should stay flat)
  • Test with different browsers (Chrome, Safari, Firefox)
  • Verify special characters are properly escaped

5. Implementation Checklist

Phase 2A - High Priority Code Changes:

  • Add gc.collect() to PDF generation functions (tickets/utils/pdf.py)
  • Optimize image processing with .close() and .thumbnail() (tickets/image_optimization.py)
  • Add Pillow MAX_IMAGE_PIXELS limit (brktickets/settings.py)
  • Add select_related/prefetch_related to admin queryset (tickets/admin.py:575-582)
  • Convert admin aggregations to use .aggregate() (tickets/admin.py:422-456)
  • Optimize report generation queries (tickets/utils/reports.py:19-121)

Phase 2B - Medium Priority Code Changes:

  • Implement streaming CSV generator (tickets/utils/preshow.py)
  • Update export view to use StreamingHttpResponse
  • Add database indexes if needed (check query performance)

Testing & Validation:

  • Run comprehensive test suite
  • Run Django Debug Toolbar to verify query counts
  • Load test with production-like data
  • Monitor memory usage for 24 hours in staging
  • Verify all outputs match pre-optimization

6. File Change Summary

Files to Modify:

  1. apps/api/tickets/utils/pdf.py - Add garbage collection
  2. apps/api/tickets/image_optimization.py - Optimize image processing
  3. apps/api/brktickets/settings.py - Add Pillow memory limits
  4. apps/api/tickets/admin.py - Query optimization (multiple methods)
  5. apps/api/tickets/utils/reports.py - Query optimization
  6. apps/api/tickets/utils/preshow.py - Streaming CSV
  7. apps/api/tickets/views.py - Update CSV export view

New Imports Required:

import gc  # For explicit garbage collection
from django.db.models import Sum  # For aggregations
from django.http import StreamingHttpResponse  # For CSV streaming
from PIL import Image  # For memory limits (in settings.py)

7. Rollback Strategy

If Issues Occur:

  1. Each optimization is isolated to specific functions
  2. Can revert individual changes without affecting others
  3. Keep old implementations commented out during initial deployment
  4. Feature branch: pique-530-memory-deployment

Quick Rollback Commands:

# Revert specific file
git checkout HEAD -- apps/api/tickets/utils/pdf.py

# Revert entire Phase 2
git revert <commit-hash>

8. Performance Targets

Before Optimization (Current):

  • PDF generation: 3-5 seconds, 150-200MB memory spike
  • Image processing: 2-4 seconds, 100-150MB memory spike
  • Admin list view: 50-100 queries for 20 shows
  • Report generation: 8-12 seconds, 200-300MB memory
  • CSV export (1000 attendees): 5-8 seconds, 80-120MB memory

After Optimization (Expected):

  • PDF generation: 2-4 seconds, 100-140MB memory spike (30% reduction)
  • Image processing: 1.5-3 seconds, 60-90MB memory spike (40% reduction)
  • Admin list view: 5-10 queries for 20 shows (90% reduction)
  • Report generation: 5-8 seconds, 120-180MB memory (40% reduction)
  • CSV export (1000 attendees): 5-8 seconds, 20-30MB memory (75% reduction)

9. Dependencies

Required Packages (Already Installed):

  • pillow==11.0.0
  • weasyprint==63.1
  • django==5.1.4
  • djangorestframework==3.15.2

Optional for Testing:

  • django-debug-toolbar (already in dev requirements)
  • memory_profiler (for detailed profiling)

  • Main Plan: MEMORY_OPTIMIZATION_PLAN.md
  • Configuration Changes: See Phase 1 in main plan
  • Infrastructure Changes: See Phase 3 in main plan

Document Status: Ready for Implementation
Next Steps: Begin Phase 2A implementation, test each change individually

Memory Reduction Strategies for Next.js Frontend

Date: 2025-10-25 Context: Addressing actual memory footprint reduction vs cache management Related: NEXTJS_CACHE_MANAGEMENT_PLAN.md


Understanding the Problem

The cache management plan prevents unbounded growth but may not reduce existing memory footprint. This document outlines strategies that will actually free memory.


Strategy Comparison

Strategy Memory Reduction Complexity Impact on Performance
Cache invalidation (revalidateTag) ⚠️ Minimal Low Minimal
Hard cache limits (isrMemoryCacheSize) ✅ Moderate Low Low (may increase cache misses)
Reduce cached routes ✅ High Medium Medium (more server-side rendering)
Container restarts ✅ Complete Low High (temporary downtime)
Disable caching ✅ Complete Low Very High (all SSR)
Edge caching (CDN) ✅ High High Low (may improve performance)
Optimize bundle size ✅ Moderate Medium Positive (faster loads)
Use streaming SSR ✅ Low Medium Positive (better UX)

Impact: High reduction, low complexity

Update next.config.ts:

const nextConfig: NextConfig = {
  experimental: {
    // Existing optimizations
    optimizeCss: true,
    optimizePackageImports: ['react-icons'],
    webpackMemoryOptimizations: true,
    preloadEntriesOnStart: false,

    // Set strict memory limit for ISR cache
    isrMemoryCacheSize: 25 * 1024 * 1024, // 25MB (reduced from 50MB)

    // Limit the number of pages kept in memory
    isrFlushToDisk: true, // Flush to disk instead of keeping in memory
  },

  // Reduce memory for static generation
  generateBuildId: async () => {
    return 'piquetickets-' + Date.now()
  },
}

Expected Memory Reduction: 30-50MB


Impact: Moderate reduction, moderate complexity

Current State: You're caching all pages with 30-second revalidation

Optimization: Only cache high-traffic routes, make others fully dynamic

// apps/frontend/src/app/shows/[slug]/page.tsx

// BEFORE: Caches every show page
export const revalidate = 30

// AFTER: Only cache popular shows, others are dynamic
export const revalidate = false // Make fully dynamic by default
export const dynamicParams = true

// Then selectively cache via fetch instead
export async function generateStaticParams() {
  // Only pre-generate top 50 shows
  const shows = await fetchShows()
  return shows.slice(0, 50).map(show => ({
    slug: show.slug
  }))
}

For Homepage:

// apps/frontend/src/app/page.tsx

// Consider removing revalidate entirely - make it SSR
// Only use cache at the fetch level, not page level
// export const revalidate = 30 // REMOVE THIS

export default async function Home() {
  // Fetch with shorter cache
  const [shows, producers, featuredShows, activeCities] = await Promise.all([
    fetchShows(), // This still has 30s cache at fetch level
    fetchProducers(),
    fetchFeaturedShows(),
    fetchActiveCities()
  ]);
  // ...
}

Expected Memory Reduction: 50-100MB (depends on number of cached routes)


3. Move to Fetch-Level Caching Only

Impact: High reduction, moderate complexity

Current: Page-level ISR + fetch-level caching = double caching Better: Only fetch-level caching with shorter TTL

// Remove page-level revalidate from ALL pages
// Keep caching only in fetch requests

// apps/frontend/src/lib/api.ts
export async function fetchShow(slug: string) {
  const response = await fetch(
    `${process.env.NEXT_PUBLIC_API_URL}/shows/${slug}`,
    {
      next: {
        revalidate: 60, // 1 minute cache
        tags: ['shows', `show-${slug}`]
      }
    }
  )
  return response.json()
}

// For frequently changing data
export async function fetchShows() {
  const response = await fetch(
    `${process.env.NEXT_PUBLIC_API_URL}/shows`,
    {
      next: {
        revalidate: 30, // 30 second cache
        tags: ['shows', 'shows-list']
      },
      // Or use no cache for lists
      // cache: 'no-store'
    }
  )
  return response.json()
}

Expected Memory Reduction: 100-200MB


4. Implement CDN/Edge Caching ⭐⭐ (Best Long-term)

Impact: Very high reduction, high complexity

Strategy: Move caching from origin to edge (Cloudflare, Vercel Edge, etc.)

Option A: Railway + Cloudflare CDN

// next.config.ts
const nextConfig: NextConfig = {
  // ... existing config

  async headers() {
    return [
      {
        source: '/shows/:path*',
        headers: [
          {
            key: 'Cache-Control',
            // Cache at CDN for 5 minutes, stale for 1 hour
            value: 'public, s-maxage=300, stale-while-revalidate=3600',
          },
          {
            key: 'CDN-Cache-Control',
            // Cloudflare-specific: cache for 1 hour
            value: 'public, max-age=3600',
          },
        ],
      },
      {
        source: '/api/:path*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'public, s-maxage=60, stale-while-revalidate=300',
          },
        ],
      },
    ]
  },
}

Then configure Cloudflare to cache HTML:

Cache Rule (Cloudflare Dashboard):
- Match: piquetickets.com/shows/*
- Cache Level: Standard
- Edge TTL: 1 hour
- Browser TTL: 5 minutes

Expected Memory Reduction: 200-400MB (most caching moves to edge)


Option B: Deploy to Vercel Edge Runtime

Move to Vercel for built-in edge caching:

// apps/frontend/src/app/shows/[slug]/page.tsx
export const runtime = 'edge' // Run on Vercel Edge

export const revalidate = 60 // Edge cache for 1 minute

export default async function ShowPage({ params }: Params) {
  // Runs on edge, minimal origin memory usage
  const show = await fetchShow((await params).slug)
  // ...
}

Expected Memory Reduction: 300-500MB (minimal origin caching)


5. Scheduled Container Restarts ⚠️ (Short-term Fix)

Impact: Complete memory reset, low complexity, high disruption

Configure Railway to restart containers daily:

# railway.json
{
  "deploy": {
    "restartPolicyType": "ON_FAILURE",
    "healthcheckPath": "/api/health",
    "healthcheckTimeout": 30,

    # Add cron to trigger health check failure
    # (forces container restart)
  }
}

Or use Railway's built-in scheduling: - Dashboard > Service > Settings > "Restart Service" - Schedule daily restart at 4 AM (low traffic)

Expected Memory Reduction: 100% (full reset) Downside: 10-30 seconds downtime during restart


6. Optimize Component Memory Usage

Impact: Moderate reduction, medium complexity

Current Issue: Dynamic imports are good, but can optimize further

// apps/frontend/src/app/shows/[slug]/page.tsx

// CURRENT: Good use of dynamic imports
const TicketCheckoutForm = dynamicImport(
  () => import('@/components/Forms/tickets'),
  { loading: () => <TicketFormSkeleton /> }
);

// OPTIMIZATION: Don't load until needed
const TicketCheckoutForm = dynamicImport(
  () => import('@/components/Forms/tickets'),
  {
    loading: () => <TicketFormSkeleton />,
    ssr: false, // Don't render server-side (saves server memory)
  }
);

// BETTER: Load only when user scrolls to tickets section
'use client'
import { lazy, Suspense } from 'react'
import { useInView } from 'react-intersection-observer'

const TicketCheckoutForm = lazy(() => import('@/components/Forms/tickets'))

function TicketSection({ show }) {
  const { ref, inView } = useInView({ triggerOnce: true })

  return (
    <div ref={ref}>
      {inView && (
        <Suspense fallback={<TicketFormSkeleton />}>
          <TicketCheckoutForm show_id={show.id} tickets={show.tickets} />
        </Suspense>
      )}
    </div>
  )
}

Expected Memory Reduction: 20-50MB


7. Reduce API Response Sizes

Impact: Low-moderate reduction, easy wins

Optimization: Fetch only needed fields from API

// apps/frontend/src/lib/api.ts

// BEFORE: Fetching full show objects
export async function fetchShows() {
  const response = await fetch(`${API_URL}/shows`, {
    next: { revalidate: 30, tags: ['shows'] }
  })
  return response.json()
}

// AFTER: Fetch minimal fields for list views
export async function fetchShows(fields?: string[]) {
  const params = new URLSearchParams()
  if (fields) {
    params.set('fields', fields.join(','))
  }

  const response = await fetch(`${API_URL}/shows?${params}`, {
    next: { revalidate: 30, tags: ['shows'] }
  })
  return response.json()
}

// Usage in homepage
const shows = await fetchShows([
  'id', 'slug', 'title', 'banner_image',
  'start_time', 'venue_name'
]) // Only fetch fields needed for cards

Expected Memory Reduction: 10-30MB (smaller cached responses)


8. Disable ISR Entirely (Nuclear Option)

Impact: Maximum reduction, all SSR, slower pages

// Remove all revalidate exports from pages
// Remove cache from fetch requests

// apps/frontend/src/lib/api.ts
export async function fetchShow(slug: string) {
  const response = await fetch(
    `${process.env.NEXT_PUBLIC_API_URL}/shows/${slug}`,
    {
      cache: 'no-store', // No caching
      // Remove: next: { revalidate: 30, tags: ['shows'] }
    }
  )
  return response.json()
}

Expected Memory Reduction: 400-600MB Expected Performance Impact: Pages 2-5x slower


Phase 1: Quick Wins (Week 1)

  1. ✅ Set isrMemoryCacheSize: 25MB
  2. ✅ Remove page-level revalidate from low-traffic pages
  3. ✅ Optimize API response sizes
  4. ✅ Add lazy loading for heavy components

Expected Reduction: 100-150MB

Phase 2: Caching Strategy (Week 2-3)

  1. ✅ Implement fetch-level only caching
  2. ✅ Selective static generation (top 50 shows only)
  3. ✅ Implement cache purge API (from previous plan)

Expected Reduction: 200-300MB total

Phase 3: Edge/CDN (Week 4-6)

  1. ✅ Set up Cloudflare CDN caching
  2. ✅ Configure cache rules for static assets
  3. ✅ Move HTML caching to edge
  4. ✅ Reduce origin cache TTL

Expected Reduction: 400-500MB total

Phase 4: Monitor & Optimize (Ongoing)

  1. ✅ Track memory usage in Railway
  2. ✅ A/B test cache strategies
  3. ✅ Consider Vercel Edge if needed

Testing Memory Reduction

Before Implementation

# SSH into Railway container
railway run bash

# Check current memory
node -e "console.log(process.memoryUsage())"

# Monitor over time
watch -n 60 "node -e 'console.log(process.memoryUsage())'"

After Each Phase

# Compare memory before/after
# Document in spreadsheet:
# - Phase | Baseline | After | Reduction | P95 Response Time

Load Testing

# Use Artillery or k6
npm install -g artillery

# Create test script
cat > load-test.yml <<EOF
config:
  target: "https://piquetickets.railway.app"
  phases:
    - duration: 300
      arrivalRate: 50
scenarios:
  - flow:
      - get:
          url: "/"
      - get:
          url: "/shows"
      - get:
          url: "/shows/{{ \$randomString() }}"
EOF

artillery run load-test.yml

Memory Monitoring Dashboard

Set up Railway metrics tracking:

// apps/frontend/src/app/api/metrics/route.ts
export async function GET() {
  const usage = process.memoryUsage()

  return Response.json({
    timestamp: new Date().toISOString(),
    memory: {
      rss: `${Math.round(usage.rss / 1024 / 1024)}MB`,
      heapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
      heapUsed: `${Math.round(usage.heapUsed / 1024 / 1024)}MB`,
      external: `${Math.round(usage.external / 1024 / 1024)}MB`,
    },
    process: {
      uptime: `${Math.round(process.uptime() / 60)}min`,
      cpuUsage: process.cpuUsage(),
    }
  })
}

Then poll this endpoint and graph in Datadog/Grafana/Railway metrics.


Success Criteria

  • Memory usage < 200MB baseline (currently ~400-600MB?)
  • Memory growth < 10MB/day
  • P95 response time < 1 second
  • Cache hit rate > 70%
  • No OOM crashes for 30 days

Last Updated: 2025-10-25 Next Review: After Phase 1 implementation