Archived: 2025-11-01 Reason: Completed - Memory optimizations implemented Status: Consolidated from 3 separate memory optimization documents Note: This document combines MEMORY_OPTIMIZATION_PLAN.md, MEMORY_CODE_OPTIMIZATIONS.md, and MEMORY_REDUCTION_STRATEGIES.md
Memory Footprint Optimization Plan - PiqueTickets API Deployment¶
Created: 2025-10-17 Status: Planning Phase Priority: HIGH
1. Executive Summary & Feature Overview¶
1.1 Feature Description¶
- Feature Name: API Deployment Memory Footprint Reduction
- Feature Type: Performance Optimization / Infrastructure Enhancement
- Priority Level: High
1.2 Problem Statement¶
Current State: - Django API deployed on Railway with 4 separate services (API, Celery Worker, Celery Beat, Flower) - Gunicorn configured with 4 workers using sync worker class - Celery worker with concurrency=4, prefetch-multiplier=4 - WeasyPrint PDF generation loads entire documents into memory - Pillow image processing creates multiple copies during transformations - No memory limits defined in deployment configuration - Database queries loading full datasets without pagination
Pain Points: - High memory consumption per container/service - Potential OOM (Out of Memory) kills during peak load - Inefficient resource utilization on Railway platform - Memory leaks from long-running processes - Duplicate image objects during format conversions - N+1 query patterns in admin interface - Full dataset loads for reports and CSV exports
User Impact: - Increased hosting costs due to higher memory allocation - Potential service disruptions during memory spikes - Slower response times when memory pressure is high - Failed PDF/report generation for large events
Business Value: - Reduced infrastructure costs - Improved service reliability and uptime - Better scalability for growing event sizes - Faster response times across all services
1.3 Expected Outcomes¶
Success Metrics: - Reduce API service memory usage from ~500MB to ~300MB baseline - Reduce Celery worker memory from ~400MB to ~200MB per worker - Eliminate memory-related crashes (target: 0 OOM events per month) - Reduce average response time for PDF generation by 20-30% - Reduce database query memory consumption by 40-50%
Technical Goals: - Implement streaming responses for large data exports - Add memory limits and resource constraints to all services - Optimize Gunicorn and Celery configurations - Implement lazy loading and pagination for database queries - Add garbage collection after memory-intensive operations - Convert synchronous PDF generation to async background tasks
2. Current State Analysis¶
2.1 Deployment Architecture¶
Services Running:
1. API Service (apps/api/Dockerfile)
- Gunicorn with 4 sync workers
- No memory limits defined
- Current config: apps/api/railway.json:10
- Celery Worker (
apps/api/Dockerfile) - Concurrency: 4 workers
- Prefetch multiplier: 4
- Max tasks per child: 100
-
Current config: apps/api/railway.worker.json:10
-
Celery Beat (
apps/api/Dockerfile) - Single scheduler process
- Database-backed scheduler
-
Current config: apps/api/railway.beats.json:10
-
Flower (
apps/api/Dockerfile) - Monitoring dashboard
- Current config: apps/api/railway.flower.json:10
Docker Configuration:
- Base image: python:3.12-slim (apps/api/Dockerfile:2)
- System dependencies include: cairo, pango, gdk-pixbuf (for WeasyPrint)
- Installs 40 Python packages (apps/api/requirements.txt)
2.2 Memory-Intensive Operations (Identified)¶
CRITICAL Issues:
- PDF Generation - Highest Impact
- Files:
tickets/utils/pdf.py,tickets/utils/reports.py - WeasyPrint loads entire HTML + renders to memory
- No streaming support
-
Generates ticket PDFs synchronously in Celery tasks
-
Image Processing - High Impact
- File:
tickets/image_optimization.py - Creates 4 responsive sizes per image (400, 800, 1200, 1600px)
- Multiple PIL Image objects in memory simultaneously
- RGBA→RGB conversion creates duplicates
-
Triggered on every Show save (models.py:403)
-
Report Generation - High Impact
- File:
tickets/utils/reports.py:19-121 - Loads all orders for show into memory
- N+1 query pattern for tickets and ticket orders
-
Builds large dictionaries before serialization
-
CSV Exports - Medium Impact
- Files:
tickets/utils/csv_export.py,tickets/utils/preshow.py - Prefetch loads all relations before streaming
- Uses StringIO for entire CSV in memory
MEDIUM Issues:
- Admin List Views - Medium Impact
- File:
tickets/admin.py:422-456 - N queries for revenue calculations (one per row)
-
No select_related/prefetch_related optimization
-
Database Query Patterns - Medium Impact
- Check-in view loads all attendees (admin.py:816)
- Order queries without pagination
- Missing database indexes on frequently queried fields
2.3 Current Dependencies¶
Core Memory-Intensive Packages:
weasyprint==63.1 # PDF generation (HIGH memory)
pillow==11.0.0 # Image processing (MEDIUM memory)
gunicorn==23.0.0 # Application server
celery==5.4.0 # Task queue
redis==5.2.1 # Cache/broker
boto3==1.36.20 # AWS S3 (if enabled)
djangorestframework==3.15.2
System Dependencies (from Dockerfile): - libcairo2, libpango-1.0-0 (WeasyPrint dependencies) - libjpeg62-turbo, libwebp-dev (Image processing)
3. Solution Design¶
3.1 Optimization Strategy (Phased Approach)¶
Phase 1: Quick Wins - Configuration Optimization¶
Low risk, immediate impact changes to deployment configuration.
Phase 2: Code Optimization¶
Medium risk, targeted code changes to reduce memory consumption.
📄 Detailed Code Changes: See
MEMORY_CODE_OPTIMIZATIONS.md
Phase 3: Architectural Improvements¶
Higher complexity changes requiring more extensive testing.
3.2 Detailed Recommendations¶
PHASE 1: Configuration Optimization¶
1.1 Gunicorn Worker Configuration¶
Current Configuration (apps/api/Dockerfile:78):
gunicorn brktickets.wsgi:application \
--bind 0.0.0.0:8000 \
--workers 4 \
--worker-class sync \
--worker-connections 1000 \
--max-requests 1000 \
--max-requests-jitter 100 \
--timeout 30 \
--keep-alive 5 \
--preload
Optimized Configuration:
gunicorn brktickets.wsgi:application \
--bind 0.0.0.0:8000 \
--workers 2 \
--threads 4 \
--worker-class gthread \
--worker-connections 1000 \
--max-requests 500 \
--max-requests-jitter 50 \
--timeout 60 \
--keep-alive 5 \
--worker-tmp-dir /dev/shm \
--preload
Changes & Rationale:
- ✅ Reduce workers from 4→2, add 4 threads = 8 concurrent requests (same throughput)
- ✅ Change to gthread worker class (threads share memory, workers don't)
- ✅ Reduce max-requests to 500 (faster memory recycling)
- ✅ Add --worker-tmp-dir /dev/shm (use RAM for tmp files, faster)
- ✅ Increase timeout to 60s (accommodate PDF generation)
Expected Impact: ~40% memory reduction (2 workers vs 4)
1.2 Celery Worker Configuration¶
Current Configuration (apps/api/railway.worker.json:10):
celery -A brktickets worker -l INFO -E \
--max-tasks-per-child=100 \
--concurrency=4 \
--prefetch-multiplier=4
Optimized Configuration:
celery -A brktickets worker -l INFO -E \
--max-tasks-per-child=50 \
--concurrency=2 \
--prefetch-multiplier=1 \
--max-memory-per-child=300000 \
--time-limit=300 \
--soft-time-limit=270
Changes & Rationale:
- ✅ Reduce concurrency from 4→2 (fewer processes in memory)
- ✅ Set prefetch-multiplier to 1 (prevent queue hoarding, better for long tasks)
- ✅ Reduce max-tasks-per-child to 50 (more frequent process recycling)
- ✅ Add --max-memory-per-child=300000 (300MB limit, auto-restart on breach)
- ✅ Add time limits (prevent runaway tasks)
Expected Impact: ~50% memory reduction per worker
1.3 Database Connection Pooling¶
Current Configuration (apps/api/brktickets/settings.py:315-329):
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql_psycopg2",
...
"CONN_MAX_AGE": 600,
"CONN_HEALTH_CHECKS": True,
}
}
Optimized Configuration:
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql_psycopg2",
...
"CONN_MAX_AGE": 60, # Reduced from 600
"CONN_HEALTH_CHECKS": True,
"OPTIONS": {
"connect_timeout": 10,
"options": "-c statement_timeout=30000", # 30s query timeout
},
}
}
Changes & Rationale: - ✅ Reduce CONN_MAX_AGE to 60s (close idle connections faster) - ✅ Add statement_timeout (prevent long-running queries)
Expected Impact: Reduced memory from idle connections
1.4 Cache Configuration¶
Current Configuration (apps/api/brktickets/settings.py:783-794):
CACHES = {
"default": {
"BACKEND": "django.core.cache.backends.redis.RedisCache",
"LOCATION": os.getenv("REDIS_URL", "redis://localhost:6379/0"),
"KEY_PREFIX": "piquetickets",
"TIMEOUT": 300,
}
}
Optimized Configuration:
CACHES = {
"default": {
"BACKEND": "django.core.cache.backends.redis.RedisCache",
"LOCATION": os.getenv("REDIS_URL", "redis://localhost:6379/0"),
"OPTIONS": {
"socket_timeout": 5,
"retry_on_timeout": True,
"socket_connect_timeout": 5,
"max_connections": 50, # Limit connection pool
"CLIENT_CLASS": "django_redis.client.DefaultClient",
},
"KEY_PREFIX": "piquetickets",
"TIMEOUT": 300,
}
}
Changes & Rationale: - ✅ Limit Redis connection pool to 50 - ✅ Add connection timeouts
Expected Impact: Reduced memory from Redis connection pool
PHASE 2: Infrastructure & Architecture¶
2.1 Add Memory Limits to Railway Deployments¶
Update Railway Configuration Files:
File: apps/api/railway.json
{
"$schema": "https://railway.com/railway.schema.json",
"build": {
"builder": "DOCKERFILE",
"dockerfilePath": "apps/api/Dockerfile"
},
"deploy": {
"runtime": "V2",
"numReplicas": 1,
"startCommand": "sh -c 'python manage.py migrate && python manage.py collectstatic --noinput && gunicorn brktickets.wsgi --config gunicorn_config.py'",
"sleepApplication": false,
"healthcheckPath": "/health",
"healthcheckTimeout": 180,
"restartPolicyType": "ON_FAILURE",
"restartPolicyMaxRetries": 10,
"memoryLimit": 512,
"cpuLimit": 1.0
}
}
File: apps/api/railway.worker.json
2.2 Create Gunicorn Configuration File¶
Create: apps/api/gunicorn_config.py
import multiprocessing
import os
# Server socket
bind = "0.0.0.0:8000"
backlog = 2048
# Worker processes
workers = 2
worker_class = 'gthread'
threads = 4
worker_connections = 1000
max_requests = 500
max_requests_jitter = 50
timeout = 60
keepalive = 5
graceful_timeout = 30
# Memory management
worker_tmp_dir = '/dev/shm'
preload_app = True
# Logging
accesslog = '-'
errorlog = '-'
loglevel = 'info'
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'
# Process naming
proc_name = 'piquetickets-api'
# Server hooks
def on_starting(server):
"""Called just before the master process is initialized."""
pass
def when_ready(server):
"""Called just after the server is started."""
server.log.info("Server is ready. Spawning workers")
def post_worker_init(worker):
"""Called just after a worker has been initialized."""
worker.log.info(f"Worker spawned (pid: {worker.pid})")
def pre_fork(server, worker):
"""Called just prior to forking the worker subprocess."""
pass
def post_fork(server, worker):
"""Called just after a worker has been forked."""
pass
def worker_exit(server, worker):
"""Called just after a worker has been exited."""
worker.log.info(f"Worker exited (pid: {worker.pid})")
2.3 Add Monitoring & Alerts¶
Create: apps/api/tickets/management/commands/check_memory.py
import psutil
from django.core.management.base import BaseCommand
class Command(BaseCommand):
help = 'Check memory usage and log warnings'
def handle(self, *args, **options):
process = psutil.Process()
memory_info = process.memory_info()
memory_mb = memory_info.rss / 1024 / 1024
self.stdout.write(f"Memory usage: {memory_mb:.2f} MB")
if memory_mb > 400:
self.stdout.write(self.style.WARNING(
f"High memory usage detected: {memory_mb:.2f} MB"
))
3. Implementation Plan¶
3.1 Phase 1: Configuration¶
| Task | Files | Priority |
|---|---|---|
| Create gunicorn config file | apps/api/gunicorn_config.py |
HIGH |
| Update Dockerfile CMD | apps/api/Dockerfile:78 |
HIGH |
| Update Railway configs | railway.json, railway.worker.json |
HIGH |
| Update database settings | brktickets/settings.py:315-329 |
MEDIUM |
| Update cache settings | brktickets/settings.py:783-794 |
MEDIUM |
| Test in staging | - | HIGH |
3.2 Phase 2: Code Optimization¶
📄 See
MEMORY_CODE_OPTIMIZATIONS.mdfor detailed implementation guide
| Task | Files | Priority |
|---|---|---|
| Add GC to PDF generation | tickets/utils/pdf.py |
HIGH |
| Optimize image processing | tickets/image_optimization.py |
HIGH |
| Add Pillow memory limits | brktickets/settings.py |
HIGH |
| Optimize admin queries | tickets/admin.py |
HIGH |
| Optimize report generation | tickets/utils/reports.py |
HIGH |
| Implement CSV streaming | tickets/utils/preshow.py |
MEDIUM |
| Update CSV export view | tickets/views.py |
MEDIUM |
| Add database indexes | tickets/models.py |
MEDIUM |
| Testing & validation | - | HIGH |
3.3 Phase 3: Monitoring & Validation¶
| Task | Files | Priority |
|---|---|---|
| Add memory monitoring command | management/commands/check_memory.py |
MEDIUM |
| Set up Railway metrics | - | HIGH |
| Load testing | - | HIGH |
| Documentation | README updates | MEDIUM |
4. Risk Assessment & Mitigation¶
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Reduced workers cause performance degradation | Medium | High | Monitor response times closely; threads compensate for worker reduction |
| Celery tasks timeout with new limits | Low | Medium | Increase time-limit if needed; split large tasks |
| PDF generation fails with GC | Low | Low | GC is called after PDF creation, not during |
| Database query changes break functionality | Low | High | Comprehensive testing; verify prefetch results match old queries |
| Memory limits cause OOM crashes | Medium | High | Start with conservative limits; gradually reduce based on monitoring |
| Streaming CSV breaks existing integrations | Low | Medium | Keep old endpoint; add new streaming endpoint |
5. Success Measurement¶
5.1 Key Metrics¶
Before Optimization (Baseline): - API service: ~500MB average, ~800MB peak - Celery worker: ~400MB average, ~600MB peak - OOM events: 2-3 per week
After Optimization (Targets): - API service: ~300MB average (~40% reduction) - Celery worker: ~200MB average (~50% reduction) - OOM events: 0 per month
5.2 Monitoring Plan¶
Week 1 Post-Deployment: - Daily memory usage checks - Response time monitoring - Error rate tracking
Week 2-4: - Weekly reviews - Gradual limit reduction if stable
Month 2-3: - Monthly performance reviews - Cost analysis (Railway billing)
6. Rollback Strategy¶
6.1 Configuration Rollback¶
- Keep old configuration files as
.backup - Railway allows instant rollback to previous deployment
6.2 Code Rollback¶
- Create feature branch
pique-530-memory-deployment - Deploy incrementally, can revert specific commits
- Use feature flags for code changes if needed
7. File Change Summary¶
Documentation Files:¶
- ✅
MEMORY_OPTIMIZATION_PLAN.md- This file (overall strategy) - ✅
MEMORY_CODE_OPTIMIZATIONS.md- Detailed code changes (Phase 2)
New Files (Phase 1 & 3):¶
apps/api/gunicorn_config.py- Gunicorn configurationapps/api/tickets/management/commands/check_memory.py- Memory monitoring
Modified Files (Phase 1 - Configuration):¶
apps/api/Dockerfile- Update CMD to use gunicorn configapps/api/railway.json- Add memory limits, update start commandapps/api/railway.worker.json- Add memory limits, update Celery commandapps/api/brktickets/settings.py- Database, cache, Pillow settings
8. Testing Strategy¶
8.1 Load Testing¶
- Simulate 100 concurrent users
- Test PDF generation under load
- Test CSV export with 5000+ attendees
- Monitor memory usage during tests
8.2 Functional Testing¶
- Verify all PDF generation endpoints work
- Verify CSV exports produce correct data
- Verify admin pages load correctly
- Verify image uploads process correctly
8.3 Performance Testing¶
- Measure baseline vs optimized response times
- Measure memory usage over 24 hours
- Test memory limits (intentionally breach to verify restart)
9. Quick Implementation Checklist¶
Phase 1 - Configuration:¶
- Create
gunicorn_config.pywith optimized settings - Update
DockerfileCMD to use config file - Update
railway.jsonwith memory limits - Update
railway.worker.jsonwith optimized Celery settings - Update database CONN_MAX_AGE in settings
- Update cache configuration
- Deploy to staging environment
- Monitor for 24 hours, verify stability
Phase 2 - Infrastructure:¶
- Add memory limits to Railway deployments
- Create Gunicorn configuration file
- Add monitoring & alerts
Phase 3 - Monitoring:¶
- Create memory monitoring management command
- Set up Railway metrics dashboard
Appendix A: Reference Links¶
- Gunicorn Performance Tuning
- Celery Optimization Guide
- Django Database Optimization
- Pillow Performance
- WeasyPrint Documentation
Document Status: Ready for Review and Approval Next Steps: Present to team, get approval, begin Phase 1 implementation
Memory Optimization - Code Changes¶
Created: 2025-10-18 Status: Ready for Implementation Priority: HIGH Related Plan: MEMORY_OPTIMIZATION_PLAN.md
Overview¶
This document contains all code-level memory optimizations for the PiqueTickets API. These are Phase 2 optimizations that require code changes (as opposed to configuration changes).
Expected Overall Impact: - 30-50% reduction in memory consumption for key operations - 20-35% faster processing for PDFs and reports - Elimination of N+1 query patterns
1. PDF Generation Memory Optimization¶
Priority: HIGH
File: apps/api/tickets/utils/pdf.py
Impact: 20-30% reduction in PDF generation memory spikes
Problem¶
WeasyPrint loads entire HTML + renders PDF into memory simultaneously. No cleanup occurs after generation.
Current Implementation¶
def generate_ticket_pdf(order):
html_string = render_to_string('tickets/ticket_pdf.html', context)
pdf = HTML(string=html_string).write_pdf() # ALL IN MEMORY
return pdf
Optimized Implementation¶
import gc
def generate_ticket_pdf(order):
html_string = render_to_string('tickets/ticket_pdf.html', context)
pdf = HTML(string=html_string).write_pdf()
# Force garbage collection after PDF generation
del html_string
gc.collect()
return pdf
Changes Required¶
- Add
import gcat top of file - Add explicit cleanup (
del html_string) after PDF generation - Call
gc.collect()to force garbage collection - Verify all PDF generation goes through Celery async tasks (already implemented in
tickets/task.py:181)
Testing¶
- Generate PDF for small order (1-5 tickets)
- Generate PDF for large order (50+ tickets)
- Monitor memory usage during generation
- Verify PDF content is identical to pre-optimization
2. Image Processing Optimization¶
Priority: HIGH
File: apps/api/tickets/image_optimization.py
Impact: 30-40% reduction in image processing memory
Problem¶
Creates multiple PIL Image objects during resize/conversion. RGBA→RGB conversion creates duplicates. All 4 responsive sizes (400, 800, 1200, 1600px) exist in memory simultaneously.
Current Implementation¶
def generate_responsive_images(image_path):
img = Image.open(image_path)
if img.mode == 'RGBA':
img = img.convert('RGB') # DUPLICATE CREATED
for width in [400, 800, 1200, 1600]:
resized = img.resize((width, height)) # ANOTHER COPY
# Process...
Optimized Implementation¶
import gc
from PIL import Image
def generate_responsive_images(image_path):
img = Image.open(image_path)
# Convert once at the beginning, avoiding duplicate memory
if img.mode == 'RGBA':
background = Image.new('RGB', img.size, (255, 255, 255))
background.paste(img, mask=img.split()[3])
img.close() # Close original
img = background
for width in [400, 800, 1200, 1600]:
# Use copy() + thumbnail() for memory efficiency
img_copy = img.copy()
img_copy.thumbnail((width, height), Image.LANCZOS)
# Save operations...
# ... existing save code ...
# Immediately close and delete
img_copy.close()
del img_copy
img.close()
gc.collect()
Additional Change: Set Pillow Memory Limits¶
File: apps/api/brktickets/settings.py
Add at the bottom of the file:
# Pillow Memory Limits
from PIL import Image
Image.MAX_IMAGE_PIXELS = 178956970 # ~150MB limit for safety
Changes Required¶
- Replace
img.convert('RGB')with proper RGBA handling using background paste - Add explicit
img.close()calls after each image operation - Use
thumbnail()instead ofresize()where possible (modifies in-place) - Add
gc.collect()after processing all sizes - Add Pillow memory limit to settings.py
Testing¶
- Upload RGBA PNG image
- Upload RGB JPEG image
- Upload very large image (4000x3000px)
- Verify all 4 responsive sizes are generated correctly
- Monitor memory usage during processing
- Verify image quality is maintained
3. Database Query Optimization¶
Priority: HIGH
Impact: 40-50% reduction in query memory consumption
3.1 Admin List View Queries¶
File: apps/api/tickets/admin.py:575-582
Problem¶
No query optimization in admin list views, causing N+1 queries for related fields.
Current Implementation¶
Optimized Implementation¶
def get_queryset(self, request):
qs = super().get_queryset(request)
return qs.select_related(
'producer',
'venue'
).prefetch_related(
'tickets',
'ticket_page_design'
)
3.2 Admin Revenue Calculations¶
File: apps/api/tickets/admin.py:422-456
Problem¶
N queries for revenue calculations (one per row in admin list view).
Current Implementation¶
def show_gross_revenue(self, obj):
attendees = TicketAttendee.objects.filter(show=obj) # N queries
return sum([attendee.total_paid for attendee in attendees])
Optimized Implementation¶
from django.db.models import Sum
def show_gross_revenue(self, obj):
result = TicketAttendee.objects.filter(
show=obj
).aggregate(total=Sum('total_paid'))
return result['total'] or 0
3.3 Report Generation¶
File: apps/api/tickets/utils/reports.py:19-121
Problem¶
Loads all orders into memory, N+1 query pattern for tickets and ticket orders.
Current Implementation¶
def generate_show_report_data(show):
orders = Order.objects.filter(show=show, success=True) # ALL ORDERS
tickets = show.tickets.all() # ALL TICKETS
for ticket in tickets:
ticket_orders = TicketOrder.objects.filter(ticket=ticket) # N+1
Optimized Implementation¶
def generate_show_report_data(show):
# Optimize order query with prefetch
orders = Order.objects.filter(
show=show,
success=True
).select_related('created_by').prefetch_related(
'ticketorder_set__ticket'
)
# Optimize ticket query with prefetch
tickets = show.tickets.prefetch_related(
'ticketorder_set__order'
)
# Use prefetched data instead of additional queries
# Modify existing loops to use .all() instead of .filter()
for ticket in tickets:
# Access via prefetched relationship
ticket_orders = ticket.ticketorder_set.all() # No additional query
Changes Required¶
- Add
select_related()andprefetch_related()to all admin querysets - Replace aggregation loops with
.aggregate()database operations - Replace N+1 query patterns with prefetch relationships
- Update report generation to use prefetched data
Testing¶
- Load admin Show list view with 50+ shows
- Check Django Debug Toolbar for query count (should reduce from 100+ to ~5)
- Generate report for show with 1000+ orders
- Verify revenue calculations are accurate
- Verify report data matches pre-optimization output
4. CSV Export Streaming¶
Priority: MEDIUM
File: apps/api/tickets/utils/preshow.py:10-57
Impact: 60-70% reduction for large exports (1000+ attendees)
Problem¶
Loads all data into StringIO buffer before streaming begins. For large events, this can consume 50-100MB of memory.
Current Implementation¶
def generate_preshow_csv(show):
output = io.StringIO() # ALL IN MEMORY
writer = csv.writer(output)
attendees = TicketAttendee.objects.filter(
show=show
).prefetch_related('orders__ticket') # LOADS ALL
for attendee in attendees:
writer.writerow([...])
return output.getvalue() # RETURNS FULL STRING
Optimized Implementation¶
Step 1: Create Streaming Generator¶
def generate_preshow_csv_streaming(show):
"""Generator function for streaming CSV"""
# Yield header row
yield 'Name,Email,Ticket Type,Check In Code,Purchased By\n'
# Use iterator() to prevent loading all at once
attendees = TicketAttendee.objects.filter(
show=show
).select_related(
'orders__ticket'
).iterator(chunk_size=100)
for attendee in attendees:
# Build CSV row manually (faster than csv.writer for streaming)
row = f'"{attendee.name}","{attendee.email}","{attendee.ticket_type}","{attendee.check_in_code}","{attendee.purchased_by}"\n'
yield row
Step 2: Update View to Use StreamingHttpResponse¶
File: apps/api/tickets/views.py (or wherever the export view is)
from django.http import StreamingHttpResponse
from django.shortcuts import get_object_or_404
def export_preshow_csv(request, show_id):
show = get_object_or_404(Show, id=show_id)
response = StreamingHttpResponse(
generate_preshow_csv_streaming(show),
content_type='text/csv'
)
response['Content-Disposition'] = f'attachment; filename="preshow_{show_id}.csv"'
return response
Changes Required¶
- Create new generator function
generate_preshow_csv_streaming() - Use
.iterator(chunk_size=100)instead of loading all querysets - Build CSV rows manually instead of using
csv.writer(for true streaming) - Update view to use
StreamingHttpResponse - Keep old function as fallback if needed
Testing¶
- Export CSV for small event (10 attendees)
- Export CSV for large event (1000+ attendees)
- Verify CSV format is identical to pre-optimization
- Monitor memory usage during export (should stay flat)
- Test with different browsers (Chrome, Safari, Firefox)
- Verify special characters are properly escaped
5. Implementation Checklist¶
Phase 2A - High Priority Code Changes:¶
- Add
gc.collect()to PDF generation functions (tickets/utils/pdf.py) - Optimize image processing with
.close()and.thumbnail()(tickets/image_optimization.py) - Add Pillow MAX_IMAGE_PIXELS limit (
brktickets/settings.py) - Add
select_related/prefetch_relatedto admin queryset (tickets/admin.py:575-582) - Convert admin aggregations to use
.aggregate()(tickets/admin.py:422-456) - Optimize report generation queries (
tickets/utils/reports.py:19-121)
Phase 2B - Medium Priority Code Changes:¶
- Implement streaming CSV generator (
tickets/utils/preshow.py) - Update export view to use
StreamingHttpResponse - Add database indexes if needed (check query performance)
Testing & Validation:¶
- Run comprehensive test suite
- Run Django Debug Toolbar to verify query counts
- Load test with production-like data
- Monitor memory usage for 24 hours in staging
- Verify all outputs match pre-optimization
6. File Change Summary¶
Files to Modify:¶
apps/api/tickets/utils/pdf.py- Add garbage collectionapps/api/tickets/image_optimization.py- Optimize image processingapps/api/brktickets/settings.py- Add Pillow memory limitsapps/api/tickets/admin.py- Query optimization (multiple methods)apps/api/tickets/utils/reports.py- Query optimizationapps/api/tickets/utils/preshow.py- Streaming CSVapps/api/tickets/views.py- Update CSV export view
New Imports Required:¶
import gc # For explicit garbage collection
from django.db.models import Sum # For aggregations
from django.http import StreamingHttpResponse # For CSV streaming
from PIL import Image # For memory limits (in settings.py)
7. Rollback Strategy¶
If Issues Occur:¶
- Each optimization is isolated to specific functions
- Can revert individual changes without affecting others
- Keep old implementations commented out during initial deployment
- Feature branch:
pique-530-memory-deployment
Quick Rollback Commands:¶
# Revert specific file
git checkout HEAD -- apps/api/tickets/utils/pdf.py
# Revert entire Phase 2
git revert <commit-hash>
8. Performance Targets¶
Before Optimization (Current):¶
- PDF generation: 3-5 seconds, 150-200MB memory spike
- Image processing: 2-4 seconds, 100-150MB memory spike
- Admin list view: 50-100 queries for 20 shows
- Report generation: 8-12 seconds, 200-300MB memory
- CSV export (1000 attendees): 5-8 seconds, 80-120MB memory
After Optimization (Expected):¶
- PDF generation: 2-4 seconds, 100-140MB memory spike (30% reduction)
- Image processing: 1.5-3 seconds, 60-90MB memory spike (40% reduction)
- Admin list view: 5-10 queries for 20 shows (90% reduction)
- Report generation: 5-8 seconds, 120-180MB memory (40% reduction)
- CSV export (1000 attendees): 5-8 seconds, 20-30MB memory (75% reduction)
9. Dependencies¶
Required Packages (Already Installed):¶
pillow==11.0.0weasyprint==63.1django==5.1.4djangorestframework==3.15.2
Optional for Testing:¶
django-debug-toolbar(already in dev requirements)memory_profiler(for detailed profiling)
10. Related Documentation¶
- Main Plan:
MEMORY_OPTIMIZATION_PLAN.md - Configuration Changes: See Phase 1 in main plan
- Infrastructure Changes: See Phase 3 in main plan
Document Status: Ready for Implementation
Next Steps: Begin Phase 2A implementation, test each change individually
Memory Reduction Strategies for Next.js Frontend¶
Date: 2025-10-25 Context: Addressing actual memory footprint reduction vs cache management Related: NEXTJS_CACHE_MANAGEMENT_PLAN.md
Understanding the Problem¶
The cache management plan prevents unbounded growth but may not reduce existing memory footprint. This document outlines strategies that will actually free memory.
Strategy Comparison¶
| Strategy | Memory Reduction | Complexity | Impact on Performance |
|---|---|---|---|
Cache invalidation (revalidateTag) |
⚠️ Minimal | Low | Minimal |
Hard cache limits (isrMemoryCacheSize) |
✅ Moderate | Low | Low (may increase cache misses) |
| Reduce cached routes | ✅ High | Medium | Medium (more server-side rendering) |
| Container restarts | ✅ Complete | Low | High (temporary downtime) |
| Disable caching | ✅ Complete | Low | Very High (all SSR) |
| Edge caching (CDN) | ✅ High | High | Low (may improve performance) |
| Optimize bundle size | ✅ Moderate | Medium | Positive (faster loads) |
| Use streaming SSR | ✅ Low | Medium | Positive (better UX) |
Recommended Approaches for PiqueTickets¶
1. Set Aggressive Cache Size Limits ⭐ (Recommended)¶
Impact: High reduction, low complexity
Update next.config.ts:
const nextConfig: NextConfig = {
experimental: {
// Existing optimizations
optimizeCss: true,
optimizePackageImports: ['react-icons'],
webpackMemoryOptimizations: true,
preloadEntriesOnStart: false,
// Set strict memory limit for ISR cache
isrMemoryCacheSize: 25 * 1024 * 1024, // 25MB (reduced from 50MB)
// Limit the number of pages kept in memory
isrFlushToDisk: true, // Flush to disk instead of keeping in memory
},
// Reduce memory for static generation
generateBuildId: async () => {
return 'piquetickets-' + Date.now()
},
}
Expected Memory Reduction: 30-50MB
2. Selective Route Caching ⭐ (Recommended)¶
Impact: Moderate reduction, moderate complexity
Current State: You're caching all pages with 30-second revalidation
Optimization: Only cache high-traffic routes, make others fully dynamic
// apps/frontend/src/app/shows/[slug]/page.tsx
// BEFORE: Caches every show page
export const revalidate = 30
// AFTER: Only cache popular shows, others are dynamic
export const revalidate = false // Make fully dynamic by default
export const dynamicParams = true
// Then selectively cache via fetch instead
export async function generateStaticParams() {
// Only pre-generate top 50 shows
const shows = await fetchShows()
return shows.slice(0, 50).map(show => ({
slug: show.slug
}))
}
For Homepage:
// apps/frontend/src/app/page.tsx
// Consider removing revalidate entirely - make it SSR
// Only use cache at the fetch level, not page level
// export const revalidate = 30 // REMOVE THIS
export default async function Home() {
// Fetch with shorter cache
const [shows, producers, featuredShows, activeCities] = await Promise.all([
fetchShows(), // This still has 30s cache at fetch level
fetchProducers(),
fetchFeaturedShows(),
fetchActiveCities()
]);
// ...
}
Expected Memory Reduction: 50-100MB (depends on number of cached routes)
3. Move to Fetch-Level Caching Only¶
Impact: High reduction, moderate complexity
Current: Page-level ISR + fetch-level caching = double caching Better: Only fetch-level caching with shorter TTL
// Remove page-level revalidate from ALL pages
// Keep caching only in fetch requests
// apps/frontend/src/lib/api.ts
export async function fetchShow(slug: string) {
const response = await fetch(
`${process.env.NEXT_PUBLIC_API_URL}/shows/${slug}`,
{
next: {
revalidate: 60, // 1 minute cache
tags: ['shows', `show-${slug}`]
}
}
)
return response.json()
}
// For frequently changing data
export async function fetchShows() {
const response = await fetch(
`${process.env.NEXT_PUBLIC_API_URL}/shows`,
{
next: {
revalidate: 30, // 30 second cache
tags: ['shows', 'shows-list']
},
// Or use no cache for lists
// cache: 'no-store'
}
)
return response.json()
}
Expected Memory Reduction: 100-200MB
4. Implement CDN/Edge Caching ⭐⭐ (Best Long-term)¶
Impact: Very high reduction, high complexity
Strategy: Move caching from origin to edge (Cloudflare, Vercel Edge, etc.)
Option A: Railway + Cloudflare CDN
// next.config.ts
const nextConfig: NextConfig = {
// ... existing config
async headers() {
return [
{
source: '/shows/:path*',
headers: [
{
key: 'Cache-Control',
// Cache at CDN for 5 minutes, stale for 1 hour
value: 'public, s-maxage=300, stale-while-revalidate=3600',
},
{
key: 'CDN-Cache-Control',
// Cloudflare-specific: cache for 1 hour
value: 'public, max-age=3600',
},
],
},
{
source: '/api/:path*',
headers: [
{
key: 'Cache-Control',
value: 'public, s-maxage=60, stale-while-revalidate=300',
},
],
},
]
},
}
Then configure Cloudflare to cache HTML:
Cache Rule (Cloudflare Dashboard):
- Match: piquetickets.com/shows/*
- Cache Level: Standard
- Edge TTL: 1 hour
- Browser TTL: 5 minutes
Expected Memory Reduction: 200-400MB (most caching moves to edge)
Option B: Deploy to Vercel Edge Runtime
Move to Vercel for built-in edge caching:
// apps/frontend/src/app/shows/[slug]/page.tsx
export const runtime = 'edge' // Run on Vercel Edge
export const revalidate = 60 // Edge cache for 1 minute
export default async function ShowPage({ params }: Params) {
// Runs on edge, minimal origin memory usage
const show = await fetchShow((await params).slug)
// ...
}
Expected Memory Reduction: 300-500MB (minimal origin caching)
5. Scheduled Container Restarts ⚠️ (Short-term Fix)¶
Impact: Complete memory reset, low complexity, high disruption
Configure Railway to restart containers daily:
# railway.json
{
"deploy": {
"restartPolicyType": "ON_FAILURE",
"healthcheckPath": "/api/health",
"healthcheckTimeout": 30,
# Add cron to trigger health check failure
# (forces container restart)
}
}
Or use Railway's built-in scheduling: - Dashboard > Service > Settings > "Restart Service" - Schedule daily restart at 4 AM (low traffic)
Expected Memory Reduction: 100% (full reset) Downside: 10-30 seconds downtime during restart
6. Optimize Component Memory Usage¶
Impact: Moderate reduction, medium complexity
Current Issue: Dynamic imports are good, but can optimize further
// apps/frontend/src/app/shows/[slug]/page.tsx
// CURRENT: Good use of dynamic imports
const TicketCheckoutForm = dynamicImport(
() => import('@/components/Forms/tickets'),
{ loading: () => <TicketFormSkeleton /> }
);
// OPTIMIZATION: Don't load until needed
const TicketCheckoutForm = dynamicImport(
() => import('@/components/Forms/tickets'),
{
loading: () => <TicketFormSkeleton />,
ssr: false, // Don't render server-side (saves server memory)
}
);
// BETTER: Load only when user scrolls to tickets section
'use client'
import { lazy, Suspense } from 'react'
import { useInView } from 'react-intersection-observer'
const TicketCheckoutForm = lazy(() => import('@/components/Forms/tickets'))
function TicketSection({ show }) {
const { ref, inView } = useInView({ triggerOnce: true })
return (
<div ref={ref}>
{inView && (
<Suspense fallback={<TicketFormSkeleton />}>
<TicketCheckoutForm show_id={show.id} tickets={show.tickets} />
</Suspense>
)}
</div>
)
}
Expected Memory Reduction: 20-50MB
7. Reduce API Response Sizes¶
Impact: Low-moderate reduction, easy wins
Optimization: Fetch only needed fields from API
// apps/frontend/src/lib/api.ts
// BEFORE: Fetching full show objects
export async function fetchShows() {
const response = await fetch(`${API_URL}/shows`, {
next: { revalidate: 30, tags: ['shows'] }
})
return response.json()
}
// AFTER: Fetch minimal fields for list views
export async function fetchShows(fields?: string[]) {
const params = new URLSearchParams()
if (fields) {
params.set('fields', fields.join(','))
}
const response = await fetch(`${API_URL}/shows?${params}`, {
next: { revalidate: 30, tags: ['shows'] }
})
return response.json()
}
// Usage in homepage
const shows = await fetchShows([
'id', 'slug', 'title', 'banner_image',
'start_time', 'venue_name'
]) // Only fetch fields needed for cards
Expected Memory Reduction: 10-30MB (smaller cached responses)
8. Disable ISR Entirely (Nuclear Option)¶
Impact: Maximum reduction, all SSR, slower pages
// Remove all revalidate exports from pages
// Remove cache from fetch requests
// apps/frontend/src/lib/api.ts
export async function fetchShow(slug: string) {
const response = await fetch(
`${process.env.NEXT_PUBLIC_API_URL}/shows/${slug}`,
{
cache: 'no-store', // No caching
// Remove: next: { revalidate: 30, tags: ['shows'] }
}
)
return response.json()
}
Expected Memory Reduction: 400-600MB Expected Performance Impact: Pages 2-5x slower
Recommended Implementation Plan¶
Phase 1: Quick Wins (Week 1)¶
- ✅ Set
isrMemoryCacheSize: 25MB - ✅ Remove page-level
revalidatefrom low-traffic pages - ✅ Optimize API response sizes
- ✅ Add lazy loading for heavy components
Expected Reduction: 100-150MB
Phase 2: Caching Strategy (Week 2-3)¶
- ✅ Implement fetch-level only caching
- ✅ Selective static generation (top 50 shows only)
- ✅ Implement cache purge API (from previous plan)
Expected Reduction: 200-300MB total
Phase 3: Edge/CDN (Week 4-6)¶
- ✅ Set up Cloudflare CDN caching
- ✅ Configure cache rules for static assets
- ✅ Move HTML caching to edge
- ✅ Reduce origin cache TTL
Expected Reduction: 400-500MB total
Phase 4: Monitor & Optimize (Ongoing)¶
- ✅ Track memory usage in Railway
- ✅ A/B test cache strategies
- ✅ Consider Vercel Edge if needed
Testing Memory Reduction¶
Before Implementation¶
# SSH into Railway container
railway run bash
# Check current memory
node -e "console.log(process.memoryUsage())"
# Monitor over time
watch -n 60 "node -e 'console.log(process.memoryUsage())'"
After Each Phase¶
# Compare memory before/after
# Document in spreadsheet:
# - Phase | Baseline | After | Reduction | P95 Response Time
Load Testing¶
# Use Artillery or k6
npm install -g artillery
# Create test script
cat > load-test.yml <<EOF
config:
target: "https://piquetickets.railway.app"
phases:
- duration: 300
arrivalRate: 50
scenarios:
- flow:
- get:
url: "/"
- get:
url: "/shows"
- get:
url: "/shows/{{ \$randomString() }}"
EOF
artillery run load-test.yml
Memory Monitoring Dashboard¶
Set up Railway metrics tracking:
// apps/frontend/src/app/api/metrics/route.ts
export async function GET() {
const usage = process.memoryUsage()
return Response.json({
timestamp: new Date().toISOString(),
memory: {
rss: `${Math.round(usage.rss / 1024 / 1024)}MB`,
heapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
heapUsed: `${Math.round(usage.heapUsed / 1024 / 1024)}MB`,
external: `${Math.round(usage.external / 1024 / 1024)}MB`,
},
process: {
uptime: `${Math.round(process.uptime() / 60)}min`,
cpuUsage: process.cpuUsage(),
}
})
}
Then poll this endpoint and graph in Datadog/Grafana/Railway metrics.
Success Criteria¶
- Memory usage < 200MB baseline (currently ~400-600MB?)
- Memory growth < 10MB/day
- P95 response time < 1 second
- Cache hit rate > 70%
- No OOM crashes for 30 days
Last Updated: 2025-10-25 Next Review: After Phase 1 implementation