10 Ways to Make FastAPI Blazing Fast: from Code to Production
James Reed
Infrastructure Engineer · Leapcell

10 FastAPI Performance Optimization Tips: End-to-End Speedup from Code to Deployment
FastAPI has become one of the preferred frameworks for Python API development, thanks to its support for asynchronous operations, automatic documentation, and strong type validation. However, in high-concurrency scenarios, unoptimized services may suffer from increased latency and decreased throughput. This article compiles 10 practical optimization solutions, each including implementation steps and design principles, to help you maximize FastAPI's performance potential.
1. Prioritize async/await to Avoid Wasting Asynchronous Advantages
How to implement: Use asynchronous syntax for view functions, dependencies, and database operations, and pair them with asynchronous libraries such as aiohttp
(for HTTP requests) and sqlalchemy.ext.asyncio
(for databases):
from fastapi import FastAPI import aiohttp app = FastAPI() @app.get("/async-data") async def get_async_data(): async with aiohttp.ClientSession() as session: async with session.get("https://api.example.com/data") as resp: return await resp.json() # Asynchronous suspension without blocking the event loop
Design principle: FastAPI is based on the ASGI protocol, with an event loop at its core. Synchronous functions (defined with def
) will monopolize the event loop thread. For example, while waiting for a database response, the CPU remains completely idle but cannot process other requests. With async/await
, when an I/O operation is suspended, the event loop can schedule other tasks, increasing CPU utilization by 3 to 5 times.
2. Reuse Dependency Instances to Reduce Reinitialization Overhead
How to implement: For stateless dependencies like database engines and configuration objects, cache instances using lru_cache
or the singleton pattern:
from fastapi import Depends from functools import lru_cache from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine @lru_cache(maxsize=1) # Create only 1 engine instance for global reuse def get_engine(): return create_async_engine("postgresql+asyncpg://user:pass@db:5432/db") async def get_db(engine=Depends(get_engine)): async with AsyncSession(engine) as session: yield session
Design principle: By default, FastAPI creates a new dependency instance for each request. However, initializing components like database engines and HTTP clients (e.g., establishing connection pools) consumes time and resources. Caching instances can reduce initialization overhead by over 90% while preventing excessive database pressure caused by the over-creation of connection pools.
3. Simplify Pydantic Models to Reduce Validation Costs
How to implement:
- Retain only fields necessary for the API;
- Use
exclude_unset
to reduce serialized data; - Use
typing
instead of Pydantic for simple scenarios:
from pydantic import BaseModel class UserResponse(BaseModel): id: int name: str # Remove unused fields like "created_at_timestamp" for the frontend @app.get("/users/{user_id}", response_model=UserResponse) async def get_user(user_id: int, db=Depends(get_db)): user = await db.get(User, user_id) return user.dict(exclude_unset=True) # Return only non-default values to reduce serialization time
Design principle: Pydantic implements type validation through reflection. The more fields a model has and the deeper the nesting, the greater the reflection overhead. In high-concurrency scenarios, validation and serialization of complex models can account for 40% of request latency. Simplifying models directly reduces reflection operations, improving response speed by 20% to 30%.
4. Use Uvicorn + Gunicorn to Maximize Multi-Core CPU Utilization
How to implement: In production environments, use Gunicorn for process management and start Uvicorn worker processes equal to the number of CPU cores:
# Example for 4-core CPU: 4 Uvicorn processes bound to port 8000 gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
Design principle: Python's Global Interpreter Lock (GIL) prevents a single process from utilizing multiple cores. Uvicorn is a pure asynchronous ASGI server, but a single process can only run on one core. Gunicorn manages multiple processes, allowing each Uvicorn process to occupy one core, resulting in linear throughput improvement with the number of cores.
5. Cache High-Frequency Data to Reduce Repeated Queries/Calculations
How to implement: Use fastapi-cache2
+ Redis to cache popular data (e.g., configurations, leaderboards) and set a reasonable expiration time:
from fastapi_cache2 import CacheMiddleware, caches, cache from fastapi_cache2.backends.redis import CACHE_KEY, RedisCacheBackend app.add_middleware(CacheMiddleware) caches.set(CACHE_KEY, RedisCacheBackend("redis://redis:6379/0")) @app.get("/popular-products") @cache(expire=300) # Cache for 5 minutes to avoid repeated execution of complex SQL async def get_popular_products(db=Depends(get_db)): return await db.execute("SELECT * FROM products ORDER BY sales DESC LIMIT 10")
Design principle: API performance bottlenecks often arise from "repeated time-consuming operations" (e.g., scanning large tables, complex algorithms). Caching temporarily stores results, allowing subsequent requests to read data directly, reducing latency from hundreds of milliseconds to milliseconds. Distributed caching also supports sharing across multiple instances, making it suitable for cluster deployments.
6. Database Optimization: Connection Pools + Indexes + N+1 Prevention
How to implement:
- Use asynchronous connection pools to control the number of connections;
- Create indexes for query fields;
- Use
select_related
to avoid N+1 queries:
# Query a user and their associated orders in one go, avoiding the N+1 problem of "query 10 users + query 10 orders" async def get_user_with_orders(user_id: int, db: AsyncSession = Depends(get_db)): return await db.execute( select(User).options(select_related(User.orders)).where(User.id == user_id) ).scalar_one_or_none()
Design principle: Databases are the performance bottleneck for most APIs:
- Connection establishment is time-consuming (connection pools reuse connections);
- Full-table scans are slow (indexes reduce query complexity from O(n) to O(log n));
- N+1 queries cause multiple I/O operations (a single query resolves this). These three optimizations can reduce database latency by over 60%.
7. Delegate Static Files to Nginx/CDN—Don’t Overburden FastAPI
How to implement: Use Nginx as a reverse proxy for static resources, and pair it with a CDN for large-scale projects:
server { listen 80; server_name api.example.com; # Nginx handles static files with a 1-day cache location /static/ { root /path/to/app; expires 1d; } # Forward API requests to FastAPI location / { proxy_pass http://127.0.0.1:8000; proxy_set_header Host $host; } }
Design principle: FastAPI is an application server, and its efficiency in handling static files is over 10 times lower than Nginx. Nginx uses an asynchronous non-blocking model, specifically optimized for static file transmission; CDNs distribute content through edge nodes to further reduce user latency.
8. Streamline Middleware to Reduce Request Interception Overhead
How to implement: Retain only core middleware (e.g., CORS, authentication) and remove debugging middleware:
from fastapi.middleware.cors import CORSMiddleware # Retain only CORS middleware and specify allowed origins and methods app.add_middleware( CORSMiddleware, allow_origins=["https://example.com"], # Avoid wildcard * to reduce security risks and performance loss allow_credentials=True, allow_methods=["GET", "POST"], # Open only necessary methods )
Design principle: Middleware intercepts every request/response. Each additional middleware adds an extra layer of processing to the request. If middleware contains I/O operations (e.g., logging), it can also block the event loop. Streamlining middleware can reduce request chain latency by 15% to 20%.
9. Avoid Calling Synchronous Functions in Asynchronous Views to Prevent Blocking
How to implement:
- Prioritize asynchronous libraries (use
aiohttp
instead ofrequests
); - If synchronous functions are unavoidable, wrap them with
asyncio.to_thread
:
import asyncio import requests # Synchronous library—cannot be called directly in async views @app.get("/sync-data") async def get_sync_data(): # Execute synchronous functions in a thread pool without blocking the event loop resp = await asyncio.to_thread(requests.get, "https://api.example.com/sync-data") return resp.json()
Design principle: Synchronous functions occupy the event loop thread, causing other asynchronous tasks to queue. asyncio.to_thread
offloads synchronous functions to a thread pool, allowing the event loop to continue processing other requests and balancing the use of synchronous libraries with performance.
10. Use Profiling Tools to Identify Bottlenecks—Avoid Blind Optimization
How to implement:
- Use
cProfile
to analyze slow requests; - Use Prometheus + Grafana for metric monitoring:
import cProfile @app.get("/profile-me") async def profile_me(): pr = cProfile.Profile() pr.enable() result = await some_expensive_operation() # Business logic to be analyzed pr.disable() pr.print_stats(sort="cumulative") # Sort by cumulative time to identify bottlenecks return result
Design principle: The premise of optimization is identifying bottlenecks—adding caching to non-time-consuming functions is meaningless. Profiling tools accurately locate time-consuming points (e.g., a SQL query accounting for 80% of latency), while monitoring tools detect online issues (e.g., sudden latency spikes during peak hours), ensuring targeted optimization.
Summary
The core logic of FastAPI performance optimization is to "reduce blocking, reuse resources, and avoid redundant work". From code-level optimizations like async/await
and simplified models, to deployment-level improvements such as server combinations and CDNs, and data-level enhancements like caching and database optimization—implementing these tips end-to-end will enable your FastAPI service to maintain low latency and high throughput even under high concurrency.
Leapcell: The Best of Serverless Web Hosting
Finally, I recommend Leapcell —the ideal platform for deploying Python services:
🚀 Build with Your Favorite Language
Develop effortlessly in JavaScript, Python, Go, or Rust.
🌍 Deploy Unlimited Projects for Free
Only pay for what you use—no requests, no charges.
⚡ Pay-as-You-Go, No Hidden Costs
No idle fees, just seamless scalability.
🔹 Follow us on Twitter: @LeapcellHQ