Machine Learning Model Deployment using FastAPI on Kubernetes

Charu Makhijani
9 min readApr 3, 2023

--

ML model as a microservice

Photo by charlesdeluvio on Unsplash

What is FastAPI?

FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on the standard Python-type hints. It was developed by Sebastián Ramírez, who is also the author of the uvicorn server and was released in 2019.

FastAPI is designed to be easy to use, fast, and scalable. It is built on top of Starlette for the web parts and Pydantic for the data parts. It leverages the power of Python’s type annotations and async/await syntax to provide automatic data validation, serialization, and documentation. FastAPI also includes the automatic generation of OpenAPI and JSON Schema documentation, and it supports GraphQL.

One of the key benefits of FastAPI is its performance. It uses asynchronous programming and a modern web server, such as uvicorn or Hypercorn, to handle high concurrency with ease.

FastAPI is also very easy to learn and use, and its developer-friendly approach can lead to faster and more productive development cycles.

Advantages

  1. Fast Performance: FastAPI is designed to be fast and has been benchmarked as one of the fastest Python web frameworks available. It leverages the power of asynchronous programming and modern web servers like Uvicorn to handle high concurrency with ease.
  2. Easy to Use: FastAPI is designed to be easy to use and learn. Its developer-friendly approach, based on Python’s type annotations, enables developers to build APIs quickly and with fewer errors.
  3. Automatic Documentation: FastAPI includes the automatic generation of OpenAPI and JSON Schema documentation, which can save developers a lot of time and effort in documenting their APIs.
  4. Data Validation: FastAPI uses Pydantic for data validation and serialization, which can help reduce errors in data input and output.
  5. Asynchronous Requests: FastAPI can handle many requests simultaneously using asynchronous programming, which can greatly increase the number of requests that can be processed in parallel leading to faster response times and improved user experience.
  6. Better error handling and custom messages: FastAPI automatically generates detailed error messages when an exception is raised, which can help developers quickly identify and fix errors in their code. FastAPI also allows customized error messages using Python’s standard exception-handling mechanisms, or by defining custom exception handlers within the application.
  7. GraphQL Support: FastAPI also supports GraphQL, a query language for APIs that can help developers build more efficient and flexible APIs.
  8. Standards-based: FastAPI is built on top of open standards such as Python 3.7+, HTTP, JSON, and async/await syntax, making it easy to integrate with other technologies.
  9. Production Ready: FastAPI is production-ready and has been used successfully in many high-traffic, high-demand applications.

Steps to deploy using FAST API

FastAPI Setup

The setup for FastAPI is the same as any other python module. Install the FastAPI, but it also needs uvicorn installation. Install both using the following commands:-

Image from author

Creating Basic API

To create a basic FastAPI, we’ll follow these steps-

  1. Import uvicorn and FastAPI
  2. Declare FastAPI instance
  3. Create a method with a root path and return a text message
  4. Create the main method to run uvicorn server on the host(127.0.0.1) at port 8000.

Test API

To test FastAPI, run the command below on the terminal-

uvicorn app:app --reload

The above command follows the following format:-

  • uvicorn refers to an ASGI web server implementation for Python.
  • The first app refers to the name of the file.
  • The second app refers to the declared FastAPI instance — Line 4 in the example above.
  • reload is to restart the server every time it's reloaded.

The next step is to go to http://127.0.0.1:8000/ address in your browser.

Image from Author

Here you will see the message “First Route” (line 8 from the file above) because the root path is called.

Deploy ML Model

Let’s take an example to understand the ML model deployment with FastAPI. Here we’ll create a machine learning model for Income Prediction, then build an API & deploy it using FastAPI with uvicorn.

Create ML Model

As this post is about the deployment only, I won’t cover the model creation part here but you can get the complete code and dataset for creating the Income Prediction ML model, from this git repo. Once you create the ML model, save it. Also save any encodings you are doing while modeling creation, as we have to do the same encodings for the API request.

Request Body Structure

Using the request body the input data will be passed from the client to the API. In FastAPI, Pydantic models are used to define the data structure for the input data. The Pydantic does all the type checking for the input parameters and returns an error for wrong input. Let’s add a data class to our existing code and create a route for the request body:

The route function declares a parameter “data” of the type “Details” defined above. This “Details” model is inherited from the Pydantic base model and offers data validation. To test out this route, I am using thunder client VS code extension to make a post request to our API “/apiv3/” route:

Test API

To test the API we’ll use Swagger UI. Go to http://127.0.0.1:8000/docs address and you’ll land on this page with the default root method and predict method.

Image by Author

Click on Predict option and it’ll open this request body.

Image by Author

Here you’ll see the request schema with all the input parameters for model prediction. Now click on the “Try it out” option in the top right corner and it’ll take you to the request input page.

Image by Author

On this page fill in all the input fields for the Income Prediction model and click on Execute. It’ll take you to the response page.

Image by Author

The response above shows the prediction from the Income Prediction ML model with the above input parameters. It’ll also show the response code from the execution of the ML model.

If you are getting a response similar to the above with response code 200, then you have successfully deployed your ML model using FastAPI. Congrats!

That’s it about the API creation using the FastAPI. You can check the automatic documentation created by FastAPI in the browser http://127.0.0.1:8000/redoc

Image by Author

FastAPI vs Flask

FastAPI and Flask are excellent web frameworks for building web applications in Python.

FastAPI is a modern web framework that is designed for high performance, scalability, and built-in support for asynchronous programming. It also has automatic data validation and built-in API documentation, making it well-suited for building API-driven applications that require high concurrency and performance.

Flask, on the other hand, is a more traditional web framework that is relatively easy to learn and use. It has a larger ecosystem of plugins and libraries and is well-suited for small to medium-sized applications that do not require high levels of concurrency or performance. Flask also has better support for template rendering and serving web pages.

Key Differences

  1. Performance: FastAPI is generally considered to be faster than Flask due to its use of asynchronous programming and modern web servers, which can handle high concurrency with ease. Flask, on the other hand, uses a more traditional synchronous approach, which can be slower in high-traffic situations.
  2. Ease of Use: FastAPI is designed to be easy to use, particularly for developers who are familiar with Python’s type annotations and async/await syntax. Flask, while also relatively easy to use, may require more boilerplate code and configuration for some tasks.
  3. Documentation: FastAPI includes the automatic generation of OpenAPI and JSON Schema documentation, which can save developers a lot of time and effort in documenting their APIs. Flask does not include this functionality by default, although there are third-party libraries that can be used to generate documentation.
  4. Scalability: FastAPI’s asynchronous approach and support for modern web servers make it highly scalable, particularly for handling large numbers of concurrent requests. Flask can also scale to some extent but may require additional configuration and optimization to handle high traffic.

Ultimately, the choice between FastAPI and Flask depends on the specific requirements and goals of your project, your familiarity with Python and web development, and your preference for synchronous or asynchronous programming. Therefore, it is not accurate to say that one framework is inherently better than the other.

Limitations

While FastAPI has many advantages, it also has some limitations that developers should be aware of. Here are a few:

  1. Learning Curve: While FastAPI is designed to be easy to use, it does require some familiarity with Python’s type annotations and async/await syntax. Developers who are new to these concepts may require some additional learning time to become proficient with the framework.
  2. Limited Ecosystem: FastAPI is a relatively new framework. As such, it has a smaller ecosystem of third-party plugins and libraries compared to more established frameworks like Flask or Django.
  3. Limited ORM Support: While FastAPI works well with ORMs like SQLAlchemy and Tortoise ORM, it does not have the same level of support for database management as some other frameworks.
  4. Asynchronous Programming Can be Complex: While FastAPI’s support for asynchronous programming is one of its key advantages, it can also be more complex than traditional synchronous programming, particularly for developers who are not familiar with this approach.
  5. Limited Support for Template Rendering: FastAPI is primarily designed for building APIs, and as such, it does not have built-in support for rendering HTML templates. While it is possible to use external libraries for this purpose, it may be less convenient than other frameworks that include built-in support for template rendering.

Overall, while FastAPI has some limitations, it is a powerful and flexible framework that can be an excellent choice for building high-performance APIs. Developers should carefully consider their project requirements and the capabilities of the framework before deciding whether to use FastAPI or another Python web framework.

Conclusion

Machine Learning model deployment on Kubernetes using FastAPI provides a powerful combination of high-performance web development and scalable containerization technology. With FastAPI’s built-in support for asynchronous programming, automatic data validation, and API documentation generation, developers can quickly build and deploy machine learning models as scalable web services.

By leveraging Kubernetes as the container orchestration platform, machine learning models deployed using FastAPI can scale horizontally to handle large numbers of requests and provide high availability. Additionally, Kubernetes provides a robust set of features for managing containerized applications, including automated deployment, rolling updates, and self-healing.

Overall, deploying machine learning models using FastAPI and Kubernetes is a powerful way to bring scalable, high-performance web services to your machine learning applications.

To access the complete source code to deploy the ML model as a microservice using FastAPI, please refer GitHub link.

If you want to learn more about model deployment strategies, please refer-

Thanks for the read. If you like the story please like, share, and follow for more such content. As always, please reach out for any questions/comments/feedback.

Github: https://github.com/charumakhijani
LinkedIn:
https://www.linkedin.com/in/charumakhijani/

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Charu Makhijani
Charu Makhijani

Written by Charu Makhijani

ML Engineering Leader | Writing about Data Science, Machine Learning, Product Engineering & Leadership | https://github.com/charumakhijani

Responses (1)

Write a response