Saturday, May 28, 2022
No menu items!
HomeArtificial Intelligence and Machine LearningA First Course on Deploying Python Projects

A First Course on Deploying Python Projects



After all the hard work on developing a project in Python, we want to share our project with other people. It can be your friend or your colleagues. Maybe they do not interested in your code, but they want to run it and make some real use of it. An example is you created a regression model that can predict a value based on input features. Your friend wants to provide their own feature and see what value your model predicts. But as your Python project getting larger, it is not as simple as sending your friend a small script. There can be many supporting files, multiple scripts, and also dependencies to a list of libraries. Getting all these right can be a challenge.

After finishing this tutorial, you will learn

How to harden your code for easier deployment by making it a module
How to create a package for your module so we can rely on pip to manage the dependencies
How to use venv module to create reproducible running environments

Let’s get started!

A First Course on Deploying Python Projects
Photo by Kelly L. Some rights reserved.

Overview

This tutorial is divided into four parts:

From development to deployment
Creating modules
From module to package
Using venv for your project

From development to deployment

When we finished a project in Python, occasionally we would not shelve it but to make it a routine job. We may finished training a machine learning model and actively use the trained model for prediction. We may built a time series model and use it for next-step prediction but new data is coming in every day so we need to re-train it to adapt to the development and keep future predictions accurate.

Whatever the reason it is, we need to make sure the program can run as expected. However, this can be harder than we thought. A simple Python script may not be a difficult issue but as our program getting larger with more dependencies, a lot of things can go wrong. For example, a newer version of a library that we used can break the workflow. Or our Python script would run some external program and that may cease to work after an upgrade of our OS. Another case is when the program depends on some files located at a specific path, and we may accidentally deleted or renamed a file.

There is always a way that our program failed to execute. But we have some techniques to make it more robust and more reliable.

Creating modules

In a previous post, we demonstrated that we can check a code snippet’s time to finish with the following command:

python -m timeit -s ‘import numpy as np’ ‘np.random.random()’

and at the same time, we can also use it as part of a script and do the following:

import timeit
import numpy as np

time = timeit.timeit(“np.random.random()”, globals=globals())
print(time)

The import statement in Python allows you to reuse functions defined in another file by considering it as a module. You may wonder how we can make a module not only provide functions but also an executable program. This is the first step to help deploying our code as if we can make our module executable, the users would not need to understand how our code structured to use it.

If our program is large enough to have multiple files, it is better to package it as a module. A module in Python is usually a folder of Python scripts and usually with a clear entry point. Hence it is more convenient to send to other people and easier to understand the flow. Moreover, we can add version to the module and let pip to keep track on the version installed.

A simple, single file program can be written as follows:

import random

def main():
n = random.random()
print(n)

if __name__ == “__main__”:
main()

If we save this as randomsample.py in the local directory, we can either run it with

python randomsample.py

or

python -m randomsample

And we can reuse the functions in another script with

import randomsample

randomsample.main()

This works because the magic variable __name__ will be “__main__” only if the script is being run as a main program but not when it is imported from another script. With this, probably your machine learning project can be packaged as the following:

regressor/
__init__.py
data.json
model.pickle
predict.py
train.py

which regressor is a directory with those five files in it. And __init__.py is an empty file, just to signal that this directory is a Python module that you can import. The script train.py is as follows:

import os
import json
import pickle
from sklearn.linear_model import LinearRegression

def load_data():
current_dir = os.path.dirname(os.path.realpath(__file__))
filepath = os.path.join(current_dir, “data.json”)
data = json.load(open(filepath))
return data

def train():
reg = LinearRegression()
data = load_data()
reg.fit(data[“data”], data[“target”])
return reg

and that of predict.py is:

import os
import pickle
import sys
import numpy as np

def predict(features):
current_dir = os.path.dirname(os.path.realpath(__file__))
filepath = os.path.join(current_dir, “model.pickle”)
with open(filepath, “rb”) as fp:
reg = pickle.load(fp)
return reg.predict(features)

if __name__ == “__main__”:
arr = np.asarray(sys.argv[1:]).astype(float).reshape(1,-1)
y = predict(arr)
print(y[0])

Then, we can run the following under the parent directory of regressor/ to load the data and train a linear regression model, then save the model with pickle:

import pickle
from regressor.train import train

model = train()
with open(“model.pickle”, “wb”) as fp:
pickle.save(model, fp)

and if we move this pickle file into the regressor/ directory, we can also do the following in a command line to run the model:

python -m regressor.predict 0.186 0 8.3 0 0.62 6.2 58 1.96 6 400 18.1 410 11.5

which the numerical arguments are a vector of input features to the model. If we further move out the if block, namely, to create a file regressor/__main__.py with the following code:

import sys
import numpy as np
from .predict import predict

if __name__ == “__main__”:
arr = np.asarray(sys.argv[1:]).astype(float).reshape(1,-1)
y = predict(arr)
print(y[0])

then we can run the model directly from the module:

python -m regressor 0.186 0 8.3 0 0.62 6.2 58 1.96 6 400 18.1 410 11.5

Note the line form .predict import predict in the example above is using the Python’s relative import syntax. This should be used inside a module to import components from other scripts of the same module.

From module to package

If you want to distribute your Python project as a final product, it is convenient to be able to install your project as a package with pip install command. This can be done easily. As you already created a module from your project, what you need to supplement is some simple setup instructions. Now you need to create a project directory, put your module in it, together with a pyproject.toml file, a setup.cfg file, and a MANIFEST.in file. The file structure would be like this:

project/
pyproject.toml
setup.cfg
MANIFEST.in
regressor/
__init__.py
data.json
model.pickle
predict.py
train.py

We will use setuptools as it has become a standard for this task. The file pyproject.toml is to specify setuptools:

[build-system]
requires = [“setuptools”]
build-backend = “setuptools.build_meta”

The key information is provided in setup.cfg. We need to specify the name of the module, the version, some optional description, what to include and what to depend on, such as the following:

[metadata]
name = mlm_demo
version = 0.0.1
description = a simple linear regression model

[options]
packages = regressor
include_package_data = True
python_requires = >=3.6
install_requires =
scikit-learn==1.0.2
numpy>=1.22, <1.23
h5py

and the MANIFEST.in is just to specify what extra file we need to include. In projects that do not have non-Python script included, this file can be omitted. But in our case, we need to include the trained model and the data file:

include regressor/data.json
include regressor/model.pickle

then at the project directory, we can install it as a module into our Python system with the following command:

pip install .

and afterwards the following code works anywhere as regressor is a module accessible in our Python installation:

import numpy as np
from regressor.predict import predict

X = np.asarray([[0.186,0,8.3,0,0.62,6.2,58,1.96,6,400,18.1,410,11.5]])
y = predict(X)
print(y[0])

There are a few details worth explaining in the setup.cfg: The metadata section is for pip system. Hence we named our package as mlm_demo which you can see this in the output of pip list command. However, Python’s module system will recognize the module name as regressor as specified in the options section. Therefore this is the name you should use in the import statement. Often, these two names are the same for the convenience of the users and that’s why people use the name “package” and “module” interchangeably. Similarly, the version 0.0.1 is appearing in pip but not known from the code. It is a convention to put this in __init__.py in the module directory so you can check the version in another script that uses it:

__version__ = ‘0.0.1’

The install_requires part in options section is the key to make our project run. What it means is that, when we install this module, we also need to install those other modules at those versions (if specified). This may create a tree of dependencies but pip will take care of it when you run the pip install command. As you can expect, we are using the Python’s comparison operator == for a specific version. But if we can accept multiple versions, we use comma (,) to separate the conditions, such as the case of numpy above.

Now you can ship the entire project directory to other people (e.g., in a ZIP file) and they can install it with pip install . in the project directory, and then run your code with python -m regressor given appropriate command line argument provided.

A final note: Perhaps you heard of the requirements.txt file in a Python project. It is just a text file, usually place in a directory with a Python module or some Python scripts. It has a format similar to the dependency specification as mentioned above. For example, it may look like

scikit-learn==1.0.2
numpy>=1.22, <1.23
h5py

What is aimed for is when you do not want to make your project into a package but still giving hints on the libraries and their versions that your project expects. This file can be understood by pip and we can make it to set up our system to prepare for the project:

pip install -r requirements.txt

but this is just for a project in development and that’s all the convenience the requirements.txt can provide.

Using venv for your project

The above is probably the most efficient way to ship and deploy a project since you included only the most essential files. This is also the recommended way because it is platform-agnostic. Should we changed our Python version or moved to a different OS, this still works (unless some specific dependency forbids us).

But there are cases that we may want to reproduce an exact environment for our project to run. For example, instead of requiring some packages installed, we want some must not installed. Also there are cases that after we installed a package with pip, the version dependency breaks after another package installed. We can solve this problem with the venv module in Python.

The venv module is from Python standard library to allow us to create a virtual environment. It is not a virtual machine or virtualization like Docker can provide to us, but instead, it heavily modified the path location that Python operates. For example, we can install multiple versions of Python in our OS but a virtual environment always assume the python command means a particular version. Another example is that within one virtual environment, we can run pip install to set up some packages in a virtual environment directory that will not interfere with the system outside.

To start with venv, we can simply find a good location and run the command

$ python -m venv myproject

Then there will be a directory named myproject created. A virtual environment is supposed to operate in a shell (so the environment variables can be manipulated). To activate a virtual environment, we execute the activation shell script with the following command (e.g., under bash or zsh in Linux and macOS)

$ source myproject/bin/activate

and afterwards, you’re under the Python virtual environment. The command python will be the command you created the virtual environment (in case you have multiple Python versions installed in your OS). And the packages installed will be located under myproject/lib/python3.9/site-packages (assuming Python 3.9). When you run pip install or pip list, you only see the packages under the virtual environment.

To leave the virtual environment, we run deactivate in the shell command line:

$ deactivate

and this is defined as a shell function.

Using virtual environments would be particularly useful if you have multiple projects in development and they requires different versions of packages (such as different version of TensorFlow). You can simply create a virtual environment, activate it, install the correct versions of all the libraries you needed using pip install command, then put your project code inside the virtual environment. Your virtual environment directory can be huge in size (e.g., just installing TensorFlow with its dependencies will consume almost 1GB of disk space). But afterwards, shipping the entire virtual environment directory to others can guarantee the exact environment to execute your code. This can be an alternative to Docker container if you prefer not to run the Docker server.

Further Reading

Indeed, some other tools exists that help us deploy our projects neatly. Docker mentioned above can be one. The zipapp package from Python standard library is also an interesting tool. Below are resources on the topic if you are looking to go deeper.

Articles

Python tutorial, Chapter 6, modules
Distributing Python Modules
How to package your Python code
Question about various venv-related packages on StackOverflow

APIs and software

Setuptools
venv from Python standard library

Summary

In this tutorial, you’ve seen how we can confidently wrap up our project and deliver to other user to run it. Specifically you learned

The minimal change to a folder of Python scripts to make them a module
How to convert a module into a package for pip
What is a virtual environment in Python and how to use it



The post A First Course on Deploying Python Projects appeared first on Machine Learning Mastery.

Read MoreMachine Learning Mastery

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments