Thursday, April 25, 2024
No menu items!

Exploring the Python Ecosystem



Python is a neat programming language because its syntax is simple, clear, and concise. But Python will not be so successful without the rich third-party libraries. Python is so famous for data science and machine learning that it becomes a de facto lingua franca just because we have so many libraries for those tasks. Without those libraries, Python is not too powerful.

After finishing this tutorial, you will learn

Where are the Python libraries installed in your system
What is PyPI and how a library repository can help your project
How to use the pip command to use a library from the repository

Let’s get started.

Exploring the Python Ecosystem
Photo by Vinit Srivastava. Some rights reserved.

Overview

This tutorial is in five parts, they are

The Python ecosystem
Python libraries location
The pip command
Search for a package
Host your own repository

The Python ecosystem

In the old days before the Internet, the language and the libraries are separated. When you learn C from a textbook, you will not see anything to help you read a CSV file or open a PNG image. Same in the old days of Java. If you need anything not included in the official libraries, you need to search it from various places. How to download or install the libraries would be specific to the vendor of the library.

It would be way more convenient if we have a central repository to host many libraries and let us install the library with a unified interface, and allows us to check for new versions from time to time. Even better, we may also search on the repository with keywords to discover the library that can help our project. The CPAN is an example of libraries repository for Perl. Similarly, we have CRAN for R, RubyGems for Ruby, npm for Node.js, and maven for Java. For Python, we have PyPI (Python Package Index), https://pypi.org/.

The PyPI is platform agnostic. If you installed your Python in Windows by downloading the installer from python.org, you have the pip command to access to PyPI. If you used homebrew on Mac to install Python, you also have the same pip command. It is the same even if you use the built-in Python from Ubuntu Linux.

As a repository, you can find almost anything on PyPI. From large libraries like Tensorflow and PyTorch, to small things like minimal. Because of the vast amount of libraries available on PyPI, you can easily find tools that implemented some important component of your projects. Therefore, we have a strong and growing ecosystem of libraries in Python that making it more powerful every

Python libraries location

When we need a library in our Python scripts, we use

import module_name

but how can Python knows where to read the content of the module and load it for our scripts? Similar to how the bash shell in Linux or command prompt in Windows looks for the command to execute, Python depends on a list of paths to locate the module to load. At any time, we can check the path by printing the list sys.path (after importing the sys module). For example, in a Mac installation of Python via homebrew,

import sys
print(sys.path)

prints the following:

[”,
‘/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python39.zip’,
‘/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9’,
‘/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload’,
‘/usr/local/lib/python3.9/site-packages’]

This means if you run import my_module, Python will look for the my_module in the same directory as your current location first (the first element, empty string). If not found, Python will check for the module located inside the zip file in the second element above. Then under the directory as the third element, and so on. The final path /usr/local/lib/python3.9/site-packages is usually where you installed your third party libraries. The second, third and fourth elements above are where the built-in standard libraries located.

If you have some extra libraries installed elsewhere, you can set up your environment variable PYTHONPATH and point to it. In Linux and Mac for example, we can run the command in the shell as follows:

$ PYTHONPATH=”/tmp:/var/tmp” python print_path.py

where print_path.py is the two-line code above. Running this command will print the following:

[”, ‘/tmp’, ‘/var/tmp’,
‘/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python39.zip’,
‘/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9’,
‘/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload’,
‘/usr/local/lib/python3.9/site-packages’]

which we see Python will search from /tmp, then /var/tmp, before checking the built-in libraries and installed third party libraries. When we set up PYTHONPATH environment variable, we use colon “:” to separate multiple paths to search for our import. In case you are not familiar with the shell syntax, the above command line that defined the environment variable and run the Python script can be broken into two commands:

$ export PYTHONPATH=”/tmp:/var/tmp”
$ python print_path.py

If you’re using Windows, you need to do this instead:

C:> set PYTHONPATH=”C:temp;D:temp”

C:> python print_path.py

That is, we need to use semicolon “;” to separate the paths.

Note: It is not recommanded, but you can modify sys.path in your script before the import statement. Python will search the new locations for the import afterwards but it means to tie your script to a particular path. In other words, your script may not run on another computer.

The pip command

The last path in the sys.path printed above is where your third party libraries normally installed. The pip command is how you get the library from the Internet and install it to that location. The simplest syntax is:

pip install scikit-learn pandas

This will install two packages, scikit-learn and pandas. Later, you may want to upgrade the packages when a new version released. The syntax is:

pip install -U scikit-learn

where -U means to upgrade. To know which packages are outdated, we can use the command:

pip list –outdated

It will print the list of all packages with a newer version in PyPI than your system, such as the following:

Package Version Latest Type
—————————- ———- ——– —–
absl-py 0.14.0 1.0.0 wheel
anyio 3.4.0 3.5.0 wheel

xgboost 1.5.1 1.5.2 wheel
yfinance 0.1.69 0.1.70 wheel

Without the –outdated, the pip command will show you all the installed packages and their versions. You can optionally show the location that each package is installed with the -V option, such as the following:

$ pip list -v
Package Version Location Installer
—————————- ———- ————————————– ———
absl-py 0.14.0 /usr/local/lib/python3.9/site-packages pip
aiohttp 3.8.1 /usr/local/lib/python3.9/site-packages pip
aiosignal 1.2.0 /usr/local/lib/python3.9/site-packages pip
anyio 3.4.0 /usr/local/lib/python3.9/site-packages pip

word2number 1.1 /usr/local/lib/python3.9/site-packages pip
wrapt 1.12.1 /usr/local/lib/python3.9/site-packages pip
xgboost 1.5.1 /usr/local/lib/python3.9/site-packages pip
yfinance 0.1.69 /usr/local/lib/python3.9/site-packages pip

In case you need to check the summary of a package, you can use the pip show command, e.g.,

$ pip show pandas
Name: pandas
Version: 1.3.4
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: The Pandas Development Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /usr/local/lib/python3.9/site-packages
Requires: numpy, python-dateutil, pytz
Required-by: bert-score, copulae, datasets, pandas-datareader, seaborn, statsmodels, ta, textattack, yfinance

This gives you some information such as the home page, where you installed it, as well as what other packages it depends on and the packages depending on it.

When you need to remove a package (e.g., to free up the disk space), you can simply run

pip uninstall tensorflow

One final note to use the pip command: There are two types of packages from pip. The packages distributed as source code, or the packages distributed as binary. They are different only when part of the module is not written in Python but in some other languages (e.g., C or Cython) and needs to compile before use. The source packages will be compiled on your machine but the binary distribution is already compiled, but specific to the platform (e.g., 64-bit Windows). Usually the latter is distributed as “wheel” packages and you need to have wheel installed first to enjoy the full benefit:

pip install wheel

A large package such as Tensorflow will take many hours to compile from scratch. Therefore, it is advisible to have wheel installed and use the wheel packages whenever it is available.

Search for a package

The newer version of pip command disabled the search function because it imposed too much workload to the PyPI system.

The way we can look for a package on PyPI is to use the search box on its webpage

When you type in a keyword, such as “gradient boosting”, it will show you many packages that contains the keyword somewhere:

and you can click on each one for more details (usually including code examples) to determine which one fits your need.

If you prefer the command line, you can install the pip-search package:

pip install pip-search

and then you can run the pip_search command to search with a keyword:

pip_search gradient boosting

It will not give you everything on PyPI because there would be thousands of them. But it will give you the most relevant results. Below is the result from a Mac terminal:

Host your own repository

PyPI is a repository on the Internet. But the pip command does not use it exclusively. If you have some reason wants to have your own PyPI server (for example, hosting internally in your corporate network so your pip does not goes beyond your firewall), you can try out the pypiserver package:

pip install pypiserver

following the package’s documentation, you can set up your server using pypi-server command. Then, you can upload the package and start serving. The detail on how to configure and set up your own server would be too long to describe in detail here. But what it does is to provide an index of available packages in the format that pip command can understand, and provide the package for downloading when pip requests a particular one.

If you have your own server, you can install a package in pip by

pip install pandas –index-url https://192.168.0.234:8080

where the address after –index-url is the host and port number of your own server.

PyPI is not the only repository. If you installed Python with Anaconda, you have an alternative system conda to install packages. The syntax is similar (almost always replace pip with conda will work as expected). However, you should be reminded that they are two different systems that work independently.

Further reading

This section provides more resources on the topic if you are looking to go deeper.

pip documentation, https://pip.pypa.io/en/stable/
Python package index, https://pypi.org/
pypiserver package, https://pypi.org/project/pypiserver/

Summary

In this tutorial, you’ve discovered the command pip and how it brings you the abundant packages from the Python ecosystem to help your project. Specifically you learned

How to look for a package from PyPI
How Python manage its libraries in your system
How to install, upgrade, and remove a package from your system
How can we host our own version of PyPI in our network



The post Exploring the Python Ecosystem appeared first on Machine Learning Mastery.

Read MoreMachine Learning Mastery

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments