Most customers are operating in a restrictive environment with limited egress connectivity to the Internet. This results in customers investing in third-party tools such as Jfrog Artifactory, Nexus, etc. to store operating system packages and libraries. There is a pressing need to download these dependencies without going to the internet and also avoid investing in a third-party tool if there are budgetary or time constraints.
In this blog, we will describe how packages.cloud.google.com subdomain works and helps start to address these challenges. This solution focuses on addressing how to download Debian/Ubuntu packages from the Google-managed repositories; however, the repo does not contain packages for popular programming languages such as Python, Javascript, etc.
So let’s get started….
Apt package manager
If you create a Linux VM on Google Cloud with Debian or Ubuntu operating system, one of the first commands you have to run before downloading a package is to download package information from all configured sources.
Apt is a package management tool that downloads packages from one or more software repositories (sources) and installs them onto your computer. A repository is generally a network server, such as the official DebianStable repository.
The main Apt sources configuration file is at /etc/apt/sources.list. To add custom sources, creating separate files under /etc/apt/sources.list.d/ is preferred.
Understanding the configuration file
Let’s take a look at the files in /etc/apt directory and /etc/apt/sources.list file to start with.
In the screenshot above, the /etc/apt/sources.list file contains multiple entries that notably show the archive type, repository URL, distribution and component. For more details on each attribute for Debian distribution, please refer to this link.
Archive type: The first word on each line, deb or deb-src, indicates the type of archive. deb indicates that the archive contains binary packages (deb), the pre-compiled packages that we normally use. deb-src indicates source packages, which are the original program sources plus the Debian control file (.dsc) and the diff.gz containing the changes needed for packaging the program. Source packages provide you with all of the necessary files to compile or otherwise, build the desired piece of software.
Repository URL: The next entry on the line is a URL to the repository that you want to download the packages from. The main list of Debian repository mirrors is located here.
Distribution: The ‘distribution’ can be either the release code name / alias ( stretch, buster, bullseye, bookworm, sid) or the release class (oldoldstable, oldstable, stable, testing, unstable) respectively.
Component: mainconsists of DFSG-compliant packages, which do not rely on software outside this area to operate. These are the only packages considered part of the Debian distribution.
Google startup process
Let’s see what is under sources.list.d directory that Google adds as part of the startup process. There are a couple of files and both contain links to google managed repositories (packages.cloud.google.com)
However, the repositories that are added by default will only help us download gcloud CLI components such as google-cloud-sdk-datalab , google-cloud-sdk-spanner-emulator and kubectl.
For example, if you wanted to learn which repository a potential package were to be downloaded from, the command below shows which repository and version you would be directed to.
The screenshots below show that apt will try to look for those packages in the repositories that are configured by default in gce_sdk.list and google-cloud.list
But, if we run a sudo apt-get update command, it will fail if we do not have egress connectivity to the internet. When it tries to connect to the external debian repository that is configured by default in the /etc/apt/sources.list file, it will timeout.
Packages.cloud.google.com – Apt mirror repo
Packages.cloud.google.com is a repository that Google maintains and hosts a mirror repository for popular Debian/ Ubuntu releases. See table below to understand the mapping between the OS release codenames indicated by the arrows and the OS versions.
Please note that Ubuntu repositories are subdivided into base (no suffix), updates (-updates), security (-security), and backports (-backports), universe (-universe), security universe (-security-universe), and updates universe (-updates-universe). This subdivision has to be followed when configuring repositories on Ubuntu instances.
Packages.cloud.google.com – demo
For the rest of this demo, I will be working out of a Debian OS VM. I will verify what version I am running on and modify the apt sources file accordingly to point to the right URLs. The approach shown in subsequent steps can be extended to Ubuntu OS as long as you point to the appropriate repository URLs following the Ubuntu specific repository structure described earlier.
I will create a new file under the /etc/apt/sources.list.d directory as “google-packages.list” that points to the appropriate repository URLs based on the semantics explained in sources.listformat.
Now that I have configured the alternate repository, let’s test by installing a debian package, htop.
Since I still have a file at /etc/apt/sources.list that refers to debian mirrors, our Update will still check that location first before falling back on the new packages.cloud.google.com mirror repositories.
When multiple Apt repositories are enabled, a package can exist in several of them. To know which one should be installed, Apt assigns priorities to packages. The default is 500. If the packages have the same priority, the package with a higher version number (most recent) wins. If packages have different priorities, the one with the higher priority wins.
Since our package is now installed into the local OS, we see a priority (100) for locally installed packages.
Prerequisites
To utilize the packages.cloud.google.com, there are also other networking configurations that you need to configure in Google Cloud illustrated below.
1. Ensure that the subnet where the VM is created has Private Google Access enabled
2. Create a firewall rule that allows egress to private VIP. Packages.cloud.google.com is only supported by the Private Google API endpoint.
3. Create DNS records to resolve to packages.cloud.google.com domain.
4. Create a route to the private google api endpoints. This is necessary if you do not have the default route to the internet (0.0.0.0/0).
Summary
In this blog post, we provided an overview of the subdomain, packages.cloud.google.com and how it can be used to download software packages for Debian and Ubuntu distributions. We also covered the networking requirements that are needed to make it work in a tightly controlled environment. To view the contents of the repository, please refer to this link.
Cloud BlogRead More