Monday, February 26, 2024
No menu items!
HomeData Engineering and Data WarehousingDotSlash: Simplified executable deployment

DotSlash: Simplified executable deployment

We’ve open sourced DotSlash, a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I/O-heavy clone operations.
With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the appropriate remote artifact for the current operating system and CPU.
At Meta, the overwhelming majority of DotSlash files are generated and committed to source control via automation, so we are also releasing a complementary GitHub Action to assemble a comparable setup outside of Meta.
DotSlash is written in Rust for performance and is cross-platform.

At Meta, we have a vast array of first-party and third-party command line tools that need to be available across a diverse range of developer environments. Reliably getting the appropriate version of each tool to the right place can be a challenging task.

For example, the source code for many of our first-party tools lives alongside the projects that leverage them inside our massive monorepo. For such tools, the standard practice is to use buck2 run to build and run executables from source, as necessary. This has the advantage that tools and the projects that use them can be updated atomically in a single commit.

While we use extensive caching and remote execution to provide our developers with fast builds, there will always be cases where buck2 run is going to be considerably slower than running the prebuilt binary directly. While we leverage a virtual filesystem that reduces the drawbacks of checking large binaries into source control compared to a traditional physical filesystem, there are still pathological cases that are best avoided by keeping such files out of the repository in the first place. (This practice also eliminates a large class of code provenance issues.)

Further, not everything we use is built from source, nor do all of our tools live in source control. For example, there is the case of buck2 itself, which needs to be pre-built for developers and readily available on the $PATH for convenience. For core developer tools like Buck2 and Sapling, we use a Chef recipe to deploy new versions, installing them in /usr/local/bin (or somewhere within the appropriate %PATH$% on Windows) across a variety of developer environments.

While this approach is reasonable for commonly-used executables, it is not a great fit for the long tail of tools. That is, while it might be convenient to install everything a developer might need in /usr/local/bin by default, this could easily add up to tens or hundreds of gigabytes of disk, very little of which will end up being executed, in practice. In turn, this makes Chef runs more expensive and prone to failure.

Introducing DotSlash

DotSlash attempts to solve many of the problems described in the previous section. While we do not claim it is a silver bullet, we have found it to be the right solution for many of our internal use cases. At Meta, DotSlash is executed hundreds of millions of times per day to deliver a mix of first-party and third-party tools to end-user developers as well as hermetic build environments.

The idea is fairly simple: we replace the contents of a set of platform-specific, heavyweight executables with a single lightweight text file that can be read by the dotslash command line tool (which must be installed on the user’s $PATH). We call such a file a DotSlash file. It contains the information DotSlash needs to fetch and run the executable it replaces for the host platform. By convention, a DotSlash file maintains the name of the original file rather than calling attention to itself via a custom file extension. Instead, it aspires to be a transparent wrapper for the original executable. To that end, a DotSlash file is required to start with #!/usr/bin/env dotslash (even on Windows) to help maintain this illusion.

The following is a hypothetical DotSlash file named node that is designed to run v18.19.0 of Node.js. Note that users across x86 Linux, x86 macOS, and ARM macOS can all run the same DotSlash file, as DotSlash will take care of doing the work to select the appropriate executable for the host on which it is being run. In this way, DotSlash simplifies the work of cross-platform releases: 

In this example, the workflow DotSlash runs through when executing node looks like: 

See the How DotSlash Works documentation for details.

Because of how #! works on Mac and Linux, when a user runs ./node –version, the invocation effectively becomes dotslash ./node –version. DotSlash requires that its first argument is a file that starts with #!/usr/bin/env dotslash, as mentioned above. Once it verifies the header, it uses a lenient JSON parser to read the rest of the file. DotSlash finds the entry in the “platforms” section that corresponds to the host it is running on.

DotSlash uses the information in this entry and hashes it to compute a corresponding file path (that doubles as a key) in the user’s local DotSlash cache. DotSlash attempts to exec the corresponding file, replacing argv0 with the path to the DotSlash file and forwarding the remaining command line arguments (–version, in this example) to the exec invocation.

If the target executable is in the cache, the user immediately runs Node.js as originally intended. In the event of a cache miss (indicated by exec failing with ENOENT), DotSlash uses the information from the DotSlash file to determine the URL it should use to fetch the artifact containing the executable as well as the size and digest information it should use to verify the contents. If this succeeds, the verified artifact is atomically mv‘d into the appropriate location in the DotSlash cache and the exec invocation is performed again. Note that DotSlash uses advisory file locking to avoid making duplicate requests even if DotSlash files requiring the same artifact are run concurrently.

Note that it is common to have multiple DotSlash files refer to the same artifact, such as a .tar.zst file, while each DotSlash file maps to a distinct entry within the archive. For example, suppose node-v18.19.0-darwin-arm64.tar.gz is a compressed tar file that contains many entries, including node , npm , and npx. The DotSlash file for node would be as follows:

#!/usr/bin/env dotslash

{
“name”: “node-v18.19.0”,
“platforms”: {
“macos-aarch64”: {
“size”: 40660307,
“hash”: “blake3”,
“digest”: “6e2ca33951e586e7670016dd9e503d028454bf9249d5ff556347c3d98c347c34”,
// Note the difference from the previous example where “format”: “zst” has been
// replaced with “format”: “tar.gz”, which specifies what type of decompression
// logic to use as well as the path within the decompressed archive to run when
// this DotSlash file is executed.
“format”: “tar.gz”,
// Assuming node-v18.19.0-darwin-arm64.tar.gz contains node, npm, and npx in the
// node-v18.19.0-darwin-arm64/bin/ folder within the the archive, the following
// is the only line that has to change in the DotSlash file that represents
// those other executables.
“path”: “node-v18.19.0-darwin-arm64/bin/node”,
“providers”: [
{
“url”: “https://nodejs.org/dist/v18.19.0/node-v18.19.0-darwin-arm64.tar.gz”
}
]
},
/* other platforms omitted for brevity */
}
}

As noted in the comments, the only change in the DotSlash files for npm and npx would be the “path” entry. Because the artifact for all three DotSlash files would be the same, whichever DotSlash file was run first would fetch the artifact and put it in the cache whereas all subsequent runs of any of the three DotSlash files would leverage the cached entry.

This technique is often used to ensure that a set of complementary executables is released together. Further, because the archive will be decompressed in its own directory, it may also contain resource files (or library files, such as .dll files that need to live alongside .exe files on Windows) that will be unpacked using the directory structure specified by the archive. This also makes DotSlash a good fit for distributing executables that are not binaries, but trees of script files, which is common for Node.js or Python.

Generating DotSlash files

At Meta, most DotSlash files are produced as part of an automated build pipeline. Our continuous integration (CI) system supports special configuration for DotSlash jobs where a user must specify:

A set of builds to run (these can span multiple platforms).
The resulting generated artifacts to publish to an internal blobstore.
The DotSlash files in source control to update with entries for the new artifacts.
The conditions under which the job should be triggered (this is analogous to workflow triggers on GitHub).

The result of such a job is a proposed change to the codebase containing the updated DotSlash files. At Meta, we call such a change a “diff,” though on GitHub, this is known as a pull request. Just like an ordinary human-authored diff at Meta, putting it up for review triggers a number of jobs that include linters, automated tests, and other tools that provide signal on the proposed change. For a DotSlash diff, if all of the signals come back clean, the diff is automatically committed to the codebase without further human intervention.

See the Generating DotSlash Files at Meta documentation for details.

The script we use to generate DotSlash files injects metadata about the build job that makes it straightforward to trace the provenance of the underlying artifacts. The following is a hypothetical example of a generated DotSlash file for the CodeCompose LSP built from source at a specific commit in clang-opt mode. Note the “metadata” entries in the DotSlash file will be ignored by the dotslash CLI, but we include them as structured data so they can be parsed by other tools to facilitate programmatic audits:

#!/usr/bin/env dotslash

// @generated SignedSource<<d8621e8ccbd7a595a3018e6a070be9c0>>
// https://yarnpkg.com/package?name=signedsource can be used to
// generate and verify the above signature to flag tampering
// in generated code.

{
“name”: “code-compose-lsp”,
// Added by automation.
“metadata”: {
“build-info”: {
“job-repo”: “fbsource”,
“job-src”: “dotslash/code-compose-lsp.star”,
// It is considered best practice to build the artifacts for
// all platforms from the same commit within a DotSlash file.
“commit”: {
“repo”: “fbsource”,
“scm”: “sapling”,
“hash”: “0f9e3d9e189bf393f7f9d0b6879361cd76fcdcd0”,
“date”: “2024-01-03 20:07:54 PST”,
“timestamp”: 1704341274
}
}
},
“platforms”: {
“linux-x86_64”: {
“size”: 2740736,
“hash”: “blake3”,
“digest”: “fc8a3ade56a97a6e73469ade1575e8f8e33fda99fbf6df429d555e480d6453d0”,
“format”: “zst”,
“providers”: [
{
“type”: “meta-cas”,
“key”: “fc8a3ade56a97a6e73469ade1575e8f8e33fda99fbf6df429d555e480d6453d0:2740736”
}
]
// Added by automation.
“metadata”: {
“build-command”: [
“buck2”,
“build”,
“–config-file”,
“//buildconfig/clang-opt”,
“//codecompose/lsp/cli:code-compose-lsp”
]
}
},
// additional platforms…
}
}

Without DotSlash, a developer would have to run buck2 build –config-file //buildconfig/clang-opt //codecompose/lsp/cli:code-compose-lsp to build and run the LSP from source, which could be a slow operation depending on the size of the build, the state of the build cache, etc. With DotSlash, the developer can run the optimized LSP as quickly as they can fetch and decompress it from the specified URL, which is likely much faster than doing a build.

Another thing you may have noticed about this example is that the “key” is not an ordinary URL, but an identifier that happens to be the concatenation of the BLAKE3 hash and the size of the specified artifact. This is because “type”: “meta-cas” indicates that this artifact must be fetched via a custom provider in DotSlash, which is specialized fetching logic built into DotSlash that has its own identifier scheme. In this case, the artifact would be fetched from Meta’s in-house content-addressable storage (CAS) system, which uses the artifact hash+size as a key.

While we do not provide the code for the meta-cas provider in the open source version of DotSlash, we do include one custom provider out-of-the-box beyond the default http provider.

Using DotSlash with GitHub releases

While DotSlash is generally useful for fetching an executable from an arbitrary URL and running it, we have found the combination of DotSlash and CI to be particularly powerful. To that end, we include custom tooling to facilitate generating DotSlash files for GitHub releases. To ensure DotSlash can fetch artifacts from private GitHub repositories as well as GitHub Enterprise instances, DotSlash includes a custom provider for GitHub releases that includes an appropriate authentication token when fetching artifacts.

For example, suppose you have existing workflows for building your release artifacts and publish them via gh release upload. For simplicity, let’s assume these are named linux-release, macos-release, and windows-release. To create a single DotSlash file that includes the artifacts from all three platforms you would introduce a new GitHub Action that leverages the workflow_run trigger so it fires whenever one of these release workflows succeeds. (Note that GitHub’s documentation states: “You can’t use workflow_run to chain together more than three levels of workflows,” so check the depth of your workflow graph if your workflow is not firing.)

The .yml file to define the new workflow would look like this:

name: Generate DotSlash File

on:
workflow_run:
# These must match the names of the workflows that publish
# artifacts to your GitHub Release.
workflows: [linux-release, macos-release, windows-release]
types:
– completed

jobs:
create-dotslash-file:
name: Generating DotSlash File
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == ‘success’ }}
steps:
– uses: facebook/dotslash-publish-release@v1
env:
# This is necessary because the action uses
# `gh release upload` to publish the generated DotSlash file(s)
# as part of the release.
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
# Additional file that lives in your repo that defines
# how your DotSlash file(s) should be generated.
config: .github/workflows/dotslash-config.json
# Tag for the release to to target.
tag: ${{ github.event.workflow_run.head_branch }}

Because inputs to GitHub Actions are limited to string values, facebook/dotslash-publish-release takes config, which is a path to a JSON file in the repo that supports a rich set of configuration options for generating the DotSlash files. The other required input is the ID of the release, which in GitHub, is defined by a Git tag. When the action is run, it will check to see whether all of the artifacts specified in the config are present in the release, and if so, will generate the appropriate DotSlash files and add them to the release.

For example, consider an open source project like Hermes where a release includes a number of platform-specific .tar.gz files, each containing a handful of executables (hermes, hdb, etc.). To create a separate an individual DotSlash file for each executable, the JSON configuration for the action would be:

{
“outputs”: {

“hermes”: {
“platforms”: {
“macos-x86_64”: {
“regex”: “^hermes-cli-darwin-“,
“path”: “hermes”
},
“macos-aarch64”: {
“regex”: “^hermes-cli-darwin-“,
“path”: “hermes”
},
“linux-x86_64”: {
“regex”: “^hermes-cli-linux-“,
“path”: “hermes”
},
“windows-x86_64”: {
“regex”: “^hermes-cli-windows-“,
“path”: “hermes.exe”
}
}
},

“hdb”: {
“platforms”: {
“macos-x86_64”: {
“regex”: “^hermes-cli-darwin-“,
“path”: “hdb”
},
“macos-aarch64”: {
“regex”: “^hermes-cli-darwin-“,
“path”: “hdb”
},
“linux-x86_64”: {
“regex”: “^hermes-cli-linux-“,
“path”: “hdb”
},
“windows-x86_64”: {
“regex”: “^hermes-cli-windows-“,
“path”: “hdb.exe”
}
}
},

// Additional entries for hvm, hbcdump, and hermesc…

}
}’

Each entry in “outputs” corresponds to the name of a DotSlash file that will be added to the release. The “platforms” for each entry defines the “platforms” that should be present in the generated DotSlash file. The action uses the “regex” to identify the file in the GitHub release that should be used as the backing artifact for the entry. Assuming the artifact is an “archive” of some sort (.tar.gz, .tar.zst, etc.), the “path” indicates the path within the archive that the DotSlash file should run.

In this particular case, Hermes does not provide an ARM-specific binary for macOS, so the “macos-aarch64” entry is the same as the “macos-x86_64”one. Though if that changes in the future, a simple update to “regex” to distinguish the two binaries is all that is needed.

Note that the action will take responsibility for computing the digest for each binary. In this example, the resulting DotSlash file for hermes would be:

#!/usr/bin/env dotslash

{
“name”: “hermes”,
“platforms”: {
“linux-x86_64”: {
“size”: 47099598,
“hash”: “blake3”,
“digest”: “8d2c1bcefc2ce6e278167495810c2437e8050780ebb4da567811f1d754ad198c”,
“format”: “tar.gz”,
“path”: “hermes”,
“providers”: [
{
“url”: “https://github.com/facebook/hermes/releases/download/v0.12.0/hermes-cli-linux-v0.12.0.tar.gz”
},
{
“type”: “github-release”,
“repo”: “facebook/hermes”,
“tag”: “v0.12.0”,
“name”: “hermes-cli-linux-v0.12.0.tar.gz”
}
],
},
// additional platforms…
}
}

Note that there are two entries in the “providers” section for the Linux artifact. When DotSlash fetches an artifact, it will try the providers in order until one succeeds. Regardless of which provider is used, the downloaded binary will be verified against the specified “hash”, “digest”,  and “size” values.

In this case, the first provider is an ordinary, public URL that can be fetched using curl –location, but the second is an example of a custom provider discussed earlier. The “type”: “github-release” line indicates that the GitHub provider for DotSlash should be used, which shells out to the GitHub CLI (gh, which must be installed separately from DotSlash) to fetch the artifact instead of curl. Because facebook/hermes is a public GitHub repository, the first provider should be sufficient here. However, if the repository were private and the fetch required authentication, we would expect the first provider to fail and DotSlash would fallback to the GitHub provider. Assuming the user had run gh auth login in advance to configure credentials for the specified repo, DotSlash would be able to fetch the artifact using gh release download.

By publishing DotSlash files as part of GitHub releases, users can copy them to their own repositories to “vendor in” a specific version of your tool with minimal effect on their repository size, regardless of how large your releases might be.

Try DotSlash Today 

Visit the DotSlash site for more in-depth documentation and technical details. The site includes instructions on Installing DotSlash so you can start playing with it firsthand. 

We also encourage you to check out the DotSlash source code and provide feedback via GitHub issues. We look forward to hearing from you!

The post DotSlash: Simplified executable deployment appeared first on Engineering at Meta.

Read MoreEngineering at Meta

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments