With the AWS Command Line Interface (AWS CLI) for Amazon Relational Database Service (Amazon RDS) and Amazon Aurora, you can use shell commands to create, modify, delete, and generate reports for many kinds of objects, such as database instances, Aurora clusters, parameter groups, and so on. In this post, we show how to use the AWS CLI to produce automation scripts for Aurora and Amazon RDS. This post is intended for you if you’re hesitant about using the AWS CLI commands because you aren’t an expert with JavaScript Object Notation (JSON), if you’re in a DevOps role and you aren’t familiar with every attribute of each RDS resource, or if you’d like to produce more automated and customized reports than you get in the dashboard or details pages on the Amazon RDS console. To make things more digestible for database programmers and shell scripters, we show how to simplify the output from AWS CLI commands as if you were doing a SQL query.
This post is part 1 of a series. In subsequent posts, you’ll learn how to create your own administration and reporting utilities to cover scenarios outside the basic create-modify-delete cycle. You can build on these techniques to quickly write an AWS CLI command to deal with management issues for your Aurora clusters, RDS instances, and related resources such as parameter groups and proxies. For example, you can dynamically construct database connection strings, or loop through all the DB instances in an Aurora cluster and apply the same operation to each one.
Why learn the AWS CLI?
The AWS CLI is an effective learning environment for managing AWS resources:
It’s easy to experiment in an interactive and iterative way.
Knowing the AWS CLI commands and their options helps demystify the choices on the AWS console and accelerate your console learning.
Learning the AWS CLI also gives you a head start in writing management utilities using programming languages such as Python and Java. The API, parameter, and data structure names that you see in each of the language SDKs are similar to command and parameter names in the AWS CLI.
You’ll understand what’s happening behind the scenes in tools that create, modify, and delete AWS resources. That helps you debug problems and take advantage of the advanced capabilities in such tools.
This post focuses mostly on one aspect that will help you learn the AWS CLI faster and be more productive with it: the query language bundled into the –query option. Knowing the query notation lets you explore AWS resources such as DB clusters, DB instances, versions, and instance classes all from your operating system command line, the same way you would explore a database schema through a SQL client tool.
Several Amazon RDS CLI commands have a describe- prefix in their names. These commands are purely read-only, the equivalent of database SELECT statements. We use these describe statements in most of the examples in this post.
Prerequisites
To follow along with the examples, you need to have the AWS CLI installed and configured with your AWS credentials and a default Region. Some of the examples use syntax that requires the AWS CLI v2. For instructions on installing, upgrading, or configuring the AWS CLI, refer to Install or update the latest version of the AWS CLI.
It’s helpful to already have in your AWS account some AWS managed databases such as Aurora DB clusters or RDS DB instances.
JSON output from the AWS CLI commands
When an AWS CLI command produces output, by default it’s in the form of a document in JSON format. The JSON document represents a nested data structure. The notation with delimiters like [ ] and { } looks familiar to JavaScript, Python, and Java programmers. All those languages have the notions of arrays (known as lists in Python) and hashes (also known as associative arrays, dictionaries, hash tables, or key-value pairs).
If the JSON format seems complicated, just picture it as the representation of a SQL result set. The JSON document contains an array that’s like the rows of the result set. Each array element contains a set of key-value pairs that are like the column names and values.
First example of JSON output
Let’s take a closer look at the way the AWS CLI output is structured in JSON format. This context helps you understand what’s happening in the later examples that simplify the structure and extract the most crucial pieces of information.
For example, let’s break down this output from a describe-db-clusters command. Although the full output might be thousands of lines long, the overall structure is apparent from the first brief section. The following example shows an excerpt with just the first few lines:
The whole document is delimited by { }, meaning the outermost level is a set of key-value pairs. There’s only one key-value pair, with a key named DBClusters. The value of the DBClusters key is an array, delimited by [ ]. Each array element represents one item being described: in this case, an Aurora cluster owned by this account. Each array element is another set of key-value pairs, again delimited by { }. Those key-value pairs represent all the attributes of that particular Aurora DB cluster.
You see that pattern over and over again with AWS CLI commands. There’s a single key, whose name depends on the type of resource that the command acts upon. The critical information in the output is the list of attributes for each item. Getting to the individual attributes requires unwrapping the outermost couple of levels from the JSON document.
The –query option for AWS CLI commands
By default, the JSON output represents every single attribute of the objects that you’re operating on. That can lead to information overload if you just need one or two items of information from several screenfuls of output.
To make the output easier to work with, the AWS CLI has a built-in query language. You supply the query as a parameter string to the –query option of almost any AWS CLI command. These query parameters simplify and reformat the JSON output, making it easy to understand or use as input to other scripts. As a database professional, you probably perform such operations in SELECT statements against the system tables in your database. You can imagine that the CLI commands are querying system tables for the high-level cloud resources managed by AWS.
Using the –query option for the first time
Let’s take the output from describe-db-clusters in the previous example and simplify it. Instead of a list of every cluster showing every attribute, we find the answer to a single specific question: what is the database engine and engine version for my-db-cluster? Here’s where the –query option comes in. It uses a JSON-aware notation known as JMESPath to run the AWS CLI output through a sequence of transformation steps. To get this answer, we turn the list of clusters into a single item, the one cluster that we’re interested in. Then we turn the list of attributes into the specific attributes that hold the answer to our question. You can use the following command, substituting the name of your own cluster instead of my-db-cluster:
The output should be a single tab-separated line of text containing a cluster name, an engine name, and a version. For example:
This example introduces the idea of a pipeline in the –query parameter string. Each segment of the query string separated by a pipe symbol takes the output of the previous stage and changes it into a different layout of an array or a hash, or applies some operation such as filtering or sorting. We’ll see advanced examples like that a subsequent post. Because the whole query string is inside a single-quoted string literal, we can spread out the pipeline stages across multiple lines without using any line continuation characters.
Extracting the rows of the result set
Let’s break down the –query string from the preceding example so that we can understand its meaning.
The first step is to get rid of the outermost key-value pair. In the Amazon RDS CLI commands, the output always only contains a single element. That’s what the initial *[] operation does. It takes the array from the value part of the first key-value pair, regardless of what the key name is. The resulting output is in an array, with [] delimiters; the outermost hash with {} delimiters is discarded.
Now we’ve gotten rid of the outermost { “DBClusters”: wrapper. The output JSON document is an array. Each array element represents the attributes of one DB cluster. It’s basically a SQL result set formatted in JSON notation.
Constructing the column list
To trim down the output to something readable, let’s pull out just a couple of columns from the result set in the previous example. You can use the following code to query only the DBClusterIdentifier and Engine attributes. The notation ‘*[].{name1:attr1,name2:attr2}’ means to flatten the JSON document into an array as in the previous example, then make each array element contain just the attributes whose names are specified in the {} block. Each key-value pair contains a name that you choose, and the value of an attribute produced by the previous stage in the query.
To make the –query string and the output self-descriptive, you can repeat the attribute name for both parts of each key-value pair. That way, the attribute name is displayed in the JSON output alongside the value of that attribute. You can’t include any spaces or line breaks inside the {} block.
Filtering the items in the initial array
The notation [?thing1 comparison_operator thing2] applies a filter to the output of the previous state. It passes through only those items that match the test. There are a number of JMESPath filter expressions that you can use. For now, let’s stick to an equality test that checks an attribute against a literal value. The literal value is quoted with backticks. For example, you might run a describe command that only returns items with a certain value for an attribute such as Engine, EngineVersion, or Status:
Another way to look up a single item
Each describe command has an option to specify the identifier of a single item to describe. That’s the equivalent to doing a database lookup using a primary key. For describe-db-clusters, that option is –db-cluster-identifier. The following example retrieves the same information as in the previous examples, but for one specific Aurora cluster:
The final unwrapping
In the previous example, the final stage of [0] takes element number zero from the resulting array. Without that stage, the output would be wrapped inside an extra [] array block. When the output is guaranteed to be a single item, the enclosing array is redundant.
Choosing an output format with the –output option
Using the techniques in the preceding examples, you can produce AWS CLI output with whatever format of JSON you want. You can choose what sort of nesting of hashes and arrays to use. You can choose the names of the keys for each key-value pair. You can take that output and manipulate it with command line tools such as jq or programming languages such as Python, or store it in a database and use the built-in JSON functions in PostgreSQL or MySQL.
You don’t have to use JSON for your entire end-to-end procedure. When you have a –query string that you’ve verified produces the right output, you can add the –output option to the AWS CLI command to hide the JSON aspect completely.
Producing formatted reports with –output table
You can use the option –output table to get output nicely formatted with ASCII lines and boxes, the same as you see with text-based SQL command-line clients. The attribute names become the column headings in the table. See the following code:
Producing plain-text output with –output text
With those same –query strings, you can use the option –output text to get plain-text output in tab-separated columns. You can feed that plain text output to traditional line-based Unix utilities such as grep, cut, and sort, or any other scripts that aren’t set up to process JSON documents. See the following code:
Because text output doesn’t include any attribute or column names, it’s often best to use text output with a small number of output columns where the field values are easily recognizable. For example, the text-based output might be a simple list of identifiers, or identifiers plus one or two other attributes.
If you don’t need the labels for formatting, such as when producing plain text output for use as input to shell scripts, you can omit the labels by using array notation such as [attr1,attr2,attr3] for the final stage. This shorter version of the command produces identical text output.
Using array notation in the final stage also ensures that the columns in –output table and –output text are printed in exactly the order you specify. When the output includes labels, the columns are ordered alphabetically based on the text of the labels. The trade-off with using array notation to enforce column order is that with –output table, the table doesn’t include column headers.
When to use each AWS CLI output format
The –output json option is the default. That’s what you would use if storing the output for use later, feeding it as input to a program written in a language with JSON-handling capabilities, or storing it in a database that has functions for parsing JSON documents.
If the output is purely for visual inspection, the –output table option is convenient. The lines and borders around each value make the formatting consistent and easy to skim. You wouldn’t use that format as input to any other program that needed to parse the output, such as grep or awk. You might copy and paste it directly into a report using a monospace font.
The –output text option is for when you need only a small number of output columns whose meaning is well-understood. For example, each line of output might be a database or cluster identifier plus one or two other attributes, such as the endpoint, engine version, status, or Amazon Resource Name (ARN). This line-oriented output with tab separators is well-suited to pipe to other line-oriented Unix tools or scripts that you’ve written.
You can even use the –output text format to pull out a single attribute from a single object. That way, you can substitute the output from an AWS CLI command directly into some other command. For example, you can use the Unix shell notation $(aws rds <command with query string and –output text>) to return a single identifier, endpoint, port, and so on.
Conclusion
In this post, you learned how to use the AWS CLI to accomplish the following:
Prepare customized reports about RDS and Aurora resources using the describe commands. You now know how to fill in template parameters such as –query ‘*[].{}’ and –query ‘*[].[]’ to produce output consisting of certain specified attributes.
Condense the output of create, modify, and delete commands for RDS and Aurora resources to show only the attributes that are most important to you.
Compose complex –query parameters using pipelines of multiple filters and transformation stages.
Choose the appropriate json, text, and table options of the –output parameter, based on your usage scenario.
Control the labeling of output produced by –output table.
Control the structure and nesting of output produced by –output json.
Control the column order of output produced by –output text.
Extract a single item or a subset of items from reports involving multiple RDS and Aurora resources.
Learn more about the JMESPath comparison operators, functions, and so on that you can use inside the –query parameter string.
In the next post, we’ll see more examples of these techniques to help automate use cases that you might encounter when managing RDS and Aurora resources.
About the Author
John Russell is a Senior Developer Advocate for Amazon’s flagship relational database, Amazon Aurora. He has over 20 years of experience documenting, testing, and building tools on top of relational databases. He has contributed to the SQL References and developer documentation of four well-known database engines. John helps educate customers who are new to AWS, managed databases, or the open-source PostgreSQL and MySQL engines. John also helps interpret and amplify developer requirements during the development of new Aurora features.
Read MoreAWS Database Blog