Amazon DocumentDB (with MongoDB compatibility) supports geospatial functionality, including 2dsphere specialized indexes and the operators for determining proximity, the $nearSphere query operator, and the $geoNear aggregation pipeline stage.
After working backward from our customers’ goals for the features they want most, Amazon DocumentDB now also supports two additional operators that are useful for geospatial inclusion and intersection:
The $geoWithin operator supports finding documents with geospatial data that exists entirely within a specified shape, such as a polygon or multipolygon
The $geoIntersects operator supports finding documents whose geospatial data intersects with a specified GeoJSON object
These have multiple practical applications, for example $geoWithin can be used to find delivery drivers located in a certain city or state, while using $geoIntersects, you can determine the city or state containing the current geo spatial position of the driver, reported by the GPS tracker.
In this post, I examine how this new functionality can help you create a location-based application using the geospatial capabilities of Amazon DocumentDB. I walk you through a possible real life use case of finding the state containing the geo location reported by the user’s mobile phone, finding all airports in that state and listing them, sorted by distance from the user’s position.
GeoJSON objects
Before I begin, it’s important to understand the different types of GeoJSON objects. In the case of a point geometry, the coordinates object is composed of one position:
An array of positions can represent a LineString or MultiPoint, presented as follows:
A LinearRing is a closed LineString with four or more positions. Although not explicitly represented as a GeoJSON geometry type, it’s referred to in the Polygon definition.
For type Polygon, the coordinates member must be an array of LinearRing coordinate arrays. The MultiPolygon type is represented by an array of polygon coordinates.
For example, the following coordinates define a shape in the form of a square over the state of Wyoming, US:
Solution overview
To demonstrate geospatial operators, you create an application that helps users find airports in the United States. The application takes the current location of the user and performs the following actions:
Determine the polygon shape where the user’s coordinates are. In our case, the shape is the state.
List the number of airports that are found in that shape.
List the closest airports or within a specified distance from the user’s location.
The application is developed in Python3 and runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance. We also use AWS Secrets Manager to store and retrieve the database credentials, as this is a recommended security best practice.
The solution presented in this post is in a test/learning configuration and prior to any production release we recommend to conduct your own risk assessments and follow security best practices in hardening each resource part of the solution. Cloud security at AWS is the highest priority, please visit the AWS Security Documentation page for more information.
The following diagram shows the architecture for the solution in this post.
Prerequisites
To deploy the application, you must complete the following prerequisite steps to set up your resources:
Create an Amazon DocumentDB cluster or use an existing cluster. This post assumes the default values for port (27017) and TLS (enabled) settings.
Launch an Amazon EC2 Linux instance in the same VPC as the Amazon DocumentDB cluster. When the instance is online, connect to it and perform these additional steps:
Add the MongoDB repository and install MongoDB tools and shell:
Install the Python3 modules needed for the application. PyMongo is the Python distribution containing the tools for working with MongoDB and Amazon DocumentDB. Boto3 is the AWS SDK for Python and allows the interaction between AWS services and the Python application. Lastly, the requests module is a simple HTTP library, which we use to fetch the SSL certificate needed for connecting to Amazon DocumentDB.
Store your credentials in Secrets Manager by completing the following steps:
On the Secrets Manager console, choose Store a new secret to create the credential
Choose the secret type and select Credentials for Amazon DocumentDB database.
Enter your credentials, leave the default encryption key, and select the database cluster.
Choose Next
Enter a name for the secret and add an optional description or tags.
Choose Next.
Review the configuration and choose Store to save the certificate.
Security
Now that the resources are deployed, you need to make sure you can access them securely. The security group for the Amazon DocumentDB cluster has to allow connections on port 27017 from the EC2 instance. You also have to allow access to Secrets Manager; to do this, you use an AWS Identity and Access Management (IAM) role with the necessary permissions attached to the EC2 instance.
Update the security group for the Amazon DocumentDB cluster
To update the security group for your cluster, complete the following steps:
On the Amazon EC2 console, choose Resources in the navigation pane.
Choose Security groups.
From the list of security groups, choose the security group you used when creating your cluster.
On the Inbound rules tab, choose Edit inbound rules.
Add an inbound rule of type Custom TCP, port 27017, and the security group attached with the EC2 instance as the source.
Choose Save rules.
Attach an IAM role to the EC2 instance
To attach a role to your EC2 instance, complete the following steps:
On the IAM console, under Access management in the navigation pane, choose Policies.
Choose Create Policy.
On the JSON tab, enter the following rules to allow read access to Secrets Manager:
Note that, for simplicity, the policy allows read access to all secrets. In a production environment, we strongly recommend using the least privilege principle and limit the permissions to the specific Amazon DocumentDB secret. In the Resource section you’ll specify the ARN (Amazon Resource Name) of the secret, instead of “*”.
Give the policy a name and choose Create policy.
On the IAM console, under Access management in the navigation pane, choose Roles.
Choose Create role.
For Trusted entity type, select AWS service.
For Use case, select EC2.
Choose Next.
Select the policy you created earlier.
Choose Next.
Give the role a name and choose Create role.
On the Amazon EC2 console, choose Instances in the navigation pane.
Select the instance you created and on the Actions menu, choose Security and then Modify IAM role.
Choose your role and then choose Update IAM role.
Prepare the dataset
First, you need a sample dataset. I’ve created one from various OpenData sources and converted it to the proper format using toGeoJSON. Connect to the EC2 instance and complete the following steps:
Download the dataset:
Unzip the airports_geodata.zip file:
Get the SSL certificate and the cluster endpoint from the Amazon DocumentDB console.
Restore the two collections to your Amazon DocumentDB cluster using mongoimport:
Connect with mongo shell to Amazon DocumentDB and create the 2dsphere index:
You may also explore the two collections to examine the documents and better understand the code that follows.
The airports collection contains the names and geo locations of the airports. The GeoJSON object is of type Point, representing a single position:
The states collection contains the states’ shapes geometry, which are represented as type Polygon or MultiPolygon:
Create functions
You start by creating a function to retrieve the credentials from Secrets Manager. Because Secrets Manager is Region specific, the function takes two arguments: the Region and the secret name, which will be passed script arguments using the argparse module.
Using your preferred text editor, create the file geoapp.py and add the code below:
Next, you want to connect to Amazon DocumentDB, so let’s create a function for this purpose and append it to the same geoapp.py file:
Find the intersection of GeoJSON objects
Now you can start with the first requirement of our application: given the latitude and longitude coordinates for a certain location, find the polygon shape that it’s part of. For our case, the shape is the state, and you use the new operator $geoIntersects to accomplish this.
You need two parameters for this function: the longitude and latitude. I’m using a projection in the $geoIntersects query to return the state name value. If no value is returned, it means the coordinates weren’t found in any of the polygons (states); an if statement takes care of this check with an appropriate message.
The following is an example mongo shell query:
The Python function to be added next is:
Find GeoJSON objects within a specified shape
For the second requirement, you need to find the locations that are contained within a given shape (polygon or multipolygon). For that, you use the newly supported operator $geoWithin.
The mongo shell query for this is as follows:
Because the requirement is to show just the number of airports, our Python function counts the documents found using the pymongo function count_documents(). Add the function to the geoapp.py file:
Find GeoJSON objects in a certain radius and calculate distance
For the last requirement of our application, you use the $geoNear operator and take advantage of the aggregation framework in Amazon DocumentDB to do all the work of finding the locations within a certain proximity and calculate the distance from the reference point. I’ve also added a query filter to show only international airports.
The Python function takes three arguments: the proximity radius (in our case in kilometers) and the geo location coordinates (longitude and latitude). The found documents are added to a list, which you can easily update later if needed. See the following code that you add to the geoapp.py file:
Run the script
You now have almost everything you need; you just need a main function. For your convenience, I’ve uploaded the script to GitHub. Let’s run it to examine how it works.
You need to supply a geo location coordinate; you can get one from Google Maps. If you drop a pin on the map you’ll see the geo location coordinate, just keep in mind that Google maps shows latitude first and longitude second. For example, in the following screenshot, you have the values -73.895006 longitude and 40.639955 latitude.
Let’s run the script with these input values and also specify a 40km radius to show the airports from. Don’t forget to pass the secret that contains the credentials to the Amazon DocumentDB cluster, for this example, the secret name is geodata-demo-cluster and is located in us-east-1 region.
After you enter the geo location coordinates and the distance radius, the script identifies the US state the coordinate is part of, finds the number of airports located in the state, and lists the ones found in the specified radius, sorted by closest.
You can use the script to test different other locations or modify the radius. Here are some other coordinates (longitude, latitude) that you can use for this purpose:
-122.095023, 37.290036
-94.297089, 38.789243
-121.956352, 47.601100
-87.649233, 41.260084
Clean up
To clean up resources created for the application in this post, remove the following:
The Amazon DocumentDB cluster
The EC2 instance used for hosting the code
The secret stored in Secrets Manager
Summary
In this post, I introduced the new geospatial operators available in Amazon DocumentDB and demonstrated how you can use them to build or extend the capabilities of an application to include geolocation lookups.
For more information about geospatial operators, refer to Querying Geospatial data with Amazon DocumentDB and Introducing Geospatial query capabilities for Amazon DocumentDB (with MongoDB compatibility).
If you have any questions or comments about this post, use the comments section. If you have any feature requests for Amazon DocumentDB, email us at [email protected].
About the author
Mihai Aldoiu is a Senior DocumentDB Specialist Solutions Architect at AWS, based out of London. He enjoys helping customers building their solutions using NoSQL databases. Mihai has over 20 years of experience in different roles, including Unix/Linux Systems Administrator, SRE/DevOPS, Database Engineer, providing him a unique perspective on customers challenges related to performance, reliability or security.
Read MoreAWS Database Blog