Thursday, October 3, 2024
No menu items!
HomeDatabase ManagementBuild a geo-location application using Amazon DocumentDB (with MongoDB compatibility)

Build a geo-location application using Amazon DocumentDB (with MongoDB compatibility)

Amazon DocumentDB (with MongoDB compatibility) supports geospatial functionality, including 2dsphere specialized indexes and the operators for determining proximity, the $nearSphere query operator, and the $geoNear aggregation pipeline stage.

After working backward from our customers’ goals for the features they want most, Amazon DocumentDB now also supports two additional operators that are useful for geospatial inclusion and intersection:

The $geoWithin operator supports finding documents with geospatial data that exists entirely within a specified shape, such as a polygon or multipolygon
The $geoIntersects operator supports finding documents whose geospatial data intersects with a specified GeoJSON object

These have multiple practical applications, for example $geoWithin can be used to find delivery drivers located in a certain city or state, while using $geoIntersects, you can determine the city or state containing the current geo spatial position of the driver, reported by the GPS tracker.

In this post, I examine how this new functionality can help you create a location-based application using the geospatial capabilities of Amazon DocumentDB. I walk you through a possible real life use case of finding the state containing the geo location reported by the user’s mobile phone, finding all airports in that state and listing them, sorted by distance from the user’s position.

GeoJSON objects

Before I begin, it’s important to understand the different types of GeoJSON objects. In the case of a point geometry, the coordinates object is composed of one position:

{ type: “Point”, coordinates: [ -88.342, 34.545 ] }

An array of positions can represent a LineString or MultiPoint, presented as follows:

{ type: “LineString”, coordinates: [ [ -88.342, 34.545 ], [ -87.121, 35.742 ] ] }

A LinearRing is a closed LineString with four or more positions. Although not explicitly represented as a GeoJSON geometry type, it’s referred to in the Polygon definition.

For type Polygon, the coordinates member must be an array of LinearRing coordinate arrays. The MultiPolygon type is represented by an array of polygon coordinates.

For example, the following coordinates define a shape in the form of a square over the state of Wyoming, US:

{
“type”: “Polygon”,
“coordinates”: [
[ [-108.62, 45.00], [-104.05, 44.99], [-104.05, 41.00], [-111.04, 40.99], [-111.05, 45.00] ]
]
}

Solution overview

To demonstrate geospatial operators, you create an application that helps users find airports in the United States. The application takes the current location of the user and performs the following actions:

Determine the polygon shape where the user’s coordinates are. In our case, the shape is the state.
List the number of airports that are found in that shape.
List the closest airports or within a specified distance from the user’s location.

The application is developed in Python3 and runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance. We also use AWS Secrets Manager to store and retrieve the database credentials, as this is a recommended security best practice.

The solution presented in this post is in a test/learning configuration and prior to any production release we recommend to conduct your own risk assessments and follow security best practices in hardening each resource part of the solution. Cloud security at AWS is the highest priority, please visit the AWS Security Documentation page for more information.

The following diagram shows the architecture for the solution in this post.

Prerequisites

To deploy the application, you must complete the following prerequisite steps to set up your resources:

Create an Amazon DocumentDB cluster or use an existing cluster. This post assumes the default values for port (27017) and TLS (enabled) settings.
Launch an Amazon EC2 Linux instance in the same VPC as the Amazon DocumentDB cluster. When the instance is online, connect to it and perform these additional steps:
Add the MongoDB repository and install MongoDB tools and shell:

echo -e “[mongodb-org-4.0] nname=MongoDB Repositorynbaseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/4.0/x86_64/ngpgcheck=1 nenabled=1 ngpgkey=https://www.mongodb.org/static/pgp/server-4.0.asc” | sudo tee /etc/yum.repos.d/mongodb-org-4.0.repo

sudo yum install -y mongodb-org-tools mongodb-org-shell

Install the Python3 modules needed for the application. PyMongo is the Python distribution containing the tools for working with MongoDB and Amazon DocumentDB. Boto3 is the AWS SDK for Python and allows the interaction between AWS services and the Python application. Lastly, the requests module is a simple HTTP library, which we use to fetch the SSL certificate needed for connecting to Amazon DocumentDB.

pip3 install pymongo boto3 requests

Store your credentials in Secrets Manager by completing the following steps:
On the Secrets Manager console, choose Store a new secret to create the credential

Choose the secret type and select Credentials for Amazon DocumentDB database.
Enter your credentials, leave the default encryption key, and select the database cluster.
Choose Next

Enter a name for the secret and add an optional description or tags.
Choose Next.

Review the configuration and choose Store to save the certificate.

Security

Now that the resources are deployed, you need to make sure you can access them securely. The security group for the Amazon DocumentDB cluster has to allow connections on port 27017 from the EC2 instance. You also have to allow access to Secrets Manager; to do this, you use an AWS Identity and Access Management (IAM) role with the necessary permissions attached to the EC2 instance.

Update the security group for the Amazon DocumentDB cluster

To update the security group for your cluster, complete the following steps:

On the Amazon EC2 console, choose Resources in the navigation pane.
Choose Security groups.

From the list of security groups, choose the security group you used when creating your cluster.
On the Inbound rules tab, choose Edit inbound rules.

Add an inbound rule of type Custom TCP, port 27017, and the security group attached with the EC2 instance as the source.
Choose Save rules.

Attach an IAM role to the EC2 instance

To attach a role to your EC2 instance, complete the following steps:

On the IAM console, under Access management in the navigation pane, choose Policies.
Choose Create Policy.

On the JSON tab, enter the following rules to allow read access to Secrets Manager:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“secretsmanager:ListSecrets”,
“secretsmanager:GetSecretValue”
],
“Resource”: [
“*”
]
}
]
}


Note that, for simplicity, the policy allows read access to all secrets. In a production environment, we strongly recommend using the least privilege principle and limit the permissions to the specific Amazon DocumentDB secret. In the Resource section you’ll specify the ARN (Amazon Resource Name) of the secret, instead of “*”.

Give the policy a name and choose Create policy.

On the IAM console, under Access management in the navigation pane, choose Roles.
Choose Create role.

For Trusted entity type, select AWS service.
For Use case, select EC2.
Choose Next.

Select the policy you created earlier.
Choose Next.

Give the role a name and choose Create role.

On the Amazon EC2 console, choose Instances in the navigation pane.
Select the instance you created and on the Actions menu, choose Security and then Modify IAM role.

Choose your role and then choose Update IAM role.

Prepare the dataset

First, you need a sample dataset. I’ve created one from various OpenData sources and converted it to the proper format using toGeoJSON. Connect to the EC2 instance and complete the following steps:

Download the dataset:

wget https://github.com/aws-samples/amazon-documentdb-samples/raw/master/blogs/geoapp-docdb/dataset/airports_geodata.zip

Unzip the airports_geodata.zip file:

unzip airports_geodata.zip

Get the SSL certificate and the cluster endpoint from the Amazon DocumentDB console.

Restore the two collections to your Amazon DocumentDB cluster using mongoimport:

mongoimport –ssl –host <Your-DocumentDB-cluster-endpoint>
–sslCAFile rds-combined-ca-bundle.pem
-u <username> -p <password>
-d geodata -c airports airports-us.json

mongoimport –ssl –host <Your-DocumentDB-cluster-endpoint>
–sslCAFile rds-combined-ca-bundle.pem
-u <username> -p <password>
-d geodata -c states states-us.json

Connect with mongo shell to Amazon DocumentDB and create the 2dsphere index:

mongo –ssl –host <Your-DocumentDB-cluster-endpoint> –sslCAFile rds-combined-ca-bundle.pem –username <insertYourUsername> –password <insertYourPassword>
rs0:PRIMARY> use geodata
switched to db geodata
rs0:PRIMARY> db.airports.createIndex({“loc”: “2dsphere”})

You may also explore the two collections to examine the documents and better understand the code that follows.

The airports collection contains the names and geo locations of the airports. The GeoJSON object is of type Point, representing a single position:

rs0:PRIMARY> db.airports.findOne()
{
“_id” : ObjectId(“536b150c30048054e348076c”),
“loc” : {
“type” : “Point”,
“coordinates” : [
-88.769896,
34.268103
]
},
“name” : “Tupelo Municipal”,
“type” : “Municipal”,
“code” : “TUP”
}

The states collection contains the states’ shapes geometry, which are represented as type Polygon or MultiPolygon:

rs0:PRIMARY> db.states.findOne()
{
“_id” : ObjectId(“536b0a143004b15885c91a20”),
“name” : “Wyoming”,
“code” : “WY”,
“loc” : {
“type” : “Polygon”,
“coordinates” : [
[
[
-108.62131299999987,
45.00027699999998
],
[
-104.05769699999995,
44.997380000000135
],
[
-104.053249,
41.00140600000009
],
[
-111.04672299999999,
40.99795899999998
],
[
-111.05519899999989,
45.001321000000075
],
[
-108.62131299999987,
45.00027699999998
]
]
]
}
}

Create functions

You start by creating a function to retrieve the credentials from Secrets Manager. Because Secrets Manager is Region specific, the function takes two arguments: the Region and the secret name, which will be passed script arguments using the argparse module.

Using your preferred text editor, create the file geoapp.py and add the code below:

import os.path
import json
import pymongo
import boto3
import requests
import argparse

# Pass secret name and secret region as arguments
parser = argparse.ArgumentParser()
parser.add_argument(‘-r’, ‘–region’,
required=True,
type=str,
help=’Specify the region of the secret (e.g. us-east-1)’)
parser.add_argument(‘-s’, ‘–secret’,
required=True,
type=str,
help=’Specify secret name’)
args = parser.parse_args()

# Function for credentials retrieval from AWS Secrets Manager
def get_credentials(region_name, secret_name):
session = boto3.session.Session()
client = session.client(service_name=’secretsmanager’,
region_name=region_name)

try:
secret_value = client.get_secret_value(SecretId=secret_name)
secret_json = json.loads(secret_value[‘SecretString’])
username = secret_json[‘username’]
password = secret_json[‘password’]
cluster_uri = secret_json[‘host’]
return (username, password, cluster_uri)
except Exception as e:
print(‘Failed to retrieve secret {} because: {}’.format(secret_name, e))

Next, you want to connect to Amazon DocumentDB, so let’s create a function for this purpose and append it to the same geoapp.py file:

# Function for connecting to Amazon DocumentDB
def get_db_client():
try:
# Get the Amazon DocumentDB ssl certificate
if not os.path.exists(‘rds-combined-ca-bundle.pem’):
url = ‘https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem’
data = requests.get(url, allow_redirects=True)
with open(‘rds-combined-ca-bundle.pem’, ‘wb’) as cert:
cert.write(data.content)

# Get Amazon DocumentDB credentials (specify the region and secret name according to your case)
(secret_username, secret_password,
cluster_uri) = get_credentials(args.region, args.secret)
db_client = pymongo.MongoClient(
cluster_uri,
tls=True,
retryWrites=False,
tlsCAFile=’rds-combined-ca-bundle.pem’,
username=secret_username,
password=secret_password,
authSource=’admin’)
except Exception as e:
print(‘Failed to create new DocumentDB client: {}’.format(e))
raise
return db_client

Find the intersection of GeoJSON objects

Now you can start with the first requirement of our application: given the latitude and longitude coordinates for a certain location, find the polygon shape that it’s part of. For our case, the shape is the state, and you use the new operator $geoIntersects to accomplish this.

You need two parameters for this function: the longitude and latitude. I’m using a projection in the $geoIntersects query to return the state name value. If no value is returned, it means the coordinates weren’t found in any of the polygons (states); an if statement takes care of this check with an appropriate message.

The following is an example mongo shell query:

rs0:PRIMARY> var location = [ -73.965355,40.782865 ]

rs0:PRIMARY> db.states.find({
“loc”: { “$geoIntersects”:
{ “$geometry”:
{ type: “Point”, “coordinates”: location }
}
}
},
{name:1})
{ “_id” : ObjectId(“536b0a143004b15885c91a2c”), “name” : “New York” }

The Python function to be added next is:

# Find the state the coordinate is part of
def geointersects(lon, lat):
try:
check_float_lon = isinstance(lon, float)
check_float_lat = isinstance(lat, float)
if check_float_lon is False or check_float_lat is False:
print(“The longitude or latitude coordinates are not correct, float type required!”)
quit()
db_client = get_db_client()
collection_states = db_client[‘geodata’][‘states’]

query_geointersects = {
“loc”: {
“$geoIntersects”: {
“$geometry”: {
“type”: “Point”,
“coordinates”: [lon, lat]
}
}
}
}

document = collection_states.find_one(query_geointersects,
projection={
“_id”: 0,
“name”: 1
})
if document is not None:
state_name = document[‘name’]
return state_name
else:
raise SystemExit(“The geo location you entered was not found in the United States!”)
except Exception as e:
print(‘Exception in geoIntersects: {}’.format(e))
raise

Find GeoJSON objects within a specified shape

For the second requirement, you need to find the locations that are contained within a given shape (polygon or multipolygon). For that, you use the newly supported operator $geoWithin.

The mongo shell query for this is as follows:

var state = db.states.findOne( {“name” : “New York”} );

db.airports.find(
{
loc : { $geoWithin : { $geometry : state.loc } }
},
{ name : 1 , type : 1, code : 1, _id: 0 }
);

Because the requirement is to show just the number of airports, our Python function counts the documents found using the pymongo function count_documents(). Add the function to the geoapp.py file:

def geowithin_count(state):
try:
db_client = get_db_client()
collection_states = db_client[‘geodata’][‘states’]
collection_airports = db_client[‘geodata’][‘airports’]
state_loc = collection_states.find_one({“name”: state},
projection={
“_id”: 0,
“loc”: 1
})
state_loc_polygon = state_loc[‘loc’]

query_geowithin = {
“loc”: {
“$geoWithin”: {
“$geometry”: state_loc_polygon
}
}
}
documents_within_count = collection_airports.count_documents(
query_geowithin)
return documents_within_count
except Exception as e:
print(‘Exception in geoWithin_count: {}’.format(e))
raise

Find GeoJSON objects in a certain radius and calculate distance

For the last requirement of our application, you use the $geoNear operator and take advantage of the aggregation framework in Amazon DocumentDB to do all the work of finding the locations within a certain proximity and calculate the distance from the reference point. I’ve also added a query filter to show only international airports.

The Python function takes three arguments: the proximity radius (in our case in kilometers) and the geo location coordinates (longitude and latitude). The found documents are added to a list, which you can easily update later if needed. See the following code that you add to the geoapp.py file:

def geonear(proximity, lon, lat):
try:
db_client = get_db_client()
collection_airports = db_client[‘geodata’][‘airports’]
query_geonear = [{
“$geoNear”: {
“near”: {
“type”: “Point”,
“coordinates”: [lon, lat]
},
“spherical”: True,
“query”: {“type” : “International”},
“distanceField”: “DistanceKilometers”,
“maxDistance”: (proximity * 1000),
“distanceMultiplier”: 0.001
}
}, {
“$project”: {
“name”: 1,
“code”: 1,
“DistanceKilometers”: 1,
“_id”: 0
}
}, {
“$sort”: {
“DistanceKilometers”: 1
}
}]
documents_near = collection_airports.aggregate(query_geonear)
location_list = []
for doc in documents_near:
location_list.append(doc)
return location_list
except Exception as e:
print(‘Exception in geoNear: {}’.format(e))
raise

Run the script

You now have almost everything you need; you just need a main function. For your convenience, I’ve uploaded the script to GitHub. Let’s run it to examine how it works.

You need to supply a geo location coordinate; you can get one from Google Maps. If you drop a pin on the map you’ll see the geo location coordinate, just keep in mind that Google maps shows latitude first and longitude second. For example, in the following screenshot, you have the values -73.895006 longitude and 40.639955 latitude.

Let’s run the script with these input values and also specify a 40km radius to show the airports from. Don’t forget to pass the secret that contains the credentials to the Amazon DocumentDB cluster, for this example, the secret name is geodata-demo-cluster and is located in us-east-1 region.

# python3 geoapp.py –-region us-east-1 –-secret geodata-demo-cluster
Enter your longitude coordinate: -73.895006
Enter your latitude coordinate: 40.639955
Enter distance radius (in km): 40
The geolocation coordinate entered is in the state of: New York
—————————–
I have found a number of 29 airports in New York.
—————————–
The following airports were found in a 40 km radius:
{‘name’: ‘John F Kennedy Intl’, ‘code’: ‘JFK’, ‘DistanceKilometers’: 9.805144788970297}
{‘name’: ‘La Guardia’, ‘code’: ‘LGA’, ‘DistanceKilometers’: 15.39837436138942}
{‘name’: ‘Newark Intl’, ‘code’: ‘EWR’, ‘DistanceKilometers’: 23.83370014027987}

After you enter the geo location coordinates and the distance radius, the script identifies the US state the coordinate is part of, finds the number of airports located in the state, and lists the ones found in the specified radius, sorted by closest.

You can use the script to test different other locations or modify the radius. Here are some other coordinates (longitude, latitude) that you can use for this purpose:

-122.095023, 37.290036

-94.297089, 38.789243

-121.956352, 47.601100

-87.649233, 41.260084

Clean up

To clean up resources created for the application in this post, remove the following:

The Amazon DocumentDB cluster
The EC2 instance used for hosting the code
The secret stored in Secrets Manager

Summary

In this post, I introduced the new geospatial operators available in Amazon DocumentDB and demonstrated how you can use them to build or extend the capabilities of an application to include geolocation lookups.

For more information about geospatial operators, refer to Querying Geospatial data with Amazon DocumentDB and Introducing Geospatial query capabilities for Amazon DocumentDB (with MongoDB compatibility).

If you have any questions or comments about this post, use the comments section. If you have any feature requests for Amazon DocumentDB, email us at [email protected].

About the author

Mihai Aldoiu is a Senior DocumentDB Specialist Solutions Architect at AWS, based out of London. He enjoys helping customers building their solutions using NoSQL databases. Mihai has over 20 years of experience in different roles, including Unix/Linux Systems Administrator, SRE/DevOPS, Database Engineer, providing him a unique perspective on customers challenges related to performance, reliability or security.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments