Run automated MongoDB backups to S3 using AWS Lambda and Zappa
MongoDB comes with a backup tool called mongodump which is what you should be using when you have the resources to set up a dedicated backup job that is able to install / run the CLI tools that are needed.
If you’re running an application that is limited in scope, scarce on resources and money or just want to set up your backup without too much hassle, you can use a Python script running on AWS Lambda and Zappa to set up automated backups to AWS S3 using the setup described in this article.
We will use the zappa
command for all interaction with AWS, so the first thing that is needed to do is to install it using pip
:
pip install zappa
Next, you can run zappa init
to create a zappa_settings.json
file. After it has been created, you will need to make the following adjustments:
- since the automated backup function does not need to expose an HTTP interface,
apigateway_enabled
can be set tofalse
. - as the backup is not time critical, it does not need a “keep warm” callback, this can be disabled by setting
keep_warm
tofalse
. - Zappa can automatically configure the CloudWatch triggers needed to periodically run the backup function. This is specified in the
events
key of your environments configuration. The simplest way of configuring an event is passing an object containing afunction
key that points to the handler you want to run and anexpression
key that determines when to run the function. Most of the time you should be able to schedule what you want using Cron and Rate Expressions.
In the end, the settings file should look something like this:
{
"backup": {
"aws_region": "eu-central-1",
"project_name": "mongo-backup",
"runtime": "python3.6",
"s3_bucket": "zappa-xxxxxxxxx",
"events": [{
"function": "main.handler",
"expression": "cron(0 18 * * ? *)"
}],
"apigateway_enabled": false,
"keep_warm": false
}
}
Next, the handler running the backup will need to be defined. We’ll use pymongo
for interacting with MongoDB and boto3
for uploading the files to S3, so you will need to install these:
pip install pymongo boto3
The backup script
I created a GitHub repository containing the code I am using for running the backup (it contains a few more options), but this is what main.py
should roughly look like:
from os import environ
from urllib.parse import urlparse
from pymongo import MongoClient
import boto3 as boto
from bson.json_util import dumps, JSONOptions, DatetimeRepresentation
s3 = boto.resource("s3")
def handler(event, context):
# the dumps will be stored in a temp file
# which also limits the size of a backup to slightly
# below 512MB (Lambda's limit on file sizes) in this setup
temp_filepath = "/tmp/mongodump.json"
bucket_folder = environ.get("BUCKET_FOLDER", "backups")
bucket_name = environ["BUCKET_NAME"]
db_uri = environ["MONGO_URI"]
db_name = environ["MONGO_DATABASE"]
client = MongoClient(db_uri)
database = client.get_database(db_name)
json_options = JSONOptions(datetime_representation=DatetimeRepresentation.ISO8601)
for collection_name in database.collection_names():
with open(temp_filepath, "w") as f:
for doc in database.get_collection(collection_name).find():
f.write(dumps(doc, json_options=json_options) + "\n")
s3.Bucket(bucket_name).upload_file(
temp_filepath, "{}/{}.json".format(bucket_folder, collection_name)
)
For the function to run properly, the following environment variables need to be set:
MONGO_URI
points to your MongoDB host and contains credentials if needed.MONGO_DATABASE
is the name of the database you want to backup.BUCKET_NAME
is the identifier of the bucket you want to store the backup files in. The script assumes this bucket already exists.- By default, the files will be stored in a folder called
backups
, but this can be overridden by settingBUCKET_FOLDER
.
You can set these values either in your zappa_settings.json
using the aws_environment_variables
key or set them in the AWS UI in case you don’t want them in your code.
Install from pip
If you want to, you can also install the backup handler above using pip:
pip install mongo_lambda_backup
and have a single line in your main.py
that passes the handler through to lambda:
from mongo_lambda_backup.handler import handler
Create the Lambda function
Next, you should be able to create your Lambda function. Before doing so you might want to check if your local AWS credentials are correctly set and you have the correct permissions for creating Lambda Functions, CloudWatch Events and S3 buckets.
zappa deploy backup
Zappa creates all that is needed to execute your backup function, packages the code and uploads it to Lambda. After making changes to your code that you want to be deployed, you can run:
zappa update backup
In case you don’t need your backup anymore, run undeploy
:
zappa undeploy backup
As your backup is probably not running too frequently and Lambda’s free plan is very generous, this gives us a free automated backup of MongoDB resources. Nice.
Keeping multiple backup versions
You may have noticed that the backup script is overwriting previously existing files on each backup. In case you want to keep multiple versions of your backups, I’d advise you to use S3 Versions and Lifecycle Rules in order to manage how many versions of your backups you want to keep.