AWS OpsWorks - setup mongodb ebs volume backups

I described how to setup mongodb on EC2 using OpsWorks and here is how to setup mongo data backups.

In my case all mongo data is stored on the same EBS volume so I just need to make a volume snapshot.

The relevant part from the mongodb docs:

    Backup with --journal

    The journal file allows for roll forward recovery.
    The journal files are located in the dbpath
    directory so will be snapshotted at the same time as the database files.
    If the dbpath is mapped to a single EBS volume then proceed to Backup
    the Database Files.
    If the dbpath is mapped to multiple EBS volumes, then in order to guarantee
    the stability of the file-system you will need to Flush and Lock
    the Database.
    NOTE
    Snapshotting with the journal is only possible if the journal resides
    on the same volume as the data files, so that one snapshot operation
    captures the journal state and data file state atomically.

This is my case - I have all data on the same EBS volume and journal option is enabled by default for MongoDB 2.0 and higher on 64-bit systems. In other cases it is necessary to flush and lock the database. This method is supported by the ec2-consistent-snapshot tool.

My solution is based on the modified version of aws-snapshot-tool. I wanted a completely automated setup where I don’t need to take manual steps like assigning tags to the volumes I need to backup.

The process I have now does this:

  • There is a special ’ec2-backup’ chef recipe which I assign to the instance (or instances) which volumes I need to backup
  • This recipe is added to the mongodb layer in OpsWorks, so every mongo instance will have it
  • Recipes assigns ‘MakeSnapshot’=True tag to the instance
  • Recipe also sets up cron jobs to perform daily, weekly and monthly backups
  • Snapshots are created by the aws snapshot tool launched by cron
  • Aws snapshot tool also sends results via SNS and I get backup notifications by email

The recipe (ec2-backup/recipes/default.rb) looks like this:


# see https://github.com/stuart-warren/chef-aws-tag

# Assign tag to the instance
include_recipe "aws"
tags = {
  "MakeSnapshot" => "True"
}
aws_resource_tag node['ec2']['instance_id'] do
    tags(tags)
    action :update
end

# Create directory where snapshot tool will be stored
directory "/srv/backup" do
  owner 'root'
  group 'root'
  mode '0644'
  action :create
end

# Copy the snapshot tool to /srv/backup
cookbook_file "makesnapshots.py" do
  path "/srv/backup/makesnapshots.py"
  action :create
end

# Copy the snapshot tool config to /srv/backup
cookbook_file "config.py" do
  path "/srv/backup/config.py"
  action :create
end

# Setup a cron job for daily backups
cron "backup-daily" do
  path "/usr/local/bin:$PATH"
  hour "1"
  minute "30"
  weekday '1-6'
  command "cd /srv/backup && /usr/bin/python makesnapshots.py day 2>&1 |/usr/bin/logger -t \"CRON: makenapshot\""
end

# Setup a cron job for weekly backups
cron "backup-weekly" do
  path "/usr/local/bin:$PATH"
  hour "2"
  minute "30"
  weekday '7'
  command "cd /srv/backup && /usr/bin/python makesnapshots.py week 2>&1 |/usr/bin/logger -t \"CRON: makenapshot\""
end

# Setup a cron job for monthly backups
cron "backup-monthly" do
  path "/usr/local/bin:$PATH"
  hour "3"
  minute "30"
  day '1'
  command "cd /srv/backup && /usr/bin/python makesnapshots.py month 2>&1 |/usr/bin/logger -t \"CRON: makenapshot\""
end

Under the recipe files folder (ec2-backup/files/default/) I have a modified copy of the aws-snapshot-tool. The recipe folder layout is this:

├── Berksfile
├── ec2-backup
│   ├── files
│   │   └── default
│   │       ├── config.py
│   │       ├── makesnapshots.py
│   ├── metadata.rb
│   └── recipes
│       └── default.rb

The Berksfile contains dependency info (only aws is relevant to ec2-backup recipe):

source 'https://supermarket.getchef.com'

cookbook 'mongodb'
cookbook 'aws', '>= 0.2.4'

My version of the makesnapshots.py is here. My modification was to allow tagging instances instead of volumes. There is already a similar pull request in the original repository so I didn’t submit my change back. There is also one new option in the config.py (see the full config example here):

    ...
    # Set to True to use intance tags, False - volume tags
    'use_instance_tag': True,
    ...

That’s all, now I have mongo data backups with email notifications.

MongoDB documentation - backup and restore on Amazon EC2

ec2-consistent-snapshot tool

aws-snapshot-tool and related post Automated Amazon EBS volume snapshots with boto

Bash script for Automatic EBS Snapshots and Cleanup on Amazon Web Services and related post

ec2-automate-backup tool and related post and another one

automated-ebs-snapshots tool

DevOps Backup in Amazon EC2 article

Stackoverflow: MongoDb EC2 EBS backups

Automated amazon ebs snapshot backup script with 7 day retention

Mongodb to Amazon s3 Backup Script

OpsWorks docs - Running Cron Jobs

Chef - cron resource

Chef - file resource and cookbook_file

Chef - directory resource

profile for Boris Serebrov on Stack Exchange, a network of free, community-driven Q&A sites