MongoDB & Docker

In Part 1 of this series we saw how to create and run a simple MongoDB instance based on CentOS. This is good for basic dev and test use, but not much beyond that as it does not address a number of performance and fault tolerance challenges. In this post, we take a closer look at Docker’s disk storage options and the associated considerations for running a database (like MongoDB) on it.

File system layering

Docker’s root file system layering.

One of Docker’s key features (and my personal favourite) is the layering of the root file system. Each of the underlying layers are read-only, stacking up to form the actual file system with only the top layer writable. These can then be easily versioned, compared to see exactly what changed, and cached so that we don’t need to rebuild it from scratch each time.

This is a huge improvement from the traditional golden image approach, whereby entire file system images or Virtual Machine (VM) templates are manually built – it’s often unclear what exactly are in them and why. More recent approaches involve Configuration Management (CM) tools such as Puppet, Chef, and Ansible, but building a complex image on-demand from scratch will take a long time. Docker’s layering approach makes this blazingly fast by rebuilding only the layers that have changed.

It is however, not without downsides: the run-time performance of such layered file systems are woefully slow. This is dependent on the storage module used, with the original AUFS being deprecated in favour of other backends like OverlayFS, Btrfs, and device mapper. Regardless, I/O heavy workloads should be moved to Docker data volumes for optimal performance. They live outside of the original Docker container and thus bypass the layered file system. There are two main data volume types: host directory and data-only containers.

Data Volumes: Host directory

Using a Docker host directory data volume (image source).

A host directory data volume is simply a directory that is mounted into the original container. Building upon our previous example in Part 1, create a directory on our Docker host and use it for MongoDB’s dbpath (which contains the data and journal files). For example:

$ docker run -d -P -v ~/db:/data/db mongod --smallfiles

Check that the MongoDB container has started successfully by inspecting the log files:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS                      NAMES
efca3b637a75        mongod:latest       "mongod --smallfiles   9 minutes ago       Up 9 minutes        0.0.0.0:49160->27017/tcp   prickly_sammet
$ docker logs efca3b637a75
2015-02-01T18:35:02.279+0000 [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=efca3b637a75
2015-02-01T18:35:02.279+0000 [initandlisten] db version v2.6.7
2015-02-01T18:35:02.279+0000 [initandlisten] git version: a7d57ad27c382de82e9cb93bf983a80fd9ac9899
2015-02-01T18:35:02.279+0000 [initandlisten] build info: Linux build7.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-02-01T18:35:02.279+0000 [initandlisten] allocator: tcmalloc
2015-02-01T18:35:02.279+0000 [initandlisten] options: { storage: { smallFiles: true } }
2015-02-01T18:35:02.282+0000 [initandlisten] journal dir=/data/db/journal
2015-02-01T18:35:02.283+0000 [initandlisten] recover : no journal files present, no recovery needed
2015-02-01T18:35:02.454+0000 [initandlisten] allocating new ns file /data/db/local.ns, filling with zeroes...
2015-02-01T18:35:02.510+0000 [FileAllocator] allocating new datafile /data/db/local.0, filling with zeroes...
2015-02-01T18:35:02.510+0000 [FileAllocator] creating directory /data/db/_tmp
2015-02-01T18:35:02.513+0000 [FileAllocator] done allocating datafile /data/db/local.0, size: 16MB,  took 0.001 secs
2015-02-01T18:35:02.514+0000 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }
2015-02-01T18:35:02.514+0000 [initandlisten] 	 added index to empty collection
2015-02-01T18:35:02.514+0000 [initandlisten] waiting for connections on port 27017
2015-02-01T18:36:02.481+0000 [clientcursormon] mem (MB) res:36 virt:246
2015-02-01T18:36:02.481+0000 [clientcursormon]  mapped (incl journal view):64
2015-02-01T18:36:02.481+0000 [clientcursormon]  connections:0
2015-02-01T18:41:02.571+0000 [clientcursormon] mem (MB) res:36 virt:246
2015-02-01T18:41:02.571+0000 [clientcursormon]  mapped (incl journal view):64
2015-02-01T18:41:02.571+0000 [clientcursormon]  connections:0

Ensure that the data files have been created in the specified host directory ~/db:

$ ls -l ~/db
total 32776
drwxr-xr-x. 2 root root       17 Feb  1 18:35 journal
-rw-------. 1 root root 16777216 Feb  1 18:35 local.0
-rw-------. 1 root root 16777216 Feb  1 18:35 local.ns
-rwxr-xr-x. 1 root root        2 Feb  1 18:35 mongod.lock
drwxr-xr-x. 2 root root        6 Feb  1 18:35 _tmp

Quick benchmarking

How much faster are host directory data volumes than the default layered root file system? This of course depends on your environment and proper performance testing is beyond the scope of this blog post, but here’s a quick way to do some quick benchmarking with mongoperf.

First let’s create a mongoperf Docker image with the following Dockerfile:

# mongoperf process on latest CentOS
# See https://docs.docker.com/articles/dockerfile_best-practices/

FROM centos
MAINTAINER James Tan <james.tan@mongodb.com>

COPY mongodb.repo /etc/yum.repos.d/
RUN yum install -y mongodb-org-tools

WORKDIR /tmp
ENTRYPOINT [ "mongoperf" ]

Use the same mongodb.repo as the previous example in Part 1, reproduced here for your convenience:

[mongodb]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1

With the above two files in your current directory, build the image by running:

$ docker build -t mongoperf .

Now benchmark the layered root file system by running:

$ echo "{nThreads:32,fileSizeMB:1000,r:true,w:true}" | docker run -i --sig-proxy=false mongoperf

You should see output similar to the following:

mongoperf
use -h for help
parsed options:
{ nThreads: 32, fileSizeMB: 1000, r: true, w: true }
creating test file size:1000MB ...
testing...
optoins:{ nThreads: 32, fileSizeMB: 1000, r: true, w: true }
wthr 32
new thread, total running : 1
read:1 write:1
877 ops/sec 3 MB/sec
928 ops/sec 3 MB/sec
920 ops/sec 3 MB/sec
...
new thread, total running : 2
read:1 write:1
1211 ops/sec 4 MB/sec
1158 ops/sec 4 MB/sec
1172 ops/sec 4 MB/sec
...
new thread, total running : 4
read:1 write:1
read:1 write:1
1194 ops/sec 4 MB/sec
1163 ops/sec 4 MB/sec
1162 ops/sec 4 MB/sec
...
new thread, total running : 8
read:1 write:1
...
1112 ops/sec 4 MB/sec
1161 ops/sec 4 MB/sec
1174 ops/sec 4 MB/sec
...
new thread, total running : 16
read:1 write:1
...
1156 ops/sec 4 MB/sec
1178 ops/sec 4 MB/sec
1160 ops/sec 4 MB/sec
...
new thread, total running : 32
read:1 write:1
...
1244 ops/sec 4 MB/sec
1205 ops/sec 4 MB/sec
1211 ops/sec 4 MB/sec
...

mongoperf will keep running so press CTRL-c to get back to the terminal. The container is still running in the background, so let’s terminate it:

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
c1366d08b543        mongoperf:latest    "mongoperf"         4 minutes ago       Up 3 minutes                            boring_kirch
$ docker rm -f c1366d08b543
c1366d08b543

Now re-run the benchmark with a host directory data volume instead:

$ mkdir ~/tmp
$ echo "{nThreads:32,fileSizeMB:1000,r:true,w:true}" | docker run -i --sig-proxy=false -v ~/tmp:/tmp mongoperf

Here’s the corresponding output from my setup:

mongoperf
use -h for help
parsed options:
{ nThreads: 32, fileSizeMB: 1000, r: true, w: true }
creating test file size:1000MB ...
testing...
optoins:{ nThreads: 32, fileSizeMB: 1000, r: true, w: true }
wthr 32
new thread, total running : 1
read:1 write:1
1273 ops/sec 4 MB/sec
1242 ops/sec 4 MB/sec
1178 ops/sec 4 MB/sec
...
new thread, total running : 2
read:1 write:1
2437 ops/sec 9 MB/sec
2702 ops/sec 10 MB/sec
2546 ops/sec 9 MB/sec
...
new thread, total running : 4
read:1 write:1
read:1 write:1
2575 ops/sec 10 MB/sec
2465 ops/sec 9 MB/sec
2558 ops/sec 9 MB/sec
...
new thread, total running : 8
read:1 write:1
...
2471 ops/sec 9 MB/sec
3081 ops/sec 12 MB/sec
3027 ops/sec 11 MB/sec
...
new thread, total running : 16
read:1 write:1
...
3031 ops/sec 11 MB/sec
3376 ops/sec 13 MB/sec
3384 ops/sec 13 MB/sec
...
new thread, total running : 32
read:1 write:1
...
3272 ops/sec 12 MB/sec
3196 ops/sec 12 MB/sec
3385 ops/sec 13 MB/sec
...

Terminate and remove the container as before.

Comparing the last set of results with 32 concurrent read-write threads, we see a 180% improvement in the number of operations per second, from 1211 to 3385 ops/sec. There’s also a 225% increase in throughput from 4 to 13 MB/sec.

Container portability

These performance gains are offset by container portability – our mongod container now require a directory on the Docker host that is not managed by Docker so we can’t easily run or move it to another Docker host. The solution is to use data-only containers, as described in the next section.

Data Volumes: Data-only containers

Using a Docker data volume container (image source).

Data-only containers are the recommend pattern storing data in Docker as it avoids the tight coupling to host directories.

To create the data-only container for our benchmark, we re-use the existing mongoperf image:

$ docker create -v /tmp --name mongoperf-data mongoperf
7d476bb9d3ca0cf282e2d3b9cf54e18d7bbe9b561be5d34646947032b64b4b9c

Now re-run the benchmark with the --volume-from mongoperf-data parameter to use our data-only container:

$ echo "{nThreads:32,fileSizeMB:1000,r:true,w:true}" | docker run -i --sig-proxy=false --volumes-from mongoperf-data mongoperf

This produces the following output in my setup:

mongoperf
use -h for help
parsed options:
{ nThreads: 32, fileSizeMB: 1000, r: true, w: true }
creating test file size:1000MB ...
testing...
optoins:{ nThreads: 32, fileSizeMB: 1000, r: true, w: true }
wthr 32
new thread, total running : 1
read:1 write:1
1153 ops/sec 4 MB/sec
1146 ops/sec 4 MB/sec
1151 ops/sec 4 MB/sec
...
new thread, total running : 2
read:1 write:1
1857 ops/sec 7 MB/sec
2489 ops/sec 9 MB/sec
2459 ops/sec 9 MB/sec
...
new thread, total running : 4
read:1 write:1
read:1 write:1
2518 ops/sec 9 MB/sec
2477 ops/sec 9 MB/sec
2451 ops/sec 9 MB/sec
...
new thread, total running : 8
read:1 write:1
...
2812 ops/sec 10 MB/sec
2837 ops/sec 11 MB/sec
2793 ops/sec 10 MB/sec
...
ew thread, total running : 16
read:1 write:1
...
3111 ops/sec 12 MB/sec
3319 ops/sec 12 MB/sec
3263 ops/sec 12 MB/sec
...
new thread, total running : 32
read:1 write:1
...
2919 ops/sec 11 MB/sec
3274 ops/sec 12 MB/sec
3306 ops/sec 12 MB/sec
...

Performance wise it is similar to host directory data volumes. The data-only container persists even if the referencing container is removed (unless the -v option is used when running docker rm). We see this by running:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
7d476bb9d3ca        mongoperf:latest    "mongoperf"         9 minutes ago                                               mongoperf-data

Wrapping up

Coming back to our mongod container, we can now run it with a data-only container for better performance:

$ docker create -v /data/db --name mongod-data mongod
$ docker run -d -P --volumes-from mongod-data mongod --smallfiles

Remember, you can see the mapped local port number by running docker ps. For example:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS                      NAMES
08245e631171        mongod:latest       "mongod --smallfiles   40 seconds ago      Up 39 seconds       0.0.0.0:49165->27017/tcp   gloomy_meitner
$ mongo --port 49165
MongoDB shell version: 2.6.7
connecting to: 127.0.0.1:49165/test
>

Volumes will eventually become first class citizens in Docker. Meanwhile, consider using community tools like docker-volume to manage them more easily.

What’s next

In the next part of this series, we will investigate the various Docker networking options and see how that fits in with a multi-host MongoDB replica set. Stay tuned!

MongoDB & Docker – Part 2

File system layering

Data Volumes: Host directory

Quick benchmarking

Container portability

Data Volumes: Data-only containers

Wrapping up

What’s next

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...