So I have a need to nightly backup a few things. I thought I would share how I do this using the following:
- Digital Ocean Object Storage
- Gitlab
- Mongo
- Python ( For simplicity )
To start, I have a template for connecting up to digital ocean that I use in multiple spots. We’ll call this docker_util.yaml
image: docker:24.0.5
services:
- docker:24.0.5-dind
before_script:
- wget https://github.com/digitalocean/doctl/releases/download/v1.105.0/doctl-1.105.0-linux-amd64.tar.gz
- tar xf doctl-1.105.0-linux-amd64.tar.gz
- mv doctl /bin
- doctl auth init -t ${DO_API_TOKEN}
- doctl registry login
Notice the ${DO_API_TOKEN}, you will need to create a secret for this. Also pending on when you are reading this, you may need to update the doctl version.
Next let’s look at our python script
backup.py
import os
import boto3
import botocore
session = boto3.session.Session()
client = session.client('s3',
config=botocore.config.Config(s3={'addressing_style': 'virtual'}),
region_name='nyc3',
endpoint_url='https://nyc3.digitaloceanspaces.com',
aws_access_key_id=os.getenv('SPACES_KEY'),
aws_secret_access_key=os.getenv('SPACES_SECRET'))
files = os.listdir("backup")
print(files)
for file in files:
client.upload_file("backup/{}".format(file),'your_bucket_here', 'backups/{}'.format(file))
All this is doing is taking the files found in a folder named backup from artifacts, and then iterating across each file and pushing it up using s3. Notice, you need to update this with your bucket name. Also notice the SPACES_KEY and SPACES_SECRET. This auth information was generated from DigitalOcean’s object storage access keys.
Here is our requirements file
bcrypt==3.2.0 beautifulsoup4==4.10.0 blinker==1.4 boto3==1.28.70 botocore==1.31.70 Brotli==1.0.9 certifi==2020.6.20 chardet==4.0.0 click==8.0.3 colorama==0.4.4 command-not-found==0.3 cryptography==3.4.8 cupshelpers==1.0 dbus-python==1.2.18 distlib==0.3.7 distro==1.7.0 distro-info==1.1+ubuntu0.1 dnspython==2.1.0 evdev==1.4.0 filelock==3.12.4 fuse-python==1.0.2 git-filter-repo==2.34.0 gpg===1.22.0-unknown html5lib==1.1 httplib2==0.20.2 idna==3.3 importlib-metadata==4.6.4 jeepney==0.7.1 jmespath==1.0.1 keyring==23.5.0 language-selector==0.1 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 lxml==4.8.0 Markdown==3.3.6 more-itertools==8.10.0 mutagen==1.45.1 netifaces==0.11.0 oauthlib==3.2.0 olefile==0.46 pexpect==4.8.0 Pillow==9.0.1 platformdirs==3.11.0 ptyprocess==0.7.0 pycairo==1.20.1 pycryptodomex==3.11.0 pycups==2.0.1 Pygments==2.11.2 PyGObject==3.42.1 PyJWT==2.3.0 pylibacl==0.6.0 pyOpenSSL==21.0.0 pyparsing==2.4.7 PyQt5==5.15.9 PyQt5-sip==12.12.2 python-apt==2.4.0+ubuntu2 python-dateutil==2.8.1 pyxattr==0.7.2 PyYAML==5.4.1 reportlab==3.6.8 requests==2.25.1 requests-toolbelt==0.9.1 s3transfer==0.7.0 SecretStorage==3.3.1 six==1.16.0 soupsieve==2.3.1 ssh-import-id==5.11 syncthing-gtk===v0.9.4.4-ds-git20220108-9023143f8b93 tornado==6.1 ubuntu-drivers-common==0.0.0 ufw==0.36.1 urllib3==1.26.5 virtualenv==20.24.6 wadllib==1.3.6 webencodings==0.5.1 websockets==9.1 xkit==0.0.0 yt-dlp==2022.4.8 zipp==1.0.0
Now for our GitLab runner script
include:
- project: 'gitlab_group_path/scripts'
file:
- '/docker_util.yaml'
stages:
- mongodump
- upload
mongodump:
image: ubuntu:24.04
stage: mongodump
services:
- docker:dind
before_script:
- apt update
- apt-get install -y wget libgssapi-krb5-2 libkrb5-3 libk5crypto3 libkrb5support0 libkeyutils1 bzip2
- wget https://fastdl.mongodb.org/tools/db/mongodb-database-tools-ubuntu2204-x86_64-100.13.0.deb
- dpkg -i mongodb-database-tools-ubuntu2204-x86_64-100.13.0.deb
- rm -f mongodb-database-tools-ubuntu2204-x86_64-100.13.0.deb
script:
- mongodump --uri="$MONGO_HOST" -d sidr --gzip
- mkdir backup
- GZIP=-9 tar -cjf "backup/backup-$(date +"%Y-%m-%d-%H-%M").tar.bz2" dump
- rm -rf dump
artifacts:
paths:
- backup/*
upload:
image: python:3.6-slim
stage: upload
before_script:
- pip install virtualenv
- virtualenv venv
- source venv/bin/activate
script:
- pip install boto3
- python3 backup.py
Now, when you run this, it should go through a few stages. First capturing your backup, and then the next will be the upload. You can apply this to lots of different things