Gitlab To S3 Backup

So I have a need to nightly backup a few things. I thought I would share how I do this using the following:

  1. Digital Ocean Object Storage
  2. Gitlab
  3. Mongo
  4. Python ( For simplicity )

To start, I have a template for connecting up to digital ocean that I use in multiple spots. We’ll call this docker_util.yaml

image: docker:24.0.5
services:
  - docker:24.0.5-dind
before_script:
  - wget https://github.com/digitalocean/doctl/releases/download/v1.105.0/doctl-1.105.0-linux-amd64.tar.gz
  - tar xf doctl-1.105.0-linux-amd64.tar.gz
  - mv doctl /bin
  - doctl auth init -t ${DO_API_TOKEN}
  - doctl registry login

Notice the ${DO_API_TOKEN}, you will need to create a secret for this. Also pending on when you are reading this, you may need to update the doctl version.

Next let’s look at our python script

backup.py
import os
import boto3
import botocore

session = boto3.session.Session()
client = session.client('s3',
                        config=botocore.config.Config(s3={'addressing_style': 'virtual'}),
                        region_name='nyc3',
                        endpoint_url='https://nyc3.digitaloceanspaces.com',
                        aws_access_key_id=os.getenv('SPACES_KEY'),
                        aws_secret_access_key=os.getenv('SPACES_SECRET'))

files = os.listdir("backup")
print(files)

for file in files:
    client.upload_file("backup/{}".format(file),'your_bucket_here', 'backups/{}'.format(file))

All this is doing is taking the files found in a folder named backup from artifacts, and then iterating across each file and pushing it up using s3. Notice, you need to update this with your bucket name. Also notice the SPACES_KEY and SPACES_SECRET. This auth information was generated from DigitalOcean’s object storage access keys.

Here is our requirements file

bcrypt==3.2.0
beautifulsoup4==4.10.0
blinker==1.4
boto3==1.28.70
botocore==1.31.70
Brotli==1.0.9
certifi==2020.6.20
chardet==4.0.0
click==8.0.3
colorama==0.4.4
command-not-found==0.3
cryptography==3.4.8
cupshelpers==1.0
dbus-python==1.2.18
distlib==0.3.7
distro==1.7.0
distro-info==1.1+ubuntu0.1
dnspython==2.1.0
evdev==1.4.0
filelock==3.12.4
fuse-python==1.0.2
git-filter-repo==2.34.0
gpg===1.22.0-unknown
html5lib==1.1
httplib2==0.20.2
idna==3.3
importlib-metadata==4.6.4
jeepney==0.7.1
jmespath==1.0.1
keyring==23.5.0
language-selector==0.1
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lxml==4.8.0
Markdown==3.3.6
more-itertools==8.10.0
mutagen==1.45.1
netifaces==0.11.0
oauthlib==3.2.0
olefile==0.46
pexpect==4.8.0
Pillow==9.0.1
platformdirs==3.11.0
ptyprocess==0.7.0
pycairo==1.20.1
pycryptodomex==3.11.0
pycups==2.0.1
Pygments==2.11.2
PyGObject==3.42.1
PyJWT==2.3.0
pylibacl==0.6.0
pyOpenSSL==21.0.0
pyparsing==2.4.7
PyQt5==5.15.9
PyQt5-sip==12.12.2
python-apt==2.4.0+ubuntu2
python-dateutil==2.8.1
pyxattr==0.7.2
PyYAML==5.4.1
reportlab==3.6.8
requests==2.25.1
requests-toolbelt==0.9.1
s3transfer==0.7.0
SecretStorage==3.3.1
six==1.16.0
soupsieve==2.3.1
ssh-import-id==5.11
syncthing-gtk===v0.9.4.4-ds-git20220108-9023143f8b93
tornado==6.1
ubuntu-drivers-common==0.0.0
ufw==0.36.1
urllib3==1.26.5
virtualenv==20.24.6
wadllib==1.3.6
webencodings==0.5.1
websockets==9.1
xkit==0.0.0
yt-dlp==2022.4.8
zipp==1.0.0

Now for our GitLab runner script

include: 
  - project: 'gitlab_group_path/scripts'
    file: 
      - '/docker_util.yaml'

stages:
  - mongodump
  - upload

mongodump:
  image: ubuntu:24.04
  stage: mongodump
  services:
    - docker:dind
  before_script:
    - apt update
    - apt-get install -y wget libgssapi-krb5-2 libkrb5-3 libk5crypto3 libkrb5support0 libkeyutils1 bzip2
    - wget https://fastdl.mongodb.org/tools/db/mongodb-database-tools-ubuntu2204-x86_64-100.13.0.deb
    - dpkg -i mongodb-database-tools-ubuntu2204-x86_64-100.13.0.deb
    - rm -f mongodb-database-tools-ubuntu2204-x86_64-100.13.0.deb
  script:
    - mongodump --uri="$MONGO_HOST" -d sidr --gzip
    - mkdir backup
    - GZIP=-9 tar -cjf "backup/backup-$(date +"%Y-%m-%d-%H-%M").tar.bz2" dump
    - rm -rf dump
  artifacts:
    paths:
      - backup/*

upload:
  image: python:3.6-slim
  stage: upload
  before_script:
    - pip install virtualenv
    - virtualenv venv
    - source venv/bin/activate
  script: 
    - pip install boto3
    - python3 backup.py

Now, when you run this, it should go through a few stages. First capturing your backup, and then the next will be the upload. You can apply this to lots of different things

Leave a Reply