background-shape
feature-image

Context

In the current environment where every company provides software as a service (SAAS), there may be a need to isolate your customers’ data. Chances are that you’re here because you searched for this exact problem and found this blog. Maybe this is the answer to your problem… Maybe not. But in my humble opinion, it is nonetheless interesting to explore this approach.

Now, every python developer knows or has heard about Django. Its a very (very) powerful framework for developing web applications with very little effort (not really but … you get my gist). One of the few reasons for Django’s simplicity is that it is very opinionated about how to do things and in my (honest) personal opinion, this results in team’s consistency (in terms of programming, architectural style etc). But that’s a topic for another time. Going back to the topic at hand, we all love Django and want to develop a SAAS based on your shiny new app idea. But after a while, you run into the problem that you want to separate your different customers’s data. Sure, you can add a foreign key constraint at the database level, but if you’ve ever tried building something like this, you’d know that while that is the most simplest solution, it makes it very difficult (not impossible) to scale (at least for me.)

So, what can you do?

While there are many approaches, I’ve chosen this approach for one primary reason which is that I wanted to achieve complete isolation between different customers' data. A client had asked me to something similar for him and I had wanted to use docker for a while and this seemed like a perfect opportunity to get this done using docker. At the time my career was in its infancy so I didn’t really have much exposure to architectural design principles and such so it was very interesting to try this out. Now that the introduction is out of the way and I’ve digressed from my ramblings, let’s move onto the actual explanation.

Let’s get started now, shall we?

First, we need a simple app that we provide software as a service to multiple customers. For this, we created some of the functionality of the Polls app using the Django Tutorial on the official Django website. Since our point is not to develop the whole thing but discuss this approach to multi-client architecture, I’ve only partly created a creation and retrieval views for our poll Question model.

The Poll App

Our Polls app directory looks something like this:

multi-client
 ├────Dockerfile
 ├────README.md
 ├────db.sqlite3
 ├────docker-compose.yml
 ├────manage.py
 ├────multi
 │    ├────__init__.py
 │    ├────asgi.py
 │    ├────settings.py
 │    ├────urls.py
 │    └────wsgi.py
 ├────nginx.conf
 ├────pg_hba.conf
 ├────polls
 │    ├────__init__.py
 │    ├────admin.py
 │    ├────apps.py
 │    ├────migrations
 │    │    ├────0001_initial.py
 │    │    └────__init__.py
 │    ├────models.py
 │    ├────tests.py
 │    ├────urls.py
 │    └────views.py
 ├────postgresql.conf
 ├────postgresql.conf.custom
 ├────requirements.txt
 └────wait-for-it.sh

NOTE: when setting this up, we called this project multi-client but it is essentially the my-website app which you can create step by step through the Django tutorial.

Inside the multi-client directory we have Django auto-generated multi directory and also our poll apps logic - in Django sense.

The following are the snippets of the files inside the polls directory that we care about. The snippets show what we actually care about in this approach.

polls/models.py

from django.db import models


class Question(models.Model):
    """Question Model."""

    question = models.CharField(max_length=256)
    publish_date = models.DateTimeField("date published")


class Choice(models.Model):
    """Choice model."""

    question = models.ForeignKey(Question, on_delete=models.CASCADE)
    choice = models.CharField(max_length=256)
    votes = models.IntegerField(default=0)

polls/views.py

from datetime import datetime
from django.http import JsonResponse
from . import models
import json


def index(request):
    """Index for Polls."""
    if request.method == "GET":
        return list_questions(request)
    if request.method == "POST":
        return create_question(request)


def list_questions(request):
    questions = models.Question.objects.all().values()

    return JsonResponse(
        {
            "questions": list(questions),
        }
    )


def create_question(request):
    """Create request for creating a new poll."""
    question = json.loads(request.body).get("question")
    pub_date = datetime.now()
    q = models.Question(question=question, publish_date=pub_date)
    q.save()
    ret_obj = q.__dict__.copy()
    ret_obj.pop("_state")
    return JsonResponse({"saved_object": ret_obj})

polls/urls.py

from django.urls import path
from . import views

urlpatterns = [
    path("", views.index, name="index"),
]

These are the only files that we really care about - I’ve kept every other file in the polls directory as it was when Django scaffolded it.

The references to the url are done through the multi/urls.py file which looks like this

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path("polls/", include("polls.urls")),
    path("admin/", admin.site.urls),
]

and finally the relavent changes to the setting.py inside the multi directory which looks like the following

...
ALLOWED_HOSTS: list = ["*"]

# Application definition

INSTALLED_APPS = [
    "polls.apps.PollsConfig",
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
]

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.middleware.common.CommonMiddleware",
    # "django.middleware.csrf.CsrfViewMiddleware",
    "django.contrib.auth.middleware.AuthenticationMiddleware",
    "django.contrib.messages.middleware.MessageMiddleware",
    "django.middleware.clickjacking.XFrameOptionsMiddleware",
]

# disabled csrf middleware because we have a very simple rest api with no cookies
# no cookies = no csrf necessary

...

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": os.environ.get("DATABASE_NAME"),
        "USER": os.environ.get("DATABASE_USER"),
        "PASSWORD": os.environ.get("DATABASE_PASSWORD"),
        "HOST": os.environ.get("DATABASE_HOST"),
        "PORT": 5432,
    }
}

...

For brevity, I’ve truncated to only show to relavent bits of the setting.py file.

Now, everything is set, right? Hmmm… not quite.

Dockerize the application

Before I use different containers, we actually need to create a container itself for our application. To create our container, create a Dockerfile in the project root - next to your manage.py file.

Our Dockerfile looks something like this

FROM python:3
ENV PYTHONBUFFERED 1
RUN mkdir /main
WORKDIR /main

ADD requirements.txt /main/
RUN pip install -r requirements.txt

ADD . /main/

To create the container, run

docker build -t taj/multi-client

This creates a container with the tag taj/multi-client which can then be used inside the docker-compose.yml.

For the sake of completeness, I also want to show the nginx.conf file I ended up using

nginx.conf

events {
    worker_connections 1024;
}

http {
    server {
        server_name client1.polls.local;
        location / {
            proxy_pass http://client1_app:8000;
            proxy_set_header HOST $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }

    server {
        server_name client2.polls.local;
        location / {
            proxy_pass http://client2_app:8000;
            proxy_set_header HOST $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

Notice the proxy pass in the conf. If you know nginx then you probably know that the clientx_app is essentially a hostname set somewhere (usually through etc/hosts.) Luckily for us, the depends_on property actually sets it up in our container so that we can refer to the container hostname that a container depends on.

Finally we’re getting somewhere

So, now we’re done with our preamble and can start with the containerising of our application and it’s database containers.

Taking a look at our docker-compose.yml

version: "3.7"

services:
  nginx-proxy:
    image: jwilder/nginx-proxy
    container_name: nginx-proxy
    ports:
      - "80:80"
    volumes:
      - /var/run/docker.sock:/tmp/docker.sock:ro
      - ./nginx.conf:/etc/nginx/nginx.conf # replace the default nginx.conf from the image with our nginx.conf
    depends_on:
      - client1_app
      - client2_app

  # client 1 block
  client1_db:
    image: postgres
    volumes:
      - ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
    environment:
      - POSTGRES_USER=client1
      - POSTGRES_DB=client1
      - POSTGRES_PASSWORD=password
  client1_migration:
    image: taj/multi-client
    command: ./wait-for-it.sh client1_db:5432 -- python manage.py migrate
    volumes:
      - .:/main
      - ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
    depends_on:
      - client1_db
    environment:
      - DATABASE_USER=client1
      - DATABASE_NAME=client1
      - DATABASE_PASSWORD=password
      - DATABASE_HOST=client1_db
  client1_app:
    restart: always
    image: taj/multi-client # this is why we ran the docker build command
    command: ./wait-for-it.sh client1_db:5432 -- python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/main
      - ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
    depends_on:
      - client1_db
    environment:
      - DATABASE_USER=client1
      - DATABASE_NAME=client1
      - DATABASE_PASSWORD=password
      - DATABASE_HOST=client1_db

  # client 2 block
  client2_db:
    image: postgres
    volumes:
      - ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_USER=client2
      - POSTGRES_DB=client2
  client2_migration:
    image: taj/multi-client
    command: ./wait-for-it.sh client2_db:5432 -- python manage.py migrate
    volumes:
      - .:/main
      - ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
    depends_on:
      - client2_db
    environment:
      - DATABASE_USER=client2
      - DATABASE_NAME=client2
      - DATABASE_PASSWORD=password
      - DATABASE_HOST=client2_db
  client2_app:
    restart: always
    image: taj/multi-client # this is why we ran the docker build command
    command: ./wait-for-it.sh client2_db:5432 -- python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/main
      - ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
    depends_on:
      - client2_db
    environment:
      - DATABASE_USER=client2
      - DATABASE_NAME=client2
      - DATABASE_PASSWORD=password
      - DATABASE_HOST=client2_db

you would see that there is nothing fancy here, we have 3 main blocks (total of 7 services).

The first service is using nginx-proxy image by jwilder, which is not an official nginx but I used it because my knowledge of networking through nginx is very limited and this was a proof of concept so the problems that I was facing were not trivial for me to solve in a day.1

The services client1_db, client1_migrations and client1_app are a separate block (in my mind). This is essentially the cluster for client1.polls.local which we defined in nginx.conf file. The client1_db instantiates a postgresql database with the provided configurations such as database, user and password. The client1_migrations and client1_app are using the image that we created earlier with the tag taj/multi-client. Since our Django app needs to talk to the database service, we explicitly enforced a dependency for client1_app and client1_migrations through the depends_on property. We also provide the env variables for the database configurations directly through the docker-compose.yml. There are probably better ways to do these configurations but I went with env variables as it was the easiest approach and I wanted to work on a proof of concept so security wasn’t an issue at this point; something that should always be considered in a production environment. This depends_on property creates a simple networking bridge between these containers and theoretically everything should’ve worked, but unfortunately it didn’t.

See the problem was that the client1_migrations and client1_app were executing before the client1_db container could run. Therefore, the client1_migrations failed to execute and the client1_app was not able to use the models because they were not migrating. I was going to write a script to sleep for about 1 minute before I run the app and migrations container but ended up using an open source wait-for-it.sh script that does the job. So had to include that into my app and migrations container and change the RUN command to use the wait-for-it.sh script. All of this applies to the services, client2_db, client2_app and client2_migrations as well, making it similar to a separate cluster (I call this block - not sure about the actual terminology).

Now, at this point, when we execute the docker-compose command

docker-compose up -d

It spins up all the necessary containers and voila! We have everything set up for two clients at the same time. Pat yourself at the back, you’ve reached the final stage. Everything should be working - so its time to roll out the red carpet and test your application. To begin testing, we have to modify our system host file. Simply adding something like this does the trick

/etc/hosts

127.0.0.1	localhost client1.polls.local client2.polls.local

Conclusion

To be honest, this is not a complete solution that I ended up using because I used Go with Docker to achive something similar with less problems as I could just as easily deploy these containers to Google’s AppEngine. But that a story for another time.

Going back to the topic at hand. The primary benefit of this approach is the complete isolation of databases. Although its a bit tricky, but I think a bit of code generation logic could speed up all the onboarding set up for new clients. What’s left is to talk about whether you really need this complete isolation. From my experience, in most cases it suffices to just have a schema level isolation where each customer have different schemas.

The drawback of this approach is that, the developers would continually have to modify their hosts file to add new clients. Not something that is viable. Maybe there is a way to using nginx on our host machine with wildcard matches on local to see proxy it to the nginx container. Anyways, thats a avenue to explore on its own with a network expert at your company (or google). Another con that I can think about in this approach is that it can get tricky for the people who manage your infrastructure at your company. Indeed there is quite a bit of manual work, albeit can be reduced by using approaches such as code generating the infrastructure on demand based on some data. But then that adds another thing that will need to be managed by them. They might not be happy about it.

So the moral I’m getting at is that this approach makes the life of the developer easier because they only have to maintain one application, but it might not be rainbows and stars for the people managing your infrastructure. So it may not be something that would be viable long term.

Alright mates, that’s the end of our explanation. If you think there is a better way to approach this kind of architecture, feel free to open an issue on the github repository and we can discuss further improvements there.


  1. There was an issue where the host was not able to access the <client>.polls.local which were hardcoded into the /etc/hosts file to the localhost address 127.0.0.1. I tried many different solutions but to no avail. Finally, I used an unofficial image because it did what I wanted to achieve with it. ↩︎