In the current environment where every company provides software as a service (SAAS), there may be a need to isolate your customers’ data. Chances are that you’re here because you searched for this exact problem and found this blog. Maybe this is the answer to your problem… Maybe not. But in my humble opinion, it is nonetheless interesting to explore this approach.
Now, every python developer knows or has heard about Django. Its a very (very) powerful framework for developing web applications with very little effort (not really but … you get my gist). One of the few reasons for Django’s simplicity is that it is very opinionated about how to do things and in my (honest) personal opinion, this results in team’s consistency (in terms of programming, architectural style etc). But that’s a topic for another time. Going back to the topic at hand, we all love Django and want to develop a SAAS based on your shiny new app idea. But after a while, you run into the problem that you want to separate your different customers’s data. Sure, you can add a foreign key constraint at the database level, but if you’ve ever tried building something like this, you’d know that while that is the most simplest solution, it makes it very difficult (not impossible) to scale (at least for me.)
While there are many approaches, I’ve chosen this approach for one primary reason which is that I wanted to achieve complete isolation between different customers' data. A client had asked me to something similar for him and I had wanted to use docker for a while and this seemed like a perfect opportunity to get this done using docker. At the time my career was in its infancy so I didn’t really have much exposure to architectural design principles and such so it was very interesting to try this out. Now that the introduction is out of the way and I’ve digressed from my ramblings, let’s move onto the actual explanation.
First, we need a simple app that we provide software as a service to multiple customers.
For this, we created some of the functionality of the Polls app using the Django Tutorial
on the official Django website. Since our point is not to develop the whole thing but discuss
this approach to multi-client
architecture, I’ve only partly created a creation and
retrieval views for our poll Question
model.
Our Polls app directory looks something like this:
multi-client
├────Dockerfile
├────README.md
├────db.sqlite3
├────docker-compose.yml
├────manage.py
├────multi
│ ├────__init__.py
│ ├────asgi.py
│ ├────settings.py
│ ├────urls.py
│ └────wsgi.py
├────nginx.conf
├────pg_hba.conf
├────polls
│ ├────__init__.py
│ ├────admin.py
│ ├────apps.py
│ ├────migrations
│ │ ├────0001_initial.py
│ │ └────__init__.py
│ ├────models.py
│ ├────tests.py
│ ├────urls.py
│ └────views.py
├────postgresql.conf
├────postgresql.conf.custom
├────requirements.txt
└────wait-for-it.sh
NOTE: when setting this up, we called this project multi-client
but it is
essentially the my-website
app which you can create step by step through the Django
tutorial.
Inside the multi-client
directory we have Django auto-generated multi
directory and also
our poll
apps logic - in Django sense.
The following are the snippets of the files inside the polls
directory that we care about.
The snippets show what we actually care about in this approach.
polls/models.py
from django.db import models
class Question(models.Model):
"""Question Model."""
question = models.CharField(max_length=256)
publish_date = models.DateTimeField("date published")
class Choice(models.Model):
"""Choice model."""
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice = models.CharField(max_length=256)
votes = models.IntegerField(default=0)
polls/views.py
from datetime import datetime
from django.http import JsonResponse
from . import models
import json
def index(request):
"""Index for Polls."""
if request.method == "GET":
return list_questions(request)
if request.method == "POST":
return create_question(request)
def list_questions(request):
questions = models.Question.objects.all().values()
return JsonResponse(
{
"questions": list(questions),
}
)
def create_question(request):
"""Create request for creating a new poll."""
question = json.loads(request.body).get("question")
pub_date = datetime.now()
q = models.Question(question=question, publish_date=pub_date)
q.save()
ret_obj = q.__dict__.copy()
ret_obj.pop("_state")
return JsonResponse({"saved_object": ret_obj})
polls/urls.py
from django.urls import path
from . import views
urlpatterns = [
path("", views.index, name="index"),
]
These are the only files that we really care about - I’ve kept every other file in the polls
directory as it was when Django scaffolded it.
The references to the url are done through the multi/urls.py
file which looks like this
from django.contrib import admin
from django.urls import path, include
urlpatterns = [
path("polls/", include("polls.urls")),
path("admin/", admin.site.urls),
]
and finally the relavent changes to the setting.py
inside the multi
directory which looks
like the following
...
ALLOWED_HOSTS: list = ["*"]
# Application definition
INSTALLED_APPS = [
"polls.apps.PollsConfig",
"django.contrib.admin",
"django.contrib.auth",
"django.contrib.contenttypes",
"django.contrib.sessions",
"django.contrib.messages",
"django.contrib.staticfiles",
]
MIDDLEWARE = [
"django.middleware.security.SecurityMiddleware",
"django.contrib.sessions.middleware.SessionMiddleware",
"django.middleware.common.CommonMiddleware",
# "django.middleware.csrf.CsrfViewMiddleware",
"django.contrib.auth.middleware.AuthenticationMiddleware",
"django.contrib.messages.middleware.MessageMiddleware",
"django.middleware.clickjacking.XFrameOptionsMiddleware",
]
# disabled csrf middleware because we have a very simple rest api with no cookies
# no cookies = no csrf necessary
...
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": os.environ.get("DATABASE_NAME"),
"USER": os.environ.get("DATABASE_USER"),
"PASSWORD": os.environ.get("DATABASE_PASSWORD"),
"HOST": os.environ.get("DATABASE_HOST"),
"PORT": 5432,
}
}
...
For brevity, I’ve truncated to only show to relavent bits of the setting.py
file.
Now, everything is set, right? Hmmm… not quite.
Before I use different containers, we actually need to create a container itself for
our application. To create our container, create a Dockerfile
in the project root -
next to your manage.py
file.
Our Dockerfile
looks something like this
FROM python:3
ENV PYTHONBUFFERED 1
RUN mkdir /main
WORKDIR /main
ADD requirements.txt /main/
RUN pip install -r requirements.txt
ADD . /main/
To create the container, run
docker build -t taj/multi-client
This creates a container with the tag taj/multi-client
which can then be used
inside the docker-compose.yml
.
For the sake of completeness, I also want to show the nginx.conf
file I ended up using
nginx.conf
events {
worker_connections 1024;
}
http {
server {
server_name client1.polls.local;
location / {
proxy_pass http://client1_app:8000;
proxy_set_header HOST $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
server {
server_name client2.polls.local;
location / {
proxy_pass http://client2_app:8000;
proxy_set_header HOST $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
}
Notice the proxy pass in the conf. If you know nginx
then you probably know that the
clientx_app
is essentially a hostname set somewhere (usually through etc/hosts
.) Luckily
for us, the depends_on
property actually sets it up in our container so that we can refer
to the container hostname that a container depends on.
So, now we’re done with our preamble and can start with the containerising of our application and it’s database containers.
Taking a look at our docker-compose.yml
version: "3.7"
services:
nginx-proxy:
image: jwilder/nginx-proxy
container_name: nginx-proxy
ports:
- "80:80"
volumes:
- /var/run/docker.sock:/tmp/docker.sock:ro
- ./nginx.conf:/etc/nginx/nginx.conf # replace the default nginx.conf from the image with our nginx.conf
depends_on:
- client1_app
- client2_app
# client 1 block
client1_db:
image: postgres
volumes:
- ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
environment:
- POSTGRES_USER=client1
- POSTGRES_DB=client1
- POSTGRES_PASSWORD=password
client1_migration:
image: taj/multi-client
command: ./wait-for-it.sh client1_db:5432 -- python manage.py migrate
volumes:
- .:/main
- ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
depends_on:
- client1_db
environment:
- DATABASE_USER=client1
- DATABASE_NAME=client1
- DATABASE_PASSWORD=password
- DATABASE_HOST=client1_db
client1_app:
restart: always
image: taj/multi-client # this is why we ran the docker build command
command: ./wait-for-it.sh client1_db:5432 -- python manage.py runserver 0.0.0.0:8000
volumes:
- .:/main
- ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
depends_on:
- client1_db
environment:
- DATABASE_USER=client1
- DATABASE_NAME=client1
- DATABASE_PASSWORD=password
- DATABASE_HOST=client1_db
# client 2 block
client2_db:
image: postgres
volumes:
- ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
environment:
- POSTGRES_PASSWORD=password
- POSTGRES_USER=client2
- POSTGRES_DB=client2
client2_migration:
image: taj/multi-client
command: ./wait-for-it.sh client2_db:5432 -- python manage.py migrate
volumes:
- .:/main
- ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
depends_on:
- client2_db
environment:
- DATABASE_USER=client2
- DATABASE_NAME=client2
- DATABASE_PASSWORD=password
- DATABASE_HOST=client2_db
client2_app:
restart: always
image: taj/multi-client # this is why we ran the docker build command
command: ./wait-for-it.sh client2_db:5432 -- python manage.py runserver 0.0.0.0:8000
volumes:
- .:/main
- ./pg_hba.conf:/usr/local/var/postgres/pg_hba.conf
depends_on:
- client2_db
environment:
- DATABASE_USER=client2
- DATABASE_NAME=client2
- DATABASE_PASSWORD=password
- DATABASE_HOST=client2_db
you would see that there is nothing fancy here, we have 3 main blocks (total of 7 services).
The first service is using nginx-proxy
image by jwilder, which is not an official nginx
but I used it because my knowledge of networking through nginx is very limited and this was
a proof of concept so the problems that I was facing were not trivial for me to solve in a
day.1
The services client1_db
, client1_migrations
and client1_app
are a separate block (in my mind).
This is essentially the cluster for client1.polls.local
which we defined in nginx.conf
file.
The client1_db
instantiates a postgresql database with the
provided configurations such as database, user and password. The client1_migrations
and
client1_app
are using the image that we created earlier with the tag taj/multi-client
.
Since our Django app needs to talk to the database service, we explicitly enforced a
dependency for client1_app
and client1_migrations
through the depends_on
property. We also
provide the env variables for the database configurations directly through the docker-compose.yml
.
There are probably better ways to do these configurations but I went with env variables as it was
the easiest approach and I wanted to work on a proof of concept so security wasn’t an issue at this
point; something that should always be considered in a production environment. This depends_on
property creates a simple networking bridge between these containers and theoretically
everything should’ve worked, but unfortunately it didn’t.
See the problem was that the client1_migrations
and client1_app
were executing before
the client1_db
container could run. Therefore, the client1_migrations
failed to execute
and the client1_app
was not able to use the models because they were not migrating. I was
going to write a script to sleep for about 1 minute before I run the app and migrations container
but ended up using an open source wait-for-it.sh
script that does the job. So had to include
that into my app and migrations container and change the RUN
command to use the wait-for-it.sh
script. All of this applies to the services, client2_db
, client2_app
and
client2_migrations
as well, making it similar to a separate cluster (I call this block - not
sure about the actual terminology).
Now, at this point, when we execute the docker-compose
command
docker-compose up -d
It spins up all the necessary containers and voila! We have everything set up for two clients at the same time. Pat yourself at the back, you’ve reached the final stage. Everything should be working - so its time to roll out the red carpet and test your application. To begin testing, we have to modify our system host file. Simply adding something like this does the trick
/etc/hosts
127.0.0.1 localhost client1.polls.local client2.polls.local
To be honest, this is not a complete solution that I ended up using because I used Go with Docker to achive something similar with less problems as I could just as easily deploy these containers to Google’s AppEngine. But that a story for another time.
Going back to the topic at hand. The primary benefit of this approach is the complete isolation of databases. Although its a bit tricky, but I think a bit of code generation logic could speed up all the onboarding set up for new clients. What’s left is to talk about whether you really need this complete isolation. From my experience, in most cases it suffices to just have a schema level isolation where each customer have different schemas.
The drawback of this approach is that, the developers would continually have to modify their hosts
file to add new clients. Not something that is viable. Maybe there is a way to using nginx
on
our host machine with wildcard matches on local to see proxy it to the nginx
container.
Anyways, thats a avenue to explore on its own with a network expert at your company (or google).
Another con that I can think about in this approach is that it can get tricky for the people
who manage your infrastructure at your company. Indeed there is quite a bit of manual work,
albeit can be reduced by using approaches such as code generating the infrastructure on demand
based on some data. But then that adds another thing that will need to be managed by them. They
might not be happy about it.
So the moral I’m getting at is that this approach makes the life of the developer easier because they only have to maintain one application, but it might not be rainbows and stars for the people managing your infrastructure. So it may not be something that would be viable long term.
Alright mates, that’s the end of our explanation. If you think there is a better way to approach this kind of architecture, feel free to open an issue on the github repository and we can discuss further improvements there.
There was an issue where the host was not able to access the <client>.polls.local
which
were hardcoded into the /etc/hosts
file to the localhost address 127.0.0.1
. I
tried many different solutions but to no avail. Finally, I used an unofficial image because it
did what I wanted to achieve with it. ↩︎