Posted by Hitul Mistry
/22 May 21
Tagged under: #django,#python,#PostgreSQL
Django is a full stack framework with capability to scale upto millions of users. Whenever we reach to the scale we have to follow the best practices in Django to effective performance and development. We have listed 70+ best practices which should be followed.
While scalling django we spend our unexpected time while getting the response from third party tools and databases. If we can better manage them and also understand the internal working of Django we can scale Django.
cache_page
and it can have the argument time in seconds.@cache_page(60 * 10)
def user_data(request):
...
user_data
will be called and its output will be stored into cache.{% load cache %}
{% cache 60 menu_items %}
<p>Menu Name: {{ menu_item.name }}</p>
{% endcache %}
{% load cache %}
{% cache 60 user_details user_instance %}
<p>First name: {{ user_instance.first_name }}</p>
<p>Last name: {{ user_instance.last_name }}</p>
{% endcache %}
user_instance
value.CONN_MAX_AGE
parameter to couple of minutes. Value of CONN_MAX_AGE
parameter should be in sync with database's connection timeout parameter.OPTIONS
in DATABASES
configuration.OPTIONS
contains db specific configurations. It should be configured if required to gain better performance.OPTIONS
.INSTALLED_APPS
, middlewares
, database
, system checks
, template engines
and much more at the startup.INSTALLED_APPS = [
"app1",
"app2"
]
if DEBUG:
INSTALLED_APPS.append("app_debug1")
app_debug1
only in case of debug is True.import json
# Getting the user_details from the request.
# Lets assume user_details is a JSON with around 1000 records in it.
# Sample schema
# [{"name": "hitul"}, {"name": "rex"}]
user_details = request.POST.get("user_details")
user_details_dict = json.loads(user_details)
for detail in user_details:
models.UserDetails.objects.create(
name=detail.get("name")
)
user_details_dict
has 1000 records then 1000 insert queries will be initiated.user_details = request.POST.get("user_details")
user_details_dict = json.loads(user_details)
user_details_instances = []
for detail in user_details:
user_details_instances.append(
models.UserDetails(
name=detail.get("name")
)
)
models.UserDetails.objects.bulk_create(user_details_instances)
bulk_create
in django will just do single insert query into database and it will be 10x faster.bulk_create
.user_details = models.User.objects.filter()
for user_instance in user_details:
print(user_instance.first_name)
models.User.objects.filter()
has 100 users then on each for loop cycle django will do one query select user.d, user.first_name, user.last_name from user where id = <id>
for 100 times with different <id>
value.user_details = models.User.objects.filter().iterator()
for user_instance in user_details:
print(user_instance.first_name)
filter
.user_details = models.User.objects.filter().iterator(chunk_size=10)
for user_instance in user_details:
print(user_instance.first_name)
chunk_size
should be used. Because if django tries to load all the record in one shot then after receiving the data it will have to map it with the django queryset which can lead to much more delay in processing.select_related
user_details = models.User.objects.filter().iterator()
# Here, lets suppose profile is a foreign key to Profile table.
for user_instance in user_details:
print(user_instance.profile.profile_picture)
profile_picture
from database because, it is a foreign key and to fetch the foreign key data Django does the query on runtime to get the values.user_details = models.User.objects.filter().prefetch_related('profile__profile_picture')
# Here, lets suppose profile is a foreign key to Profile table.
for user_instance in user_details:
print(user_instance.profile.profile_picture)
profile_picture
because it has already fetched the results. However, it will make the query on each for loop cycles. Because, django's iterator
is not used.iterator
and select_related
cannot be used together. Read the article.iterator
or select_related
based on lesser number of queries being done.prefetch_related
class Student(models.Model):
pass
class College(models.Model):
a = ForeignKey(Student)
for student_instance in Student.objects.all():
print(student_instance.college_set.all())
college_set
(reverse foreignkey lookup) query.class Student(models.Model):
pass
class College(models.Model):
a = ForeignKey(Student)
for student in Student.objects.prefetch_related('college_set').all():
print(student)
models.User.objects.all()
models.User.objects.all().values('first_name', 'last_name')
first_name
and last_name
columns. Only difference between this and normal queryset will be that values
will return dict
.values
can fetch the required column data only. It can avoid the unnecessary overhead of fetching and mapping the extra columns with the queryset and as a result boosts the performance.values_list
also does the similar job as values
with different it will return tuple
.values
and values_list
only
and defer
are another ways to optimize the django performance by querying required columns.models.User.objects.all().defer('first_name')
first_name
. Whenever, we try to access the other column from the queryset, that column details will be fetched from the database on that specific instance.models.User.objects.all().only('first_name')
only
is also inverse of defer
. only
will fetch the first_name
only and whenever we try to access other column, database query will be initiated to fetch the data.user_instance = models.User.objects.filter(id=10).last()
user_instance.first_name = "Hitul"
user_instance.save()
UPDATE core_user SET core_user.first_name='hitul', core_user.last_name='mistry', password='sfu7Hdsfsdf76', username='hitul', email='hitul@digiqt.com' where id = 10;
save
method. Here we can notice that, even though we have not updated any column than first_name
but still other columns also got updated.user_instance = models.User.objects.filter(id=10).last()
user_instance.first_name = "Hitul"
user_instance.save(update_fields=["first_name"])
update_fields
argument to update specific column.UPDATE core_user SET core_user.first_name='hitul' where id = 10;
update_fields
. update_fields
argument will make sure that only passed columns will be updated in the database.RAM
.O(1)
complexity in Redis. It's performance will be consistant in case of any volume.O(1)
is not possible then we can go with the other data structures with nearby lesser complexities. In any case, we should have the fair idea of the worst case complexities.HttpRespponse
.kb
.lazy
for the variables which can take time to evalute.lazy
function in the django will only evalute the variable value whenever they are being used.print
statements. It does not give any sense of what is going on, from where it goes initiated, time etc in logs.loggers
of django which gives lots of insights of the log.CDN(Content Distribution Network)
is the service which copies the files across different part of the globe. Later, whenever file is accesses, it is getting served from the nearby CDN server location from the client.CDN
network has petabyte scale bandwidth available with them.CDN
services available. Examples are Amazon Cloudfront, Google CDN, Cloudflare CDN, Azure CDN etc.git commit
. If pre-commit hooks finds any problem with the code then it does not let the developer commit the code and force the developer the fix the issues in the code.pre-commit hook
practices lead to habbits for the developers.repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-ast
- id: check-json
- id: check-merge-conflict
- id: check-symlinks
- id: debug-statements
- id: mixed-line-ending
- id: requirements-txt-fixer
- id: check-added-large-files
- id: detect-private-key
- id: flake8
args: [--max-line-length=170]
- repo: https://github.com/asottile/pyupgrade
rev: v2.7.3
hooks:
- id: pyupgrade
- repo: https://github.com/psf/black
rev: 19.3b0
hooks:
- id: black
runserver
in productionrequirements.txt
pre-commit
hooks can be used to update requirements.txt
before each commit..env
file, django has local_settings.py
.local_settings.py
can be added into .gitignore
file to restrict git from detecting change in file.local_settings.py
.local_settings.py
should have all the settings variables which are envrionment(development, prod and qa) specific.pre-commit
hooks or ci/cd health checks
.auto reload
on code change should be disabled on the production.eventlet
, gevent
, gthread
etc. Each one has their own pros and cons.multithreading
or multiprocessing
inside Django views or utils.LOGGER
. Tailor is as per the production needs.LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'filters': {
'require_debug_false': {
'()': 'django.utils.log.RequireDebugFalse'
},
'require_debug_true': {
'()': 'django.utils.log.RequireDebugTrue'
}
},
'formatters': {
'main_formatter': {
'format': '%(levelname)s:%(name)s: %(message)s '
'(%(asctime)s; %(filename)s:%(lineno)d)',
'datefmt': "%Y-%m-%d %H:%M:%S",
},
},
'handlers': {
'mail_admins': {
'level': 'ERROR',
'filters': ['require_debug_false'],
'class': 'django.utils.log.AdminEmailHandler'
},
'console': {
'level': 'DEBUG',
'filters': ['require_debug_true'],
'class': 'logging.StreamHandler',
'formatter': 'main_formatter',
}
},
'loggers': {
'django.request': {
'handlers': ['console'],
'level': 'ERROR',
'propagate': True,
},
'django': {
'handlers': ['console', ],
},
'py.warnings': {
'handlers': ['console', ],
},
'': {
'handlers': ['console'],
'level': "DEBUG",
},
}
}
from django.urls import reverse
# urls.py
url(r'abc/', views.UserView, name="user-details"),
# views.py
def view():
route_url = reverse("user-details")
return HttpResponse(route_url)
# template
<a href="{% url 'user-details' %}">User details</a>
urls.py
then no need to modify it into code.users_count = models.User.objects.filter(age__get=22).count()
if users_count:
print("User exists")
users_exists = models.User.objects.filter(age__get=22).exists()
if users_exists:
print("User exists")
True
in Django then Django does debugging logs and runs utilities to capture debug insights.Ahmedabad
K P Epitome, Block B, Office No: 714, Near DAV International School, Makarba, Ahmedabad-380051, Gujarat.
+91 99747 29554
Mumbai
WeWork, Enam Sambhav C-20, G Block,Bandra- Kurla Complex, MUMBAI-400051, Maharashtra.
+91 99747 29554
Stockholm
Bäverbäcksgränd 10 12462 Bandhagen, Stockholm, Sweden.
+46 72789 9039