21 Mar 2024

Django Streaming HTTP Responses

In this blog post, I’ll explain how and when to use Django’s StreamingHttpResponse, what you can accomplish with it, and when it might not be a good idea. Let’s start with the how.

How to create a Streaming HTTP Response in Django

import time

from django.http import StreamingHttpResponse


def streaming(_):
    def generate():
        for i in range(3):
            time.sleep(1)
            yield f"Chunk: {i}\n".encode()

    return StreamingHttpResponse(generate())

urls = [
    path("stream/", streaming),
]

If you are using Django 4.2 or later, you can use async iterators:

import asyncio

from django.http import StreamingHttpResponse


def astreaming(_):
    async def stream():
        for i in range(3):
            await asyncio.sleep(1)
            yield f"Chunk: {i}\n".encode()

    return StreamingHttpResponse(stream())

urls = [
    path("astreaming/", astreaming),
]

That’s it! The stream function is a generator that yields strings. Each chunk is sent to the client as soon as it’s yielded. I’ve added the time.sleep so that we can see it working:

Django doesn’t render an HTTP response itself. Instead, it follows the WSGI (or ASGI) spec. It’s then up to your WSGI server to create the HTTP response.

For example, Gunicorn receives Django’s streaming HTTP response intent and creates an HTTP/1.1 Chunked Response from it. Gunicorn’s response is then streamed to your browser. See How Does Django’s StreamingHttpResponse Work, Exactly? by Andrew for more details.

Even though chunked responses are not supported by HTTP/2, Nginx (and other modern proxy servers) have no issues converting the Chunked HTTP Response from Gunicorn to an HTTP/2 (or HTTP/3) equivalent, so even when your website is serving HTTP/2 requests, your response will still be streamed.

When to use Streaming HTTP Response

A good use case for Streaming HTTP Responses is generating a large CSV file. If the file has several GBs of data, you could generate it on the fly, line by line, ensuring your view never exceeds a few KBs of memory.

Streaming HTTP Responses are also used to implement Server-Sent Events (SSE). SSE is a one-way communication channel that can be a good fit for implementing notifications, live updates, etc. Check out django-eventstream for a Django package that makes it easy to use SSE.

I’ve been playing around with creating a ChatGPT-like user interface with zero lines of JavaScript. While this isn’t the best use case for streaming responses (see reasons below), it’s a fun example and seems to work pretty well:

Sadly, I couldn’t use Django’s Templating system to get this example to work because Django’s Template.render can’t be streamed. There is a Trac issue for this, and there was even a PR opened with the change implemented before work stalled. Since there was no activity in the last five years, is it not actually worth adding?

Things to know about Streaming HTTP Response

Besides being unable to use Django’s templating system, there are a few other things to remember when using Streaming HTTP Responses.

When using Streaming HTTP Responses, be aware that the connection is kept open while the response is generated. This can make your life miserable in several ways:

The thread/worker generating the response will be blocked until it’s done streaming the response. If the response takes a long time (minutes or even longer), you need to provision many concurrent workers to make sure there are always workers available for new requests. Using async solves this problem, since an async worker can handle hundreds or thousands of concurrent requests.
Most systems assume requests resolve fast. Fly will kill your request in 60 seconds, while Heroku will kill it in 30 seconds. Even Gunicorn with a sync worker will kill the request after 30 seconds (or whatever your --timeout setting is) while the worker is streaming data! Increasing the timeout and ensuring you use gthread workers might be necessary to avoid this issue, but it might not always be possible. Also, increasing or disabling timeouts globally because of one or two streaming views is not the best idea.
Browsers only allow six simultaneous connections to a single domain. If you are trying to stream into multiple windows/tabs simultaneously, you might encounter this problem. This is resolved if you use http/2, which allows multiple streams in a single connection.
Some Django middlewares might not work as expected. The GZipMiddleware, for example, will buffer the entire response before compressing it.
Keeping the connection open for an extended period could be challenging. Your clients might disconnect, or your server might kill the connection because you are doing an upgrade. To solve this, you can look at Server-Sent Events that have a built-in reconnection mechanism. A proxy like Pushpin can also help reduce the complexity of your app.

Because of all of these potential issues, use StreamingHttpResponses only when you really have no other choice. Sticking to the traditional way of generating a response will help you keep out of truble.

To finish up

I hope this post has helped you understand when and how to use Streaming HTTP Responses in Django. To finish up, I’ll leave you with another example video of me creating a loader using a Streaming HTTP Response:

Anže's Blog

Python, Django, and the Web

Django Streaming HTTP Responses

How to create a Streaming HTTP Response in Django

When to use Streaming HTTP Response

Things to know about Streaming HTTP Response

To finish up