Running Julia in Production (kinda)

Lessons Learned from Trying to Host an Oxygen App on Digital Ocean
Julia
Author
Published

August 28, 2024

Introduction

I’ve been thinking a lot about Bayesian Statistics recently. I’ve been working my way through two1 books2 on the subject and it’s not dramatic to say that these books have reignited my interest in going back to school to pursue further studies in statistics (more on that at some point in the future 😄).

In particular, I really am drawn to the Turing library in Julia when doing Bayesian stats projects. I hear a lot from random internet denizens and my coworkers that there’s no real point in adapting Julia because there are other options for everything it does. Why use Turing when you can use PyMC? Or brms?

This is not meant to be a blog post about why Turing is a great framework–but off the top of my head, there are a few reasons that I want to draw attention to it:

  1. Its syntax is great, and very close to how models are defined in textbooks/scientific papers. For example, the coinflip example from the official Turing docs looks like this:

    @model function coinflip(; N::Int)
        p ~ Beta(1, 1)
    
        y ~ filldist(Bernoulli(p), N)
    
        return y
    end;

    which defines our prior distribution for the variable p, the probability of heads in a coin toss, and the distribution of the observations y.

  2. It’s pure Julia! So the library is composable with other samplers, model families, and more.

  3. It’s fast, and fairly easy to work with.

My desire to use Turing for more than just toy problems led me to try and build something that lightly introduces users to the power of Julia without getting into the technical nitty gritty–basically, an app that any end user could use powered by Julia that was also buildable by me with my baseline knowledge of Bayesian stats.

So, I set out to do just that and built a simple Bayesian A/B Testing app. The output is a pretty visual of your control and variant, as well as a formatted table explaining the results served via a Shiny app.

an example A/B test, run through my app

Though the frontend is written in R, all of the heavy lifting is done in the backend by Julia. The only thing R does is grab results from a POST request to an Oxygen server and handle the frontend work with RShiny.

Getting this running in the cloud was challenging at first, but pretty rewarding. And honestly, it built my confidence that I could write a Julia microservice and use it in production if I chose to.

Note

Does that mean I would recommend moving all your microservices away from FastAPI to Oxygen? Probably not…if we think of the concept of “cool tokens” (that is, the limited amount of social or political capital that a person or team has to spend on new, innovative, or “cool” technologies), your job may have a limited supply of these tokens for exploring things like this…and in that case it’s better to stick to an industry standard like FastAPI.

This being a personal project, and the fact that the predictive code itself already leverages Julia, it made sense to just use the Julia framework for deploying a web app! And if you’re doing something in Julia and want to deploy it as a web app, I’d for sure recommend Oxygen in that case.

This post will explore, at a high level, the challenges and fun of getting this app to work “in production.” Let’s start with some details about how the server code actually works.

Writing Code is Easy (it works on my machine!)

The code which powers the app is fairly simple, and relies heavily on a few Julia libraries and their dependencies: Oxygen and Turing. The server logic has a single POST endpoint which returns the formatted results (which display in the table) and vectors containing the probability densities for each variant for visualization in R.

The test endpoint, defined in Oxygen, is fairly simple. There’s a bit of data validation I added later because my friend’s first instinct was apparently to try and break things…

thanks, Matt!

and of course there’s some code which translates the user’s input in the app into the inputs for a Bayesian A/B test. The test gets run, and the results are returned to the user in the frontend. Ultimately, without all the other server code, it looks like this:

@post "/test" function(req::HTTP.Request)
    data = JSON3.read(String(req.body))

    ## data validation
    if !haskey(data, "a") || !haskey(data, "b")
        return Response(400, "You must provide two variants: 'a' and 'b'.")
    end

    if !(data["a"]["conversions"] isa Int) || !(data["b"]["conversions"] isa Int)
        return Response(400, "Conversions must be an integer.")
    end

    if data["a"]["conversions"] < 0 || data["b"]["conversions"] < 0
        return Response(400, "Conversions must be a positive number.")
    end

    if !(data["a"]["samples"] isa Int) || !(data["b"]["samples"] isa Int)
        return Response(400, "Samples must be an integer.")
    end

    if data["a"]["samples"] < 0  || data["b"]["samples"] < 0
        return Response(400, "Samples must be a positive number.")
    end

    #=
    note that TestControl, TestVariant, and ABTest are just custom structs,
    defined elsewhere in the server code.
    =#
    control = TestControl(data["a"]["conversions"], data["a"]["samples"])
    variant = TestVariant(data["b"]["conversions"], data["b"]["samples"])
    ab_test = ABTest(data["name"], control, variant)

    println("Running test...")
    results = run_test(ab_test.a, ab_test.b)

    return serialize_results(results, control, variant, ab_test.name)
end

As mentioned before, I wrote the frontend in RShiny just because that’s what I was familiar with–plus it’s easy to deploy to shinyapps.io. httr2 is used to get the results from the /test endpoint. Locally, this worked great! I could submit a test via the app, and quickly get results back from Julia. Then, this got visualized with ggplot2.

So, now that I had the bones set up, it was time to get it working “in production.”

Deploying…can be hard

The simplest way to get this up and running not on my machine was to containerize the script as a Dockerfile so it can theoretically go anywhere that Docker can be. I have a Dockerfile which looks like this:

FROM julia:1.10.4
USER root

WORKDIR /app
COPY Project.toml Manifest.toml ./
RUN julia --project -e 'using Pkg; Pkg.instantiate(); Pkg.precompile();'
COPY . ./src

EXPOSE 8080
CMD ["julia", "--project", "src/server.jl"]

Without going too in-depth, the main thing to note here is that pre-compilation is specifically front-loaded to the Docker build step by copying over the project’s Project.toml and Manifest.toml. Anyone who has used Julia knows that precompilation can take a while, especially on machines where RAM is limited…so being able to pre-compile all the libraries here is a big plus and something I’d recommend doing with any project.

Theoretically, this Dockerfile setup means that all the packages are already pre-compiled when the script referenced in the CMD step is run. So it should have been smooth sailing from here, right…? Right…?

Attempt #1: DigitalOcean’s App Platform

My friend recommended that I use DigitalOcean’s App Platform service to deploy the Oxygen server. After all, I just had to point it to the Dockerfile above and it should have worked like magic. Through the app platform, I’d have access to some niceties like automatic rollbacks and a simpler deployment.

I immediately ran into problems. Deploying would fail without much information from DigitalOcean regardless of how the cluster was scaled or how much RAM it had. Just…silent failures that read as Error: []! I added more logging to the script triggered by the dockerfile to try and figure out where the code was failing at least, and it pointed me directly to the step of the code where it would run: using Turing.

I was flummoxed by this, and I still kind of am. I don’t have a definitive answer as to why this specifically failed other than the fact that whenever this line of code ran, it began consuming an ungodly amount of CPU.

As mentioned in the Dockerfile overview above, I attempted to front-load Julia’s precompilation during the Docker build stage so that it didn’t hog all the CPU and slow the cluster to a crawl…but confusingly, the build step would run successfully and the deployment would only fail while loading Turing.

The flakiness of the failure suggests to me that perhaps I was hitting some kind of DigitalOcean CPU limit, but since I couldn’t replicate the failure consistently, and scaling up still didn’t help, I can’t confirm that.

So, after multiple failed attempts at deploying the service via the app platform, I decided to go with a simpler option.

Attempt #2: A DigitalOcean Droplet

Despite the niceties offered by the app platform, I really just wanted something to work. I realized that I could use a small Droplet (basically DO’s Linux virtual machine) to host the server code, and run the code on the droplet via Docker.

The steps were something like this:

  • spin up the droplet
  • install Docker
  • SCP the Julia server code over to the root folder
  • spin up a docker container from the Dockerfile above, and set this container to automatically restart if the code crashes or the droplet restarts

For the initial docker build, where the pre-compilation step happens, I did scale up the cluster a bit to ensure it got through that step smoothly (again, pre-compilation in Julia is a CPU hog). Once that ran successfully, and I was able to make requests to the endpoint, I was able to scale back down to a smaller droplet with 1GB of memory. Because the docker container was set to auto-restart, this was as simple as shutting down the droplet, sizing down, and turning it back on.

And voila - this is how the app currently delivers its predictions cheaply for around $7 a month.

Learnings

Despite the failed attempts to use the app platform, I was surprised how easy it was to spin up a minimal web service using Oxygen–and delighted that I got a metrics dashboard for free out of the box!

Despite some hiccups, the droplet route ended up being pretty straightforward to get up and running. I really would have liked to get the app platform working, and I’m convinced that it would for a small Julia service that wasn’t loading a library for running MCMC…but at least in this case, that wouldn’t have worked.

Ultimately, though this was not really an on-the-job “win” in terms of deploying Julia, it did raise my confidence that Julia can be used reliably for real-world applications or services even if deployment is somewhat unconventional. Though the droplet approach was perhaps not as ideal as the app platform from a setup perspective, it’s been stable and reliable.

Footnotes

  1. McElreath, R. (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781315372495↩︎

  2. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., & Rubin, D.B. (2013). Bayesian Data Analysis (3rd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16018↩︎