The engineering team of RainforestQA (YC S12) is remote and distributed around the globe, with developers in America, Europe, and Asia. We’re using PagerDuty to assign developers on-call and for contacting/escalating people in case of incidents.

Our working hours cover almost all time-zones, about 22 hours from full time-zone coverage. Yet for distributing our on-call schedule, we were unhappy with PagerDuty’s standard daily rotation. For a distributed team, why does somebody need to be on-call at night when other team members are working right now?

Follow the sun scheduling from PagerDuty which is their solution for distributed teams didn’t work for us because it assumes you can group people by blocks of time that don’t overlap. This didn’t work for us because we have many people with many different working times.

In 2017 we started an internal project called Tyrant to **automatically solve this problem of fairly managing on-call scheduling across time-zones**.

Finally, in 2019 during a hackathon session in Kuala Lumpur (learn how to run off-sites for distributed teams) we created the latest version of Tyrant which used discrete optimization.

Tyrant runs as a cron job every Saturday. It gets the working hour preferences of the developers, builds the on-call schedules, pushes them to PagerDuty, then emails the schedule to everyone.

This blog post is an introduction to discrete optimization for building the kind of on-call schedule we needed.

Discrete optimization is the selection of a best element (answer) with regard to some criterion, and where some of the variables are restricted to be discrete (the number of permitted values is either finite or countably infinite).

I had used discrete optimization in hobby projects and for programming challenges. But for a long time I couldn’t find a use case at work; every time I had a project that was solvable through discrete optimization the naive algorithms were better because they got a good enough result faster and without the complexity of adding a discrete optimization system.

Tyrant is a completely different story. As it creates the schedule only once per week, speed isn’t important - the quality of the result is.

Tyrant’s core is implemented in MiniZinc. MiniZinc is a high-level modelling language and compiler. MiniZinc compiles into FlatZinc, a low-level language that is understood by a wide range of discrete optimization solvers. For us it means faster development, self-explanatory model, and ability to switch solvers.

Let’s explore a simple problem to familiarize ourselves with the basic MiniZinc syntax. Imagine you run a small car factory named “miniTesla”. You have 100 lithium batteries, and 190000 kilograms of metal. Building a “Cybertruck” requires 1 battery and 2500 kg of metal, while building “Model3” requires also 1 battery but only 1500 kg of metal. If you can sell a “Cybertruck” for $40k and “Model3” for $35k you’d like to know how many of each vehicle to produce to maximize your profits.

miniTesla example in MiniZinc playground

The amount of words to describe the problem in English would be about the same as in MiniZinc. The code is self-explanatory even without knowing a lot about the syntax.

There’s a few nuances:

To reuse the model we can split it into 2 separate files: a file for the model and a file for the data. In the model file we have variable definition without values like this:

Source code of hello world with data file

Now we’re prepared to build our first simple on-call scheduling model. For input, the model takes developers preferences (does developer X want to be on-call at hour Y). The output: assign a developer to every hour next week. The final assignment should follow only one constraint: assign a developer to an hour according to their preferences.

Niklaus Wirth wrote the famous book Algorithms + Data Structures = Programs. For discrete optimization modeling, MiniZinc does Algorithms for us, so we only need to worry about the Data Structures part. For this problem we have 2 core data structures to define: the Input and the Output.

For output we use an array **assignment** which records which developer is on-call for every hour in the week and so has a size of 168 elements (because there are 7 days × 24 hours in a week).

Besides **array**, we’ve introduced **set of int **which is an enumeration of the numbers from start to end inclusive. MiniZinc arrays have named indexes. In our example **assignment** has named indexes based on the range **HOURS** (1..168). The statement **array[HOURS] of var DEVELOPERS: assignment = [7, 8, 9, ...] **therefore means developer 7 will be on-call at hour 1 (the first index of **HOURS** range).

We’ll make the input data a multi-dimensional array, with size **DEVELOPERS** × **HOURS**. This represents the mapping of developer and an hour to an integer in the set { 0, 1 }. Where a 1 means a developer is available for on-call.

Example of a data file (notice how rows separated with** | **symbol):

One note about **nb_developers**. Theoretically we could get the value of **nb_developers** from **working_hours**, but in practice it’s a chicken and egg problem and MiniZinc has to be told **nb_developers** to enumerate properly over **working_hours** array.

We’re going to add our first constraint now: if a developer is not available at some hour in working_hours array, then we should not assign them to be on-call.

**->** means implication. For me it was easier to understand by thinking it as being equivalent to this pseudocode:

Unlike the miniTesla example, here we don’t need optimization, as we don’t have an expression to optimize. Instead, we simply ask MiniZinc to satisfy all the constraints:

If you run it then a solution will be available in a few milliseconds.

The solution says it’s not so much fun to be the first developer.

First unfair model in MiniZinc Playground or on GitHub

Let’s add fairness to the model. We’re going to define fairness as the total on-call hours for each developer should be as similar as possible. To make life easier later we can create an intermediate array total_hours, which holds the numbers of on-call hours for every developer.

Here we can see new syntax, a list comprehension. We iterate for every developer, and in the inner list comprehension, calculate the actual number of working hours.

To solve for fairness, we need an expression to convert **total_hours** to a single number so different solutions can be compared. One option is **min-max**:

The minimal possible objective expression is 0 when all developers has equal total on-call hours per week. The expression is provably correct but in practice it’s pretty slow with default miniZinc optimizer, as the approach it takes is to first assign the first developer to all hours as initial solution and then try to improve it.

One option to improve performance is to tell MiniZinc to select developers randomly during search with search annotation. You can read more about this in the MiniZinc tutorial, here we’ll focus on another solution.

Min-max model in MiniZinc Playground or on GitHub

The min-max objective is simple to define, but unfortunately, in some edge cases, it will not work. Minimum and maximum create boundaries and the algorithm will try to push them together as much as possible. But imagine if minimum and maximum boundaries stack at some values because some constraint, in such case min-max objective will NOT even try to optimize developers between boundaries.

Example: we have 4 developers and fair distribution of on-call hours would be 42 total hours per week for everyone [42,42,42,42], objective here equals to 0 because maximum and minimum are the same. Let’s assume we have only 1 developer in Asia timezone, so only she could cover some number of hours (let’s say 84). Now fair distribution would look like [84,28,28,28]. The model cannot decrease maximum but still assigns to all other developers equal number of hours. So far so good… but one developer is spending a week in Bahamas without internet. Now min(total_hours) equals to 0. So distributions [84,84,0,0] and [84,42,42,0] have the same objective equal to 84 but different fairness. Our objective does not represent fairness anymore.

Instead, we can use more complicated objective expression that measures the difference between all combinations of DEVELOPERS:

Again there is new syntax, but it’s slightly different list comprehension with an aggregation function on top. We iterate for all unique DEVELOPERS pairs and for each pair, calculate the absolute difference. As the last step, we sum all such differences.

Only one line left, to ask to minimize the sum of absolute differences:

Model with absolute difference objective in MiniZinc Playground or on GitHub

What happens if no developers can cover some hours? Remember that assignment is defined as **array[HOURS] of var DEVELOPERS: assignment;** it **requires** a developer for every hour, otherwise MiniZinc will report an error:

The solution is to allow some values to have a **nil** value. MiniZinc natively supports optional types, but I prefer to use 0 to represent optional value (because some solvers don’t have optional type; plus for future performance optimization I prefer model the problem as low-level as possible).

We need define a new range **DEVELOPERS0** which includes an extra 0, and use it as a possible values of **assignment** array for every hour.

This actually is not enough. The objective tries to minimize the difference, and the easiest optimal result is to set all on-call hours to 0, so the objective will be 0 too. Instant optimal solution…

One way to fix this is to add a constraint: if somebody is available at some hour, then an hour should not be 0.

Another complexity with discrete optimization is that you should understand what type of solver will be used before writing the model. By default, MiniZinc uses Gecode which is a smart depth-first search. It’s good enough for many models, but not for this one. Here decision tree is wide and our objective function cannot direct the search of Gecode.

Instead, from the beginning, I specifically optimized the model for Mixed Integer Programming. By optimize, I mean I tried to use boolean variables and array representation instead of sets. On the command line, we need to specify the solver to use. I use Coin-bc, a great open-sourced mixed integer programming solver included by default in MiniZinc:

Coin-bc returns the result faster for the model and in many cases proves optimal solution.

Notice how Gecode solver finds an optimal solution in a few seconds, and then still tries to improve. Coin-bc finds the solution slightly faster and proves optimality. This doesn’t mean that Coin-bc is better than Gecode - both have strengths and weaknesses. To properly explain the difference we need to dig into constraint-based algorithms(Gecode) and mixed integer programming(Coin-bc) first, and it’s too much for this article. We actually use both in production. Gecode finds first initial solution, then Coin-bc optimizes it.

Model with absolute difference objective in MiniZinc Playground (Playground only supports Gecode solver) or on GitHub

I found online course Discrete Optimization at the right time for me. I took time off between 2 jobs and was going to just spend a few hours to watch the course, but the leaderboard made some adjustment. I spent about 2 months full-time on solving the problems, and eventually I managed get an A on all of them. The hardest and most rewarding course in my life. Thanks to Prof. Pascal Van Hentenryck and Dr. Carleton Coffrin!

To start in this field, I’d suggest 4 online courses in order:

- Discrete Optimization from Prof. Pascal Van Hentenryck and Dr. Carleton Coffrin.
- Basic Modeling for Discrete Optimization from Prof. Peter James Stuckey and Prof. Jimmy Ho Man Lee.
- Advanced Modeling for Discrete Optimization from Prof. Peter James Stuckey and Prof. Jimmy Ho Man Lee.
- Solving Algorithms for Discrete Optimization from Prof. Peter James Stuckey and Prof. Jimmy Ho Man Lee.

Don’t forget to check out the full MiniZinc tutorial (pdf). It covers basic syntax upto advanced topics like search annotation, best modelling practices, and so on with many examples.