Verifiable sortition

In previous posts I have highlighted the role of sortition (selection by lot) as a means of improving democracy. In this post I will focus on the mathematics of how to arrange systems that are (hopefully) difficult to game. The first system is computerized while the second system is considerably more low-tech, relying mostly on dice rolls.

The are two assumptions necessary to make these systems work: that there exists a list of the people eligible and willing for the sortition, and that each such person appears exactly once on the list.

Call the number of people on the list n and the number of people to choose k. The number of combinations is M=(nk) and the total amount of entropy is E=log2M . To draw for example a body of 1001 seats from 10 billion people requires 24713.1 bits of entropy. Drawing candidates one-by-one requires somewhat more entropy: log2(n!/(n-k)!) = 33252.5 bits.

Gathering entropy

The main problem is gathering the necessary amount of entropy. Specifically the system must come up with a number N between 0 and (nk)-1 in a way that is not biased. Luckily this is quite straightforward: combine entropy from multiple sources using modular addition. As long as at least one entropy source is unbiased then the resulting number will also be unbiased. Participants must also be prevented from seeing the other submissions.

When the sortition happens, the process should be broadcast to encourage public scrutiny. Various groups should take part in the process, each coming up with their own N and submitting it to the system within say a day. Universities and schools could be involved, or just any body of people concerned with the democratic process. Each N should then be encrypted, thus "sealing" them, and submitted to some central server. After some time, say a day, decryption keys should be provided, with a deadline of say one week. Submissions that don't meet the deadline are discarded. Things should be arranged so that anyone can download the relevant files and re-run the sortition program, thus verifying the results.

Further inspiration can be had from ICANN's key signing ceremonies.

Computing winners

The act of turning N into an assignment of seats in the body is a combinatorial operation known as unranking. Wikipedia's article on combinatorial number systems describes an overview of the process. Each ci can be found by binary search. This means evaluating O(klogn) binomial coefficients, each one no larger than M. Each coefficient takes no longer than O(k2(logn)2) time to compute, using long multiplication. Thus the entire process takes at most O(k3(logn)3)=O˜(k3) time. With FFT multiplication the bound shrinks to O˜(k2) .

Lower-tech methods

The method described above may be too high-tech to count as sufficiently scrutable. The simplest method is likely printing paper slips of every person on the list of candidates and drawing them out of a hat on live TV. This suffers from the risk of some names not having been printed. There is also that if each slip is the size of a grain of rice then the hat must have a volume of 750 m³.

Another way is rolling dice. Picking candidates from a pool of 10,000,000,000 using 6-sided dice (d6) requires log61010=13 rolls each, or 5 rolls if using d100's. Using d10's or d100's has the benefit of being easier to understand compared to working in base-6. This is more dice rolls than necessary for computing N, 5005 rolls vs 2730 rolls if using d100's. Once selected, each candidate is stricken off the list.

Some care needs to be taken how to convert the generated numbers into candidates, to avoid bias. First the result of each d100 should be reduced mod 100. The result of 5 rolls is a number between 0 and 9,999,999,999. If there are 9,876,543,210 candidates then values between 9,876,543,210 and 9,999,999,999 means another roll needs to be performed, yielding a number between 9,876,543,210 and 999,999,999,999. Of these only values between 997,530,864,211 and 999,999,999,999 cannot be reduced mod 9,876,543,210 without bias. This process terminates quickly with high likelyhood, and only requires basic knowledge of modular arithmetic to understand. If a candidate that has already been selected is selected in this process then another dice can simply be rolled.

Every candidate can watch/listen to the proceedings via TV, their number in the lot being printed on their "ballot" and verifiable by what's going on in the broadcast. This could be an excellent opportunity to bring on mathematics communicators like Matt Parker, to make the whole thing more interesting. Printed tables can be prepared in advance to assist in the necessary long divisions, including upper limits for when extra rolls are necessary. Performing the computations by hand should instill more confidence in the results, computers only being used to verify the calculations.

An even simpler method would be to re-roll all dice until a number less than n is arrived at. Note that it is not sufficient to re-roll only dice that cause the result to be greater than or equal to n. Doing so introduces a bias. All dice must be re-rolled. For example if n is 9801 then for the first die only numbers in the range 0-98 would be accepted. This heavily biases the result towards the lot with number 9800 because that is the only one that starts with 98. This means lot #9800 has probability 1/99 whereas lots #0000 to #9799 each only have probability 1/9900.

Then what?

Selecting what set of candidates should make up a body is only part of the problem. How should this new set of people actually be integrated into the body? Should there be overlapping sortitions so that only part of the body is replaced at a time? Should there be mandatory education as part of the process, to learn the ropes? These things may likely be different depending on the responsibilities of the body. Local bodies may have lower requirements than say one that decides on questions that are of global concern.