Elections are an important and interesting example of the practical implementation of the random behavior of human society. After all, the results of elections depend not only on, generally speaking, random and variable support for candidates, but also to a certain extent on the – also random – turnout of voters, which, in the case of approximately equal support for candidates, becomes decisive.
In fact, election processes in many countries, Ukraine and the Russian Federation in particular show that electoral fraud can also have a noticeable or (in the case of the Russian Federation) decisive influence on the election results – from relatively legal (such technologies as "black PR", motivation "own" voters to visit the polling stations, demotivating "other" voters or presenting "own" candidate as the "lesser evil" for those voters who are undecided) to direct falsifications (casting ballots for "own" candidates, spoiling ballots for candidates rivals, falsification of final protocols).
The implementation of legal election technologies (the motivation of "own" voters or demotivation of "other" voters through social networks and contextual political advertising as a tool for such motivation in particular) could be observed in recent election processes in such stable democracies as Great Britain (the 2016 United Kingdom European Union withdrawal membership referendum; the technologies used on this referendum are described in the film "Brexit: The Uncivil War") and the USA (the 2016 United States presidential election when Donald Trump was elected President; the technologies used on these both campaign are described in the film "The Great Hack").
Direct falsifications of election results could be observed, according to the conclusions of international independent observers, in several elections in Ukraine (in particular, the presidential elections of 2004, when massive falsifications [
Several approaches have been proposed by researchers to detect possible electoral fraud [
First, the authors look at the frequency distribution of turnout or candidate’s results across electoral districts. If the data are genuine, we would typically expect this distribution to be normal. Deviations from normality – such as bimodality or a substantial tail of districts with very high turnouts – give cause for suspicion. Also on such frequency distribution of turnout is noticeable increasing of round numbers (such as, e.g., multiples of 5 or 10) [
Second, the authors look at how each candidate’s share of the total electorate correlates with the turnout. If voting figures are reported accurately, we would expect a candidate’s share of the electorate to increase with turnout in rough proportion to his or her overall vote share: if turnout increases by 100 votes, we would expect a candidate who scores 50% of the vote overall to pick up around 50 of these votes. If, by contrast, turnout is inflated by the casting of ballot boxes, the proportion of additional votes captured by the favored candidate will be far higher.
The third approach is the analysis of the ratio of votes cast by candidates/parties, depending on the turnout. It was first used by Kiesling [
Another approach [
A more detailed review of statistical approaches can be found in [
However, all proposed statistical approaches to the assessment of election fraud are subject to legitimate criticism, because the deviation of the empirical frequency distribution from the normal distribution can be caused by other reasons (for example, regional differences in voter activity or support for certain candidates).
This paper proposes a new approach to assessing the presence of election fraud. The main idea is to construct, after certain processing of statistical data, an arctangent regression of the obtained data and estimate the parameters of this arctangent regression as parameters of the Cauchy distribution according to the results by Krykun [
We recall [
and cumulative distribution function, respectively,
The parameters α and γ are called, respectively, a location parameter and a scale one.
It is well known [
Based on the fact that the cumulative distribution function for the Cauchy distribution is the arctangent function, Krykun in [
One more well-known result [
On the other hand, many electoral indicators (turnout, percentage of votes for certain candidates, percentage of spoiled ballots or votes "against all") are plausible to consider as having random deviations distributed according to a normal distribution. Thus, if we find the mathematical expectation and variance of these quantities, we can normalize them.
During the Ukrainian presidential elections in 2004, in Donetsk region were formed 23 constituencies and 13 constituencies were formed in the Luhansk region – a total of 35 constituencies (from the total number of 225 constituencies and foreign electoral district). Donetsk and Luhansk regions in those elections were the basis for the pro-government candidate Victor Yanukovych, who in these regions won 39,5% of all votes in his support, namely 4,35 million out of 11,0 million, according to the results of the first round [
At these elections, observers recorded mass violations in favor of the government-supported candidate, the notorious politicalFigure
Turnout in three rounds of that election.
| 1^{st} round, number of registered voters | Turnout, million and % from the 1^{st} round list | ||
| 1^{st} round | 2^{nd} round | Re-voting of 2^{nd}?round | |
Donetsk region | 3,685 million | 2,878 million | 3,712 million | 3,144 million |
78,1 % | 100,7 % | 85,3 % | ||
Luhansk region | 1,946 million | 1,472 million | 1,754 million | 1,638 million |
75,6 % | 90,1 % | 84,2 % |
Also large-scale falsifications in Donetsk and Luhansk regions is clearly evidenced by the statistical results of these three rounds of voting which one can see onFigure
Turnout to vote and number of PECs on 1st round [14]
Turnout to vote and number of PECs on 2nd round [14]
Turnout to vote and number of PECs on re-voting of 2nd round [14]
According to these histograms, we have absolutely impossible turnout on the 2nd round (indicates total falsifications at the PEC), we have a bimodal distribution on the histogram of the 1st round and several turnout indicators in excess of 100% (it indicates massive falsifications at the PEC) and a fairly plausible picture on re-voting of the 2nd round (a certain overestimation of the number of PECs with a high turnout is caused by the presence of special precincts – hospitals, military units, prisons with a small number of voters and a high turnout rate).
The fantastic turnout during the 2nd round of elections in the entire Donetsk region (look atTable
Therefore, in the 1st round, we can assume the presence of falsifications associated with an increase in voter turnout (due to multiple voting or direct ballot casting). Therefore, we will choose 2 indicators for the analysis – the percentage of voter turnout (which may be overstated) and the percentage of votes against all (which may be understated, since the falsifiers probably added ballots for "their" candidate, and not against all).
Further, in order to avoid the influence of inflated turnout, we will calculate the statistical indicators of the 1st round of this election without the votes from Donetsk and Luhansk regions. We have (calculations by the author according to official data [
Mean and variance of percent of turnout and votes against all on 1^{st} round of this election without the votes from Donetsk and Luhansk regions.
| Turnout | Votes against all |
Mean | 74,502 % | 2,027 % |
Variance | 46,376 sq. % | 1,857 sq. % |
σ | 6,810 % | 1,363 % |
Next, we normalize and centre the final election indicators of 1^{st} round of this election such as turnout and votes against all in 35 constituencies in Donetsk and Luhansk regions:
Later we calculate their ratio according to the formula
For the obtained sample ${x}_{i}$, we apply the procedure of [
After the specified data processing, we will get aTable
Estimates of parameters.
Estimate | Reject of | |||
1 item | 3 items | 7 items | 9 items | |
α | -0,9371 | -1,0211 | -1,0366 | -1,0281 |
γ | 0,7564 | 1,0249 | 1,1813 | 1,2596 |
So, we can see that the point estimate of the location parameter α is quite stable and is within $\left[\u20131,04;-1,02\right]$, and the point estimate of the scale parameter γ increases with decreasing sample size and is within $\left[1,02;1,26\right]$.
Therefore, the parameter is γ close to the theoretical value equal to 1, but the parameter α is significantly different from the theoretical value equal to 0.
Let’s consider a sample, which consists of observations of the realization of a random variable distributed by the Cauchy law with an unknown parameter α and a known parameter γ. We know the mean of the sample $\overline{x}=0,1965$ and suppose for simplicity, that the parameter γ is known. Further, we calculate the theoretical probability that the parameter α is contained in the extended, compare to the obtained estimates, interval, for example in the interval $\left[-1,1;-0,95\right]$.
For this purpose we will use such property of the Cauchy distribution [
So we obtain (for $\gamma =1$)
Let’s compare result above with result for $\gamma =1,26$:
And two more probabilities (for $\gamma =1$):
and for $\gamma =1,26$
Therefore, one can say that the results in Donetsk and Luhansk regions during the 1st round of these elections are extremely unlikely, which gives ground for asserting the presence of significant fraud.
The main advantages of the proposed method are the possibility of using such indicators of an election that are difficult for falsifiers to predict, such as the number of spoiled ballots, the number of votes "against all" and taking into account such a statistical indicator as the variance of election data. It is also possible to use the usual statistical indicators – turnout, and support for certain candidates.
The author assumes that with a rational (that is, taking into account the specifics of possible violations) pairwise analysis of the mentioned indicators of the election process, a noticeable deviation of the distribution of the ratio of these values from the Cauchy distribution with parameters 0 and 1 gives reason to conclude the presence of election fraud.
In addition, for a numerical assessment of the influence of falsifications on the election results, the author supposes it appropriate to use the Cauchy distribution curve instead of a normal distribution one, because the Cauchy distribution as a distribution with "heavy tails", can better take into account the specifics of election processes (for example, close to 100 % turnout in special precincts – hospitals, prisons, military units, or greatly increased support for the candidate in his hometown).
The proposed model will be enhanced using recent studies of random processes [
The author also gratefully acknowledges the project "Electoral Memory" (https://ukr.vote) of the public organization "Ukrainian Center for Social Data" (https://socialdata.org.ua) and personally Mr. Serhij Vasylchenko, as well as Mr. Roman Udot, Co-chairman of the Board of The Movement for the Defense of Voters' Rights "Golos" for the help with the finding of statistical data.