New Approach to Statistical Analysis of Election Results

: In this paper, a new method of detection of election fraud is proposed. This method is based on the calculation of the ratio of two standard normal random variables; estimation of parameters of obtained sample and comparison of these estimates with known theoretical values of parameters. Also in the paper, there is an example of the application of the described method.


Introduction
Elections are an important and interesting example of the practical implementation of the random behavior of human society. After all, the results of elections depend not only on, generally speaking, random and variable support for candidates, but also to a certain extent on the -also random -turnout of voters, which, in the case of approximately equal support for candidates, becomes decisive.
In fact, election processes in many countries, Ukraine and the Russian Federation in particular show that electoral fraud can also have a noticeable or (in the case of the Russian Federation) decisive influence on the election results -from relatively legal (such technologies as "black PR", motivation "own" voters to visit the polling stations, demotivating "other" voters or presenting "own" candidate as the "lesser evil" for those voters who are undecided) to direct falsifications (casting ballots for "own" candidates, spoiling ballots for candidates rivals, falsification of final protocols).
The implementation of legal election technologies (the motivation of "own" voters or demotivation of "other" voters through social networks and contextual political advertising as a tool for such motivation in particular) could be observed in recent election processes in such stable democracies as Great Britain (the 2016 United Kingdom European Union withdrawal membership referendum; the technologies used on this referendum are described in the film "Brexit: The Uncivil War") and the USA (the 2016 United States presidential election when Donald Trump was elected President; the technologies used on these both campaign are described in the film "The Great Hack").
Direct falsifications of election results could be observed, according to the conclusions of international independent observers, in several elections in Ukraine (in particular, the presidential elections of 2004, when massive falsifications [1] in favor of the pro-government candidate Victor Yanukovych led to mass national protests, which were called "Orange Revolution" and the re-voting of the decisive II round, which led to the victory of the opposition candidate Victor Yushchenko) and in fact in all election processes in the Russian Federation for at least the last 20 years.
Several approaches have been proposed by researchers to detect possible electoral fraud [2].
First, the authors look at the frequency distribution of turnout or candidate's results across electoral districts. If the data are genuine, we would typically expect this distribution to be normal. Deviations from normality -such as bimodality or a substantial Ivan H. Krykun 2 of 9 tail of districts with very high turnouts -give cause for suspicion. Also on such frequency distribution of turnout is noticeable increasing of round numbers (such as, e.g., multiples of 5 or 10) [3].
Second, the authors look at how each candidate's share of the total electorate correlates with the turnout. If voting figures are reported accurately, we would expect a candidate's share of the electorate to increase with turnout in rough proportion to his or her overall vote share: if turnout increases by 100 votes, we would expect a candidate who scores 50% of the vote overall to pick up around 50 of these votes. If, by contrast, turnout is inflated by the casting of ballot boxes, the proportion of additional votes captured by the favored candidate will be far higher.
The third approach is the analysis of the ratio of votes cast by candidates/parties, depending on the turnout. It was first used by Kiesling [4] to analyze elections in Armenia. The method has proven itself useful in Russia to identify fraud in favor of the ruling party and its candidates [5,6].
Another approach [7,8] is the use of Benford's law (or the first-digit law) to compare the obtained frequency of appearance of the first (or second) digit in the final election documentation with the theoretical distribution.
A more detailed review of statistical approaches can be found in [2]. However, all proposed statistical approaches to the assessment of election fraud are subject to legitimate criticism, because the deviation of the empirical frequency distribution from the normal distribution can be caused by other reasons (for example, regional differences in voter activity or support for certain candidates).
This paper proposes a new approach to assessing the presence of election fraud. The main idea is to construct, after certain processing of statistical data, an arctangent regression of the obtained data and estimate the parameters of this arctangent regression as parameters of the Cauchy distribution according to the results by Krykun [9,10].

Main result
We recall [11,12] that the Cauchy distribution with parameters α and γ (where γ>0) has probability density function and cumulative distribution function, respectively, The parameters α and γ are called, respectively, a location parameter and a scale one. It is well known [11,12] that the Cauchy distribution has neither mathematical expectations nor moments of higher orders. The Cauchy distribution belongs to the distributions with "heavy tails", for which the law of large numbers does not hold. Due to these properties of the Cauchy distribution, it is not possible to estimate its parameters by standard methods (method of moments or maximum likelihood estimation), several different methods have been proposed for this.
Based on the fact that the cumulative distribution function for the Cauchy distribution is the arctangent function, Krykun in [10] (Prop. 2) proposed a simple approach to estimate the unknown parameters of the Cauchy distribution, based on the use of empirical arctangent regression.
One more well-known result [11,12] is that the ratio between two standard normal random variables with parameters 0 and 1 is a random variable having a Cauchy distribution with parameters 0 and 1. On the other hand, many electoral indicators (turnout, percentage of votes for certain candidates, percentage of spoiled ballots or votes "against all") are plausible to consider as having random deviations distributed according to a normal distribution. Thus, if we find the mathematical expectation and variance of these quantities, we can normalize them. Proposition 1. After normalizing two selected election indicators, we calculate their ratio. Next, we will perform statistical processing of the obtained data according to [10] (Prop. 3 and Lemma 1) and obtain empirical estimates of the distribution parameters. Further, we will compare the obtained estimates with the theoretical parameters of the Cauchy distribution (that is, with the values 0 and 1) and, in case of noticeable differences, try to estimate the probability of such a deviation of the parameters from the theoretical ones and draw statistical conclusions about the plausibility of the election result.

Remark 1.
In order to see the picture better, it is advisable to separate a set of polling stations, if there exists evidence of mass violations of the same type (for example, casting ballots for "own" candidate or spoiling ballots for an "other" candidates) and analyze a set of data from these stations.

Remark 2.
If there are reasons to believe that the increase in turnout was massively falsified (because of casting in ballots or multiple votes), then it is advisable to calculate the final election indicators (turnout to poll, votes against all and invalid ballots) without data from such electorates precincts.

Background
During the Ukrainian presidential elections in 2004, in Donetsk region were formed 23 constituencies and 13 constituencies were formed in the Luhansk region -a total of 35 constituencies (from the total number of 225 constituencies and foreign electoral district). Donetsk and Luhansk regions in those elections were the basis for the pro-government candidate Victor Yanukovych, who in these regions won 39,5% of all votes in his support, namely 4,35 million out of 11,0 million, according to the results of the first round [13].
At these elections, observers recorded mass violations in favor of the governmentsupported candidate, the notorious political figure Victor Yanukovych. These violations were also recorded during the 1st round, and especially during the 2nd round of elections, which led to mass protests in November-December 2004 ("Orange Revolution") and the re-voting of the decisive 2nd round, which took place 12/26/04. The falsifications were especially large-scale in Donetsk and Luhansk regions, and the Table 1 on page 6 is highly indicative (data from [13]). Also large-scale falsifications in Donetsk and Luhansk regions is clearly evidenced by the statistical results of these three rounds of voting which one can see on Figures 1-3 below (here PEC = precinct election commission).   [14] According to these histograms, we have absolutely impossible turnout on the 2nd round (indicates total falsifications at the PEC), we have a bimodal distribution on the histogram of the 1st round and several turnout indicators in excess of 100% (it indicates massive falsifications at the PEC) and a fairly plausible picture on re-voting of the 2nd round (a certain overestimation of the number of PECs with a high turnout is caused by the presence of special precincts -hospitals, military units, prisons with a small number of voters and a high turnout rate).
The fantastic turnout during the 2nd round of elections in the entire Donetsk region (look at Table 1 above) of more than 100% (from the number of voters on the list of the 1st round) only confirms not just the mass, but the totality of falsifications!

Estimation of parameters
Therefore, in the 1st round, we can assume the presence of falsifications associated with an increase in voter turnout (due to multiple voting or direct ballot casting). Therefore, we will choose 2 indicators for the analysis -the percentage of voter turnout (which may be overstated) and the percentage of votes against all (which may be understated, since the falsifiers probably added ballots for "their" candidate, and not against all).
Further, in order to avoid the influence of inflated turnout, we will calculate the statistical indicators of the 1st round of this election without the votes from Donetsk and Luhansk regions. We have (calculations by the author according to official data [13,15] :  Later we calculate their ratio according to the formula For the obtained sample , we apply the procedure of [10] (Prop. 2) and use [10] (eq. (10), (11)). In addition, for reasons [10] (Prop. 3) we reject some quantity of largest and smallest values from the sample .
After the specified data processing, we will get a Table 3 of parameter estimates.

Plausibility of the election results
Therefore, the parameter is γ close to the theoretical value equal to 1, but the parameter α is significantly different from the theoretical value equal to 0.
Let's consider a sample, which consists of observations of the realization of a random variable distributed by the Cauchy law with an unknown parameter α and a known parameter γ. We know the mean of the sample = 0,1965 and suppose for simplicity, that the parameter γ is known. Further, we calculate the theoretical probability that the parameter α is contained in the extended, compare to the obtained estimates, interval, for example in the interval [−1,1; −0,95].
For this purpose we will use such property of the Cauchy distribution [11,12]: for the sample { }, distributed by the Cauchy law with the parameters α and γ, the variable = 1 ∑ =1 is also distributed by the Cauchy law with the same parameters.
So we obtain (for = 1) Therefore, one can say that the results in Donetsk and Luhansk regions during the 1st round of these elections are extremely unlikely, which gives ground for asserting the presence of significant fraud.

General conclusions and prospects for further research
The main advantages of the proposed method are the possibility of using such indicators of an election that are difficult for falsifiers to predict, such as the number of spoiled ballots, the number of votes "against all" and taking into account such a statistical indicator as the variance of election data. It is also possible to use the usual statistical indicators -turnout, and support for certain candidates.
The author assumes that with a rational (that is, taking into account the specifics of possible violations) pairwise analysis of the mentioned indicators of the election process, a noticeable deviation of the distribution of the ratio of these values from the Cauchy distribution with parameters 0 and 1 gives reason to conclude the presence of election fraud.
In addition, for a numerical assessment of the influence of falsifications on the election results, the author supposes it appropriate to use the Cauchy distribution curve instead of a normal distribution one, because the Cauchy distribution as a distribution with "heavy tails", can better take into account the specifics of election processes (for example, close to 100 % turnout in special precincts -hospitals, prisons, military units, or greatly increased support for the candidate in his hometown).