## DEMOCRACY THREAT INDEX

The Democracy Threat Index summarizes democracy experts' ratings of threats to American democracy into a single, daily-updated score. We begin by surveying a randomly selected sub-sample of experts each weekday, with individual respondents contacted once every two months. As described here, they rate six sub-components on their current levels of threat to democracy on a 1-5 scale. We discard responses that repeat IP addresses (to prevent re-voting), report 5s on every sub-component, or that omit two or more sub-component scores. For responses that omit exactly one sub-component, we impute the missing value using a regression on the remaining sub-components. We then average the sub-components to produce an individual Democracy Threat score and re-scale this to 0-100.

Our Democracy Threat Index is a weighted average of these individual Democracy Threat scores over the prior 60 days. The core of our method is weighting a response by time, with today's responses getting weight 1 and the weight declining by the number of days since a response was given (∆Days). We use the following simple formula:

Weight = 1/(1+ln(1+∆Days))

where ln(·) is the natural log. For instance, a response from 1 day ago gets weight 0.59, whereas a response from 60 days ago gets weight 0.19. This technique allows the weighted average to respond to new information without over-relying on small response samples. As a final step, the weights are scaled to sum to 1 and a weighted average is constructed.

The shortcoming of exclusively relying on this rolling average is that it will adjust slowly if a major event happens and scores rapidly change. We thus apply a technique to test for discrete breaks in the survey responses and then reweight the Index if an event is detected. This serves the dual purposes of deriving a more accurate daily survey and identifying significant events. To be clear, our technique is entirely data-driven and does not rely on anyone supplying candidate events.

Our method of event testing is a novel version of "changepoint analysis" adapted to the rolling survey and relatively small sample. The central challenge is to simultaneously identify whether a break has occurred and when. Each day, we examine the previous 12 days in the sample. For each of these 12 days, we calculate two quantities: (1) the significance of the difference between the weighted Democracy Threat Index up to the prior day and the simple average of individual Democracy Threat scores after, and (2) the summed square of errors between the individual scores and the weighted average if we assume an event occurred that day. An event is confirmed on a specific day if the difference in (1) is significant at the p < 0.005 level, the value in (2) minimizes the summed square of errors, and there are at least 8 observations after the day.

This fairly conservative test requires that both a break is detected and the determination of an event maximizes model fit. If an event is detected, we reweight all responses prior to the event. Although we could weight them 0, this is too extreme as it ignores all prior information and would sharply limit the sample size. Instead, we again let the data speak. The reweighting is based on the magnitude of the break (relative to the data's prior variance) from before to after the event. Specifically, if Break is the magnitude of the shift and Noise is the standard error of individual deviations from the Index prior to the event, we reweight as:

Weight(Prior to Event) = (1-Break/(Noise+Break))^2

In constructing the rolling average, this is multiplied by the weight due to time elapsed.

Simulations: To analyze our method's performance, we tested simulated response data. For 100 "days," we first determined the number of responses by generating 12 binomial variables with a 0.2 probability of 1. For each 1, we then generated a normal variable with mean 30 and standard deviation 10 (close to our observed data). This produced a 100-day simulated sample with no event. We then applied our averaging and event-detection technique to each day in succession. This simulation process was repeated 300 times. We found a false positive rate for events of only 1.30% per day.

To gauge the technique's sensitivity, we then repeated this simulation with a break in day 50 of varying magnitudes. For each size shift, we repeated this 100 times. The results are encouraging: If we define successful detection as an event being identified somewhere between days 45 and 55, we get a 28% success rate for a shift of 2.5, 53% for 5, 79% for 7.5, and 96% for 10. Thus, the technique will primarily pick up substantively large shifts.