Data Collection Update

We have changed how we are reporting testing and case rates on the Data by State page. Previously, we used the 2018 year end prison population numbers released April 30, 2020 from the Bureau of Justice Statistics for the denominator. Since these data are old, and there have been recent changes to prison populations, we are now using more recent population data collected by the Vera Institute of Justice. For all states, minus the exceptions below, the population data are for April 30/May 1, 2020. For the following states we use population data for March 31, 2020: Montana, South Dakota, Tennessee, and Washington. For the following states we use population data for December 31, 2019: Illinois, Maryland, Minnesota, Montana, New Mexico, and Virginia. These represent the most recent data available.

Our testing rates represent the number of tests per population. It is possible that some states are testing the same person more than once. Only one state is reporting the number of people tested separately from the number of tests given. As an example, Michigan has now reported more tests for COVID-19 (37,885) than there are people currently incarcerated in their prisons (36,980), likely due to releasing people from prison, widespread testing practices (including repeat testing), and deaths.

Last updated June 4, 2020.

Data Collection Methods

In this blog post, we detail how we are collecting data and how our data may be different from other sources. If you still have questions after viewing this blog post, please contact us.

How are you collecting data?

We are collecting data manually. Every day of the week (Sun thru Sat) a team member checks the website for all 52 prison systems (50 states, Federal BOP, ICE). As of now, we are not including media reports, which may have more accurate and timely data. The links to the source websites where we are finding our data are posted on the state data page of our website.

We enter the data into a spreadsheet, aggregate it, and then post it to the website. There is one exception for the state of New Mexico. They have not yet started reporting their data regularly on their website. So, our staff called them on May 4 to ask for their data. The “last updated” date is not the date that the latest data was released by the institution but is the latest date that we checked for data on their website. Some systems are releasing data every day, some every business day, and some every few days. Because institutions sometimes change the way they report data, we check them all every day. We are working on automating this process using web scraping, but we will continue to check the data manually due to some caveats, which are outlined below.

What kind of data are you collecting?

We are collecting everything! If a system is reporting it, we are recording it. We are purposeful in the way that we aggregate data for public presentation. Every system is reporting data in different ways, so we had to make some decisions in order to standardize the data. For example, some systems are reporting confirmed COVID-19 deaths, suspected COVID-19 deaths, COVID-19 deaths pending autopsy, and or confirmed COVID-19 deaths with underlying causes. Some systems only report COVID-19 “deaths” without any further information. As of now, we aggregate all COVID-19 related deaths. As another example some department of corrections include probation and parole and report numbers of cases in that population too. We have intentionally decided to exclude those case counts in our totals for now. We are still collecting it, though, and will include these numbers later when we have more capacity to report publicly all the data we have collected.  We are in the process of editing our data dictionary for public use and will post it soon. The data dictionary includes case definitions and all the data peculiarities for each institution.

What are the strengths and limitations of your data?

All data has strengths and limitations, and it is important to understand these when using data. We have been collecting data at the facility level for every state and the BOP every day since April 22. We have some data for some institutions prior to April 22, but it is not consistent. Since our data is collected manually, we are able to set criteria and exclude/include cases based on that criteria. We are also able to see when institutions change the way they are reporting data and correct it in our files. For example, some states changed their reporting of COVID-19 cases so that they are no longer cumulative. We have to use our historic data and the new data they present to make sure that we are presenting cumulative counts. These are some of the strengths of our data. However, relying on humans to do this work, rather than computers, also creates some limitations. Humans make typos and other errors. We do have some redundancy built into our process to catch errors as soon as possible.

How is your data different then other collections?

Each dataset on COVID-19 and corrections will have some differences. For example, the Marshall Project also collected data on COVID-19 and corrections. Their methodology was very different from ours, however. They called the institutions once a week to collect cumulative data over the phone. They also only collected data for a finite period of time, rather than continuously. It might be good practice to use multiple data sources to triangulate your findings. We too are working to see if there are ways for us to combine our data with the Marshall Project to make more robust the complete data picture here at the COVID Prison Project.

Last updated May 19, 2020.


This project was launched by a group of interdisciplinary scientists who work at the intersection of public health and criminal justice. We hope to fill a major gap in how COVID-19 in correctional settings is reported, tracked, and analyzed. 

The risk posed by infectious diseases in prisons and jails is significantly higher than in the community, both in terms of risk of transmission, exposure, and harm to individuals who become infected. This is true of other group living facilities, such as nursing homes and assisted living facilities. In correctional facilities, risk is driven by close-quarter unsanitary living conditions and limited access to hygiene products. Moreover, people in prisons tend to have a higher burden of health conditions that may make them more susceptible to COVID-19, because they overwhelmingly come from our most marginalized neighborhoods.

Some jails and prisons have worked to change policy to release people and modify their policies on the inside to prevent disease transmission. Some have not. Regardless of these actions, many jails and prison across the country have become the epi-center of the COVID-19 pandemic in America, with outbreaks occurring in large urban jails (e.g. Riker’s Island, Cook County Jail) and prison facilities (in Ohio, North Carolina, Michigan and other states).  

While we are reporting these data, it is important to keep in mind a few things. First, these data are preliminary. Just like reporting by departments of public health, it will likely be months (or years) before we have a complete set of valid data. Still, it is important to monitor these trends as they develop over time. Second, we caution against comparing the number of cases and the number of tests within departments of correction across states. States vary in the type of information they are reporting. The number of cases directly correlates to the number of tests, meaning that states that are doing more testing will have a higher number of positive cases. Testing guidelines for prisons are likely shaped by other state agencies (e.g., departments of public health). And, the inmate population size and correctional management strategies (e.g., inmate housing) vary by state. We think it is more useful to compare what is happening in prisons to what is happening in the community within their state. Third, it is important to keep in mind that there is a natural lag in reporting: from the time someone is tested, to when results are received, to when they are reported internally, and, finally, to when they are reported publicly. Our data come from the information that is publicly reported by departments of correction. The most common way that states are reporting information about prisons is through their website, but this is not always the case. 

Over the coming weeks, we hope to grow our website and make it more interactive. So please check back often for new content. If you are a prison administrator, and you have more accurate data that you want to share with us, please contact us! We also welcome ideas and questions from viewers.