Data Collection Methods

In this blog post, we detail how we are collecting data and how our data may be different from other sources. If you still have questions after viewing this blog post, please contact us.

How are you collecting data?

We are collecting data manually. Every day of the week (Sun thru Sat) a team member checks the website for all 52 prison systems (50 states, Federal BOP, ICE). As of now, we are not including media reports, which may have more accurate and timely data. The links to the source websites where we are finding our data are posted on the state data page of our website.

We enter the data into a spreadsheet, aggregate it, and then post it to the website. There is one exception for the state of New Mexico. They have not yet started reporting their data regularly on their website. So, our staff called them on May 4 to ask for their data. The “last updated” date is not the date that the latest data was released by the institution but is the latest date that we checked for data on their website. Some systems are releasing data every day, some every business day, and some every few days. Because institutions sometimes change the way they report data, we check them all every day. We are working on automating this process using web scraping, but we will continue to check the data manually due to some caveats, which are outlined below.

What kind of data are you collecting?

We are collecting everything! If a system is reporting it, we are recording it. We are purposeful in the way that we aggregate data for public presentation. Every system is reporting data in different ways, so we had to make some decisions in order to standardize the data. For example, some systems are reporting confirmed COVID-19 deaths, suspected COVID-19 deaths, COVID-19 deaths pending autopsy, and or confirmed COVID-19 deaths with underlying causes. Some systems only report COVID-19 “deaths” without any further information. As of now, we aggregate all COVID-19 related deaths. As another example some department of corrections include probation and parole and report numbers of cases in that population too. We have intentionally decided to exclude those case counts in our totals for now. We are still collecting it, though, and will include these numbers later when we have more capacity to report publicly all the data we have collected.  We are in the process of editing our data dictionary for public use and will post it soon. The data dictionary includes case definitions and all the data peculiarities for each institution.

What are the strengths and limitations of your data?

All data has strengths and limitations, and it is important to understand these when using data. We have been collecting data at the facility level for every state and the BOP every day since April 22. We have some data for some institutions prior to April 22, but it is not consistent. Since our data is collected manually, we are able to set criteria and exclude/include cases based on that criteria. We are also able to see when institutions change the way they are reporting data and correct it in our files. For example, some states changed their reporting of COVID-19 cases so that they are no longer cumulative. We have to use our historic data and the new data they present to make sure that we are presenting cumulative counts. These are some of the strengths of our data. However, relying on humans to do this work, rather than computers, also creates some limitations. Humans make typos and other errors. We do have some redundancy built into our process to catch errors as soon as possible.

How is your data different then other collections?

Each dataset on COVID-19 and corrections will have some differences. For example, the Marshall Project also collected data on COVID-19 and corrections. Their methodology was very different from ours, however. They called the institutions once a week to collect cumulative data over the phone. They also only collected data for a finite period of time, rather than continuously. It might be good practice to use multiple data sources to triangulate your findings. We too are working to see if there are ways for us to combine our data with the Marshall Project to make more robust the complete data picture here at the COVID Prison Project.

Last updated May 19, 2020.