How we build the data behind the research.
Independent, reproducible, and documented. The Eviction Research Network mines civil court filings, cleans defendant records, estimates race at the neighborhood level, and deduplicates cases — so researchers, advocates, and policymakers can trust what the numbers say.
Five stages, court filing to research-ready dataset
Every ERN state profile follows the same five-stage pipeline. The steps below link to the technical details further down the page.
-
Acquisition
Partner with data providers & court systems to obtain raw filings.
-
Geocoding
Address → latitude/longitude → census geography.
-
Name Cleaning
Extract first/last names; filter out businesses & unnamed tenants.
-
Demographic Estimation
Bayesian imputation of race using surname + neighborhood.
-
Deduplication
Collapse repeat entries of the same case over time.
The full pipeline — from raw court records, through OCR and address geocoding, to Bayesian demographic estimation and validation — is documented as a general methodology in Thomas, Hepburn, Graetz, and Desmond, "Estimating Eviction Prevalence Across the United States" (Cityscape, 2024). It is the baseline reference for applying this approach to any jurisdiction's court data, in any format.
What each stage does — and why it matters
Data acquisition
Civil court records aren't centralized in the United States. We source them from five distinct partners covering 30+ states — each with a different data format and a different level of completeness.
Technical detailsGeocoding & redistribution
Raw addresses vary wildly in precision. We geocode to the highest feasible scale (tract > ZIP > county) and redistribute low-precision cases into the "primary geography" of their county.
Technical detailsDefendant name cleaning
We use regular expressions and string cleaning to extract individual tenants and exclude commercial cases, so state profiles reflect household evictions only.
Technical detailsDemographic estimation
Using Imai & Khanna's Bayesian ecological inference, we combine surname and neighborhood racial composition to compute probabilistic race estimates for each tenant. We never assign race — we estimate group rates.
Technical detailsDeduplication
Court systems often log the same case multiple times across different events. We collapse duplicates at the tract level by matching on name and address, keeping the earliest row.
Technical detailsAggregation & analysis
Clean records feed eviction-rate calculations by tract, county, state, race, and month — the backbone of every state profile, dashboard, and the Housing Precarity Risk Model.
Explore HPRMFull methodology, start to finish
Data Sources
ERN's research is only as rigorous as the court records we can access. We work with five data partners, each providing records through a different pipeline — FOIA requests, state court administrative offices, direct sheriff-department data, and NLP-scraped court record systems. The table below describes each partner and the states each covers.
Civil court data is fragmented across 50 state court systems and thousands of county jurisdictions. ERN's value is stitching these sources together so researchers and policymakers can compare trends across regions.
Legal Services Corporation
States included: Alaska, Arizona, Arkansas, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Indiana, Kansas, Kentucky, Maine, Minnesota, Mississippi, Missouri, New York, North Dakota, Ohio, Oklahoma, Pennsylvania, Puerto Rico, South Carolina, Tennessee, Texas, Utah, Vermont, Virgin Islands, Virginia, Wisconsin.
Portland State University
Eviction data for the entire state of Oregon were provided by Lisa Bates, PhD director of EvictedInOregon at Portland State University. The records contained fields for case number, date of filing, each party listed on a case, the side of the listed party, type of eviction, and whether the filing occurred during a moratorium.
Chicago Legal Aid / ACLU
Eviction data for Cook County, DuPage County, Kane County, McHenry County, and Will County was provided through FOIA requests, web scraping, and Chicago Legal Aid. The information available varied with each county and not all records contained sufficient information for reporting. The discrepancies are noted in the Illinois state profile.
Baltimore City Sheriff's Department
Baltimore eviction data consists of Sheriff service calls and completions, otherwise known as writs of restitution. Writs are executed after the filing if the tenant is still on the premises. These data were provided by the Baltimore City Sheriff's department in collaboration with the Public Justice Center.
Washington State unlawful detainer data
Washington Eviction data consists primarily of Unlawful Detainers (eviction filings). The ERN team conducted a multi-stage process to collect, clean, and analyze these data. First, case number ID's, judgments, names, and county of the filing were requested through the WA State Administrative Office of the Courts. Because these data did not contain addresses, which is necessary to map and estimate demographics, ERN reached out to county clerks where case file images are held to request online access to their record systems and scrape these records using the case number. Next, ERN digitized the court records and used Natural Language Processing to mine the court record addresses of the defendant. (Future research will include mining the reason for eviction and other characteristics regarding each case to determine causes and consequences of eviction). Finally, addresses are geocoded so we can map and conduct demographic estimation of those facing eviction.
County level data covers the entire state while tract level data covers King, Pierce, Snohomish, and Whatcom counties.
Geocoding and Geographic "Redistribution"
Geocoding is the process of creating spatial data by establishing the latitude and longitude of individual addresses. While the Legal Services Corporation geocoded their data before sending it to us, datasets from other sources required that we geocode them ourselves using a combination of US Census Bureau, ArcGIS, and OpenStreetMap geocoding services. We first used the US Census Bureau's service — which is capable of processing up to 10,000 addresses per request — and then used either ArcGIS or OpenStreetMap (or both) to geocode leftover addresses.
While we would like to be able to aggregate all evictions to the census tract level, the quality and specificity of the address field provided in the original data varies. It is not always possible to determine the census tract the eviction occurred in since some addresses list only a zip code or county. In these cases, the latitude and longitude that result from geocoding are the central coordinates of whichever geographic entity is available and do not accurately represent the exact location of the eviction. For example, an eviction with only the zip code listed (instead of a specific street address) would be assigned the latitude and longitude of the zip code's centroid, which may be located outside of the census tract that the eviction actually occurred in. To address this issue, we devised a system to (1) determine the appropriate geographic scale at which to map eviction rates, and (2) geographically "redistribute" evictions into smaller geographies when necessary.
For each county within a state, we determined the geographic scale (census tract, zip code, or county) at which the plurality of eviction cases were available — we called this the county's "primary geography." When the primary geography was the census tract, we mapped the county's eviction rates at the tract level. If the primary geography was the zip code, we mapped the county's eviction rates by zip code.
Mapping eviction at the wrong geographic scale can dramatically distort racial-disparity estimates. A neighborhood-level trend looks very different than a county average, especially in counties with segregated housing patterns.
However, when the plurality of evictions in a county are available at a certain geographic scale, this does not mean that all of the evictions in that county are available at that scale. For example, a county whose primary geography is the census tract might have some evictions that are only available at the zip code level, and a county whose primary geography is the zip code might have a number of evictions that are only available at the county level. In order to map all the evictions in a county at the same geographic scale (i.e., the "primary geography"), we "redistributed" these evictions into the appropriate geographic entities.
In counties where the primary geography was the census tract, evictions that were available at only the zip code level were distributed equally into census tracts within their respective zip codes. For example, if there were 5 tracts in a zip code and 10 eviction cases in the zip code needing "redistribution", each tract would be assigned 2 cases (except for tracts with zero renters according to the census, which would not be assigned any cases). Similarly, in counties where the primary geography was the zip code, evictions that were available only at the county level were distributed equally into zip codes within the county.
Defendant Name Cleaning
After geocoding, we used regular expressions and other string manipulation methods to clean and extract the first and last names of individual defendants. The data include information about eviction filings among (1) individual households with first and last names, (2) businesses, and (3) unnamed tenants. For these state profiles, we are only interested in analyzing evictions of individual households, not commercial evictions, so we filtered out cases where the name suggested the defendant was a business rather than a person.
Demographic Estimation
The canonical write-up of this stage, including accuracy validation against legal-aid intake data, is Thomas et al., "Estimating Eviction Prevalence Across the United States" (Cityscape, 2024) — see our ERN summary.
Using the surname extracted from the defendant name field, we estimated the race of each defendant with a valid human name using a Bayesian prediction model. This ecological inference method developed by Imai and Khanna uses Bayes' rule to examine the racial likelihood of frequently occurring surnames within Census name data and the racial composition for each neighborhood (tract data) where the evicted defendant lived. Using these two pieces of information, we computed the predicted probability of each racial category (White, Black, Latine, Asian, or other) for any given individual. For example, a person with the last name Jackson, a common Black surname, living in a neighborhood where a large share of the population is Black would have a higher likelihood of being estimated as Black compared to a person living in a neighborhood where a smaller share of the population is Black. Neighborhood racial composition is defined by the 2020 Decennial Census tract geography.
ERN never assigns race to any individual. Demographic estimation produces probabilistic group-level rates — the share of evictions across a tract or county attributable to each racial group. Individual-level race is never reported.
To determine eviction rates by race at the tract and county level:
-
We first summed the predicted probabilities of each race for all the individuals in the tract/county by month to determine the predicted number of evictions for each racial group.
- For example, if there were three individuals in a tract/county in June 2017, and their predicted probabilities of being Asian were 0.3, 0.8, and 0.2 respectively, we would say that there were (0.3 + 0.8 + 0.2) = 1.3 evictions among Asians in that tract/county in that month.
-
We then estimated the proportion of evictions filed against each racial group by dividing these predicted race-specific evictions by the predicted sum of evictions for all racial groups.
- For example, if there were 1.3 evictions among Asians in a tract/county in June 2017, and 16 evictions among all racial groups (Asian + Black + Latine + White + other), we would say that 1.3 / 16 = approximately 8% of evictions in June 2017 were among Asians.
-
However, because we could not successfully perform demographic estimation for all individuals listed in the data (e.g., when the defendant name was something like "UNAUTHORIZED OCCUPANT"), simply counting the cases for which demographic estimation was successful misrepresents the real eviction counts. To remedy this, we multiplied the estimated proportions (explained in the paragraph above) by the total number of unique eviction cases included in the data (calculated before demographic estimation was conducted) to again estimate the number of evictions for each racial group.
- For example, if we determined that 8% of evictions in the tract/county in June 2017 were among Asians, and there were 19 total evictions (according to pre-demographic estimation calculations), we would say that there were actually 0.08 * 19 = 1.52 evictions among Asians.
-
Finally, we calculated eviction rates by race, or the share of renters in each racial group (i.e., the universe of people who could potentially face eviction) who were evicted. To do this, we divided the updated estimated eviction counts by the total number of renters in each racial group, according to the 2020 census.
- For example, if we calculated 1.52 evictions among Asians in the tract/county in June 2017, and there were 70 Asian renters tract/county according to the 2020 census, we would say that the eviction rate among Asians was 1.52 / 70 = approximately 2.2%.
Deduplication
In some of the datasets we received, there were many instances of multiple rows with identical defendant names and street addresses, each row corresponding to a different date and with a different case ID. These cases presumably do not represent multiple separate evictions, but a single case being entered into the court's system at different points in time. While deduplication could generally not be done for county-level data because the datasets did not contain enough information, we did deduplicate tract-level data when valid defendant names and addresses were available, keeping the earliest row for each unique name and address.
The organizations that make this research possible
Legal Services Corporation (LSC)
Federally-funded nonprofit coordinating civil legal aid across the U.S. LSC provided geocoded eviction records spanning Alaska to Virginia.
Portland State University
Dr. Lisa Bates' EvictedInOregon project shares case-level filings including party names, filing dates, and pandemic moratorium status.
Chicago Legal Aid & ACLU
FOIA requests, web scraping, and direct data sharing for Cook, DuPage, Kane, McHenry, and Will counties.
Baltimore City Sheriff's Department
Sheriff writs of restitution — executed after filing — shared via partnership with Public Justice Center.
WA State Administrative Office of the Courts
Case ID, judgment, and name data provided by AOC. ERN scrapes county-clerk systems for address fields and applies NLP to extract addresses from record images.
Bring your state to ERN
We're actively seeking partners in under-studied states. If your court system, legal aid organization, or research team wants to stand up a state pipeline, we'd like to talk.
Questions researchers ask us
Can I replicate your methodology on a new dataset?
Yes. The pipeline stages — acquisition, geocoding, name cleaning, demographic estimation, deduplication — are documented above and implemented with standard tools (U.S. Census geocoder, wru or predictrace R packages for Bayesian demographic imputation). If you're replicating on a new state, we'd encourage you to reach out — replicating the redistribution logic for mixed-precision addresses takes care to get right.
How do you handle missing or ambiguous race data?
ERN never assigns race to an individual. Demographic estimation produces probabilistic rates at the tract and county level. When a defendant's name cannot be parsed (e.g., "UNAUTHORIZED OCCUPANT") we drop them from the race-estimation step but account for them in total counts via the proportion adjustment described in Stage 04.
What's the error rate on demographic estimation?
Bayesian Improved Surname Geocoding (BISG), the method Imai & Khanna developed and we extend, typically achieves aggregate group-level accuracy within 1–3 percentage points against ground-truth validation when both surname and neighborhood signal are strong. Accuracy degrades in neighborhoods with low segregation or with less common surnames. We report group-level eviction rates, not individual assignments, to remain within the method's validated use case.
Why aggregate to census tract instead of ZIP code?
Census tracts are designed to approximate ~4,000 people of similar socioeconomic characteristics, making them the standard unit for neighborhood-level analysis and racial-disparity research. ZIP codes, by contrast, are postal-delivery zones with no social meaning and can span populations from 100 to 100,000. Where tract-level geocoding is not possible, we fall back to ZIP and document the redistribution logic.
Is your code public?
Our pipeline code lives in ~37 state-specific research repositories within the ERN GitHub organization. State profiles typically include their cleaning and analysis scripts as Quarto or R Markdown notebooks. If you need a specific repo or method reference, please email us.
How often are state datasets updated?
Update cadence varies by data partner. LSC-sourced states refresh roughly annually; Washington and Oregon refresh quarterly during active research cycles; Baltimore and Illinois refresh on a case-by-case basis tied to grant timelines. State profile pages show the last refresh date.
Use the citation below when referencing ERN methods
@techreport{ERN_methodology_2026,
author = {Thomas, Timothy and {Eviction Research Network}},
title = {Methodology: Data Collection, Cleaning, and
Demographic Estimation for Eviction Court Records},
institution = {University of California, Berkeley —
Department of Sociology},
year = {2026},
url = {https://evictionresearch.net/methodology.html}
}
Want to stand up a new state pipeline?
If you're a court administrator, researcher, or legal aid organization with access to eviction data in a state we haven't reached yet, we'd like to build with you.
Start a conversation