3 Data and Methods
This study utilizes eviction court records to understand the count and demographics of those that are evicted. Evictions in this report are defined as the count of unlawful detainer court records in the State of Washington. An unlawful detainer is a court ordered eviction process filed by a landlord to remove a tenant with the most common reason being falling behind on rent. In short, a landlord posts an eviction notice for a tenant to pay or leave within 3 days of notice. If the tenant cannot comply, the landlord then gives the tenant a summons and complaint to which the tenant must respond within a week. Next, the parties go to court to determine whether to issue a writ of restitution (removal of the tenant by the Sheriff) or if the tenant wins the case leading to a dismissal (see the Losing Home report, pg. 13, for a concise description of the most common unlawful detainer practices).
Given this definition, we consider all eviction counts in this study to be “formal evictions.” Informal eviction data on tenants vacating within the 3-day notice or prior to notice due to an increase in rent is not available.
Eviction counts are analyzed using a full list of unlawful detainer court cases and names provided by the Washington State Superior Court. We also combine population, rental, income, and market data from the US Decennial and American Community Survey population and rental unit estimates, Housing and Urban Development fair market rent trends, the Washington State Department of Commerce Homelessness Point in Time Counts, and the Bureau of Labor Statistics Consumer Price Index to adjust yearly monetary amounts to 2017 dollars.
To better understand both who is evicted and how broader contexts affect evictions, we developed a multi-level approach to collecting and processing novel eviction data. The analysis is based on automatically processing the actual pages from the eviction court documents, converting the eviction addresses to census tracts, and estimating race and sex based on the name and tract. We have two types of information available to us: first, the summary tables of all the eviction cases provided by the State Court and second the the eviction lawsuit court records for selected counties provided by each individual county court clerks office. The list of all evictions are by county from 2004 to 2017 and includes case numbers, the names of all parties involved (defendants, plaintiffs, and attorneys), case resolution of the eviction, and judgment amounts. Missing causes of evictions and demographic detail requires us to review actual court records to obtain more detailed information. The court records are in the form of unstructured photocopies. Broadly, we perform the following tasks in this order:
First, we download the records from courts’ information systems of the corresponding counties.24 As the courts’ information systems do not have easy-to-access web APIs, this involves developing custom web scraping scripts. Courts’ keep their documents either in pdf or in tagged image file (TIF) format. We convert these documents into high-resolution (250 dpi) individual page images, and the images in turn into text using tesseract 4.0 OCR software.
Thereafter we extract the eviction addresses from the texts. Our current approach uses regular expression-based matching in order to detect addresses. We chose this approach partly because of its simplicity, and partly because it requires less training data than the alternatives, neural-network based named entity recognition. Regular expressions and related methods are popular in performing similar tasks.
As the documents may contain more addresses than that of the premises, such as the address of the attorney or landlord, we evaluate the likelihood of the address being the correct one. We collect words from the text in the neighborhood where the address is written (10 words before and 10 words after the address) and evaluate the type of the address based on these bag-of-words using Naive Bayes. The approach works very well for addresses extracted from eviction summons (these are written in a fairly standard format). If the algorithm cannot find the correct address in the summons file, it also scans all the other available documents, and picks the address it considers most likely. In those cases our correct address selection algorithm is less robust.
The accuracy of the full address extraction for about 40,000 cases is approximately 75%. The main issues are a) ocr errors that introduce wrong names (e.g. Bellmgham instead of Bellingham or sth street instead of 5th street), b) missing zip codes in documents, and c) picking wrong address out of several addresses. The latter usually happens if the address is handwritten in the summons (or sometimes the address does not exist in printed form at all) as our address correctness estimate is less robust for address extracted from different files.
Third, the extracted addresses are geocoded through ESRI’s ArcMap software, which uses an “address locater” database to match each input address to a set of geographic coordinates. ArcMap also assigns each input address an “Address Type”, which specifies the point of reference that ArcMap uses to determine coordinates for each address. These “Address Type” categories largely reflect administrative units such as “Admin” (state), “SubAdmin” (county), “Postal” (ZIP code), and “Locality” (municipality). This categorization provides an indication of address-matching accuracy, because each category represents the centroid of an administrative jurisdiction. If a given address is assigned an “Admin” match type, therefore, the geographic coordinates provided by ArcMap represent the geographic center of a given state rather than the precise coordinates of the address location. Similarly, “SubAdmin”, “Postal”, and “Locality” represent the centroids of counties, ZIP codes, and municipalities respectively. Therefore, our measure of a successful address match is one in which the address match type is either “Street Address” or “Point Address”, which indicate that the given coordinates closely reflect the actual address location.
With this metric for a successful address match, we carry out geocoding in three steps. First, we geocode all extracted addresses in ArcMap. Second, we take any addresses that are not successfully matched and standardize their address format using Google’s geocoding API. Finally, these addresses are geocoded once again in ArcMap. Any addresses without a successful match following these two rounds of geocoding are discarded. The first round of geocoding had an address match success rate of approximately 60%. After the google validation, the overall success rate increased to 93.6%. Tract FIPS codes were then spatially joined to each of the cases with a successful address allowing us to estimate the sex and race of individuals using their name and location.
Using these addresses, race is estimated through a Bayesian prediction model using surname and geolocation. This ecological inference method developed by Imai and Khanna25 utilizes the Bayes’ rule to examine the racial likelihood of frequently occurring surnames within Census name data and the racial composition for each neighborhood (tract data) where the evicted defendant lived. Using these two pieces of information, we compute the predicted probability of each racial category (White, Black, Latinx, Asian, or other) for any given individual. For example, a person with the last name Jackson, a common Black surname, living in a high-Black neighborhood would have a higher likelihood of being Black. Whereas the same name found in a high-White neighborhood would have a lower probability of being Black. Neighborhood racial composition is defined by the 2010 Decennial Census tract geography. Sex is inferred by cross-validating the first name of the individual with the Social Security Administration (SSA) Name Registry from 1932 to 2012 and the US Census Integrated Public Use Microdata Series (IPUMS).
Currently, we have completed the download and processing data for Pierce, Snohomish, and Whatcom county, and some of the King county (i.e. city of Seattle). Access to these data is both costly and requires establishing a relationship with each county clerk in the state.↩
Imai, Kosuke, and Kabir Khanna. “Improving ecological inference by predicting individual ethnicity from voter registration records.” Political Analysis 24, no. 2 (2016): 263-272.↩