P2-243 Geospatially Explicit Synthetic Poultry Data: Filling in Data Gaps with Synthetic Populations

Monday, July 27, 2015
Exhibit Hall (Oregon Convention Center)
William Wheaton , RTI International , Research Triangle Park , NC
Mark Bruhn , RTI International , Research Triangle Park , NC
James Rineer
Barbara Kowalcyk , RTI International , Research Triangle Park , NC
Introduction: Agent-based models are an important tool for analyzing spread of animal infectious disease outbreaks and evaluating potential control interventions.  However, geospatial information – often unavailable in the public domain – is needed to implement these models.  Synthetic populations are an increasingly powerful source of data that can fill this data gap.

Purpose: A synthetic poultry farm database was created for use in infectious disease models and evaluated for accuracy.  Results for the synthetic poultry farm database are described here.

Methods: The database was developed using Census of Agriculture county data along with spatial data on landuse/landcover, transportation, elevation/slope, and hydrography to generate farm probability maps and to create spatial features representing farms with appropriate characteristics such as farm type and size.  Farm locations are generated in places where the suitability layer indicates areas of high probability for a poultry farm. Counts, sizes, and types are derived from the Census of Agriculture counts by county.

Results: The U.S. synthetic poultry database contains 145,903 commercial poultry farms.  The dataset is geospatial, containing an estimated latitude/longitude coordinate for each farm as well as farm characteristics.   Locations of the synthetic farm suitability layer (i.e., places deemed to be suitable locations for poultry farms) were compared to random placement of farms in suitable places using an agreement matrix statistical methodology.  Total accuracy is 78.8 (95% CI: 74.8, 82.2), suggesting that the synthetic location model does a good job of matching likely poultry farm locations. 

Significance: Realistic, geospatially explicit, statistically accurate synthetic data provide a source of open, readily available, de-identified data for use in a wide variety analyses relevant to food safety. These datasets are more reliable than those created using random placement.