Purpose: A synthetic poultry farm database was created for use in infectious disease models and evaluated for accuracy. Results for the synthetic poultry farm database are described here.
Methods: The database was developed using Census of Agriculture county data along with spatial data on landuse/landcover, transportation, elevation/slope, and hydrography to generate farm probability maps and to create spatial features representing farms with appropriate characteristics such as farm type and size. Farm locations are generated in places where the suitability layer indicates areas of high probability for a poultry farm. Counts, sizes, and types are derived from the Census of Agriculture counts by county.
Results: The U.S. synthetic poultry database contains 145,903 commercial poultry farms. The dataset is geospatial, containing an estimated latitude/longitude coordinate for each farm as well as farm characteristics. Locations of the synthetic farm suitability layer (i.e., places deemed to be suitable locations for poultry farms) were compared to random placement of farms in suitable places using an agreement matrix statistical methodology. Total accuracy is 78.8 (95% CI: 74.8, 82.2), suggesting that the synthetic location model does a good job of matching likely poultry farm locations.
Significance: Realistic, geospatially explicit, statistically accurate synthetic data provide a source of open, readily available, de-identified data for use in a wide variety analyses relevant to food safety. These datasets are more reliable than those created using random placement.