START team will debut new data integration tools at annual APSA conference

May 12, 2017Jessica Rivinius

On Wednesday, Aug. 3, a team of START researchers will debut two innovative software tools—geomerge and MELTT—that implement best practices for spatial data integration, and teach a short course introducing participants to the tools through hands-on exercises using illustrative datasets and code. Part of the American Political Science Association (APSA) Annual Meeting short course series, this course will give participants an overview of the foundations for the integration of spatial data, including key conceptual and technical challenges, followed by specific applications.

The geomerge package provides a standardized set of methodologies for integrating geographical data into Geographic Information Systems (GIS) layers that can be used for any style of quantitative and spatial statistical analysis. Geomerge handles specialized programs and information such as: R SpatialPolygonDataFrame, SpatialPointsDataFrame, SpatialLinesDataFrame, and RasterLayer classes, and conducts spatial joins of variable attributes in a projected coordinate system. Panel time-series data aggregations may be specified by users but geomerge can also be run in a static (cross-sectional) mode. geomerge contributes to efforts within the research community toward standardization, transparency, and reproducibility.

The forthcoming R Package MELTT --- matching event data by location, time, and type --- presents an automated process that allows for automated, transparent and reproducible integration and disambiguation of multiple event datasets. MELTT offers an efficient, effective and versatile alternative to manual integration of event data, a task that is otherwise often opaque or impractical to perform manually.

MELTT formalizes an innovative procedure for integrating event data that is (1) automated, to facilitate efficient integration of large datasets in a manner that can be too resource intensive to do by hand, (2) transparent, written in the form of a protocol and program, to clearly communicate how integration is undertaken and to guarantee reproducibility, and (3) adaptable, to accommodate any choices of datasets and parameters of comparison. MELTT is an iterative procedure that systematically compares all cases from multiple event datasets that fall within the same spatio-temporal window, examining other available information about the identified cases to evaluate whether or not they are indeed matches.

The START teams who developed these tools will debut them during a pre-conference online short course, “SC13: Tools & Best Practices for the Integration of Spatial Data,” for APSA at 1:30 p.m. Aug. 30.

The course is intended for any researchers who use spatial data. It assumes a general knowledge of spatial data analysis, as well as some familiarity with GIS software and the R programming language.

The short courses are intended to provide diverse opportunities, either half day or full day, for professional development and offer attendees the chance to connect with scholars from a range of backgrounds. They are sponsored by APSA Organized Sections and other affiliated organizations.

Pre-registration for short courses is required and is $25 per short course. Registration for short courses is available on the Annual Meeting registration page, as part of the registration process. All short course participants must also be registered for the conference.