Social scientists have begun to collect large datasets that are heterogeneous and semi-structured, but the ability to analyze such data has lagged behind its collection. We design a process to map such datasets to a numerical form, apply singular value decomposition clustering, and explore the impact of individual attributes or fields by overlaying visualizations of the clusters. This provides a new path for understanding such datasets, which we illustrate with three real-world examples: the Global Terrorism Database, which records details of every terrorist attack since 1970; a Chicago police dataset, which records details of every drug-related incident over a period of approximately a month; and a dataset describing members of a Hezbollah crime/terror network in the U.S.
Publication Information
Skillicorn, D. B. and Christian Leuprecht. 2018. "Clustering Heterogeneous Semi-structured Social Science Datasets for Security Applications." In Security by Design, ed. Anthony J. Masys. Cham, Switzerland: Springer. https://link.springer.com/chapter/10.1007/978-3-319-78021-4_9