This talk concerns developing three in-use systems for publishing and studying Finnish 20th century war history: WarSampo, WarWictimSampo 1914-1922, and WarMemoirSampo. These systems include a Linked Open Data (LOD) service and a SPARQL endpoint on the Linked Data Finland platform (https://ldf.fi), and an in-use semantic portal on top of it. The systems are based on the so-called Sampo Model and are part of the larger Sampo series of systems (https://seco.cs.aalto.fi/applications/sampo/).
In Sampo systems the idea is to aggregate and enrich heterogenous, distributed datasets into harmonized knowledge graphs, based on a shared ontology infrastructure. The services are then used for data analyses in Digital Humanities with tools such as Google Colab and Jupyter Notebooks, and for developing ready-to-use applications, where faceted search and browsing are integrated seamlessly with data analytic tools, such as the Sampo portals.
WarSampo aggregates data about the Second World War (WW2) in Finland from some 20 data sources and several collaborating organizations. The core dataset includes all 95 000 death records of the fallen Finnish soldiers from the National Archives. A key innovation of WarSampo is to try to automatically re-assemble the life stories of the soldiers by data linking. The portal has had over a million distinct users typically trying to find information about their lost relatives. The data has also be used for data analyses. WarSampo got the international LODLAM Open Data Price in 2017.
WarWIctimSampo is a smaller related system based on the death records and battles of the Civil War in Finland and Kindred Wars during 1914-1922.
WarMemoirSampo demonstrates the idea of publishing and watching videos on the Semantic Web, with a focus on memoirs of WW2 veterans. The system enables scene segments in videos to be searched by their semantic content. While watching a video, additional contextual information is provided dynamically. The system is based on the WarSampo infrastructure and a knowledge graph that has been extracted automatically from timestamped textual natural language descriptions of the video contents.