The third Swiss Open Cultural Data Hackathon took place on 15-16 september 2017 at the University of Lausanne. It was organised by the OpenGLAM.ch working group in cooperation with the Cantonal and University Library of Lausanne, infoclio.ch and other partners.
A complete list of Hackathons projects as well as a list of available datasets can be found on the Open Glam Hackathon website.
Les Archives photographiques de la Société des Nations (SDN) — ancêtre de l'actuelle Organisation des Nations Unies (ONU) — consistent en une collection de plusieurs milliers de photographies des assemblées, délégations, commissions, secrétariats ainsi qu'une série de portraits de diplomates. Si ces photographies, numérisées en 2001, ont été rendues accessible sur le web par l'Université de l'Indiana, celles-ci sont dépourvues de métadonnées et sont difficilement exploitables pour la recherche.
Notre projet est de valoriser cette collection en créant une infrastructure qui soit capable de détecter les visages des individus présents sur les photographies, de les classer par similarité et d'offrir une interface qui permette aux historiens de valider leur identification et de leur ajouter des métadonnées.
Le projet s'est déroulé sur deux sessions de travail, en mai (Geneva Open Libraries) et en septembre 2017 (3rd Swiss Open Cultural Data Hackathon), séparées ci-dessous.
Université de Lausanne | United Nations Archives | EPITECH Lyon | Université de Genève |
---|---|---|---|
Martin Grandjean | Blandine Blukacz-Louisfert | Gregoire Lodi | Samuel Freitas |
Colin Wells | Louis Schneider | ||
Adrien Bayles | |||
Sifdine Haddou |
Dans le cadre de la troisième édition du Swiss Open Cultural Data Hackathon, l’équipe qui s’était constituée lors du pre-event de Genève s’est retrouvée à l’Université de Lausanne les 15 et 16 septembre 2017 dans le but de réactualiser le projet et poursuivre son développement.
Les discussions de la matinée se sont concentrées sur les stratégies de conception d’un système permettant de relier les images aux métadonnées, et de la pertinence des informations retenues et visibles directement depuis la plateforme. La question des droits reposant sur les photographies de la Société des Nations n’étant pas clairement résolue, il a été décidé de concevoir une plateforme pouvant servir plus largement à d’autres banques d’images de nature similaire.
Découverte : Wikimedia Commons dispose de son propre outil d'annotation : ImageAnnotator. Voir exemple ci-contre.
Organisation
https://github.com/PictureAnnotation
Repositories
https://github.com/PictureAnnotation/Annotation
https://github.com/PictureAnnotation/Annotation-API
Images en ligne sur Wikimedia Commons avec identification basique pour tests :
A (1939) https://commons.wikimedia.org/wiki/File:League_of_Nations_Commission_075.tif
B (1924-1927) https://commons.wikimedia.org/wiki/File:League_of_Nations_Commission_067.tif
Université de Lausanne | United Nations Archives | EPITECH Lyon | Archives d'Etat de Genève |
---|---|---|---|
Martin Grandjean martin.grandjean@unil.ch | Blandine Blukacz-Louisfert bblukacz-louisfert@unog.ch | Adam Krim adam.krim@epitech.eu | Anouk Dunant Gonzenbach anouk.dunant-gonzenbach@etat.ge.ch |
Colin Wells cwells@unog.ch | Louis Schneider louis.schneider@epitech.eu | ||
Maria Jose Lloret mjlloret@unog.ch | Adrien Bayles adrien.bayles@epitech.eu | ||
Paul Varé paul.vare@epitech.eu |
Ce projet fait partie du Geneva Open Libraries Hackathon.
Lancement du hackathon Geneva Open Libraries à la Bibliothèque de l'ONU (présentation du week-end, pitch d'idées de projets, …)
Premières idées de projets:
- Site avec tags collaboratifs pour permettre l'identification des personnes sur des photos d'archives.
- Identification des personnages sur des photos d'archives de manière automatisée.
→ Identifier automatiquement toutes les photos où se situe la même personne et permettre l'édition manuelle de tags qui s'appliqueront sur toutes les photos du personnage (plus besoin d'identifier photo par photo les personnages photographiés).
Idéalisation du projet: que peut-on faire de plus pour que le projet ne soit pas qu'un simple plugin d'identification ? Que peut-on apporter de novateur dans la recherche collaborative ? Que peut-on faire de plus que Wikipédia ?
Travailler sur la photo, la manière dont les données sont montrées à l'utilisateur, etc…
Problématique de notre projet: permettre une collaboration sur l'identification de photos d'archives avec une partie automatisée et une partie communautaire et manuelle.
Analyser les photos → Identifier les personnages → Afficher la photo sur un site avec tous les personnages marqués ainsi que tous les liens et notes en rapports.
Utilisateur → Création de tags sur la photographie (objets, scènes, liens historiques, etc..) → Approbation de la communauté de l'exactitude des tags proposés.
Travail en cours sur le P.O.C.:
- Front du site: partie graphique du site, survol des éléments…
- Prototype de reconnaissance faciale: quelques défauts à corriger, exportation des visages…
- Poster du projet
Le projet ayant été sélectionné pour représenter le hackathon Geneva Open Libraries lors de la cérémonie de clôture de l'Open Geneva Hackathons (un projet pour chacun des hackathons se tenant à Genève ce week-end), il est présenté sur la scène du Campus Biotech.
Version PDF grande taille disponible ici. Version PNG pour le web ci-dessous.
Version PNG
Joint project bringing together three separate projects: Rilke correspondance, Jung correspondance and ETH Library.
Objectives:
ACTUAL INPUT DATA
Comment: The Rilke data is cleaner than the Jung data. Some cleaning needed to make them match:
1) separate sender and receiver; clean up and cluster (OpenRefine)
2) clean up dates and put in a format that IT developpers need (Perl)
3) clean up placenames and match to geolocators (Dariah-DE)
4) match senders and receivers to Wikidata where possible (Openrefine, problem with volume)
METADATA STRUCTURE
The follwing fields were included in the common basic data structure:
sysID; callNo; titel; sender; senderID; recipient; recipientID; place; placeLat; placeLong; datefrom, dateto; language
DATA CLEANSING AND ENRICHMENT
* Description of steps, and issues, in Process (please correct and refine).
Issues with the Jung correspondence is data structure. Sender and recipient in one column.
Also dates need both cleaning for consistency (e.g. removal of “ca.”) and transformation to meet developper specs. (Basil using Perl scripts)
For geocoding the placenames: OpenRefine was used for the normalization of the placenames and DARIAH GeoBrowser for the actual geocoding (there were some issues with handling large files). Tests with OpenRefine in combination with Open Street View were done as well.
The C.G. Jung dataset contains sending locations information for 16,619 out of 32,127 letters; 10,271 places were georeferenced. In the Rilke dataset all the sending location were georeferenced.
For matching senders and recipients to Wikidata Q-codes, OpenRefine was used. Issues encountered with large files and with recovering Q codes after successful matching, as well as need of scholarly expertise to ID people without clear identification. Specialist knowledge needed. Wikidata Q codes that Openrefine linked to seem to have disappeared? Instructions on how to add the Q codes are here https://github.com/OpenRefine/OpenRefine/wiki/reconciliation.
Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files.
DATA after cleaning:
https://github.com/basimar/hackathon17_jungrilke
DATABASE
Issues with the target database:
Fields defined, SQL databases and visuablisation program being evaluated.
How - and whether - to integrate with WIkidata still not clear.
Issues: letters are too detailed to be imported as Wikidata items, although it looks like the senders and recipients have the notability and networks to make it worthwhile. Trying to keep options open.
As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted.
They took the cleaned CVS files, converted to SQL, then JSON.
Additional issues encountered:
- Visualization: three tools are being tested: 1) Paladio (Stanford) concerns about limits on large files? 2) Viseyes and 3) Gephi.
- Ensuring that the files from different projects respect same structure in final, cleaned-up versions.
Heatmap of Rainer Maria Rilke’s correspondence (visualized with Google Fusion Tables)
Correspondence from and to C. G. Jung visualized as a network. The two large nodes are Carl Gustav Jung (below) and his secretary’s office (above). Visualized with the tool Gephi
The goal of the project is to try to ingest all performances of the Schauspielhaus Theater in Zurich held between 1938 and 1968 in Wikidata. In a further step, data from the www.performing-arts.eu Platform, Swiss Theatre Collection and other data providers could be ingested as well.
Step 3 (you can get the Q number from there)
This projects aims at providing a directory of Swiss Video Games and metadata about them.
The directory is a platform to display and promote Swiss Games for publishers, journalists, politicians or potential buyers, as well as a database aimed at the game developers community and scientific researchers.
Our work is the continuation of a project initiated at the 1st Open Data Hackathon in 2015 in Bern by David Stark.
The list was created on Aug. 11 2014 by David Javet, game designer and PhD student at UNIL. He then opened it to the Swiss game dev community which collaboratively fed it. At the start of this hackathon, the list was composed of 241 games, starting from 2004. It was turned into an open data set on the opendata.swiss portal by Oleg Lavrovsky.
We try to analyse bibliographical data using big data technology (flink, elasticsearch, metafacture).
Here a first sketch of what we're aiming at:
We use bibliographical metadata:
Swissbib bibliographical data https://www.swissbib.ch/
crossref https://www.crossref.org/
Librarian:
- For prioritizing which of our holdings should be digitized most urgently, I want to know which of our holdings are nowhere else to be found.
- We would like to have a list of all the DVDs in swissbib.
- What is special about the holdings of some library/institution? Profile?
Data analyst:
- I want to get to know better my data. And be faster.
→ e.g. I want to know which records don‘t have any entry for ‚year of publication‘. I want to analyze, if these records should be sent through the merging process of CBS. Therefore I also want to know, if these records contain other ‚relevant‘ fields, defined by CBS (e.g. ISBN, etc.). To analyze the results, a visualization tool might be useful.
Goal: Enrichment. I want to add missing identifiers (e.g. DOIs, ORCID, funder IDs) to the edoc dataset.
→ Match the two datasets by author and title
→ Quality of the matches? (score)
elasticsearch https://www.elastic.co/de/
JAVA based search engine, results exported in JSON
Flink https://flink.apache.org/
open-source stream processing framework
Metafacture https://culturegraph.github.io/,
https://github.com/dataramblers/hackathon17/wiki#metafacture
Tool suite for metadata-processing and transformation
Zeppelin https://zeppelin.apache.org/
Visualisation of the results
Data Ramblers Project Wiki https://github.com/dataramblers/hackathon17/wiki
Finding connections and pathways between book and object collections of the University Institute for History of Medecine and Public Heatlh (Institute of Humanities in Medicine since 2018) of the CHUV.
The project started off with data sets concerning two collections: book collection and object collection, both held by the University Institute for History of Medicine and Public Health in Lausanne. They were metadata of the book collection, and metadata plus photographs of object collection. These collections are inventoried in two different databases, the first one accessible online for patrons and the other not.
The idea was therefore to find a way to offer a glimpse into the objet collection to a broad audience as well as to highlight the areas of convergence between the two collections and thus to enhance the patrimony held by our institution.
Juxtaposing the library classification and the list of categories used to describe the object collection, we have established a table of concordance. The table allowed us to find corresponding sets of items and to develop a prototype of a tool that allows presenting them conjointly: https://medicalhistorycollection.github.io/glam2017/.
Finally, we’ve seized the opportunity and uploaded photographs of about 100 objects on Wikimedia: https://commons.wikimedia.org/wiki/Category:Institut_universitaire_d%27histoire_de_la_m%C3%A9decine_et_de_la_sant%C3%A9_publique.
https://github.com/MedicalHistoryCollection/glam2017/tree/master/data
Christkatholische Landeskirche der Schweiz: historische Dokumente
Description
Der sog. “Kulturkampf” (1870-1886) (Auseinandersetzung zwischen dem modernen liberalen Rechtsstaat und der römisch-katholischen Kirche, die die verfassungsmässig garantierte Glaubens- und Gewissensfreiheit so nicht akzeptieren wollte) wurde in der Schweiz besonders heftig ausgefochten.
Ausgewählte Dokumente in den Archiven der christkatholischen Landeskirche bilden diese interessante Phase zwischen 1870 und 1886/1900 exemplarisch ab. Als lokale Fallstudie (eine Kirchgemeinde wechselt von der römisch-katholischen zur christkatholischen Konfession) werden in der Kollektion die Protokolle der Kirchgemeinde Aarau (1868-1890) gemeinfrei publiziert (CC-BY Lizenz). Dazu werden die digitalisierten Zeitschriften seit 1873 aus der Westschweiz publiziert. Die entsprechenden Dokumente wurden von den Archivträgern (Eigner) zur gemeinfreien Nutzung offiziell genehmigt und freigegeben. Allfällige Urheberrechte sind abgelaufen (70 Jahre) mit Ausnahme von wenigen kirchlichen Zeitschriften, die aber sowieso Öffentlichkeitscharakter haben.
Zielpublikum sind Historiker und Theologen sowie andere Interessierte aus Bildungsinstitutionen. Diese OpenData Kollektion soll andere christkatholische Gemeinden ermutigen weitere Quellen aus der Zeit des Kulturkampfes zu digitalisieren und zugänglich zu machen.
Overview
Bestände deutsche Schweiz :
• Kirchgemeinde Aarau
Fonds Suisse Romande:
• Journaux 1873-2016
• Canton de Neuchâtel
• Paroisse Catholique-Chrétienne de Genève: St.Germain (not yet published)
The data will be hosted on christkatholisch.ch; the publication date will be communicated. Prior to this the entry (national register) on opendata.swiss must be available and approved.
* Présence (2007-2016): http://www.catholique-chretien.ch/publica/archives.php
Match linked persons of the media database of the Swiss Social Archives with Wikidata.
The internet and the world wide web are often referred to as being disruptive. In fact, every new technology has a disruptive potential. 550 years ago the invention of the modern printing technology by Johannes Gutenberg in Germany (and, two decades later, by William Caxton in England) was massively disruptive. Books, carefully bound manuscripts written and copied by scribes during weeks, if not months, could suddenly be mass-produced at an incredible speed. As such the invention of moveable types, along with other basic book printing technologies, had a huge impact on science and society.
And yet, 15th century typographers were not only businessmen, they were artists as well. Early printing fonts reflect their artistic past rather than their industrial potential. The font design of 15th century types is quite obviously based on their handwritten predecessors. A new book, although produced by means of a new technology, was meant to be what books had been for centuries: precious documents, often decorated with magnificent illustrations. (Incunables – books printed before 1500 – often show a blank square in the upper left corner of a page so that illustrators could manually add artful initials after the printing process.)
Memory, also known as Match or Pairs, is a simple concentration game. Gutenberg Memory is an HTML 5 adaptation of the common tile-matching game. It can be played online and works on any device with a web browser. By concentrating on the tiles in order to remember their position the player is forced to focus on 15th (or early 16th) century typography and thus will discover the ageless elegance of the ancient letters.
Gutenberg Memory, Screenshot
Johannes Gutenberg: Biblia latina, part 2, fol. 36
Gutenberg Memory comes with 40 cards (hence 20 pairs) of syllables or letter combinations. The letters are taken from high resolution scans (>800 dpi) of late medieval book pages digitized by the university library of Basel. Given the original game with its 52 cards (26 pairs), Gutenberg Memory has been slightly simplified. Nevertheless it is rather hard to play as the player's visual orientation is constrained by the cards' typographic resemblance.
In addition, the background canvas shows a geographical map of Europe visualizing the place of printing. Basic bibliographic informations are given in the caption below, including a link to the original scan.
Click on the cards in order to turn them face up. If two of them are identical, they will remain open, otherwise they will turn face down again. A counter indicates the number of moves you have made so far. You win the game as soon as you have successfully found all the pairs. Clicking (or tapping) on the congratulations banner, the close button or the restart button in the upper right corner will reshuffle the game and proceed to a different font, the origin of which will be displayed underneath.
2017/09/15 v1.0: Prototype, basic game engine (5 fonts)
2017/09/16 v2.0: Background visualization (place of printing)
2017/09/19 v2.1: Minor fixes
Elias Kreyenbühl (left), Thomas Weibel at the «Génopode» building on the University of Lausanne campus.
This is a game about guessing and learning about geography through images and maps, made with Swisstopo's online maps of Switzerland. For several years this game was developed as open source, part of a series of GeoAdmin StoryMaps: you can try the original SwissGuesser game here, based on a dataset from the Swiss Federal Archives now hosted at Opendata.swiss.
The new version puts your orienteering of Swiss museums to the test.
Demo: OpenGuesser Demo
Encouraged by an excellent Wikidata workshop (slides) at #GLAMhack 2017, we are testing a new dataset of Swiss museums, their locations and photos, obtained via the Wikidata Linked Data endpoint (see app/data/*.sparql in the source). Visit this Wikidata Query for a preview of the first 10 results. This opens the possibility of connecting other sources, such as datasets tagged 'glam' on Opendata.swiss, and creating more custom games based on this engine.
We play-tested, revisited the data, then forked and started a refresh of the project. All libraries were updated, and we got rid of the old data loading mechanism, with the goal of connecting (later in real time) to open data sources. A bunch of improvement ideas are already proposed, and we would be glad to see more ideas and any contributions: please raise an Issue or Pull Request on GitHub if you're so inclined!
A small tool to get a quick feeling of an ontology on Wikidata.
(None, but hopefully this helps you do stuff with your data :) )