Das Schweizer Fachportal für die Geschichtswissenschaften

Swiss Open Cultural Data Hackathons

Visual Exploration of Vesalius' Fabrica

infoclio.ch Vorträge
Samstag, 2. Juli 2016

Screenshots of the prototype




Description

We are using cultural data of a rare copy of DE HUMANI CORPORIS FABRICA combined with a quantitive and qualitative analysis of visual content. This all is done without reading a single text page. That means that we are trying to create a content independent visual analysis method, which would enable a public to have an overview and quick insight.

Process

work in progress
manual sketches

Clickable Prototype

Some Gifs

Data

Team

Sprichort

infoclio.ch Vorträge
Samstag, 2. Juli 2016


sprichort is an application which lets users travel in time. The basis are a map and historical photographs like Spelterini's photographs from his voyages to Egypt, across the Alps or to Russia. To each historical photograph comes an image of what it looks like now. So people can see change. The photographs are complemented with stories from people about this places and literary descriptions of this places.

An important aspect is participation. Users should be able to upload their own hostorical photographs and they should be able to provide actual photos to historical photographs.

The User Interface of the application:





Web

Data

Team

SFA-Metadata (swiss federal state archives) at EEXCESS

infoclio.ch Vorträge
Samstag, 2. Juli 2016

The goal of our „hack“ is to reuse existing search and visualization tools for dedicated datasets provided by the swiss federal state archives.

The project EEXCESS (EU funded research project, www.eexcess.eu) has the vision to unfold the treasure of cultural, educational and scientific long-tail content for the benefit of all users. In this context the project realized different software components to connect databases, providing means for both automated (recommender engine) and active search queries and a bunch of visualization tools to access the results.

The federal swiss state archives hosts a huge variety of digitized objects. We aim at realizing a dedicated connection of that data with the EEXCESS infrastructure (using a google chrome extension) and thus find out, whether different ways of visualizations can support intuitive access to the data (e.g. Creation of virtual landscapes from socia-economic data, browse through historical photographs by using timelines or maps etc.).

Let's keep fingers crossed …:-)




Data

Team

  • Louis Gantner
  • Daniel Hess
  • Marco Majoleth
  • André Ourednik
  • Jörg Schlötterer

#GlamHackClip2016

infoclio.ch Vorträge
Samstag, 2. Juli 2016

Short clip edited to document the GLAMHack 2016, featuring short interviews with hackathon participants that we recorded on site and additional material from the Open Cultural Data Sets made available for the hackathon.

Data

Music - Public Domain Music Recordings and Metadata / Swiss Foundation Public Domain

- Amilcare Ponchielli (1834-1886), La Gioconda, Dance of the hours (part 2), recorded in Manchester, 29. Juli 1941

- Joseph Haydn, Trumpet Concerto in E Flat, recorded on the 19. Juni 1946

Aerial Photographs by Eduard Spelterini / Swiss National Library

- Eduard Spelterini, Basel between 1893 and 1923. See picture.

- Eduard Spelterini, Basel between 1893 and 1902. See picture.

Bilder 1945-1973 / Dodis

- 1945: Ankunft jüdischer Flüchtlinge

- 1945: Flüchtlinge an der Grenze

Team

  • Jan Baumann (infoclio.ch)
  • Enrico Natale (infoclio.ch)

Collaborative Face Recognition and Picture Annotation for Archives

infoclio.ch Vorträge
Samstag, 16. September 2017

Le projet

Les Archives photographiques de la Société des Nations (SDN) — ancêtre de l'actuelle Organisation des Nations Unies (ONU) — consistent en une collection de plusieurs milliers de photographies des assemblées, délégations, commissions, secrétariats ainsi qu'une série de portraits de diplomates. Si ces photographies, numérisées en 2001, ont été rendues accessible sur le web par l'Université de l'Indiana, celles-ci sont dépourvues de métadonnées et sont difficilement exploitables pour la recherche.

Notre projet est de valoriser cette collection en créant une infrastructure qui soit capable de détecter les visages des individus présents sur les photographies, de les classer par similarité et d'offrir une interface qui permette aux historiens de valider leur identification et de leur ajouter des métadonnées.

Le projet s'est déroulé sur deux sessions de travail, en mai (Geneva Open Libraries) et en septembre 2017 (3rd Swiss Open Cultural Data Hackathon), séparées ci-dessous.


Session 2 (sept. 2017)

L'équipe

Université de Lausanne United Nations Archives EPITECH Lyon Université de Genève
Martin Grandjean Blandine Blukacz-Louisfert Gregoire Lodi Samuel Freitas
Colin Wells Louis Schneider
Adrien Bayles
Sifdine Haddou

Compte-Rendu

Dans le cadre de la troisième édition du Swiss Open Cultural Data Hackathon, l’équipe qui s’était constituée lors du pre-event de Genève s’est retrouvée à l’Université de Lausanne les 15 et 16 septembre 2017 dans le but de réactualiser le projet et poursuivre son développement.

Vendredi 15 septembre 2017

Les discussions de la matinée se sont concentrées sur les stratégies de conception d’un système permettant de relier les images aux métadonnées, et de la pertinence des informations retenues et visibles directement depuis la plateforme. La question des droits reposant sur les photographies de la Société des Nations n’étant pas clairement résolue, il a été décidé de concevoir une plateforme pouvant servir plus largement à d’autres banques d’images de nature similaire.


Samedi 16 septembre 2017

Découverte : Wikimedia Commons dispose de son propre outil d'annotation : ImageAnnotator. Voir exemple ci-contre.


Code

Data

Session 1 (mai 2017)

L'équipe

Université de Lausanne United Nations Archives EPITECH Lyon Archives d'Etat de Genève
Martin Grandjean martin.grandjean@unil.ch Blandine Blukacz-Louisfert bblukacz-louisfert@unog.ch Adam Krim adam.krim@epitech.eu Anouk Dunant Gonzenbach anouk.dunant-gonzenbach@etat.ge.ch
Colin Wells cwells@unog.ch Louis Schneider louis.schneider@epitech.eu
Maria Jose Lloret mjlloret@unog.ch Adrien Bayles adrien.bayles@epitech.eu
Paul Varé paul.vare@epitech.eu

Ce projet fait partie du Geneva Open Libraries Hackathon.


Compte-Rendu

Vendredi 12 mai 2017

Lancement du hackathon Geneva Open Libraries à la Bibliothèque de l'ONU (présentation du week-end, pitch d'idées de projets, …)

Premières idées de projets:

- Site avec tags collaboratifs pour permettre l'identification des personnes sur des photos d'archives.

- Identification des personnages sur des photos d'archives de manière automatisée.

→ Identifier automatiquement toutes les photos où se situe la même personne et permettre l'édition manuelle de tags qui s'appliqueront sur toutes les photos du personnage (plus besoin d'identifier photo par photo les personnages photographiés).



Samedi 13 mai 2017

Idéalisation du projet: que peut-on faire de plus pour que le projet ne soit pas qu'un simple plugin d'identification ? Que peut-on apporter de novateur dans la recherche collaborative ? Que peut-on faire de plus que Wikipédia ?
Travailler sur la photo, la manière dont les données sont montrées à l'utilisateur, etc…

Problématique de notre projet: permettre une collaboration sur l'identification de photos d'archives avec une partie automatisée et une partie communautaire et manuelle.


Analyser les photos → Identifier les personnages → Afficher la photo sur un site avec tous les personnages marqués ainsi que tous les liens et notes en rapports.

Utilisateur → Création de tags sur la photographie (objets, scènes, liens historiques, etc..) → Approbation de la communauté de l'exactitude des tags proposés.


Travail en cours sur le P.O.C.:

- Front du site: partie graphique du site, survol des éléments…

- Prototype de reconnaissance faciale: quelques défauts à corriger, exportation des visages…

- Poster du projet


Dimanche 14 mai 2017

Le projet ayant été sélectionné pour représenter le hackathon Geneva Open Libraries lors de la cérémonie de clôture de l'Open Geneva Hackathons (un projet pour chacun des hackathons se tenant à Genève ce week-end), il est présenté sur la scène du Campus Biotech.


Data

Poster Genève - Mai 2017

Version PDF grande taille disponible ici. Version PNG pour le web ci-dessous.

Poster Lausanne - Sepembre 2017

Version PNG

Code

Jung - Rilke Correspondance Network

infoclio.ch Vorträge
Samstag, 16. September 2017

Joint project bringing together three separate projects: Rilke correspondance, Jung correspondance and ETH Library.

Objectives:

  • agree on a common metadata structure for correspondence datasets
  • clean and enrich the existing datasets
  • build a database that can can be used not just by these two projects but others as well, and that works well with visualisation software in order to see correspondance networks
  • experiment with existing visualization tools

Data

ACTUAL INPUT DATA

Comment: The Rilke data is cleaner than the Jung data. Some cleaning needed to make them match:
1) separate sender and receiver; clean up and cluster (OpenRefine)
2) clean up dates and put in a format that IT developpers need (Perl)
3) clean up placenames and match to geolocators (Dariah-DE)
4) match senders and receivers to Wikidata where possible (Openrefine, problem with volume)

METADATA STRUCTURE

The follwing fields were included in the common basic data structure:

sysID; callNo; titel; sender; senderID; recipient; recipientID; place; placeLat; placeLong; datefrom, dateto; language

DATA CLEANSING AND ENRICHMENT

* Description of steps, and issues, in Process (please correct and refine).

Issues with the Jung correspondence is data structure. Sender and recipient in one column.
Also dates need both cleaning for consistency (e.g. removal of “ca.”) and transformation to meet developper specs. (Basil using Perl scripts)

For geocoding the placenames: OpenRefine was used for the normalization of the placenames and DARIAH GeoBrowser for the actual geocoding (there were some issues with handling large files). Tests with OpenRefine in combination with Open Street View were done as well.

The C.G. Jung dataset contains sending locations information for 16,619 out of 32,127 letters; 10,271 places were georeferenced. In the Rilke dataset all the sending location were georeferenced.

For matching senders and recipients to Wikidata Q-codes, OpenRefine was used. Issues encountered with large files and with recovering Q codes after successful matching, as well as need of scholarly expertise to ID people without clear identification. Specialist knowledge needed. Wikidata Q codes that Openrefine linked to seem to have disappeared? Instructions on how to add the Q codes are here https://github.com/OpenRefine/OpenRefine/wiki/reconciliation.

Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files.

DATA after cleaning:

https://github.com/basimar/hackathon17_jungrilke

DATABASE

Issues with the target database:
Fields defined, SQL databases and visuablisation program being evaluated.
How - and whether - to integrate with WIkidata still not clear.

Issues: letters are too detailed to be imported as Wikidata items, although it looks like the senders and recipients have the notability and networks to make it worthwhile. Trying to keep options open.

As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted.
They took the cleaned CVS files, converted to SQL, then JSON.

Additional issues encountered:

- Visualization: three tools are being tested: 1) Paladio (Stanford) concerns about limits on large files? 2) Viseyes and 3) Gephi.

- Ensuring that the files from different projects respect same structure in final, cleaned-up versions.

Visualization (examples)

Heatmap of Rainer Maria Rilke’s correspondence (visualized with Google Fusion Tables)

Correspondence from and to C. G. Jung visualized as a network. The two large nodes are Carl Gustav Jung (below) and his secretary’s office (above). Visualized with the tool Gephi

Team

  • Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine.
  • Lena Heizman (Dodis / histHub): Mentoring with OpenRefine.
  • Hugo Martin
  • Samantha Weiss
  • Michael Gasser (Archives, ETH Library): provider of the dataset C. G. Jung correspondence
  • Irina Schubert
  • Sylvie Béguelin
  • Basil Marti
  • Jérome Zbinden
  • Deborah Kyburz
  • Paul Varé
  • Laurel Zuckerman
  • Christiane Sibille (Dodis / histHub)
  • Adrien Zemma
  • Dominik Sievi wdparis2017

Schauspielhaus Zürich performances in Wikidata

infoclio.ch Vorträge
Samstag, 16. September 2017

The goal of the project is to try to ingest all performances of the Schauspielhaus Theater in Zurich held between 1938 and 1968 in Wikidata. In a further step, data from the www.performing-arts.eu Platform, Swiss Theatre Collection and other data providers could be ingested as well.

  1. load data in OpenRefine
  2. Column after column (starting with the easier ones) :
    1. reconcile against wikidata
    2. manually match entries that matched multiple entries in wikidata
    3. find out what items are missing in wikidata
    4. load them in wikidata using quick statements (quick statements 2 allow you to retrieve the Q numbers of the newly created items)
    5. reconcile again in OpenRefine

Raw Data

Reconcile in OpenRefine

Choose corresponding type

For Work, you can use the author as an additional property

Manually match multiple matches

Import in Wikidata with quick statements

Step 1

  • Len : english label
  • P31 : instance of
  • Q5 : human
  • P106 : occupation
  • Q1323191 : costume designer

Step 2

Step 3 (you can get the Q number from there)

  • Renate Albrecher
  • Julia Beck
  • Flor Méchain
  • Beat Estermann
  • Birk Weiberg
  • Lionel Walter

Swiss Video Game Directory

infoclio.ch Vorträge
Samstag, 16. September 2017

This projects aims at providing a directory of Swiss Video Games and metadata about them.

The directory is a platform to display and promote Swiss Games for publishers, journalists, politicians or potential buyers, as well as a database aimed at the game developers community and scientific researchers.

Our work is the continuation of a project initiated at the 1st Open Data Hackathon in 2015 in Bern by David Stark.

  1. An open spreadsheet contains the data with around 300 entries describing the games.
  2. Every once in a while, data are exported into the Directory website (not publicly available yet).
  3. At any moment, game devs or volunteer editors can edit the spreadsheet and add games or correct informations.

The list was created on Aug. 11 2014 by David Javet, game designer and PhD student at UNIL. He then opened it to the Swiss game dev community which collaboratively fed it. At the start of this hackathon, the list was composed of 241 games, starting from 2004. It was turned into an open data set on the opendata.swiss portal by Oleg Lavrovsky.

Big Data Analytics (bibliographical data)

infoclio.ch Vorträge
Samstag, 16. September 2017

We try to analyse bibliographical data using big data technology (flink, elasticsearch, metafacture).

Here a first sketch of what we're aiming at:

We use bibliographical metadata:

Swissbib bibliographical data https://www.swissbib.ch/

  • Catalog of all the Swiss University Libraries, the Swiss National Library, etc.
  • 960 Libraries / 23 repositories (Bibliotheksverbunde)
  • ca. 30 Mio records
  • MARC21 XML Format
  • → raw data stored in Mongo DB
  • → transformed and clustered data stored in CBS (central library system)

edoc http://edoc.unibas.ch/

  • Institutional Repository der Universität Basel (Dokumentenserver, Open Access Publications)
  • ca. 50'000 records
  • JSON File

crossref https://www.crossref.org/

  • Digital Object Identifier (DOI) Registration Agency
  • ca. 90 Mio records (we only use 30 Mio)
  • JSON scraped from API

Swissbib

Librarian:

- For prioritizing which of our holdings should be digitized most urgently, I want to know which of our holdings are nowhere else to be found.

- We would like to have a list of all the DVDs in swissbib.

- What is special about the holdings of some library/institution? Profile?

Data analyst:

- I want to get to know better my data. And be faster.

→ e.g. I want to know which records don‘t have any entry for ‚year of publication‘. I want to analyze, if these records should be sent through the merging process of CBS. Therefore I also want to know, if these records contain other ‚relevant‘ fields, defined by CBS (e.g. ISBN, etc.). To analyze the results, a visualization tool might be useful.

edoc

Goal: Enrichment. I want to add missing identifiers (e.g. DOIs, ORCID, funder IDs) to the edoc dataset.

→ Match the two datasets by author and title

→ Quality of the matches? (score)

elasticsearch https://www.elastic.co/de/

JAVA based search engine, results exported in JSON

Flink https://flink.apache.org/

open-source stream processing framework

Metafacture https://culturegraph.github.io/,
https://github.com/dataramblers/hackathon17/wiki#metafacture

Tool suite for metadata-processing and transformation

Zeppelin https://zeppelin.apache.org/

Visualisation of the results

Usecase 1: Swissbib

Usecase 2: edoc

  • Dominique Blaser
  • Jean-Baptiste Genicot
  • Günter Hipler
  • Jacqueline Martinelli
  • Rémy Meja
  • Andrea Notroff
  • Sebastian Schüpbach
  • T
  • Silvia Witzig

Medical History Collection

infoclio.ch Vorträge
Samstag, 16. September 2017

raw.githubusercontent.com_medicalhistorycollection_glam2017_master_front.jpg

Finding connections and pathways between book and object collections of the University Institute for History of Medecine and Public Heatlh (Institute of Humanities in Medicine since 2018) of the CHUV.

The project started off with data sets concerning two collections: book collection and object collection, both held by the University Institute for History of Medicine and Public Health in Lausanne. They were metadata of the book collection, and metadata plus photographs of object collection. These collections are inventoried in two different databases, the first one accessible online for patrons and the other not.

The idea was therefore to find a way to offer a glimpse into the objet collection to a broad audience as well as to highlight the areas of convergence between the two collections and thus to enhance the patrimony held by our institution.

Juxtaposing the library classification and the list of categories used to describe the object collection, we have established a table of concordance. The table allowed us to find corresponding sets of items and to develop a prototype of a tool that allows presenting them conjointly: https://medicalhistorycollection.github.io/glam2017/.

Finally, we’ve seized the opportunity and uploaded photographs of about 100 objects on Wikimedia: https://commons.wikimedia.org/wiki/Category:Institut_universitaire_d%27histoire_de_la_m%C3%A9decine_et_de_la_sant%C3%A9_publique.

Data

https://github.com/MedicalHistoryCollection/glam2017/tree/master/data

Team

  • Magdalena Czartoryjska Meier
  • Rae Knowler
  • Arturo Sanchez
  • Roxane Fuschetto
  • Radu Suciu

Old-catholic Church Switzerland Historical Collection

infoclio.ch Vorträge
Samstag, 16. September 2017

Christkatholische Landeskirche der Schweiz: historische Dokumente

Description

Der sog. “Kulturkampf” (1870-1886) (Auseinandersetzung zwischen dem modernen liberalen Rechtsstaat und der römisch-katholischen Kirche, die die verfassungsmässig garantierte Glaubens- und Gewissensfreiheit so nicht akzeptieren wollte) wurde in der Schweiz besonders heftig ausgefochten.
Ausgewählte Dokumente in den Archiven der christkatholischen Landeskirche bilden diese interessante Phase zwischen 1870 und 1886/1900 exemplarisch ab. Als lokale Fallstudie (eine Kirchgemeinde wechselt von der römisch-katholischen zur christkatholischen Konfession) werden in der Kollektion die Protokolle der Kirchgemeinde Aarau (1868-1890) gemeinfrei publiziert (CC-BY Lizenz). Dazu werden die digitalisierten Zeitschriften seit 1873 aus der Westschweiz publiziert. Die entsprechenden Dokumente wurden von den Archivträgern (Eigner) zur gemeinfreien Nutzung offiziell genehmigt und freigegeben. Allfällige Urheberrechte sind abgelaufen (70 Jahre) mit Ausnahme von wenigen kirchlichen Zeitschriften, die aber sowieso Öffentlichkeitscharakter haben.
Zielpublikum sind Historiker und Theologen sowie andere Interessierte aus Bildungsinstitutionen. Diese OpenData Kollektion soll andere christkatholische Gemeinden ermutigen weitere Quellen aus der Zeit des Kulturkampfes zu digitalisieren und zugänglich zu machen.

Overview

Bestände deutsche Schweiz :

• Kirchgemeinde Aarau

  1. Protokolle Kirchgemeinderat 1868-1890
  2. Monographie (1900) : Xaver Fischer : Abriss der Geschichte der katholischen (christkatholischen) Kirchgemeinde Aarau 1806-1895

Fonds Suisse Romande:

• Journaux 1873-2016

  1. Le Vieux-Catholique 1873
  2. Le Catholique-Suisse 1873-1875
  3. Le Catholique National 1876-1878
  4. Le Libéral 1878-1879
  5. La Fraternité 1883-1884
  6. Le Catholique National 1891-1908
  7. Le Sillon de Genève 1909-1910
  8. Le Sillon 1911-1970
  9. Présence 1971-2016

• Canton de Neuchâtel

  1. Le Buis 1932-2016

• Paroisse Catholique-Chrétienne de Genève: St.Germain (not yet published)

  1. Répertoire des archives (1874-1960)
  2. Conseil Supérieur - Arrêtés - 16 mai 1874 au 3 septembre 1875
  3. Conseil Supérieur Président - Correspondence - 2 janv 1875 - 9 sept 1876

The data will be hosted on christkatholisch.ch; the publication date will be communicated. Prior to this the entry (national register) on opendata.swiss must be available and approved.

Data

Team

Swiss Social Archives - Wikidata entity match

infoclio.ch Vorträge
Samstag, 16. September 2017

Match linked persons of the media database of the Swiss Social Archives with Wikidata.

Data

  • Metadata of the media database of the Swiss Social Archives

Team

Hacking Gutenberg: A Moveable Type Game

infoclio.ch Vorträge
Samstag, 16. September 2017

The internet and the world wide web are often referred to as being disruptive. In fact, every new technology has a disruptive potential. 550 years ago the invention of the modern printing technology by Johannes Gutenberg in Germany (and, two decades later, by William Caxton in England) was massively disruptive. Books, carefully bound manuscripts written and copied by scribes during weeks, if not months, could suddenly be mass-produced at an incredible speed. As such the invention of moveable types, along with other basic book printing technologies, had a huge impact on science and society.

And yet, 15th century typographers were not only businessmen, they were artists as well. Early printing fonts reflect their artistic past rather than their industrial potential. The font design of 15th century types is quite obviously based on their handwritten predecessors. A new book, although produced by means of a new technology, was meant to be what books had been for centuries: precious documents, often decorated with magnificent illustrations. (Incunables – books printed before 1500 – often show a blank square in the upper left corner of a page so that illustrators could manually add artful initials after the printing process.)

Memory, also known as Match or Pairs, is a simple concentration game. Gutenberg Memory is an HTML 5 adaptation of the common tile-matching game. It can be played online and works on any device with a web browser. By concentrating on the tiles in order to remember their position the player is forced to focus on 15th (or early 16th) century typography and thus will discover the ageless elegance of the ancient letters.

Gutenberg Memory, Screenshot

Johannes Gutenberg: Biblia latina, part 2, fol. 36

Gutenberg Memory comes with 40 cards (hence 20 pairs) of syllables or letter combinations. The letters are taken from high resolution scans (>800 dpi) of late medieval book pages digitized by the university library of Basel. Given the original game with its 52 cards (26 pairs), Gutenberg Memory has been slightly simplified. Nevertheless it is rather hard to play as the player's visual orientation is constrained by the cards' typographic resemblance.

In addition, the background canvas shows a geographical map of Europe visualizing the place of printing. Basic bibliographic informations are given in the caption below, including a link to the original scan.

Click on the cards in order to turn them face up. If two of them are identical, they will remain open, otherwise they will turn face down again. A counter indicates the number of moves you have made so far. You win the game as soon as you have successfully found all the pairs. Clicking (or tapping) on the congratulations banner, the close button or the restart button in the upper right corner will reshuffle the game and proceed to a different font, the origin of which will be displayed underneath.

2017/09/15 v1.0: Prototype, basic game engine (5 fonts)

2017/09/16 v2.0: Background visualization (place of printing)

2017/09/19 v2.1: Minor fixes

Elias Kreyenbühl (left), Thomas Weibel at the «Génopode» building on the University of Lausanne campus.

OpenGuesser

infoclio.ch Vorträge
Samstag, 16. September 2017

This is a game about guessing and learning about geography through images and maps, made with Swisstopo's online maps of Switzerland. For several years this game was developed as open source, part of a series of GeoAdmin StoryMaps: you can try the original SwissGuesser game here, based on a dataset from the Swiss Federal Archives now hosted at Opendata.swiss.

The new version puts your orienteering of Swiss museums to the test.

Demo: OpenGuesser Demo

Encouraged by an excellent Wikidata workshop (slides) at #GLAMhack 2017, we are testing a new dataset of Swiss museums, their locations and photos, obtained via the Wikidata Linked Data endpoint (see app/data/*.sparql in the source). Visit this Wikidata Query for a preview of the first 10 results. This opens the possibility of connecting other sources, such as datasets tagged 'glam' on Opendata.swiss, and creating more custom games based on this engine.

We play-tested, revisited the data, then forked and started a refresh of the project. All libraries were updated, and we got rid of the old data loading mechanism, with the goal of connecting (later in real time) to open data sources. A bunch of improvement ideas are already proposed, and we would be glad to see more ideas and any contributions: please raise an Issue or Pull Request on GitHub if you're so inclined!

Latest commit to the master branch on 9-17-2017

Download as zip

Wikidata Ontology Explorer

infoclio.ch Vorträge
Samstag, 16. September 2017

A small tool to get a quick feeling of an ontology on Wikidata.

Data

(None, but hopefully this helps you do stuff with your data :) )

Team

Dario Donati (Swiss National Museum) & Beat Estermann (Opendata.ch)

infoclio.ch Vorträge
Sonntag, 28. Oktober 2018

New Frontiers in Graph Queries

infoclio.ch Vorträge
Sonntag, 28. Oktober 2018

We begin with the observation that a typical SPARQL endpoint is not friendly in the eyes of an average user. Typical users of cultural databases include researchers in the humanities, museum professionals and the general public. Few of these people have any coding experience and few would feel comfortable translating their questions into a SPARQL query.

Moreover, the majority of the users expect searches of online collections to take something like the form of a regular Google search (names, a few words, or at the top end Boolean operators). This approach to search does not make use of the full potential of the graph-type databases that typically make SPARQL endpoints available. It simply does not occur to an average user to ask the database a query of the type “show me all book authors whose children or grandchildren were artists.”

The extensive possibilities that are offered by graph databases to researchers in the humanities go unexplored because of a lack of awareness of their capabilities and a shortage of information about how to exploit them. Even those academics who understand the potential of these resources and have some experience in using them, it is often difficult to get an overview of the semantics of complex datasets.

We therefore set out to develop a tool that:

  • simplifies the entry point of a SPARQL query into a form that is accessible to any user
  • opens ways to increase the awareness of users about the possibilities for querying graph databases
  • moves away from purely text-based searches to interfaces that are more visual
  • gives an overview to a user of what kinds of nodes and relations are available in a database
  • makes it possible to explore the data in a graphical way
  • makes it possible to formulate fundamentally new questions
  • makes it possible to work with the data in new ways
  • can eventually be applied to any SPARQL endpoint

https://github.com/sparqlfish/sparqlfish

Data

Wikidata

Team

Sex and Crime und Kneippenschlägereien in Early Modern Zurich

infoclio.ch Vorträge
Sonntag, 28. Oktober 2018

Minutes reported by pastor in Early Modern Zurich.

4.jpg

Make the “Stillstandsprotokolle” searchable, georeferenced and browsable and display them on a map.

For more Info see our Github Repository

Access the documents: archives-quickaccess.ch/search/stazh/stpzh

Data

Team

Wikidata-based multilingual library search

infoclio.ch Vorträge
Sonntag, 28. Oktober 2018

In Switzerland each linguistic region is working with different authority files for authors and organizations, situation which brings difficulties for the end user when he is doing a search.

Goal of the Hackathon: work on a innovative solution as the library landscape search platforms will change in next years.
Possible solution: Multilingual Entity File which links to GND, BnF, ICCU Authority files and Wikidata to bring end user information about authors in the language he wants.

Steps:

  1. analyse coverage by wikidata of the RERO authority file (20-30%)
  2. testing approach to load some RERO authorities in wikidata (learn process)
  3. create an intermediate process using GND ID to get description information and wikidata ID
  4. from wikidata get the others identifiers (BnF, RERO,etc)
  5. analyse which element from the GND are in wikidata, same for BnF, VIAF and ICCU
  6. create a multilingual search prototype (based on Swissbib model)

Data

Number of ID in VIAF

  • BNF: 4847978
  • GND: 8922043
  • RERO: 255779

Number of ID in Wikidata from

  • BNF: 432273
  • GND: 693381
  • RERO: 2145
  • ICCU: 30047
  • VIAF: 1319031 (many duplicates: a WD-entity can have more than one VIAF ID)

Query model:

wikidata item with rero id

#All items with a property
# Sample to query all values of a property
# Property talk pages on Wikidata include basic queries adapted to each
property
SELECT
   ?item ?itemLabel
   ?value ?valueLabel
# valueLabel is only useful for properties with item-datatype
WHERE
{
   ?item wdt:P3065 ?value
   # change P1800 to another property
   SERVICE wikibase:label { bd:serviceParam wikibase:language
"[AUTO_LANGUAGE],en". }
}
# remove or change limit for more results
LIMIT 10000

Email from the GND :

There is currently no process that guarantees 100% coverage of GND entities in wikibase. The existing links between wikibase and GND entries come mostly from manually edited Wikipedia entries.

User Interface

There are several different target users: the librarians who currently use all kinds of different systems and the end user, who wants to search for information or to locate a book in a nearby library.

Librarian: The question of process is the key challenge concerning the librarian user. At present some Swiss librarians create authority records and some don't. New rules and processes for creating authority files in GND , BNF, etc will change their work methods. The process of creating local Swiss authority files will be entirely revamped. Fragmented Swiss regional authority files will disappear, and be replaced by either the German, French, Italian, American etc national authority files or by direct creation in Wikidata by the local librarian. (Wikidata will serve as central repository for all autority IDs).

End User
The model for the multilingual user interface is SwissBib, the “catalog of Swiss univerity libraries, the Swiss national library, several cantonal libraries and other institutions”. The objective is to keep the look and functionalities of the existing website, which includes multilingual display of labels in English, French, German and Italian.

What changes is the source of information about the author which will in the future be taken from the BNF for French, the GNB for German, and LCCN for English. (In the proof of concept pilot, only the author name will be concerned.)

The list of books and libraries will continue to function as before, with no changes.

In the full project, several pages must be modified:

* The search page (example with Joel Dicker): https://www.swissbib.ch/Search/Results?lookfor=joel+dicker&type=AllFields

* The Advanced search page https://www.swissbib.ch/Search/Advanced

* The Record Page: https://www.swissbib.ch/Record/48096257X

The Proof on Concept project will focus exclusively on the basic search page.

Open issues
The issue of key words remains open (at present they are from DNB, which works for German and English, but does not work for French)

The question of an author photo and a bio is open. At present very few authors have a short bio paragraph associated with their names. Should each author have a photo and bio? If so, where to put it on the page?

Other design question: Should the selection of the language of the book be moved up on the page?

Prototype

(Translations from Wikidata into French)

1. Schweizerisches Landesmuseum

http://feature.swissbib.ch/Record/110393589

2. Wikimedia Foundation

http://feature.swissbib.ch/Record/070092974

3. Chocoladefabriken Lindt & Sprüngli AG

http://feature.swissbib.ch/Record/279360789

ATTENTION: Multiple authors

4. Verband schweizerischer Antiquare und Kunsthändler

ATTENTION: NO French label in Wikidata

http://feature.swissbib.ch/Record/107734591

Methods

Way of getting BNF records via SRU from Wikidata

http://catalogue.bnf.fr/api/SRU?version=1.2&operation=searchRetrieve&query=aut.ark%20all%20%22ark:/12148/cb118806093%22&recordSchema=unimarcxchange&maximumRecords=20&startRecord=1

Instruction: add prefix “ark:/12148/cb” to BNF ID in order to obtain the ARK ID

Lookup Qcode from GND ID

SELECT DISTINCT ?item ?itemLabel  WHERE {
  ?item wdt:P227 "1027690041".
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Integration of RERO person/organisation data into Wikidata

Methodology

4 cases

1. RERO authorities are in Wikidata with RERO ID

  • 2145 items

2. RERO authorities are in Wikidata without RERO ID but with VIAF ID

  • 1316347 items (without deduplication)
  • add only ID possibles (PetScan)

3. RERO authorities are in Wikidata without RERO or VIAF ID

  • reconcialiation with OpenRefine

4. RERO authorities are not in Wikidata

  • Quickstatements or mass import

Demo / Code / Final presentation

Team

  • Elena Gretillat
  • Nicolas Prongué
  • Lionel Walter
  • Laurel Zuckerman
  • Jacqueline Martinelli

Zurich Historical Photo Tours

infoclio.ch Vorträge
Sonntag, 28. Oktober 2018

We would like to enable our users to discover historical pictures of Zürich and go to the places where they were taken. They can take the perspective of the photographer from around 100 years ago and see how the places have changed. They can also share their photographs with the community.
We have planned two thematic tours, one with historical photographs of Adolphe Braun and one with photographs connected to the subject of silk fabrication. The tour is enhanced with some historical information.
In the collections of the ETH, Baugeschichtliches Archiv, and Zentralbibliothek Zurich, Graphische Sammlung, we found pictures to match the topics above and to set up a nice tour for the users.
In a second step we went to the actual spots to verify if the pictures could be taken and to find out the exact geodata.
Meanwhile, our programmers inserted the photographer's stops on a map. As soon as the users reach the proximity of the spot, their phone will start vibrating. At this point, the historical photo will show up and the task is to search for the right angle from where the historical photograph has been taken. At this point, the users are asked to take their own picture. The app allows the users to overlay the historical with the current picture so a comparison can be made. The user is provided by additional information like the name of the photographer of the historical picture, links to the collection the picture comes from, the building itself, connection to the silk industry etc.
Here is the link to our prototype: https://glamhistorytour.github.io/HistoryTourApp/

Data

Team

  • Maya Beer
  • Rafael Arizcorreta
  • Tina Tomovic
  • Annabelle Wiegart
  • Lothar Schmitt
  • Thomas Bochet
  • Marina Petrova
  • Kenny Floria