histograph

Graph-based exploration

Discover related documents

HistoGraph treats multimedia collections as networks. The underlying assumption is simple: if two people are mentioned together in a document, we assume that they may have something to do with each other. Whether or not such a relationship is interesting is in the eye of the beholder. Co-occurrence networks become huge and unwieldy very quickly, which forces us to filter them based on another simple assumption: the more often entities co-occur, the more likely it is that they have a meaningful relationship with each other. We combine these two assumptions with mathematical models (co-occurrence frequencies weighted by tf-idf specificity and Jaccard distances) which allow us to rank the list of co-occurrences. This tells us who appears with whom and in which documents.

Filters by entity, date range and document type

For those who already have an idea of what they are looking for, filters help to narrow down the number of documents by document type and entity. Here we looked for photos from a given date range in which Pierre Werner appears.

Ego networks provide a bird’s-eye-perspective on these relationships. They reveal the structure of the co-occurrence network, namely the relationships between all those who appear together with a given person. Clicking on an edge generates a list of documents in which both people appear. On this basis the user can decide whether or not this relationship is indeed of interest.

A timeline provides an additional filter on documents and shows how networks change over time.

Reveal relations between people

What connects a group of people? HistoGraph reveals a list and an interactive graph which retrieves all documents featuring a group of people and reveals co-occurrences among them.

Keep track of relevant documents

Users can keep track of documents they find useful by adding them to their favourites. HistoGraph displays this list of favourite documents both as a list and as a graph which reveals the relationships among them.

Crowd-sourced indexation

User input

Three different systems are in place to collect user input: Questions on the overall validity of an entity (“Is this a person?”), questions on the validity of an entity annotation in an object (“Is this person mentioned here?”) and personalised notifications based and previous actions of a user (“User x added a person to a document you worked on. Can you confirm this annotation?”).

Confirmation and Annotation

All annotations can be in one of three stages: not validated, validated or disputed. In addition, users are encouraged to fix mistakes themselves by annotating new entities and by flagging wrong entity types, fragments, duplicates or erroneous annotations. To avoid accidental annotations and reduce the risk of vandalism, HistoGraph treats every annotation as a suggestion pending confirmation by other users.

Generic and expert crowds

We operate with two types of crowd task: tasks targeted at a generic crowd, which means that anyone is able to provide input, and harder, more challenging tasks, which target expert users. Users qualify for these expert tasks on the basis of their previous actions. For example, a user who annotates many documents associated with Pierre Werner will be asked to validate related annotations by others and to identify unknown entities in related documents.

Named Entity Recognition & Disambiguation

HistoGraph combines tools like YAGO-AIDA for the automatic detection and disambiguation of named entities - people, places, institutions and dates - with crowd-based annotations. Thanks to the enrichment with DBPedia and VIAF links, histoGraph can handle multilanguage documents flawlessly. By default, every automatically detected entity is pending validation by a human user.

Automatic entity detection works very well overall but will always remain imperfect in places. To address this, HistoGraph depends on human validation and error correction.

For institutions

HistoGraph is available open source under MIT licence. The application is designed to serve two purposes: To facilitate the non-hierarchical exploration of multimedia collections based on existing metadata and automatic entity detection and the crowd-based indexation of such collections. HistoGraph can handle any digitized text and image documents. To learn more about custom developments for your institution, contact us at histoGraph (at) cvce (dot) eu.

For developers

HistoGraph is designed as a lightweight browser-based application built on top of a Neo4j graph database and uses Node, Angular and SigmaJS frameworks.

The code is available on our github under a MIT license.

A github wiki is available yet not complete.

Looking forward for your feedback, issues and comments!

About

HistoGraph is developed by the CVCE Digital Humanities Lab by Lars Wieneke, Daniele Guido, and Marten Düring. HistoGraph builds on a demonstrator application developed for the EC funded collaborative project CUbRIK (Grant agreement number 287704, 2011-14). Further information about the project and the demonstrator can be found on the website of our CUbRIK partner EIPCM and the demonstrator itself is accessible on the site of our colleagues from ENGINEERING. To learn more about HistoGraph or to explore demo applications, contact us at histograph [ät] cvce.eu.

Terms of use

Copyright

The site, together with the information and documents that it contains (texts, images, photographs, etc.), is protected by intellectual property laws and by copyright legislation in all countries. You undertake to respect these intellectual property regulations and the copyright usage restrictions provided with each document without prejudice to the rights and exceptions envisaged by any binding provisions of applicable law. All rights of reproduction, public communication, adaptation, distribution or rebroadcasting via the Internet, intranet or any other means are strictly reserved in all countries. The documents available on this website are the exclusive property of the CVCE and/or of their authors or right holders. You undertake not to remove or alter the copyright notice indicating the author or the source of a document, nor to circumvent the digital rights management protection of the documents, such as restrictions on printing or downloading and visible or invisible watermarking. Any infringement may give rise to civil or criminal proceedings.