Two years back, Dr. Alida Metcalf, in the Rice history department, came looking for some help in integrating a few of her research collections: a large Zotero library, an Airtable that linked various sources and entities, her image collection, and the image collection of one of her graduate students.

This set off a series of experiments in representing her rich, humanities data. In the first iteration, our OIT interns built a wonderful Flask + Bootstrap application backed by a SQLite database and Rice’s large-scale networked storage device, which allowed us to replicate the graduate student’s collection, OCR his 40,000 pages on the supercomputers, and make it both text searchable and browsable. You can see the Research computing paper on facilitation that came out of that project here: “Digital Humanities Application Development in the Cloud

We quickly outgrew our Flask app, however. It became clear that generating high-quality metadata and flexibly adding new classes of entities to our collection was going to break the bank.

Alida and I realized that what we were making was essentially a personal research archive, and we decided to use Omeka-S as our back end for the service. This would allow us to create a diversity of entities, capturing everything from documents, to notes on those documents, to images, to people and places. We developed custom ontologies to facilitate this process, and enlisted some of Rice’s brilliant undergraduates to help us with designing a front-end interface in React.

After a year of development, and with the assistance of small grants from Rice’s HRC and the Ken Kennedy Institute, we were able to hire an outside developer (Domingos Dellamonica) to bring the project to the next level. We now have the prototype of a system for managing collections of diverse entities and linking semantically between them, within controlled vocabularies. This allows us to do things like:

Present the data in flexible, searchable, and editable tables according to the entity class’s custom schema:

Represent the semantically linked data across entity classes as an interactive, dynamic network graph:

The data behind the above images was generated in the graduate history class, Port Cities of the Atlantic World. It works primarily off of a shared Zotero library, but also includes entities generated through named entity recognition scripts I wrote for this purpose, and which surfaced numerous topical connections between the students’ writing and the sources that were assigned in the course.

We are about to start our second round of development on the project, and will have more exciting news soon!

Code available at: https://github.com/JohnMulligan/Special-Collections/