Share these talks and lectures with your colleagues
Invite colleaguesMachine learning in the library: Developing an inter-departmental core solution to manage data
Abstract
The Oklahoma State University Archives identified the need for an updated, comprehensive inventory of its digital assets to guide the development of digital preservation priorities. Creating it was complicated by sparse records, limited manpower and dependence on fading institutional memory as well as poor data management. A strategic planning process was launched to address these deficiencies. Machine learning (ML) was identified as a promising tool to minimise the labour-intensive process of sorting artefacts and identifying records that needed to be augmented, cleaned or eliminated from the collection. A pilot project to explore the effectiveness of using ML to curate a high-value archival collection was implemented. This paper describes the nature of ML, its promise and limitations for use in archives, and the outcomes of the pilot project. In particular, the pilot project showed promising results in the application of facial recognition techniques. Collaboration with interested colleagues in other departments suggests that ML can be widely applied to projects throughout the library.
The full article is available to subscribers to the journal.
Author's Biography
Patrice-Andre Prud'Homme Patrice-Andre Prud’homme is the Director of Digital Curation at the Oklahoma State University Library, where he provides leadership and management in the areas of digital preservation, curation and discovery of digital resources. He manages the processing of digital materials and their associated metadata and experiments with machine learning to increase the visibility and relevance of digital collections for research and education.
Kay K. Bjornen is the Research Data Initiatives Librarian at the Oklahoma State University Library, where she teaches and consults on data management and data literacy. She is an analytical chemist by training and her interest in data management began during her years as a corporate research manager when she had responsibility for the organisation and maintenance of research and technical records.
Phillip Doehle is the Digital Services Librarian at Oklahoma State University. He coordinates the university’s Carpentries initiative, teaching introductory data-science skills to researchers. He has been actively involved in the Carpentries since 2015. Phillip holds a master’s degree in applied mathematics from Oklahoma State University.