Share these talks and lectures with your colleagues
Invite colleaguesReasoning about unstructured data de-identification
Abstract
We frame the problem of de-identifying unstructured text within the greater landscape of privacy-enhancing technologies. We then cover what sort of background knowledge can be gained from only stylistic information about a written document and how we can use research on authorship attribution and author profiling to improve our understanding about the sorts of inferences that can be made from an otherwise de-identified text. Finally, we provide a risk score for determining the likelihood that a message will be attributed to a particular author within a dataset using only author profiling tools.
The full article is available to subscribers to the journal.
Author's Biography
Patricia Thaine is a Computer Science PhD Candidate at the University of Toronto and a Postgraduate Affiliate at the Vector Institute; she is doing research on privacy-preserving natural language processing, with a focus on applied cryptography. Her research interests also include computational methods for lost language decipherment. She is a recipient of the Natural Sciences and Engineering Research Council of Canada (NSERC) Postgraduate Scholarship, the Royal Bank of Canada (RBC) Graduate Fellowship, the Beatrice ‘Trixie’ Worsley Graduate Scholarship in Computer Science and the Ontario Graduate Scholarship. She has eight years of research and software development experience, including at the McGill Language Development Lab, the University of Toronto’s Computational Linguistics Lab, the University of Toronto’s Department of Linguistics and the Public Health Agency of Canada. Patricia is the co-founder and Chief Executive Officer of Private AI, a Toronto- and Berlin-based start-up creating a suite of privacy tools that make it easy to comply with data-protection regulations, mitigate cybersecurity threats and maintain customer trust. She is also a member of the Board of Directors of Equity Showcase, one of Canada’s oldest not-for-profit charitable organisations.
Gerald Penn is a Professor of Computer Science at the University of Toronto, where he studies spoken language processing and computational linguistics. He has over 100 publications, with the top one accruing 1,581 citations. He is a senior member of the Institute of Electrical and Electronics Engineers (IEEE) and Association for the Advancement of Artificial Intelligence (AAAI) and a past recipient of the Ontario Early Researcher Award. His lab revolutionised speech recognition with its work on neural networks, which received the IEEE Signal Processing Society’s Best Paper Award. He has led numerous research projects, including ones funded by Avaya, Bell Canada, the Connaught Fund, Microsoft, NSERC, the German Ministry for Training and Research, SMART Technologies, the US Army and the US Office of the Director of National Intelligence. Gerald has also worked at Bell Labs and the National Aeronautics and Space Administration (NASA).