Substructure searching, similarity calculations and fingerprints

Published on May 31, 2011   44 min

A selection of talks on Methods

Please wait while the transcript is being prepared...
0:00
Hello and welcome to this talk. My name is Mark Rijnbeek. I work at the Cheminformatics Research Group at the European Bioinformatics Institute in Hinxton. That's near Cambridge, in the UK. My background, I'm a database and Java developer. I've contributed to a number of cheminformatics projects, such as JChemPaint, to the molecular editor OrChem, which is a database search engine for Oracle. This talk will be about substructure searching chemistry databases, about similarity calculations and similarity searching, and about how to create fingerprints and how you can use them to search databases.
0:40
Here, you can see the content of the talk. There are four main parts. As an introduction, I will introduce the various concepts that will be detailed later on, introduce substructure searching briefly, tell you what fingerprinting is, and point out what similarity searching entails. Then, for substructure searching, we'll be looking a bit more on that. And then, thirdly, how fingerprinting can help substructure searching, and from that we'll jump to similarity searching, which also can make use of fingerprints.
1:14
First, we have to establish what it is we're going to talk about, really. Let's assume we have a dataset of chemical structures, molecules or compounds in some sort of a dataset. Realistically, it would be a database, but you could also store it in file format. It doesn't really matter. There will be a set of molecules, and we want to search those compounds. We can do a substructure search. The substructure search will yield all the molecules that contain the specified query structure. User draws or formalizes a query in whatever format, and then the cheminformatic software will translate that into something it can search with. It will scan the database and try to find any sort of compound that has that particular structure as a substructure, so it will look for the superstructures. A similarity search, on the other hand, gives molecules similar to the query, ranked by their similarity. Similarity is a bit of a fuzzy concept, as we'll see. Substructure searching is more formal. It's either a substructure or it isn't, whereas whether or not it's similar is a bit of a fuzzy topic. You have to give it a score. You can say, well, it's very similar. Let's say it's 80% similar, or it's very dissimilar. Or it's only 20% similar, but we have to come to terms with what those percentages really mean.

Quiz available with full talk access. Request Free Trial or Login.

Hide

Substructure searching, similarity calculations and fingerprints

Embed in course/own notes