Substructure searching, similarity calculations and fingerprints

Rijnbeek, Mark

We noted you are experiencing viewing problems

Check with your IT department that JWPlatform, JWPlayer and Amazon AWS & CloudFront are not being blocked by your network. The relevant domains are *.jwplatform.com, *.jwpsrv.com, *.jwpcdn.com, jwpltx.com, jwpsrv.a.ssl.fastly.net, *.amazonaws.com and *.cloudfront.net. The relevant ports are 80 and 443.
Check the following talk links to see which ones work correctly:
Auto Mode
HTTP Progressive Download Send us your results from the above test links at access@hstalks.com and we will contact you with further advice on troubleshooting your viewing problems.
No luck yet? More tips for troubleshooting viewing issues
Contact HST Support access@hstalks.com

Please review our troubleshooting guide for tips and advice on resolving your viewing problems.
For additional help, please don't hesitate to contact HST support access@hstalks.com

We hope you have enjoyed this limited-length demo

Request free trial
Recommend to your librarian

Share
Share This Talk
Messaging

Outlook

Gmail

Yahoo!

WhatsApp
Social

Facebook

X

LinkedIn

VKontakte
Permalink
Replay Talk

This is a limited length demo talk; you may login or review methods of obtaining more access.

Slides
Topics
Links
Citation

Printable Handouts

PDF

Navigable Slide Index

Introduction
Talk content
Substructure search and similarity search
Substructure searching
Fingerprints
Fingerprint filter example
Fingerprint captures chemical aspects
Similarity searching
Fingerprint similarity example
Substructure searching
Query input
R-Group searching
Data structures for structures
Subgraph isomorphism
Ullmann algorithm
VF2 algorithm
Molecular graphs
Tuning example for VF2
Isomorphism considerations: aromaticity
Isomorphism considerations: hydrogens
Isomorphism considerations: stereoisomers
Fingerprinting
2D fingerprint types
Structural keys versus Hashing
Pubchem
Hashed fingerprint example
Ring perception
SSSR versus Exhaustive
SSSR versus Exhaustive pros and cons
Substructure with screening step
Similarity searching
Similarity example (1)
Similarity calculation
Similarity example (2)
Alternative calculations
Similarity search with boundaries
Similarity searching algorithm
What is a good similarity score?
What similarity method to choose?
References

Topics Covered

Searching chemical compounds in data sets
Querying for a fragment (substructure)
Similarity to the query input
Algorithms and performance aspects
Concept of fingerprinting
Use of fingerprinting for searching data sets

Links

Series:

Introduction to Cheminformatics

Categories:

Methods

Talk Citation

Rijnbeek, M. (2011, May 31). Substructure searching, similarity calculations and fingerprints [Video file]. In The Biomedical & Life Sciences Collection, Henry Stewart Talks. Retrieved July 27, 2025, from https://doi.org/10.69645/TYJY9976.
Export Citation (RIS)

Publication History

Published on May 31, 2011

Financial Disclosures

Mr. Mark Rijnbeek has not informed HSTalks of any commercial/financial relationship that it is appropriate to disclose.

Embed in course/own notesEmbed Lecture

Substructure searching, similarity calculations and fingerprints

Mr. Mark Rijnbeek – European Bioinformatics Institute (EBI), UK

Published on May 31, 2011 44 min

Review
Share
Share This Talk
Messaging

Outlook

Gmail

Yahoo!

WhatsApp
Social

Facebook

X

LinkedIn

VKontakte
Permalink
Add to

A selection of talks on Methods

30 min

Prof. Aldrin V. Gomes
University of California, Davis, USA

35 min

Prof. Sarah Ransdell
Nova Southeastern University, USA

44 min

Dr. Rana Ismail
Michigan State University, USA

Audio Interview

17 min

Prof. Eytan Ruppin
National Institutes of Health (NIH), USA

Audio Interview

18 min

Dr. Shyam Panjwani
Bayer Pharmaceuticals, USA

41 min

Mr. Atul Mathur
Content Alive, Singapore

33 min

Dr. Martin Buescher
Head of Biophysics at Miltenyi Biotec, Germany

30 min

Prof. Dmitri Rusakov
University College London, UK

43 min

Prof. Ruy M. Ribeiro
University of Lisbon, Portugal

33 min

Dr. Thomas W. MacFarland
Nova Southeastern University, USA

26 min

Dr. Andrei A. Bunaciu
S.C. AAB_IR research S.R.L., Romania

41 min

Prof. Theresa Whiteside
University of Pittsburgh Cancer Institute, USA

24 min

Dr. Robert Hammond
University of St Andrews, UK

42 min

Dr. Alex Sverdlov
Novartis Pharmaceuticals Corporation, USA

41 min

Prof. Lei Liu
Tsinghua University, China

47 min

Dr. Brennan Kahan
University College London, UK

Transcript

Please wait while the transcript is being prepared...

0:00

Hello and welcome to this talk. My name is Mark Rijnbeek. I work at the Cheminformatics Research Group at the European Bioinformatics Institute in Hinxton. That's near Cambridge, in the UK. My background, I'm a database and Java developer. I've contributed to a number of cheminformatics projects, such as JChemPaint, to the molecular editor OrChem, which is a database search engine for Oracle. This talk will be about substructure searching chemistry databases, about similarity calculations and similarity searching, and about how to create fingerprints and how you can use them to search databases.

0:40

Here, you can see the content of the talk. There are four main parts. As an introduction, I will introduce the various concepts that will be detailed later on, introduce substructure searching briefly, tell you what fingerprinting is, and point out what similarity searching entails. Then, for substructure searching, we'll be looking a bit more on that. And then, thirdly, how fingerprinting can help substructure searching, and from that we'll jump to similarity searching, which also can make use of fingerprints.

1:14

First, we have to establish what it is we're going to talk about, really. Let's assume we have a dataset of chemical structures, molecules or compounds in some sort of a dataset. Realistically, it would be a database, but you could also store it in file format. It doesn't really matter. There will be a set of molecules, and we want to search those compounds. We can do a substructure search. The substructure search will yield all the molecules that contain the specified query structure. User draws or formalizes a query in whatever format, and then the cheminformatic software will translate that into something it can search with. It will scan the database and try to find any sort of compound that has that particular structure as a substructure, so it will look for the superstructures. A similarity search, on the other hand, gives molecules similar to the query, ranked by their similarity. Similarity is a bit of a fuzzy concept, as we'll see. Substructure searching is more formal. It's either a substructure or it isn't, whereas whether or not it's similar is a bit of a fuzzy topic. You have to give it a score. You can say, well, it's very similar. Let's say it's 80% similar, or it's very dissimilar. Or it's only 20% similar, but we have to come to terms with what those percentages really mean.

Quiz

Quiz available with full talk access. Request Free Trial or Login.

Show

Hide

Share
Share This Talk
Messaging

Outlook

Gmail

Yahoo!

WhatsApp
Social

Facebook

X

LinkedIn

VKontakte
Permalink
More actions

Substructure searching, similarity calculations and fingerprints

Embed in course/own notes

See Options

Login via your organisation

We noted you are experiencing viewing problems

We hope you have enjoyed this limited-length demo

Share This Talk

Messaging

Social

Permalink

Printable Handouts

Navigable Slide Index

Topics Covered

Links

Series:

Categories:

Talk Citation

Publication History

Financial Disclosures

Substructure searching, similarity calculations and fingerprints

Share This Talk

Messaging

Social

Permalink

A selection of talks on Methods

Transcript

Quiz

Share This Talk

Messaging

Social

Permalink

Substructure searching, similarity calculations and fingerprints