Registration for a live webinar on 'Rare disease clinical trials: challenges and best practices' is now open.
See webinar detailsWe noted you are experiencing viewing problems
-
Check with your IT department that JWPlatform, JWPlayer and Amazon AWS & CloudFront are not being blocked by your network. The relevant domains are *.jwplatform.com, *.jwpsrv.com, *.jwpcdn.com, jwpltx.com, jwpsrv.a.ssl.fastly.net, *.amazonaws.com and *.cloudfront.net. The relevant ports are 80 and 443.
-
Check the following talk links to see which ones work correctly:
Auto Mode
HTTP Progressive Download Send us your results from the above test links at access@hstalks.com and we will contact you with further advice on troubleshooting your viewing problems. -
No luck yet? More tips for troubleshooting viewing issues
-
Contact HST Support access@hstalks.com
-
Please review our troubleshooting guide for tips and advice on resolving your viewing problems.
-
For additional help, please don't hesitate to contact HST support access@hstalks.com
We hope you have enjoyed this limited-length demo
This is a limited length demo talk; you may
login or
review methods of
obtaining more access.
Printable Handouts
Navigable Slide Index
- Introduction
- Talk content
- Substructure search and similarity search
- Substructure searching
- Fingerprints
- Fingerprint filter example
- Fingerprint captures chemical aspects
- Similarity searching
- Fingerprint similarity example
- Substructure searching
- Query input
- R-Group searching
- Data structures for structures
- Subgraph isomorphism
- Ullmann algorithm
- VF2 algorithm
- Molecular graphs
- Tuning example for VF2
- Isomorphism considerations: aromaticity
- Isomorphism considerations: hydrogens
- Isomorphism considerations: stereoisomers
- Fingerprinting
- 2D fingerprint types
- Structural keys versus Hashing
- Pubchem
- Hashed fingerprint example
- Ring perception
- SSSR versus Exhaustive
- SSSR versus Exhaustive pros and cons
- Substructure with screening step
- Similarity searching
- Similarity example (1)
- Similarity calculation
- Similarity example (2)
- Alternative calculations
- Similarity search with boundaries
- Similarity searching algorithm
- What is a good similarity score?
- What similarity method to choose?
- References
Topics Covered
- Searching chemical compounds in data sets
- Querying for a fragment (substructure)
- Similarity to the query input
- Algorithms and performance aspects
- Concept of fingerprinting
- Use of fingerprinting for searching data sets
Talk Citation
Rijnbeek, M. (2011, May 31). Substructure searching, similarity calculations and fingerprints [Video file]. In The Biomedical & Life Sciences Collection, Henry Stewart Talks. Retrieved October 16, 2024, from https://doi.org/10.69645/TYJY9976.Export Citation (RIS)
Publication History
Financial Disclosures
- Mr. Mark Rijnbeek has not informed HSTalks of any commercial/financial relationship that it is appropriate to disclose.
A selection of talks on Methods
Transcript
Please wait while the transcript is being prepared...
0:00
Hello and welcome to this talk.
My name is Mark Rijnbeek.
I work at the Cheminformatics
Research Group
at the European Bioinformatics
Institute in Hinxton.
That's near
Cambridge, in the UK.
My background, I'm a
database and Java developer.
I've contributed to a number
of cheminformatics projects,
such as JChemPaint,
to the molecular editor OrChem,
which is a database
search engine for Oracle.
This talk will be about
substructure searching
chemistry databases,
about similarity calculations
and similarity searching,
and about how to
create fingerprints
and how you can use them
to search databases.
0:40
Here, you can see the
content of the talk.
There are four main parts.
As an introduction, I will
introduce the various concepts
that will be detailed later on,
introduce substructure
searching briefly,
tell you what fingerprinting is,
and point out what similarity
searching entails.
Then, for substructure
searching,
we'll be looking a
bit more on that.
And then, thirdly,
how fingerprinting can help
substructure searching,
and from that we'll jump
to similarity searching,
which also can make
use of fingerprints.
1:14
First, we have to
establish what it is
we're going to talk
about, really.
Let's assume we have a dataset
of chemical structures,
molecules or compounds in
some sort of a dataset.
Realistically, it
would be a database,
but you could also store
it in file format.
It doesn't really matter.
There will be a
set of molecules,
and we want to search
those compounds.
We can do a substructure search.
The substructure search will
yield all the molecules
that contain the specified
query structure.
User draws or formalizes a
query in whatever format,
and then the cheminformatic
software will translate that
into something it
can search with.
It will scan the database
and try to find any
sort of compound
that has that particular
structure as a substructure,
so it will look for
the superstructures.
A similarity search,
on the other hand,
gives molecules
similar to the query,
ranked by their similarity.
Similarity is a bit of a
fuzzy concept, as we'll see.
Substructure searching
is more formal.
It's either a
substructure or it isn't,
whereas whether or
not it's similar
is a bit of a fuzzy topic.
You have to give it a score.
You can say, well,
it's very similar.
Let's say it's 80% similar,
or it's very dissimilar.
Or it's only 20% similar,
but we have to come to terms
with what those
percentages really mean.