We noted you are experiencing viewing problems
- 
        
        Check with your IT department that JWPlatform, JWPlayer and Amazon AWS & CloudFront are not being blocked by your network. The relevant domains are *.jwplatform.com, *.jwpsrv.com, *.jwpcdn.com, jwpltx.com, jwpsrv.a.ssl.fastly.net, *.amazonaws.com and *.cloudfront.net. The relevant ports are 80 and 443.
- 
        
        Check the following talk links to see which ones work correctly:
 Auto Mode
 HTTP Progressive Download Send us your results from the above test links at access@hstalks.com and we will contact you with further advice on troubleshooting your viewing problems.
- 
        
        No luck yet? More tips for troubleshooting viewing issues
- 
        
        Contact HST Support access@hstalks.com
- 
        Please review our troubleshooting guide for tips and advice on resolving your viewing problems.
- 
        For additional help, please don't hesitate to contact HST support access@hstalks.com
We hope you have enjoyed this limited-length demo
                    
                    This is a limited length demo talk; you may
                    
                      login or
                    
                    review methods of
                    obtaining more access.
                  
                
                
              Printable Handouts
Navigable Slide Index
- Introduction
- Talk content
- Substructure search and similarity search
- Substructure searching
- Fingerprints
- Fingerprint filter example
- Fingerprint captures chemical aspects
- Similarity searching
- Fingerprint similarity example
- Substructure searching
- Query input
- R-Group searching
- Data structures for structures
- Subgraph isomorphism
- Ullmann algorithm
- VF2 algorithm
- Molecular graphs
- Tuning example for VF2
- Isomorphism considerations: aromaticity
- Isomorphism considerations: hydrogens
- Isomorphism considerations: stereoisomers
- Fingerprinting
- 2D fingerprint types
- Structural keys versus Hashing
- Pubchem
- Hashed fingerprint example
- Ring perception
- SSSR versus Exhaustive
- SSSR versus Exhaustive pros and cons
- Substructure with screening step
- Similarity searching
- Similarity example (1)
- Similarity calculation
- Similarity example (2)
- Alternative calculations
- Similarity search with boundaries
- Similarity searching algorithm
- What is a good similarity score?
- What similarity method to choose?
- References
Topics Covered
- Searching chemical compounds in data sets
- Querying for a fragment (substructure)
- Similarity to the query input
- Algorithms and performance aspects
- Concept of fingerprinting
- Use of fingerprinting for searching data sets
Talk Citation
Rijnbeek, M. (2011, May 31). Substructure searching, similarity calculations and fingerprints [Video file]. In The Biomedical & Life Sciences Collection, Henry Stewart Talks. Retrieved October 31, 2025, from https://doi.org/10.69645/TYJY9976.Export Citation (RIS)
Publication History
- Published on May 31, 2011
Financial Disclosures
- Mr. Mark Rijnbeek has not informed HSTalks of any commercial/financial relationship that it is appropriate to disclose.
A selection of talks on Methods
Transcript
Please wait while the transcript is being prepared...
      
      
        
                  0:00
                
                
                  
                    Hello and welcome to this talk.
                  
                    My name is Mark Rijnbeek.
                  
                    I work at the Cheminformatics
Research Group
                  
                    at the European Bioinformatics
Institute in Hinxton.
                  
                    That's near
Cambridge, in the UK.
                  
                    My background, I'm a
database and Java developer.
                  
                    I've contributed to a number
of cheminformatics projects,
                  
                    such as JChemPaint,
                  
                    to the molecular editor OrChem,
                  
                    which is a database
search engine for Oracle.
                  
                    This talk will be about
                  
                    substructure searching
chemistry databases,
                  
                    about similarity calculations
and similarity searching,
                  
                    and about how to
create fingerprints
                  
                    and how you can use them
to search databases.
                  
                
              
                  0:40
                
                
                  
                    Here, you can see the
content of the talk.
                  
                    There are four main parts.
                  
                    As an introduction, I will
introduce the various concepts
                  
                    that will be detailed later on,
                  
                    introduce substructure
searching briefly,
                  
                    tell you what fingerprinting is,
                  
                    and point out what similarity
searching entails.
                  
                    Then, for substructure
searching,
                  
                    we'll be looking a
bit more on that.
                  
                    And then, thirdly,
                  
                    how fingerprinting can help
substructure searching,
                  
                    and from that we'll jump
to similarity searching,
                  
                    which also can make
use of fingerprints.
                  
                
              
                  1:14
                
                
                  
                    First, we have to
establish what it is
                  
                    we're going to talk
about, really.
                  
                    Let's assume we have a dataset
of chemical structures,
                  
                    molecules or compounds in
some sort of a dataset.
                  
                    Realistically, it
would be a database,
                  
                    but you could also store
it in file format.
                  
                    It doesn't really matter.
                  
                    There will be a
set of molecules,
                  
                    and we want to search
those compounds.
                  
                    We can do a substructure search.
                  
                    The substructure search will
yield all the molecules
                  
                    that contain the specified
query structure.
                  
                    User draws or formalizes a
query in whatever format,
                  
                    and then the cheminformatic
software will translate that
                  
                    into something it
can search with.
                  
                    It will scan the database
                  
                    and try to find any
sort of compound
                  
                    that has that particular
structure as a substructure,
                  
                    so it will look for
the superstructures.
                  
                    A similarity search,
on the other hand,
                  
                    gives molecules
similar to the query,
                  
                    ranked by their similarity.
                  
                    Similarity is a bit of a
fuzzy concept, as we'll see.
                  
                    Substructure searching
is more formal.
                  
                    It's either a
substructure or it isn't,
                  
                    whereas whether or
not it's similar
                  
                    is a bit of a fuzzy topic.
                  
                    You have to give it a score.
                  
                    You can say, well,
it's very similar.
                  
                    Let's say it's 80% similar,
or it's very dissimilar.
                  
                    Or it's only 20% similar,
                  
                    but we have to come to terms
                  
                    with what those
percentages really mean.
                  
                
               
       
     
                    
                     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
     
        
      
    