ISSN:
1573-4951
Keywords:
Similarity
;
Comparison
;
Database
;
Ring
;
Ring-cluster
;
Combinatorial
Source:
Springer Online Journal Archives 1860-2000
Topics:
Chemistry and Pharmacology
Notes:
Abstract We present some new ideas for characterizing and comparing largechemical databases. The comparison of the contents of large databases is nottrivial since it implies pairwise comparison of hundreds of thousands ofcompounds. We have developed methods for categorizing compounds into groupsor series based on their ring-system content, using precalculatedstructure-based hashcodes. Two large databases can then be compared bysimply comparing their hashcode tables. Furthermore, the number of distinctring-system combinations can be used as an indicator of database diversity.We also present an indepen- dent technique for diversity assessment calledthe ’saturation diversity‘ approach. This method is based on picking as manymutually dissimilar compounds as possible from a database or a subsetthereof. We show that both methods yield similar results. Since the twomethods measure very different properties, this probably says more about theproperties of the databases studied than about the methods.
Type of Medium:
Electronic Resource
URL:
http://dx.doi.org/10.1023/A:1007937308615
Permalink