GLORIA

GEOMAR Library Ocean Research Information Access

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Atom pairs  (1)
  • Biochemistry and Biotechnology  (1)
  • 1
    Electronic Resource
    Electronic Resource
    Springer
    Journal of computer aided molecular design 8 (1994), S. 323-340 
    ISSN: 1573-4951
    Keywords: Atom pairs ; PLS ; SAMPLS ; Topological descriptors ; QSAR
    Source: Springer Online Journal Archives 1860-2000
    Topics: Chemistry and Pharmacology
    Notes: Summary Trend vector analysis [Carhart, R.E. et al., J. Chem. Inf. Comput. Sci., 25 (1985) 64], in combination with topological descriptors such as atom pairs, has proved useful in drug discovery for ranking large collections of chemical compounds in order of predicted biological activity. The compounds with the highest predicted activities, upon being tested, often show a several-fold increase in the fraction of active compounds relative to a randomly selected set. A trend vector is simply the one-dimensional array of correlations between the biological activity of interest and a set of properties or ‘descriptors’ of compounds in a training set. This paper examines two methods for generalizing the trend vector to improve the predicted rank order. The trend matrix method finds the correlations between the residuals and the simultaneous occurrence of descriptors, which are stored in a two-dimensional analog of the trend vector. The SAMPLS method derives a linear model by partial least squares (PLS), using the ‘sample-based’ formulation of PLS [Bush, B.L. and Nachbar, R.B., J. Comput.-Aided Mol. Design, 7 (1993) 587] for efficiency in treating the large number of descriptors. PLS accumulates a predictive model as a sum of linear components. Expressed as a vector of prediction coefficients on properties, the first PLS component is proportional to the trend vector. Subsequent components adjust the model toward full least squares. For both methods the residuals decrease, while the risk of overfitting the training set increases. We therefore also describe statistical checks to prevent overfitting. These methods are applied to two data sets, a small homologous series of disubstituted piperidines, tested on the dopamine receptor, and a large set of diverse chemical structures, some of which are active at the muscarinic receptor. Each data set is split into a training set and a test set, and the activities in the test set are predicted from a fit on the training set. Both the trend matrix and the SAMPLS approach improve the predictions over the simple trend vector. The SAMPLS approach is superior to the trend matrix in that it requires much less storage and CPU time. It also provides a useful set of axes for visualizing properties of the compounds. We describe a randomization method to determine the optimum number of PLS components that is very much faster for large training sets than leave-one-out cross-validation.
    Type of Medium: Electronic Resource
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Electronic Resource
    Electronic Resource
    New York, NY : Wiley-Blackwell
    Proteins: Structure, Function, and Genetics 14 (1992), S. 16-28 
    ISSN: 0887-3585
    Keywords: BLAST3 ; protein sequence ; sequence motif ; protein sequence database ; Chemistry ; Biochemistry and Biotechnology
    Source: Wiley InterScience Backfile Collection 1832-2000
    Topics: Medicine
    Notes: Signature sequences are contiguous patterns of amino acis 10-50 resiues long that are associated with a particular structure or function in proteins. These may be of three types (by our nomenclature): superfamily signatures, remnant homologies, and motifs. We have performed a systematic search through a database of protein sequences to automatically and preferentially find remnant homologies and motifs. This was accomplished in three steps: 1We generated a nonredundant sequence database.2We used BLAST3 (Altschul and Lipman, Proc. Natl. Acad. Sci. U.S.A. 87:5509--5513, 1990) to generate local pairwise and triplet sequence alignments for every protein in the database vs. every other.3We selected “interesting” alignments and grouped them into clusters. We find that most of the clusters contain segments from proteins which share a common structure or function. Many of them correspond to signatures previously noted in the literature. We discuss three previously recognized motifs in detail (FAD/NAD-binding, ATP/GTP-binding, and cytochrome b5-like domains) to demonstrate how the alignments generated by our procedure are consistent with previous work and make structural and functional sense. We also discuss two signatures (for N-acetyltransferases and glycerol-phosphate binding) which to our knowledge have not been previously recognized. © 1992 Wiley-Liss, Inc.
    Additional Material: 6 Ill.
    Type of Medium: Electronic Resource
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...