Abstract
Total 200 properties related to structural characteristics were employed to represent structures of 400 HA coded proteins of influenza virus as training samples. Some recognition models for HA proteins of avian influenza virus (AIV) were developed using support vector machine (SVM) and linear discriminant analysis (LDA). The results obtained from LDA are as follows: the identification accuracy (R ia ) for training samples is 99.8% and R ia by leave one out cross validation is 99.5%. Both R ia of 99.8% for training samples and R ia of 99.3% by leave one out cross validation are obtained using SVM model, respectively. External 200 HA proteins of influenza virus were used to validate the external predictive power of the resulting model. The external R ia for them is 95.5% by LDA and 96.5% by SVM, respectively, which shows that HA proteins of AIVs are preferably recognized by SVM and LDA, and the performances by SVM are superior to those by LDA.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Gan M H. Avian Influenza (the second edition). Beijing: Chinese Agriculture Press, 2002. 15–79
Garten W, Klenk H D. Understanding influenza virus pathogenicity. Trends in Microbiol, 1999, 7(3): 99–100
Kachigan S K. Statistical analysis. New York: Radius Press, 1986: 264–285
Burbidge R, Trotter M, Buxton B. Drug design by machine learning: Support vector machines for pharmaceutical data analysis. Comp and Chem, 2001, 26: 5–14
Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. J Mol Biol, 2001, 308: 397–407
Hong S J, Weiss S M. Advances in predictive models for data mining. Pattern Recogn Lett, 2001, 22: 55–61
Belousov A I, Verzakov S A, Frese J von. A flexible classification approach with optimal generalization performance: support vector machines. Chemom Intell Lab Syst, 2002, 64: 15–25
Gao J B, Gunn S R, Harris C J. SVM regression through variational methods and its sequential implementation. Neurocomputing, 2003, 55: 151–167
Cortes C, Vapnik V. Support vector networks. Mach Learn, 1995, 20: 273–293
Flach P A. On the state of the art in machine learning: A personal review. Artif Intell, 2001, 131: 199–222
Sάnchez A V D. Advanced support vector machines and kernel methods. Neurocomputing, 2003, 55: 5–20
Tropsha A, Gramatica P, Gombar V K. The importance of being earnest: validation is the absolute essential for successful application and inerpretation of QSPR models. QSAR Comb Sci, 2003, 22: 69–77
Kim D, Lee I-B. Process monitoring based on probabilistic PCA. Chemon Intell Lab Syst, 2003, 67: 109–123
Guan Y, Poon L L M, Cheung C Y. H5N1 influenza: A protean pandemic threat. Proc Natl Acad Sci, 2004, 101(21): 8156–8161
Chen H, Deng G, Li Z, Tian G, Li Y, Jiao P, Zhang L, Liu Z, Webster R G, Yu K. The evolution of H5N1 influenza viruses in ducks in southern China, Proc Natl Acad Sci USA. 2004, 101: 10452–10457
Guan Y, Shortridge K F, Krauss S, et al. Molecular characterization of H9N2 influenza viruses: Were they the donors of the “internal” genes of H5N1 viruses in Hong Kong? Proc Natl Acad Sci USA, 1999; 96: 9363–9367
Chen H, Smith G J D, Li K S. Establishment of multiple sublineages of H5N1 influenza virus in Asia: Implications for pandemic control. Proc Natl Acad Sci USA, 2006, 103: 2845–2850
Lin Y P, Shaw M, Gregory V. Avian-to-human transmission of H9N2 subtype influenza A viruses: Relationship between H9N2 and H5N1 human isolates. Proc Natl Acad Sci USA, 2000, 97: 9654–9658
Mei H, Sun L L, Zhou Y, Xing Q, Li Z. Identification of coding proteins related to SARS-CoV. Chin Sci Bull, 2004, 49(19): 2037–2040
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by Foundations of National High Technology (863) Programme (Grant No. 2006AA02Z312), Innovative Group Programme for Graduates of Chongqing University, Science and Innovation Fund (Grant No. 200711C1A0010260), National 111 Programme Introducing Talents of Discipline to Universities (Grant No. 0507111106), Chongqing Municipality Basic and Applied Fundamental Science Fund (Grant No. 01-3-6), National Chunhui Project Foundation (Grant No. 99-4-4+3-7), State Key Laboratory of Chemo/Biosensing and Chemometrics Fund (Grant No. 2005012), Fok-Yingtung Educational Foundation (Grant No. 98-7-6)
Rights and permissions
About this article
Cite this article
Liang, G., Chen, Z., Yang, S. et al. Recognition for avian influenza virus proteins based on support vector machine and linear discriminant analysis. Sci. China Ser. B-Chem. 51, 166–170 (2008). https://doi.org/10.1007/s11426-008-0006-7
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11426-008-0006-7