In:
PLOS Computational Biology, Public Library of Science (PLoS), Vol. 17, No. 9 ( 2021-9-22), p. e1009345-
Abstract:
Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno , achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).
Type of Medium:
Online Resource
ISSN:
1553-7358
DOI:
10.1371/journal.pcbi.1009345
DOI:
10.1371/journal.pcbi.1009345.g001
DOI:
10.1371/journal.pcbi.1009345.g002
DOI:
10.1371/journal.pcbi.1009345.g003
DOI:
10.1371/journal.pcbi.1009345.g004
DOI:
10.1371/journal.pcbi.1009345.g005
DOI:
10.1371/journal.pcbi.1009345.g006
DOI:
10.1371/journal.pcbi.1009345.g007
DOI:
10.1371/journal.pcbi.1009345.t001
DOI:
10.1371/journal.pcbi.1009345.t002
DOI:
10.1371/journal.pcbi.1009345.t003
DOI:
10.1371/journal.pcbi.1009345.t004
DOI:
10.1371/journal.pcbi.1009345.s001
DOI:
10.1371/journal.pcbi.1009345.s002
DOI:
10.1371/journal.pcbi.1009345.s003
DOI:
10.1371/journal.pcbi.1009345.s004
DOI:
10.1371/journal.pcbi.1009345.s005
DOI:
10.1371/journal.pcbi.1009345.s006
DOI:
10.1371/journal.pcbi.1009345.s007
DOI:
10.1371/journal.pcbi.1009345.s008
DOI:
10.1371/journal.pcbi.1009345.s009
DOI:
10.1371/journal.pcbi.1009345.s010
DOI:
10.1371/journal.pcbi.1009345.s011
DOI:
10.1371/journal.pcbi.1009345.s012
DOI:
10.1371/journal.pcbi.1009345.s013
DOI:
10.1371/journal.pcbi.1009345.s014
DOI:
10.1371/journal.pcbi.1009345.s015
DOI:
10.1371/journal.pcbi.1009345.s016
DOI:
10.1371/journal.pcbi.1009345.s017
DOI:
10.1371/journal.pcbi.1009345.s018
DOI:
10.1371/journal.pcbi.1009345.s019
DOI:
10.1371/journal.pcbi.1009345.s020
DOI:
10.1371/journal.pcbi.1009345.s021
DOI:
10.1371/journal.pcbi.1009345.s022
DOI:
10.1371/journal.pcbi.1009345.s023
DOI:
10.1371/journal.pcbi.1009345.s024
DOI:
10.1371/journal.pcbi.1009345.s025
Language:
English
Publisher:
Public Library of Science (PLoS)
Publication Date:
2021
detail.hit.zdb_id:
2193340-6
Permalink