научная статья по теме DE NOVO TRANSCRIPTOME ANALYSIS OF MULBERRY (MORUS L.) UNDER DROUGHT STRESS USING RNA-SEQ TECHNOLOGY Химия

Текст научной статьи на тему «DE NOVO TRANSCRIPTOME ANALYSIS OF MULBERRY (MORUS L.) UNDER DROUGHT STRESS USING RNA-SEQ TECHNOLOGY»

DE NOVO TRANSCRIPTOME ANALYSIS OF MULBERRY (MORUS L.) UNDER DROUGHT STRESS USING RNA-SEQ TECHNOLOGY

© 2014 Heng Wang", Wei Tong", Li Feng", Qian Jiao", Li Long"- b, Rongjun Fang"- c, and Weiguo Zhao", b, d, #

aSchool of Biology and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang Jiangsu, 212018PR China bSericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang Jiangsu, 212018 PR China cSchool of Life Sciences, Nanjing University, Nanjing Jiangsu, 210093 PR China dSouth Jiangsu Sericultural Research Institute, Liyang Jiangsu, 213300 PR China Received January 9, 2014; in final form, February 3, 2014

A large-scale RNA sequencing (RNA-seq) of mulberry (Morus L.) was carried out between two samples in regular and drought stress condition. In this research, de novo assembly was performed, and totally 54736 contigs were obtained from the reads, including the scaffolded regions. 1051 genes were identified that were significantly differently expressed between the two samples. As determined by Gene Ontology (GO) annotation and the Kyoto Encyclopedia of Genes and Genomes pathway mapping, 10110 GO terms and 247 pathways were assigned and then analyzed. Thousands of SSR markers produced in this study will enable genetic linkage mapping construction and gene-based association studies. Seven unique genes showing different expression level in control and drought stress groups were subsequently analyzed and identified by real-time PCR. For lack of mulberry whole genome information, transcriptome and de novo analysis from the two samples will provide important and useful information for later research and help genetic breeding of mulberry.

Keywords: mulberry, drought stress, RNA-seq, de novo assembly, transcriptome, gene expression

DOI: 10.7868/S0132342314040034

INTRODUCTION

Mulberry (Morus L.), a perennial tree or shrub, is an important economic plant, not only used for sericulture as the sole diet plant for the domesticated silkworm (Bombyx mori), but also for a variety of other purposes such as the production of edible fruits or useful timber [1]. The growth and productivity of mulberry are adversely affected by abiotic and biotic stress [2]. However, plants have various response and defense systems at the molecular, cellular, and physiological levels in order to survive. Genomics endeavors in mulberry research have been undertaken with the characterization of mulberry chloroplast genome [3] and generation of expressed sequence tags (ESTs) from native and water-stressed leaves [4, 5]. However, for further progress, new gene targets need to be identified requiring extensive genomics studies. A genome-wide transcription analysis in response to drought is essential to provide effective genetic engineering strategies to improve stress tolerance in mulberry and other crop plants. Investigations of the transcriptome using different approaches are gradually leading to a better un-

1 The first two authors contributed equally. # Corresponding author (e-mail: wgzsri@126.com).

derstanding of the molecular mechanisms of drought tolerance.

Recent advances in high-throughput sequencing, also called the next-generation sequencing (NGS), provided a fast, cost-effective, and reliable approach to generate large expression datasets for functional ge-nomic analysis, which is especially suitable for nonmodel species with un-sequenced genomes, including the Roche GS FLX [6], ABI-SOLiD [7], Illumina Solexa [8], and Helicos [9] technologies. The high-throughput sequencing approach has highlighted the benefits of providing a more thorough qualitative and quantitative description of gene expression than the microarray-based assays [10—13]. RNA-seq, a revolutionary advance in genome-scale sequencing, is a more comprehensive and efficient way to measure transcriptome composition, obtain RNA expression patterns, and discover new exons and genes [14—19]. It enables the laborious cloning steps to be avoided, and the higher sequencing depth adds further to its potentially superior accuracy and precision compared with previous methods [12, 19]. In addition, it is very sensitive and allowing detections of low-abundant transcripts. Transcriptome studies on yeast, Arabidop-sis thaliana, mouse, and human cells have demon-

Table 1. Clean summary and contig measurements after assembly

Clean summary Contig measurement

treatment control including scaffolded excluding scaffolded

Number of raw reads 20121904 25320773 Total reads 45678067 44505649

Number of discard reads 224089 222270 Contigs 54736 62646

Number of effective reads 19897815 25098503 Average/bp 835 710

after trimmed Maximum/bp 12174 12174

Effective Reads ratio 98.89% 99.12% Minimum/bp 200 1

N75(50.25) 4564 4866

strated that this approach is well-suited for surveying the complexity of transcription in eukaryotes [20—26]. Since RNA-seq is not limited to detecting transcripts that correspond to existing genomic sequences, it is particularly attractive for non-model organisms with genomic sequences that are yet to be determined [27—30].

In this report, to gain a deeper understanding of the mechanisms of drought tolerance in mulberry, we conducted the transcriptome analysis of gene expression under a drought model with Solexa RNA-seq. The analyzed transcriptome datasets will serve as a public information platform for gene expression, genomics, and functional genomics in mulberry, providing reference sequences and the information for the follow-up studies, and lay the foundation for molecular breeding in mulberry and other plants.

RESULTS

Sequencing and de novo Assembly

Qualified RNA and library were obtained before sequencing. For a global view of the transcriptome after drought stress in mulberry, we used Illummina HiSeq 2000 to perform high-throughput sequencing analysis from two libraries (control and treatment). De novo assembly was performed using the Scaffolding Contig Algorithm of CLC Genomics Workbench (version 5.5) using the following parameters: word size = 45 (the minimum unit equivalent to other stitching algorithm £-mer), minimum contig length >200. More complete contigs can be spliced out using the Pool reads stitching to produce more accurate and comprehensive calculations of expression level.

Total of 54736 contigs were obtained from the reads with the average length of 835 bp, maximum length of 12174 bp, and GC percentage of 40.4%, including the scaffolded regions. The size distribution of these contigs is displayed in Fig 1. Clean statistics of raw reads was carried out using fastx. Clean summary and the contig measurement are shown in Table 1. 98.89% and 99.12% of the raw reads in treatment and control group are effective after trimming.

Database Annotation and Functional Classification

Database annotation. Nucleic acid sequences of scaffolding contigs were translated into proteins and compared for blast-blastx with the UniProt (May, 2012). UniProt is a comprehensive high-quality freely accessible database ofprotein sequence and functional information, many entries having been derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. Screening with the E-value of <1e-5 and protein similarity >30% resulted in the UniProt annotations. From the result, we can get the homologous protein ID, length, annotation information, and the match length. We have 54736 total contigs and 27013 are on the note with an annotation rate of about 50.0%. Plant species distribution of the annotated protein shows that mulberry proteins are mostly homologous with Vitis vinifera, Ricinus communis, and Populus tri-chocarpa. It allows for numerous studies of these plants in homologous and comparative aspects in the future.

GO functional classification. To functionally categorize the contigs, Gene Ontology (GO) terms were assigned to each assembled contig. GO is an international standard gene functional classification system, which offers a dynamic-updated controlled vocabulary and a strictly defined concept to comprehensively describe the properties of genes and their products in any organism. GO has three ontologies: molecular function, cellular component, and biological process [31]. GO functional classification was carried out based on the Uniprot annotation result. In total, 10110 annotations were assigned to gene ontology classes with functional terms. Of them, assignments to the biological process made up the majority (6416, 63.46% of the total), followed by cellular component (2037, 20.15% of the total), and molecular function (1657, 16.39% of the total). For each sequence, the specific annotated GO terms provide a broad overview of the groups of genes cataloged in the transcriptome. GO function is arranged in accordance with the tree structure. Generally, deeper layers have more detailed features, correspondingly increased redundancy. Be-

Table 2. Summary of SSR searching results

SSR statistics type Number

Mixed SSR 4,282

Single SSR 16,411

Di-nucleotide SSR 8,247

Tri-nucleotide SSR 3,833

Tetra-nucleotide SSR 342

Penta-nucleotide SSR 96

Hexa-nucleotide SSR 74

Total SSR 33,285

cause of this, function of the second layer (level 2) was widely used at present.

We also got the share number of all the sequences distributed over various levels of the three categories of the Gene Ontology. From that, we know the levels of the GO terms, the category, contig number, and genes commented on the GO. The result of GO functional classification was counted and 10110 terms were summarized into the three main GO categories and 54 sub-categories. The distribution of the sub-categories in level 2 in each category are shown in Fig 2. The assigned functions of contigs covered a broad range of GO categories. Under the biological process category, metabolic process (11027 contigs, 31.70%) and cellular process (9888 contigs,

Для дальнейшего прочтения статьи необходимо приобрести полный текст. Статьи высылаются в формате PDF на указанную при оплате почту. Время доставки составляет менее 10 минут. Стоимость одной статьи — 150 рублей.

Показать целиком