This dataset is collected from the paper: Chen, J.*#, Huang, Y.*#, Brachi, B.*#, Yun, Q.*#, Zhang, W., Lu, W., Li, H., Li, W., Sun, X., Wang, G., He, J., Zhou, Z., Chen, K., Ji, Y., Shi, M., Sun, W., Yang, Y.*, Zhang, R.#, Abbott, R. J.*, & Sun, H.* (2019). Genome-wide analysis of Cushion willow provides insights into alpine plant divergence in a biodiversity hotspot. Nature Communications, 10(1), 5230. doi:10.1038/s41467-019-13128-y. This data contains the genome assembly of alpine species Salix brachista on the Tibetan Plateau, it contains DNA, RNA, Protein files in Fasta format and the annotation file in gff format. Assembly Level: Draft genome in chromosome level Genome Representation: Full Genome Reference Genome: yes Assembly method: SMARTdenovo 1.0; CANU 1.3 Sequencing & coverage: PacBio 125.0; Illumina Hiseq X Ten 43.0; Oxford Nanopore Technologies 74.0 Statistics of Genome Assembly: Genome size (bp): 339,587,529 GC content: 34.15% Chromosomes sequence No.: 19 Organellas sequence No.: 2 Genome sequence No.: 30 Maximum genome sequence length (bp): 39,688,537 Minimum genome sequence length (bp): 57,080 Average genome sequence length (bp): 11,319,584 Genome sequence N50 (bp): 17,922,059 Genome sequence N90 (bp): 13,388,179 Annotation of Whole Genome Assembly: Protein：30,209 tRNA：784 rRNA：118 ncRNA：671 Please see attachments for more details of annotation. The tables in the Supplementary Information of this article can also be found in this dataset. The table list is represented in attachments. The accession no. of genome assembly is GWHAAZH00000000 (https://bigd.big.ac.cn/gwh/Assembly/663/show).
一. Data overview This data interchange is the second data interchange of "genomics research on drought tolerance mechanism of typical desert plants in heihe basin", a key project of the major research program of "integrated research on eco-hydrological processes in heihe basin".The main research goal of this project is a typical desert sand Holly plants as materials, using the current international advanced a new generation of gene sequencing technology to the whole genome sequence and gene transcription of Holly group sequence decoding, so as to explore related to drought resistance gene and gene groups, and transgenic technology in model plants such as arabidopsis and rice) verify its drought resistance. 二, data content 1.Sequencing of the genome and transcriptome of lycophylla SPP. The genome size of Mongolian Holly was about 926 Mb, GC content 36.88%, repeat sequence proportion 66%, genome heterozygosity rate 0.56%, which indicated that the genome has many repeat sequences, high heterozygosity and belongs to a complex genome.Based on the predicted sequence results, we subsequently carried out in-depth sequencing of the genome of lysiopsis SPP. The obtained data were assembled to obtain a 937 Mb genome sequence (table 1), which was basically the same as the predicted genome size.Through to the sand Holly transcriptome sequencing and sequence assembly (table 2), received more than 77000 genes coding sequence (Unigene), these sequences are comments found that most of the gene sequence and legumes and soybean, garbanzo beans and bean has a higher similarity (figure 1), consistent with the fact of sand ilex leguminous plants. 一), and the sand Holly is a leguminous plants consistent with the fact. 2.Discovery of simple repeat sequence (SSR) molecular markers of sand Holly: There is a transcriptome data set of sand Holly in the network public database, and the sample collection site is zhongwei city, ningxia.But this is the location of the project team samples in minqin county, gansu province, in order to study whether this sand in different areas of the Holly sequence has sequence polymorphism, we first identify the minqin county plant samples in the genomes of simple sequence repeat (SSR) markers (table 3), and then, compares the transcriptome sequences of plant sample, found in part of SSR molecular marker polymorphism (table 4), these molecular markers could be used for the species of plant genetic map construction, QTL mapping and genetic diversity analysis in the study. 三, data processing instructions Sample collection place: minqin county, gansu province, latitude and longitude: N38 ° 34 '25.93 "E103 ° 08' 36.77".Genome sequencing: a total of 8 genomic DNA libraries of different sizes were constructed and determined by Illumina HiSeq 2500 instrument.Transcriptome sequencing: a library of 24 transcriptome mrnas was constructed and determined by Illumina HiSeq 4000. 四, the use of data and meaning We selected a typical desert plant as the research object, from the Angle of genomics, parse the desert plant genome and transcriptome sequences, excavated its precious drought-resistant gene resources, and to study their drought resistance mechanism of favorable sand Holly this ancient and important to the utilization of plant resources, as well as the heihe river basin of drought-resistant plant genetic breeding, ecological restoration and sustainable development.
Background: this data interchange is the first data interchange of the key project of "integrated study of eco-hydrological processes in heihe basin", "genomics research on drought tolerance mechanism of typical desert plants in heihe basin".The main research targets of the key projects is a typical sand desert plants are Holly, using the current international advanced a new generation of gene sequencing technology to the whole genome sequence and gene transcription of Holly group sequence decoding, so as to explore related to drought resistance gene and gene groups, and transgenic technology in model to verify their drought resistance in plants. Process and content: as genome sequencing requires special sequencing equipment, the project is huge and the process is complex (mainly including genome library construction, sequencing, data analysis and genome assembly), so it needs to be completed by a professional sequencing company.After contacting with sequencing companies, we learned that before sequencing an unknown genome, the size and complexity of the genome should be predicted, which is a necessary prerequisite for designing sequencing schemes and strategies.Therefore, in 2013, we mainly predicted the chromosome composition, genome size and complexity of sand Holly, and successfully established the extraction and purification method of its genomic DNA.The results showed that the plant was diploid, the genome was composed of 9 staining lines (18 lines of diploid), and the genome size was 1.07G.The quality test results of the genomic DNA indicated that the requirements of the obtained DNA complex sequencing have been sent to the sequencing company for library construction and sequencing, which is now in progress.In addition, in order to obtain a large number of uniform plant materials, we have discussed the induction of callus, which has been successful.Due to these reasons, we were unable to complete the genome sequencing and submit the relevant data of sand Holly in accordance with the original plan of the project this year, mainly because we did not count the predicted contents of the genome before. Data usage: the data obtained in this year on ploidy, karyotype composition and genome size of lycopodium SPP.The success of the callus induction provides a high-quality material guarantee for the subsequent transcriptome sequencing and drought-resistance mechanism research experiments, and it is also a new contribution to the cytological and physiological research of the plant.