Abstract

We report the whole-genome sequence of the Arabian honeybee (Apis mellifera jemenetica). Seven A. m. jemenetica samples were sequenced representing three distinct subpopulations. Generated sequence reads were mapped to the reference honeybee Apis mellifera genome (Amel_HAv3.1). Data revealed genome-wide patterns of genetic variation which can be useful in the characterization and assessment of positive selection of the Arabian honeybee using different genetic markers. In total, 75.16 Gb of clean bases were generated, and the GC content of samples ranged between 31.9 and 35.3%. The effective reference genome size is 223,937,270 bp. The mapping rate of samples varied from 88.97% to 96.19%, and the effective mapping depth was between 41.80 and 48.84X. Single-nucleotide polymorphisms (SNPs) among sequenced individuals ranged between 2379499 and 2396116 with respect to the reference A. mellifera genome (Amel_HAv3.1), and 2% of the SNPs were nonsynonymous. Genome Analysis Toolkit (GATK) detected 1097962–1109829 InDels and 10090–11962 structural variations (SV) from which 22.1 to 33.8% were in the form of deletions. Copy number variation (CNV) ranged between 550 and 2824, and 45–91% of them were downregulated. These variations among interbreeding individuals or groups of the same species may reflect an adaptive environmental response and fitness among different subpopulations and can be very useful for subspecies characterization, conservation, and selection of the Arabian honeybee.

1. Introduction

The honeybee Apis mellifera (Linnaeus, 1758) (Hymenoptera: Apidae) comprises 31 geographical subspecies distributed in Asia, Africa, and Europe [1]. Based on molecular and/or morphometric analyses, these subspecies have been classified into separate evolutionary lineages and sublineages: the O and Y lineages occurring in Western Asia, the M lineage in Eurasia, the C lineage in Europe, and the A lineage and Z sublineage in Africa [1, 2]. The Arabian honeybee A. m. jemenetica is the native honeybee subspecies of the Arabian Peninsula and tropical Africa. It is the only A. mellifera subspecies that is naturally occurring in Africa and Asia, exhibiting distinctive traits associated with different geographical ranges [2]. It is assumed that beekeeping in Saudi Arabia started since for more than 4000 years ago [3]. A. m. jemenetica has been kept in diverse climatic zones of the country and acquired unique morphological and behavioral aspects among other A. mellifera subspecies. For example, it is the smallest and the most adapted to extreme temperatures compared to other European or African A. mellifera subspecies [2, 4, 5]. Each honeybee lineage or sublineage can include several A. mellifera subspecies. Based on morphometric characterization and mitochondrial DNA analysis, A. m. jemenetica of Saudi Arabia belongs to the Y lineage (previously called Z sublineage or O lineage) and is separated into 3 distinctive subpopulations, with significant morphological/molecular variations [68]. Recently, a comprehensive phylogenetic analysis of the Arabian honeybee, A. m. jemenetica, based on complete mitochondrial DNA sequences (n = 14), revealed three distinct subclusters within Saudi Arabia [8], with very close evolutionary relationship with the Syrian honeybee, A. m. syriaca, the Egyptian honeybee A. m. lamarckii, and east African subspecies such as A. m. capensis and A. m. scutellata [8]. These subpopulations may exhibit different adaptive traits [9]. Nevertheless, huge importations of exotic honeybee subspecies may describe Saudi Arabia as the foremost package bee importer worldwide with about 1.3 million packages being imported in 2021 [10], which may impact the presence and characteristics of A. m. jemenetica and entails urgent conservation strategies of this honeybee subspecies.

The completion of the honeybee genome (A. m. ligustica) [11] and subsequent releases and updates opens the door for genome-wise comparisons with other A. mellifera subspecies and ecotypes. From the 31 reported A. mellifera subspecies, only a few A. mellifera subspecies genomes were published (A. m. ligustica, A. m. carnica, A. m. mellifera, A. m. caucasica, and A. m. intermissa). Recently, by analyzing 251 genomes of 18 native honeybee subspecies, scientists suggested a West Asian origin of A. mellifera with at least three adaptive radiations given to African and European lineages; consequently, 145 genes with unique signatures of selection across all bee lineages were documented [12]. These findings increased the focus on the characterization and conservation of the West Asian honeybee subspecies such as A. m. jemenetica and A. m. syriaca. In this study, the whole-genome sequences of seven honeybee samples representing three distinct subpopulations of the Arabian honeybee A. m. jemenetica have been reported, which will enrich our understanding of the evolutionary relationship among A. mellifera subspecies and provide a wealth of knowledge on molecular characterization, conservation, and selection of A. m. jemenetica [13].

2. Materials and Methods

2.1. Sampling and DNA Extraction

Seven native nonmigratory honeybee apiaries (location (sample code): Gis): Al-Madina (MD101.1): 25.411561, 37.520384; Najran (N21): 17.625917, 43.754611; Asir (A43) 18.256722, 42.229028; Al-baha (B31): 19.852472, 41.585611; Makkah (MK163): 21.911416, 39.753706; Jazan (J61): 17.517472, 43.075000; and Tabuk (Tu21): 28.387028, 36.855194 were selected from purebred beekeeping areas within Saudi Arabia. From each apiary, one sample consisting of 15 bees was collected, 10 bees were characterized morphometrically using body dimensions (proboscis, femur, tibia, metatarsus, hind leg, tergite-3, and tergite-4 cubital index 1 and 2) and wing characters (wing angles: a4, b4, d7, c9, g18, j10, j16, k19, l13, n23, and o26 and 20 wing landmarks) [2, 14] to ensure accurate subspecies affiliation. After that, three adult workers were taken from each sample. Bee samples were preserved in ethanol (96%) and stored in the freezer. Genomic DNA was extracted from one honeybee and purified using Qiagen extraction and purification kits (CAT: 96504; 28106, respectively: Qiagen, Hilden, Germany). The genomic DNA was then sent to BGI (https://www.bgi.com) for sequencing. The genomic DNA was first subjected to ultrasound shearing, end repairing, and end adaptor ligation and fragment selection for amplification and library construction.

2.2. Sequencing Assessment and Raw Data Processing

Figure 1 charts the steps of sequencer raw data processing, sequence alignment, and sequence variation assessments. Raw sequencing data produced from the DNBseq-G400 pine line were subjected to three-step filtration using SOAPnuke v1.5.6 software [15]. Data processing started with adaptor trimming, and any reads with an adaptor mapping rate higher than 50% were removed. Then, low-quality reads with more than 50% of low-quality bases (Q20 < 50%) were removed. Finally, contiguous reads with more than 2% N bases were removed.

2.3. Alignment

Sequence reads were aligned to the reference genome using the BWA analysis tool (https://bio-bwa.sourceforge.net/bwa.shtml) [11]. The honeybee Amel_HAv3.1 (https://www.ncbi.nlm.nih.gov/assembly/gcf_003254395.5) reference genome was used for alignment of sample sequence reads. The reference genome size is 225,250,884 bp while the effective size is 223,937,270 bp (N base excluded), with GC content at 32.34%. The mapping rate of samples varied from 32.96% to 99.47%, and the effective mapping depth ranged between 26.51X and 59.33X (according to the sample selected). The short-sequence reads aligned against the long reference genome were created in SAM (Sequence Alignment/Map) format. Picard tools (v1.118) were used to sort the SAM files by coordinate and convert them to BAM files. Picard tools software (v1.118) (https://broadinstitute.github.io/picard/) was also used to mark duplicate reads.

2.4. Sequence Variation Assessment

GATK (version 3.7-0-gcfedb67: https://www.broadinstitute.org/gatk/, RealnTimes = 1) was used in single-nucleotide polymorphism (SNPs) and short insertions and deletions (InDels) calling. SNPs annotation started with a consensus sequence which consisted of all bases of the target genome. SNPs that existed in both genotypes (our genotype and the reference genomes) were screened, and after that, a highly reliable SNP dataset was generated. Nonsynonymous mutation and large-effect SNPs were also recorded. Annotation and statistics of selected SNPs and InDels BGI were performed using an in-house Perl script and ReSeqTools (https://github.com/BGI-shenzhen/Reseqtools). The BreakDancer/CREST method (https://breakdancer.sourceforge.net [11]). Based on reference genome, the BreakDncer/CREST method (https//breakdancer.sourceforge.net (chen et al., 2009)) enabled us to produce a list of structural variations (SV) that were detected across the whole genome. Copy number variations (CNVs) were detected according to Zheng et al. [16] method. Using SOAP alignment results, the depth of each base was calculated and standardized by the mean depth of its chromosome to calculate the copy number variation according to the standard method [16].

3. Results

3.1. Sequences and Alignment Assessment

In total, 75.16 G of clean bases were generated. About 96% of the clean data have quality values higher than 20 (q20 ≥ 96%) and from 85.2 to 90.4% of the data have quality values higher than 30 (q30 ≥ 96%) (Table 1). For alignment, the Apis mellifera ligustica reference genome (assemblyAmel_HAv3.1: https://www.ncbi.nlm.nih.gov/genome/48?genome_assembly_id=403979) was used. The reference genome size is 225,250,884 bp, while the effective size is 223,937,270 bp (N base excluded), with GC content at 32.34%. The mapping rate of samples varies from 88.97% to 96.19%, and the effective mapping depth is between 41.80 and 48.84X (Table 2). GC content of samples ranged between 31.9 and 35.3 (Table 1).

3.2. SNPs Detection and Annotation

All seven samples represented morphologically one honeybee subspecies A. m. jemenetica and three distinct subpopulations. The polymorphic loci in A. m. jemenetica samples revealed a high total number of SNPs ranging between 2379499 and 2396116 in relation to the reference A. mellifera genome (Amel_HAv3.1) and 35.1–36.3% of which were heterozygous. Nonsynonymous SNPs counted about 2% of the total number of SPNs and ranged between 48892 and 45559. Polymorphic sites of exons ranged between 529533 and 540719. A total number of 2041833–2050427 genes had at least one polymorphic site among different honeybee samples. Sample MD101 showed the highest number of nonsynonymous SNPs and SNPs on exons. These variations may have a great impact on the biological or behavioral traits of the Arabian honeybee A. m. jemenetica. The total number and distribution of SNPs are listed in Table 3.

3.3. Insertions and Deletions (InDels)

Genome Analysis Toolkit (GATK) detected 1097962–1109829 insertions and deletions (InDels) among different A. m. jemenetica samples, which were distributed almost equally between insertions and deletions. Table 4 shows the distribution and annotation of InDels. Insertion and deletions at coding regions (CDS-InDels) were 0.94–0.99 of the total InDels in the genomes of different samples. About 0.86% of the InDels occurred within genes. InDel annotation can reveal more details about the evolutionary history among A. mellifera subspecies.

3.4. SV and CNV

Structural variation is a large sequence variation (1 kb or even larger) that can include sequence duplication, inversion, and transposition or insertions and deletions, commonly referred to as copy number variants (CNVs). These CNVs often overlap with segmental duplications, and regions of DNA>1 kb exist more than once in the genome, copies of which are >90% identical [17]. If present at >1% in a population, a CNV may be referred to as copy number polymorphism (CNP). The highest number of structural variations (SVs) and copy number variations occurred in the samples MD101 and B31. The copy number variation in the MD101 genome was about 4 times more compared with other A. m. jemenetica genomes. About 22.1 to 33.8 of the structural variation in the samples’ sequences were in the form of deletion, and Table 5 lists the detailed distribution and annotation of the structural variations in all A. m. jemenetica samples. Copy number variations among individuals of the same species may reflect environmental response, for example, sensory perception and immunity [18, 19], and is considered an impotent genetic variation among populations. About 45–91% of copy number variations were downregulated. Table 6 lists the detailed distribution and annotation of the copy number variation in the honeybee A. mellifera.

4. Discussion

One of the striking features of A. mellifera is the high recombination between chromosomes, which favors effective natural selection. Data presented here are the whole-genome resequencing of the native honeybee subspecies of Saudi Arabia A. m. jemenetica. Results demonstrated high genetic variation in SNPs, SV, InDels, and CVN among A. m. jemenetica samples and in comparison to the reference A. mellifera genome sequence. These genetic variations can be used in the molecular characterization of A. m. jemenetica and related subpopulations [20]. Furthermore, adaptive structural and behavioral traits associated with the natural occurrence of A. m. jemenetica can be explored when data are subjected to further analysis [11]. The very high CNV in two samples (MD101 and B31) can be related to nonallelic homologous recombination during meiotic recombination and may demonstrate direct interaction with different environments [19]. Furthermore, high GC content in some genomes of the native honeybee of Saudi Arabia may indicate higher recombination rates [11] or higher genetic diversity [11]. Single-nucleotide polymorphisms (SNPs) are variations caused by the changing of a single nucleotide (A, T, C, or G) in the genome. The SNPs, including switch and reverse of single-nucleotide bases, are responsible for genome diversity between species and among individuals of the sample species. Analysis of SNPs among honeybee subspecies across genomes has been used in testing hypotheses concerning the ancestral origin of the western honeybee A. mellifera and the relationship among a large number of its subspecies [2]. SNP data created from this publication may help in further understanding of A. m. jemenetica evolution and natural selection. Additionally, a positive correlation between SNP number and GC content can indicate many remarkable evolutionary features [11] and molecular adaptive characteristics [11]. Previous studies indicated that the GC-rich regions have more mutations [11], and this may be the case for the native honeybee populations in some regions of Saudi Arabia. InDels refers to insertion mutation, deletion mutation, or both, including events in the early stage of evolution, and might help in explaining levels of gene expression that disrupt the production of essential proteins [21]. Copy number variation (CNV) is an important form of structural variation among individuals of the same species which may indicate long-term adaption to the unique environment of the Arabian Peninsula; CNVs of genes coding to vitellogenin or heat shock proteins or drought tolerance may unveil molecular aspects of adaptation of this honeybee subspecies compared to others.

5. Conclusion

This is the first genome resequencing of the Arabian honeybee A. m. jemenetica. A honeybee that is characterized as unique in many aspects (size, heat tolerance, and natural distribution). Whereas the complete mitochondrial analysis of the Arabian honeybee from Saudi Arabia focuses on subspecies affiliation [8], the whole-genome sequences focus on in-depth genetic variation among the Arabian honeybee populations within Saudi Arabia. The outcome of this study can be used in the characterization, selection, and breeding of A. m. jemenetica.

Data Availability

The genomes have been uploaded to the National Center for Biotechnology Information (NCBI: Sequence Reads Archives) and were assigned the following accession numbers: https://www.ncbi.nlm.nih.gov/search/all/?term=SAMN16378534 to SAMN16378541.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the National Plan for Science, Technology, and Innovation (MAARIFAH), King Abdul-Aziz City for Science and Technology, Kingdom of Saudi Arabia, grant no. 13-AGR1443-02. We are grateful to local beekeepers who allowed us to sample their apiaries and to Iftikhar Rasool and Arif Single in morphological slide preparation.