Assembly and Annotation of Ashkenazi Reference Genome

We describe the assembly and annotation of a new, population-specific human reference genome.  We used publicly available data for HGP HG002 individual from Ashkenazi jewish trio, available from Genome In  A Bottle (GIAB) project.  The new reference that we call Ash1, is more complete than the human reference GRCh38. While GRCh38 is a mosaic of five different individual genomes, our reference represents a single individual. The Ashkenazi reference genome, has 2,973,118,650 nucleotides placed on the chromosomes as compared to 2,937,639,212 in GRCh38. We annotated the genome by transferring the CHESS annotation from GRCh38 genome. The new annotation identified 20,157 protein-coding genes, of which 19,563 are >99% identical to their counterparts on GRCh38. 40 of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Alignment of DNA sequences from an unrelated part-Ashkenazi (~70%) individual to Ash1 identified ~1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes.


Alexey Zimin

Johns Hopkins University, Baltimore, MD, USA

