Centromere sequence, structure, and evolution

Centromeres are specialized chromosomal regions that ensure the accurate inheritance of genetic information. At the sequence level, centromeres are comprised of near-identical repeats known as alpha-satellite, which are 171 bp long, organized in tandem, and can span multiple megabases on each chromosome. Because of the repetitive nature of these regions, centromeres have posed an enormous challenge to standard short-read sequencing and assembly methods, and consequently, all centromeres are absent from the human reference genome. During my postdoctoral training, I developed a sequence assembly method that combines two long-read sequencing data types (Oxford Nanopore Technologies ultra-long reads and Pacific Biosciences high-fidelity reads) to generate the first complete sequence of a centromere on a human autosome, chromosome 8. I also applied this method to resolve every remaining gap on chromosome 8, thereby generating the first telomere-to-telomere sequence of a human autosomal chromosome

Screen Shot 2021-08-06 at 5.57.25 PM.png

The sequence, structure, epigenetic, and evolutionary map of the human chromosome 8 centromere. This centromere consists of a 2.08 megabasepair (Mbp) D8Z2 alpha-satellite higher-order repeat (HOR) array flanked by blocks of monomeric/divergent alpha-satellite. The D8Z2 HOR array is heavily methylated, except for a small, 73 kbp region that is hypomethylated. This hypomethylated region is centered within the 632 kbp centromeric chromatin domain, marked by the presence of the histone H3 variant, CENP-A. A pairwise sequence identity heat map reveals five major evolutionary layers and a mirror symmetry characteristic of active sequence homogenization in the core of the HOR array.

Screen Shot 2021-08-06 at 10.26.00 PM.png

The sequence, structure, and evolutionary map of the chromosome 8 centromeres in chimpanzee, orangutan, and macaque. All three centromeres have a layered and symmetrical sequence organization similar to that observed in humans.

Analysis of the structure of the human chromosome 8 centromere revealed that it is comprised of five major evolutionary layers that are symmetrical in nature. To better understand the evolution of the chromosome 8 centromere, I generated complete sequence assemblies of the chromosome 8 centromere in chimpanzee, orangutan, and rhesus macaque and used these assemblies to reconstruct the evolutionary history of this centromere over the last 25 million years. I found that each centromere has the same layered and symmetrical organization observed in the human ortholog. Additionally, I confirmed that the alpha-satellite HOR structure evolved after apes diverged from Old World Monkeys less than 25 million years ago. Phylogenetic comparisons of the chromosome 8 centromeres revealed that it is evolving at least 2.2.-3.8 times faster than the rest of the human genome and is one of the most rapidly evolving regions identified. These findings support a model of centromere evolution where highly identical alpha-satellite repeats expand in the core of the centromere and push older, more divergent repeats to the edges in an assembly line fashion. 

Human artificial chromosomes with non-repetitive centromeres

Screen Shot 2021-08-06 at 11.19.51 PM.png

Metaphase chromosome spreads containing a non-repetitive human artificial chromosomes (HAC; green). Non-repetitive HACs are able to form a functional centromere (marked by the histone H3 variant CENP-A; red) that ensures their stable propagated in cells for long periods of time. Scale bar = 10 microns.

Human artificial chromosomes (HACs) are engineered mini-chromosomes that acquire a functional centromere and are stably maintained in human cells. They have the potential to transform synthetic biology and permit the development of numerous radical developments in medicine because they can be used deliver genes or other DNA elements without integration into the host genome. Despite their utility, HACs are considered difficult to engineer because they typically require repetitive centromeric DNA sequences that can complicate cloning, handling, and their stability in bacterial propagation. Overcoming the barrier of repetitive centromeric DNA would accelerate HAC development for their use in the clinic. During my 

Ph.D. training, I developed a new type of HAC that is completely devoid of repetitive DNA. I identified a sequence from chromosome 4q21 that forms a functional centromere on HAC DNA, which enables its stable propagation in cells for months. This new type of HAC surmounts barriers that have limited the progress of the construction of a synthetic human genome.