At the Wellcome Collection near Euston, the human genome sequence is printed in multiple volumes, filling an entire bookcase. On opening a book I read line after line of ACTG but can’t make any sense of it. What is this book and what do these letters mean?
The Human Genome Project Consortium published the first draft sequence of a human genome in 2001 but it contained many gaps and there was uncertainty about the order of sequences in some areas, particularly in repetitive parts. Thanks to improved technologies, many of these gaps have been filled in and several ‘whole genomes’ have been sequenced. This means there isn’t really any single ‘human genome sequence’. The sequence which comes closest is the reference genome, the current release of which is known as GRCh37. This is assembled by the Genome Reference Consortium and is not the genome of a single person, but based on a consensus between four individuals from Buffalo, New York. In most cases it contains a single sequence at each chromosomal location.
This reference genome is not meant to be an ‘average’ genome, but is designed as a map to show the location of genes and other genomic regions relative to one another. Along with improved sequencing technology and assembly software, the reference genome is part of the reason the time taken to sequence genomes has dropped from years to days. This opens the door for genomes to be sequenced in a clinical setting as an alternative to genetic tests in diagnosis, or to determine the best treatment for a patient.
Some of the largest remaining gaps in the reference genome are caused by heterochromatin: a tightly packaged form of DNA that is involved in silencing genes and in chromosome structure. While most known genes are in euchromatic (loosely packed) regions, around 10% of the genome is heterochromatic and cannot be sequenced with current technologies. Work is ongoing to develop technologies that can read this hidden sequence and begin to reveal its secrets.
So what lies ahead for the human genome? GRCh38 is due to be released this summer with even more of the gaps filled in and improved accuracy in chromosome co-ordinates, but one area still to explore is the function of the different parts of the genome. Traditional protein-coding genes account for just 2% of the genetic material, meaning there is 98% left to decipher. The ENCODE (Encyclopaedia of DNA Elements) project aims to do that, using high-throughput techniques to find out which areas of the genomes are associated with certain proteins and functions. Their first results have just been published and show a wide variety of functionality across the genome, but there are many mysteries still to be solved about how our genes determine who we are.
Image from Microbiology Powerpoint templates