Round-up Jan 16th- 31st

Updates to the human germline editing saga


  • In a long piece in the New York Times Magazine on paleogenomics, takehomes here,  Gideon Lewis-Kraus points to the field’s major and recent successes, but also highlights how some archeologists fear that it traffics in “grand intellectual narratives” that history warns us against. This is an indication of culture wars between geneticists and others.


  • A study that doubled the number of microbial genomes available (150,000) by producing data from metagenomic studies including those covering non-Westernized countries. They grouped their ~150,000 new sequences in to 5000 “species bins”, 77% of which were novel. The new sequences dramatically increase the mappability of samples to 87%.
  • A fine grained (682bp) map of where crossovers occur on chromosomes, from Decode. The CEO of Decode, and author, Dr Stefansson (source): “The classic premise of evolution is that it is powered first by random genetic change. But we see here in great detail how this process is in fact systematically regulated – by the genome itself and by the fact that recombination and de novo mutation are linked. We have identified 35 sequence variants affecting recombination rate and location, and show that de novo mutations are more than fifty times more likely at recombination sites than elsewhere in the genome. Furthermore, women contribute far more to recombination and men to de novo mutation, and it is the latter that comprise a major source of rare diseases of childhood. What we see here is that the genome is an engine for generating diversity within certain bounds. This is clearly beneficial to the success of our species but at great cost to some individuals with rare diseases, which are therefore a collective responsibility we must strive to address”
  • Polygenic score for lifespan, explaining 1% of the phenotypic variance, which is 5% of the heritability. Those in the top 10% of the score can expect, on average, to live 5 years longer than those in the bottom 10%.
  • GWAS for risk tolerance and risky behaviors, with genetic overlaps found between different “risky” phenotypes, and with various personality traits.



  • More powers for DNA forensics. As of the beginning of January, the Rapid DNA Act comes into force. Rapid DNA machines sitting inside police stations allow police officers to obtain sequence results in 90 minutes. The Act allows for police to upload this data to CODIS, the National DNA database, and look for matches to e.g. previous crime scenes.

In other news: The preprint server for biology BioRxiv, just turned five. In 2018, about 1711 preprints were posted per month, and in October there were over 1 million downloads. A project called the Rxvist allows users to see which preprints are generating the most twitter attention. I have added this to my bookmarks, and will be using it to help inform this round-up from henceforth!


Round-up Dec 22 2018 – Jan 15 2019

Focus on polygenic risk scores

First, let’s look at a paradigmatic example of a polygenic score publication. Inouye et al constructed a Polygenic Risk Score for CAD, gaining a hazard ratio of 4.17 for those in the top 20% compared to bottom 20% of their score. It is better as a single predictor (based on Area Under ROC curve, also known as C-statistic) than any one of smoking, diabetes, hypertension, body mass index, self-reported high cholesterol, and family history (it does not do as well as all of them put together). They conclude that their score “strengthens the concept of using genomic information to stratify individuals for CAD risk in general populations and demonstrates the potential for genomic screening in early life to complement conventional risk prediction.”

In reaction to articles such as this, several clear lines of criticism have emerged

  1. The scores are only applicable in the ancestral population that they were developed in. Combine this with the well publicised fact that almost all studies are on Caucasian populations (reviewed here), and that the assays used are SNP chips whose genetic variants were chosen based on frequencies of variants within European populations, and several issues are immediately apparent. As an alternative to producing scores separately for each ancestral population, a suggestion that studies based on African populations would be less biased and more generalizable to other populations. It is  based on the simple fact that non-African populations have been subject to more genetic drift – i.e. change in genetic variant frequency because of small population sizes. It is also the case that there are hazards aplenty in using differences between populations to infer anything about genetics, and particularly about natural selection (see g.g. this article).
  2. Hazard ratios have to be very high to be useful as screening tools. In an article that has been well circulated on twitter, “The illusion of polygenic disease risk prediction”, the authors point out that “the paradox is largely explained by the fact that odds ratios or hazard ratios typically compare risks in the tails of a single risk distribution, but these ratios ignore the proportions of individuals who will or will not develop the disease that fall in the region between the tails of the distribution”. The first author, Nicholas Wald, has been pointing this out for a long time. Note that this does not just apply to genomics, in his 1999 paper Wald takes as its case study cholesterol levels for heart disease, and shows how poor a screening test this is. (They state that no future polygenic risk score will produce a high enough relative risk — it would be good to check this, based on a score that captured the full heritability estimates for a given trait.) This argument ought not to be news. I enjoyed this slide deck by epidemiologist Cecile Janssens that traces the history of the prospect of predictive genomic tests, and some of the known pitfalls.
  3. The role of the environment means that genetics is often not as useful as these scores would suggest. If the environment changes, e.g. all people stop smoking, then the polygenic scores change too. If the genetics is mediated by an independently measurable and modifiable intermediate phenotype, e.g. cholesterol levels, then it is much less useful to know the genetics. Though see the Inouye paper showing that their score is relatively independent of other known risk factors.

These skeptical voices are not preventing a full-scale rush to applications. Color announced a 100,000 person initiative to use low throughput whole genome data to provide individuals with polygenic scores.



Science and Applications

  • Antonio Regalado has summed up the top advances in Genetics from 2018 — seeing it all in one place is definitely impressive.
  • The latest chapter on heritability, from a study of Aetna’s database of insurance claims covering about 45m individuals. Their dataset has over 56,000 pairs of twins born since 1985, and over 700,000 sibling pairs. They connect zipcodes to environmental factors of interest — SES, air pollution and weather/climate. They found that variance from these measures was much lower than from genetics and shared environment, with obesity being the phenotype with the strongest link to SES (var=0.027). Monthly cost of data was estimated at 29% heritable and 30% due to shared environment. The respective figures for co-morbidities were 43% and 24%.
  • There is often concern that receiving ambiguous results can lead to increased worry for individuals. But a new study based on a sample of over 5000 women receiving HBOC genetic risk testing fond that receiving uncertain results did not increase worry among women compared to a negative result.
  • The BabySeq project reports on results of exome sequencing of 159 newborns (127 healthy and 32 in the NICU). Of these 15 (9.4%) had genetic variants associated with a disease that could be managed in childhood. Genomic sequencing for newborns remains a contentious area.
  • An AP poll found 70% of Americans supportive of genetic editing “to prevent an incurable or fatal disease a child otherwise would inherit, such as cystic fibrosis or Huntington’s disease”, about two thirds to “prevent a child from inheriting a non-fatal condition such as blindness, and even to reduce the risk of diseases that might develop later in life, such as cancers”, and about 70% oppose “using gene editing to alter capabilities such as intelligence or athletic talent, and to alter physical features such as eye color or height.” I can’t find any original data, just reports e.g. here.
  • I thought this was an interesting story about how much of an impact the classification of a disease can make — in this case, the efforts to have schizophrenia classified as a brain disease so that it was covered by a new CDC program. Why does this matter? Mental conditions receive less funding and health insurance is often less generous. Strong echoes of dualism here.