Coronavirus: up until May 1

A vast amount of research has started to understand how human genetics interacts with the new coronavirus. 

The below is written with what is an ever growing interest for me in my work in bioethics: the significance of individual biological difference, and how we let this influence our lives; and the difficulties that stem from the use of the population concept. I have drawn attention below to the all-too-easy-to-make differences between populations that are appearing in the literature.

Humans react very differently to infection with SARS-CoV-2. Some of this is because of genetic differences. Understanding these differences could help our understanding of disease processes — severity, outcomes, how the immune system interacts with the virus. This could lead to better disease management, possibly by suggesting therapeutic approaches. It could also help us understand patterns of infection, possibly informative for vaccine development. These use cases are the biggest motivation. They don’t rely on any of us learning our own relevant genetic information.

Understanding what makes some people more and less susceptible to severe disease could also be used directly. Genetic information is often used in the prescription of pegylated interferon α and ribavirin for chronic Hep C infection. Finding who is least susceptible could be useful for clinical trial design, particularly if we opt to purposefully infect research subjects in a challenge trial design. Finding who is most susceptible could help identify who has to take extra precautions. Recognizing that a diverse array of genotypes can have large impacts could ensure that individuals with all those genotypes are represented in therapy trials and vaccine trials, so we know that results generalize.

Before we turn to human genomics, what do we know about the virus’s genome?

Before SARS-CoV-2, 6 human coronaviruses were known: SARS, which killed 774 of 8096 infected in 2002-3; MERS, which killed ~600 of ~2000 infected starting in 2012; and four others that cause milder symptoms, which collectively account for a third of all colds. There are also hundreds of coronaviruses that infect other animals, notably bats. In 2013, a bat virus very close to SARS was identified, suggesting direct transmission from bats.

The first genome of the new coronavirus was published on January 10: ~30,000 bases, 14 ORFs encoding 27 proteins. The NYT produced a beautiful tour of the viral genome.

On 21 January, a group posted to the bioarxiv a theory that the virus may have been engineered in a lab. Others pointed out mistakes in the analysis, and it was quickly retracted.

The first report of viral sequences from nine Wuhan patients showed 99.8% sequence identity to each other, and hence a recent common origin. It also showed sufficient divergence from SARS-CoV to be classified as a different virus. 

Based on homology to SARS-CoV, SARS-CoV-2 was predicted to also use the human protein ACE2 to enter cells. The receptor binding domain on the spike protein has a strong effect on infectivity. (It is known which mutations might lead to greater infectivity; it is hoped these will not be selected for.)

Multiple sequence alignment has been used to trace how the virus is spreading. This is a key technique in “precision public health” which relies on using slight changes between pathogen genomes to trace how the infection spreads through populations. See a global effort called NextStrain— with lots of beautiful graphics. A recent (but non-covid) review of pathogen genomics. The CDC has been a late adopter compared to e.g. the UK, but there are several very large scale efforts using next generation sequencing for pathogen tracking in the US. The UK announced a £20m effort to track the pathogen using viral genomics. Another great graphical piece from the NYT showing how the virus has gained mutations, and how this can let us track the virus.

An April 29 preprint showed two clades, six subclades, and some evidence of convergent evolution to affect how strongly the spike protein binds to ACE2. Mutations in some subclades could evade some current tests.

(Of interest: As reported by MIT Technology Review, 20 years ago it was demonstrated that, starting with viral DNA, a virus would “reboot” in a cell. That meant that as soon as the viral genome was published, any lab with access to something that could “print” DNA, and that had the know how to go from there, could make the virus (and any variant thereof). Only a few places can print DNA. They can choose whether to fulfill an order. Part of the process is that they compare incoming orders with a database of known pathogens (e.g. polio), to give one layer of control.)  

Host genomics

COVID-19, the disease that can result from infection with SARS-CoV-2, is a product of the virus, the host, and the environment. The genome of the host can affect the impact of a pathogen, with both rare and common human variation known to play a role. The most famous example is the allele for sickle cell anemia: heterozygotes are protected from malaria. Another famous example is CCR5‐Δ32 which confers resistance to some HIV strains (this was the edit made to the first genetically engineered humans, born in 2018). 

A 2017 review stated “Despite their limited application in the field, GWASs [genome wide association studies] have provided valuable insights by pinpointing associations to both innate and adaptive immune response loci, as well as novel unexpected risk factors for infection susceptibility.” It also stresses that heterogeneity across populations is particularly important in the setting of host response to infectious disease. Things also get complicated by the presence of viral strain – host genotype interactions. We have evidence from this in HepC.

A 2016 study titled “Genetic Ancestry and Natural Selection Drive Population Differences in Immune Responses to Pathogens” showed that, when exposed to listeria and salmonella, many genes showed differential expression in white blood cells between individuals of European ancestry and individuals of African ancestry. They traced these differences to genetic variants, and further argue that the differences between populations are due to recent natural selection.

How important are genetic differences in understanding the differences in how SARS-CoV-2 affects individuals? An early estimate of the heritability of susceptibility to SARS-CoV-2 infection puts it at about 50%. This is hard to estimate as there are so many confounders, for example poverty.

The standard tools for identifying which genetic variation makes a difference are genome wide association studies. A preprint shows that early variants that show up as genome wide significant in the UK are associated with higher educational attainment and better health, and hence probably capture who was traveling, rather than anything about disease biology.

Which genes are known to be involved in how SARS-CoV-2 affects the body? 

What human variation has already been identified that makes a difference for COVID-19?

  • ACE2
  • The Interferon Lambda Region has a role in controlling the expression of ACE2. Polymorphisms in the region have been linked to ACE2 expression levels in diseased tissues (preprint from an Oxford group). Variation in this region has previously been associated with Hep C infection outcomes (but with effect varying by virus genotype). The protective allele is found more frequently in East Asians and its absence in those of African ancestry. They conclude “the overall impact of this polymorphism on the clinical course should be assessed, especially given the very variable distribution of IFNL4 alleles in different ethnic groups”
    • A preprint from Italy found different haplotypes between East Asians and Italians, with two rare alleles of interest predicted to induce higher levels of TMPRSS2 in Italians. One suggesting possible regulation through androgens (and hence possibly linked to sex differences), and the other already linked to increased susceptibility to flu.
    • A preprint from the LungMap consortium found evidence of increased expression of TMPRSS2 with aging, and identified a regulatory SNP that contributes to expression levels.
  • Interferon-induced transmembrane protein 3 (IFITM3) – a variant linked to more severe disease. The study from China compared mild to severe cases and found the homozygous variant rs12252 was much more common in the severe cases (p = 0.00093; OR = 6.37). The variant was previously associated with flu severity, and is found at much higher rates in those of East Asian ancestry (e.g. carried by ~26% of the Beijing population). 
  • HLA proteins. A class of human leukocyte antigen (HLA) proteins sit at the cell surface, presenting short sections of protein (peptides) for recognition by T-cells. If the T-cells recognize non-self, they react appropriately. Which peptides are shown? That depends on what’s in the cell — if a virus is present, it can include sections of virus proteins. And it depends on the precise structure of the protein’s binding grove, which is hyper-variable between humans. The extent to which an individual’s HLA cells have binding groves that bind bits of the viral proteins could therefore affect the body’s immune response is. 
  • Blood groups (A more susceptible, O less so)
  • Meanwhile, polygenic scores for disease severity have already been prouced

Several large scale studies are starting

There have also been some commitments to the ideal of data sharing


  • Some of these references are to preprints, which are not peer reviewed. Others have been rushed through printing. I am pretty confident that not all of this will stand the test of time.
  • So much is standardized in the genomics workflows, and so much of the data is publicly available, that it is very easy to put out papers that look vaguely sensible. Here is a preprint looking at ancestry differences, which I remain very unconvinced by

Catch-up Dec 9th – April 30th

I will cover coronavirus separately. In the meantime, a lot has been happening in genomics more broadly. I’ll draw your attention to:

  1. Ongoing controversy about Precision Medicine. Two pieces question the prominence it receives Precision medicine: course correction urgently needed and Promises and perils of using genetic tests to predict risk of disease. Meanwhile The Personalized Medicine Coalition has helped put together a new bipartisan caucus to advance support for personalized medicine in Washington.
  2. A host of new common disease/complex trait work (separated into its own section below), including GWAS for self-reported childhood maltreatment, income, alcohol intake, lifespan. A few papers that have looked at including polygenic scores in clinical models, with mixed results.
  3. A remarkable amount of legislative efforts, including a Bill on the Governor of Florida’s desk that would close some GINA loopholes, and efforts in Utah, New Jersey, and California
  4. DNA from detained immigrants is being added to the federal national DNA database, CODIS, which was originally established to track violent federal offenders



  • From October, prime-editing, akin to “search and replace” rather than “cut and paste”, which doesn’t require a double strand break “Prime editing substantially expands the scope and capabilities of genome editing, and in principle could correct up to 89% of known genetic variants associated with human diseases.” In the first few weeks post the publication of this article the researchers were inundated with requests for constructs.
  • Some updates from All Of Us: 339,000 consented participants, 265,000 completed the full protocol, 50% from racial and ethnic minorities, 80% from populations underserved in biomedical research. “This is more than just a data resource. It’s an ecosystem” that includes data analysis tools and participant outreach.
  • Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, reporting on the matched tumor-normal sequencing of 2,658 individuals. Key findings included: average of 4-5 driver mutations per cancer; only 5% tumors lacked any identified driver mutation; large rates of  chromoplexy (17.8% of tumours) and chromothripsis (22.3%); change of mutational signature over time in 40% of tumors; driver mutations can occur years before cancer diagnosis. 
  • The NHGRI have published their draft Genomics 2020 strategic plan. They invite comments.
  • Anti-CRISPRs exist in the wild, as part of the viral counter-attack to the bacterial counter-attack that is the main function of CRISPR systems in bacteria. There are also several other attempts to produce a “stop button” for gene editing, which should have as a by-product editing which can be better targeted to particular cell types and to reduce off-(sequence)-target effects. (Link to an easy to read nature news feature summary).


Polygenics: Common diseases/complex traits




Regulation etc

Catch-up Sept 3rd – Dec 8th

I started a fellowship at Harvard this fall, and it’s taken me a while to find the time to sit down to do a round-up. There’s been a torrent of news, covering DTC player expansions, major new initiatives, and law enforcement’s use of DNA technology – as well as a host of other topics. We’re a year out from the birth of the first genetically modified babies, and full swing into the polygenic turn. Diversity remains a governing watchword.








Regulation etc

Round-up July 23rd – Sept 2nd

The polygenics of sexuality (see below) has been in the limelight. I appreciated that the dialogue around this study got into some of the “so what” of this type of work. In a post on Medium I reflect on the fact that we seem to be a bit picky and choosy about which traits we are willing to embrace as having a genetic etiology. A recently published study reveals some other biases: Antisocial behavior is judged as less genetically influenced than prosocial behavior. Probably because people wanted to hold others to be blameworthy for their actions.

I continue to think that we’re racing into a future where we have some handle on biological differences between individuals (via probabilistically predictive scores of various types, genetic and otherwise), and that we lack the necessary frameworks for knowing what to do with this information. Putting my time where my worries are, starting today I’m officially employed to think about exactly these questions. 



  • The “no single gay gene” study in Science, covered by numerous news outliets, found that the trait ever versus never had same sex sex is ~32% (95% confidence interval 11-54%) heritable, and is polygenic. From a scientific point of view, that result is completely unsurprising, just shoeing that this trait is like all others studied. One thing that was different about this trait: male-female genetic correlations were considerably less than the usual one. Secondary results from the study included that personality traits, mental health disorders, and risky behaviors were genetically correlated with the trait, but not physical traits. They also looked at the proportion of same-sex to total partners, and found that this trait was not that highly correlated with the ever/never had same sex sex (0.73 for men, 0.52 for women). Their conclusion? Sexuality is complex, and can’t be captured by a single dimension, as scales like Kinsey’s try and do. The study is an interesting case example of how ethics was engaged with. 
  • Polygenic scores cannot be interpreted as a “genetic endowment” because the predictive power can come from a number of other sources: population stratification, assortative mating, and genotype-environment correlation (rGE). They found that the predictive power of polygenic scores for cognitive traits was higher (by 60%) between families than when looking at non-identical twins. But if socioeconomic status was controlled for, much of the difference disappeared. They conclude that socioeconomic status contributes to the between family predictive power of cognitive polygenic scores, acting as an “environmentally mediated parental genetic effect”, a type of genotype-environment correlation. 
  • A preprint that covers the potential effects of selecting embryos on the basis of polygenic scores. When selecting between ten embryos, with 20% of the variance captured by the polygenic score, the gain (selecting for highest score embryo versus average embryo) would be ~0.5 of a standard deviation, with a 95% confidence interval of -1.2 to 2.3. Compare to the standard deviation of 6cm for height and 15 IQ points. 
  • A preprint of a large (135,000) Finnish study on polygenic scores for a few major diseases. Comparing the top 2.5% to the middle 20-80%, hazard ratios of 2.0-4.3 (depending on disease). They conclude that the performance of the polygenic scores was similar to established clinical risk scores.
  • An example of sociogenomic research: finding that some polygenic scores are associated with exposure to bullying.
  • A preprint that points to a way polygenic scores could be made clinically relevant: by focusing on the variants that contribute to polygenic scores that also fall within gene sets tied to particular drugs. 
  • Polygenic score for Alcohol abuse, 
  • More evidence for the common genetic underpinnings of various psychiatric disorders.
  • Some genes in humans are up-regulated after death



  • Erik Parens and Paul Appelbaum consider the evidence for the widely predicted negative psychosocial impacts of genetic testing. Years of study have shown that individuals do not end up more depressed or anxious or stressed (though there may be transitory effects. Note the recent Hastings Center Special Report containing many papers on this topic). They see a path forward “Using more subtle measures than merely asking people if they feel depressed or anxious” to show that “genetic results can affect people in real ways.” Here are the examples of this newer generation of work that they point to:
    • A study that shows those were APOE e4+ and knew rated their memory worse and performed worse on a memory test than those who were positive and didn’t know. 
    • A study where individuals were told whether they had a genetic variant associated with decreased exercise capacity independently of their genetic results found that those told they had the variant performed worse.
    • A study that shows the impact of perceived genetic cause of obesity, with subjects more likely to eat more if they thought it was genetic.



Round-up May 29th – July 22nd

In addition to various news sites (GenomeWeb, STAT) and newsletters (e.g. GA4GH) I usually use Rxvist to see what’s new. They show which pre-prints have been generating the most tweets. Except this month, that functionality is broken, so they’re showing the most downloaded articles instead — many of which were not published recently. The top hits in genetics/genomics include face prediction, tutorial for how to construct polygenic scores, ancestry, the effects of an extra X, and single-cell methods.

The stories that have been grabbing my attention are those that link genetics and identity. An excellent interview  interview with Dorothy Roberts, author of  Fatal Invention: How Science, Politics, and Big Business Re-create Race in the Twenty-First Century, discussed the relationship between genetic ancestry and race. Some selected excerpts

  • “Some people think it’s harmless to believe in biological differences between races as long as we don’t value one over another, but the whole point of dividing humans into races is to value some more than others.” 
  • “Racism isn’t a product of race. Race is a product of racism. People think it’s OK to categorize people by race as long as they’re not racist, but any division of people into supposedly natural races promotes a racist agenda, whether we intend it to or not.” 
  • On the eugenics movement: “If certain groups of people are at a disadvantage, the thinking went, it must be because of their biological inferiority, not due to state violence and structural inequalities…. Such thinking is still used to explain social inequality in the present: If we believe biology produces these unequal social and economic conditions, then how can they be immoral and in need of change? They are “natural.” The situation can’t be changed.” 
  • On where you look for explanations: “Have you looked into the fact that black children get expelled and arrested at far higher rates than white children for the same behaviors, like missing school, talking back to a teacher, or roughhousing? Have you looked for any explanations other than within the gray matter of their brains?”” 
  • “If you add a new technology to an already racist system, you’ll get another racist product.” 
  • And her conclusion: “I don’t believe we should be “color-blind,” that we shouldn’t pay any attention to race. As a political invention, race continues to determine power arrangements and is not going to just go away. We have to dismantle racist institutions to affirm our common humanity. And to do that, we need to understand how the concept of race really functions.” 

An interesting study on what happens when white nationalists find out some of their ancestry is not European. A study looked at the responses individuals in a white nationalist group got when they posted their unexpected results — the vast majority of comments they received focused on potential inaccuracies of the testing. The authors conclude: “White nationalism is not simply an identity community or political movement but should be understood as bricoleurs with genetic knowledge displaying aspects of citizen science.”

Genetic identity is key to the debates over donor conception. In the news recently have been reveals of prior mix-ups in sperm donor conception based on genetic testing. One couple whose children were conceived with a donor they didn’t select is revealing of attitudes within the process: ““I didn’t choose someone who has a history of brain cancer in the family. I would never have chosen this donor. They should be ashamed to even have this donor on the website.”” The position of the courts is that there are no grounds to sue if the child is healthy. Meanwhile a Singaporean court defined a new type of loss, a loss of genetic affinity, to deal with a case of a couple who unintentionally ended up with a bi-racial child.



Regulation etc

An interview with George Church touching on many future tech possibilities: “Just being different at all from the middle of the bell curve gives you an advantage in a part of society that cherishes innovation and out-of-the-box thinking.” 

Round-up April 26 – May 28

I’ve started this round-up with recent papers focused on two scientific themes that will dominate the near term progress in understanding links between genotype and phenotype, 1) trouble ahead for polygenic scores, and 2) the coming together of rare and common variation analysis.

Trouble for polygenic scores

Previous work on the genetics of human height, based on the GIANT consortium, had identified evolutionary adaptation signatures to explain the North-South height gradient. But two new studies applying the same methodology to the more homogenous and larger UK Biobank data found no such evidence. They did find that the same SNPs were identified, and with similar effect sizes. But population structure biases these effect sizes. This is particularly problematic on meta-analysis that combine heterogeneous data sources. And is much worse if sub-significant SNPs are included. This should cause extreme caution when a) looking for signals of polygenic adaptation, b) between-population differences. Additionally, this population structure can be “an additional source of error in polygenic scores and affect their applicability even within populations.” (paper 1). As “even small differences in ancestry will be inadvertently translated into large differences in predicted phenotype” (paper 2). These results are nicely put in context here, with a quote from a former teacher of mine, “The methods developed so far really think about genetics and environment as separate and orthogonal, as independent factors. When in truth, they’re not independent. The environment has had a strong impact on the genetics, and it probably interacts with the genetics,” said Gil McVean, a statistical geneticist at the University of Oxford. “We don’t really do a good job of … understanding [that] interaction.”

Question: what would the same analysis show for the Educational Attainment polygenic score? Which stands as the other score based on very large heterogeneous data, and utilized many non-significant SNPs.

A separate preprint shows how, even within an ancestry group, porting polygenic scores has its challenges. “The prediction accuracy of polygenic scores depends on characteristics such as the age or sex composition of the individuals in which the GWAS and the prediction were conducted, and on the GWAS study design.”


Coming together of rare and common variation.

We’re getting to the stage of having very large cohorts of NGS data. What will we learn of how common and rare variation jointly contribute to disease? And what implications does this have for the clinic?

An exome cohort of over 20,000 T2D cases and 24,000 controls, representing one of the largest studies yet using NGS data. For 76% of their cohort they also had array data plus imputation. The broad relevance of this type of study lead me to read this paper fairly closely. What did they find?

  • Looking exome wide, of the 6.3 million variants in their data set, 15 were exome-wide significant. They were powered to find variants with an effect size of OR 2.5 at a frequency of 0.2%
  • They aggregated to the gene level and found 3 significant genes. Looking at the near misses in other datasets leads them to think that these will become exome wide significant in the future. They estimated that the top 100 gene level signals would capture a mere ~2% of the genetic variance in their sample.
  • Then they aggregate another level up, at the gene set level, only drawing the weak conclusion that this line of work “can be used as a potential metric to prioritize candidate genes relevant to T2D.”
  • They found almost all the variants they had found in the exome data in the array data (8 of the 10 single variants), and then 14 more non-coding in the array data. The vast majority of their overall variants were not imputable. Because the array data identified common variants, it explained more of the genetic liability
  • The basic issue is that they continue to be underpowered to a) find rarer variants (<<0.2%), b) accurately estimate their effect sizes. They suggest, as an antidote to (a), relying on prior suspicions of a gene-disease connection to narrow search space (and hence lower threshold for detecting significance).
  • They conclude that for research, GWAS are best for “locus discovery and fine mapping”, and NGS for gene characterization and confidence in gene-disease connections.
  • And for personalized medicine, the very rare variants of large effect sizes may be useful, but these are so rare as to complement (rather than replace) polygenic scores based on array data.



  • Large datasets such as the UK Biobank are showing that a lot of the candidate gene work was spurious. Here is a piece from Ed Yong focusing on SCL6A4 and its connections with depression, which about 450 papers investigated. Now many are claiming that there is no evidence that the connection exists (and indeed that this has been clear for years now). But some are saying that we know the effects of this gene depend on the environment, and the new studies do not measure the environment anywhere near as accurately as needed.
  • Sarah Zhang at the Atlantic points out another enduring legacy of some of the candidate gene work. A gene called MTHFR was associated to adverse results following smallpox vaccination in a small 2008 study. Just like other candidate genes, it hasn’t stood the test of time. But MTHFR is the single gene that 23andMe gets the most questions about, by Anti-Vaxxers hoping to find their child has a variant that will get them a medical exemption from vaccination (to do this they have to download their raw data and upload it somewhere else).
  • In a preprint Plomin et al argue on the basis of ~7000 twin pairs for the existence of a substantially heritable (50-60%) p-factor, polygenic general psychopathology factor
  • A study in PNAS and a write-up in the NY Times locates several more cases where an extreme difference in smell perception can be linked to single SNPs.
  • A polygenic score for obesity, with those in the top decile 13kg heavier than those in the bottom decile by age 18.
  • A large (~30k cases, ~170k controls) GWAS of bipolar disorder identifies 30 genome wide significant variants. One scary thing: their first analysis was of a subset of the data (20k cases, 31k controls), in which they found 19 variants, 8 of which did not replicate in the combined analysis.
  • Most people who are at a 50% risk of developing Huntingtons disease do not want to know if they carry the genetic variant. Why? A study based on data from 1999-2008 found the two biggest reasons were no effective cure/treatment (66%) and inability to undo knowledge (66%).





What next for human germline editing?

The issues are somewhat clearly identified. The lack of concrete proposals is deafening.


Yesterday I attended an event at Harvard, “Editorial Humility: A Moratorium on Human Germline Editing?”, sparked by the recently published call for a moratorium on human germline editing that Eric Lander co-authored with 17 others.

Back in 2015 there was a clear call for a moratorium, with a focus on whether this is a road we want to go down at all. In 2017, the National Academies of Science and Medicine published a report that watered down this call, focusing on questions of safety and efficacy (I argued at the time against this watering down). He Jiankui pointed to this NASAM report in claiming that there was no clear writ against his decision to pursue the human germline editing that lead to the birth of Lulu and Nana. In other words, the absence of a clearly called for moratorium likely had a role in the actual use of the technology.

Is a moratorium the right approach? Eric Lander, speaking first, explained that the main aim of the new call was to seed the debate. He is backed by the NIH, represented at the event by Carrie Wolinetz. In the US there is a ban against germline editing already in place, but the world’s largest funder of biomedical research nonetheless thought using their “bully pulpit” position in support of a moratorium was the right thing to do.

A moratorium is a lightweight solution. It would be time limited from the get go. And it would leave eventual decisions with each sovereign power on a country by country basis. For panellists Betsy Bartholet and Sheila Jasanoff, this does not go far enough, and we should be aspiring to an International Treaty. Betsy Bartholet called on Eric Lander and the concerned scientific community to start the work to get a treaty in place. Eric Lander called on Bartholet and the lawyers to instead do this work. Both claimed a lack of expertise. This was concerning. An example of how things can truly fall between the cracks.

Playing devil’s advocate, I Glenn Cohen argued that moratoriums can be “sticky”, even if they have a sunset clause. Moreover, bioethicists have been talking about this for years; we’ve done all the thinking we need to. He also argued that we need to de-exceptionalize genetic modification as a technology. Many technologies, e.g. smartphones, have disturbing ethical implications. I agree with Lander’s response that Yes, we have issues across the board when it comes to new technologies, but that’s a reason to engage with all of them, not to disengage with this one.

Steve Hyman, ex-provost of Harvard, said that he was much more concerned about the use of cognitive polygenic scores for selecting embryos. Given my research interests, no surprise that I strongly agree. I’ve also published (joint with Sarah Polcz) that although scientists are making a big distinction between heritable/non-heritable, I think the bigger distinction is therapy/enhancement.  

I took two main things away from the panel.

First, everybody was horrified that the scientific community so thoroughly dropped the ball. Many others knew what He Jiankui was up to, and no-one raised the alarm. (Note that the scientific community is divided about whether blame should fall entirely on He or not.) What should be done? Eric Lander made the case that you can’t expect scientists to self-regulate because they have an inherent conflict of interest. When you work on anything you have to be the biggest believers in the upsides, the optimists. Jasanoff reminded Lander that the metro between Harvard and Kendall “runs both ways”, and that he should come to the Kennedy School more often to explore the issues from a different point of view. The question of the role of scientists seems prescient in light of the debate over the extent to which the social media giants can and should self-regulate. In this arena, bioethicists are given heat for being too conservative, for overplaying the risks of a technology and failing to see the potential benefits. So what is the right balance of roles?   

Second, as ever, there were broad calls for public debate. But when it came to concrete proposals for “deliberative explorations”, nothing. There was reference to learning about how other countries have done this. The UK is always held up as the shining example. (And the inevitable “How do they manage to achieve consensus on the use of reproductive technologies, but end up in such a mess over Brexit?”). I’ve added to my To Do list a comparative approach to the various approaches to public engagement taken around the world with respect to mitochondrial replacement therapy. Approaches that have caught my eye include the Moral Machines work, where the public were invited to state who they thought a self-driving car could kill. And something like what the folks at World Wide Views are up to. I’m very interested to hear of deliberation and public engagement that others are enthusiastic about.

Round-up March 13 – April 25

Three topics have dominated genomics happenings. First, polygenic scores: the science continues to mature, there are calls for their rapid clinical integration, there are concerns about their use, and there are commercially available products. Second, how to regulate human germline modification. Third, use of genomics in forensics.


Polygenic scores

  • Polygenic scores for cognitive traits like IQ and Educational Attainment are confounded by gene-environment effects, especially socioeconomic status. Preprint from Plomin and team. Within family predictions were ~60% lower than between family predictions for these traits, but not for traits like BMI and height. The difference disappeared after accounting for SES, suggesting that SES is part of “passive gene environment correlation” or “genetic nurture”. All genetic influences operate via the environment. Three genotype-environment correlation (rGE) mechanisms:

Passive rGE: “Parents generate family environments consistent with their own genotypes, which in turn facilitate the development of the offspring trait, thus inducing a correlation between offspring genotype and family environment ”  

Active rGE: children select, modify and create experiences

Evocative rGE: children evoke responses in their environment (correlated to their genetic propensities.)

Within family genetic differences can include active and evocative rGE effects, but not passive rGE effects, which are shared within the family.

If aiming just to maximize trait prediction, using between family based scores (i.e. calculated from unrelated individuals) is legitimate. But for causal analysis, including the use of Mendelian Randomization, within family designs are necessary.

  • Another pre-print examining the “nature of nurture”, gene-environment correlations in the context of educational attainment which also had access to the polygenic scores for the mothers. They show that both mothers’ and children’s polygenic scores are predictive of parenting style. And also that mothers genetics predicted childhood educational attainment beyond direct transmission, mostly via providing a stimulating cognitive environment.


Germline genetic modification: moratorium or no?



  • Rwanda is proposing a DNA database of all its citizens for fighting crime purposes. The plan is in its earliest stages, no legislation has yet been passed. In 2015 Kuwait proposed a similar database for fighting terrorism, but it was later struck down by the constitutional court.
  • FamilyTreeDNA let the FBU access genetic data without telling its customers, and faced a backlash for it. Now customers can opt out from law enforcement access to their data. But the company doesn’t want its customers to do that, and has launched Ad campaigns on that premise, stating it feels it has a “moral responsibility” to help solve cases.
  • Meanwhile, the growth of GEDMatch, the platform that consumers can choose to upload their genetic test results to in full knowledge that law enforcement have access to it, coupled with the ability that up to 4th cousins can be identified from these uploads, is leading to a “National DNA database by default”. So states Natalie Ram in Slate, who calls for ending familial searching, and points to a Bill in Maryland that hopes to do just that.
  • An example of DNA being used for more than an ID. In a murder case, police sequenced DNA found on the victim and found that it belonged to a black man, which changed their search strategy. They then asked nearly 400 black men who had been taken into custody in the region for DNA samples, as part of a “Race-biased dragnet”.
  • An apartment complex on Long Island is setting up a registry of the DNA of residents’ dogs, and will test dog poop to punish those dog owners that do to clean up after their pets.







Round-up Feb 19 – March 12

Two truly ginormous releases of data

  • The Biobank, ~500,000 individuals with extensive phenotypes, has released the first ~50,000 whole exome sequences (complementing the Chip data that has been around for longer).
  • The National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program has a data set of ~50,000 whole genomes (of a planned 145,000). An exciting fact about this data set is that ~30% are from individuals with African ancestry. The individuals are extensively phenotyped. Much of the genetic data (I’m unclear how much) is available on dbGap.

The size of NGS data has truly exploded. Here’s hoping that this sort of size dataset will allow us to peel back the curtain on the clinical relevance of rare variation.

Controversy — mostly China



  • 23andMe have launched a Type 2 Diabetes score. Using a freshly developed polygenic score based on their 2.5 million customers, it adjusts the score based on ethnicity and age to give not just a relative odds, but a percentage chance of developing the condition in the next x years. I was unable to confirm this as it doesn’t work on my report — perhaps because it only works for the latest chip.
  • The PeopleSeq consortium has partnered with the major projects that offer genome sequencing to healthy individuals (“predispositional screening”). They send out surveys to participants before and after screening. In their first published results covering several hundred people, they found that while most individuals discussed the results with their doctor, only 13.5% made an appointment specifically for that purpose. About 40% reported that they learnt something new about their health, but fewer than 10% made any changes. More than half were disappointed that they did not receive more actionable information. One message the authors want us to take home: patients felt empowered rather than distressed by their results.
  • A group of 8 institutions wants to see whole genome sequencing in the clinic, and have formed the Medical Genome Initiative to help establish best practices etc to make this happen.


And in other interesting things, here is a nice write-up of the extent to which humans are innately violent — tracing the debates, and pointing to the question, do we need an answer to this question? Also, whether our views on this question affect our beliefs about peace-keeping efforts.

Round-up Feb 1 – Feb 18


  • An opinion piece in STAT by Michael Joyner and Nigel Paneth against the genetic reductionism of precision medicine: “We are calling for an open debate, in all centers of biomedical research, about the best way forward, and about whether precision medicine is really the most promising avenue for progress. It is time for precision medicine supporters to engage in debate — to go beyond asserting the truism that all individuals are unique, and that the increase in the volume of health data and measurements combined with the decline in the cost of studying the genome constitute sufficient argument for the adoption of the precision medicine program.” Their piece references a 1999 lecture from Francis Collins. He explains the background to the Human Genome Project (“a public science initiative focused so sharply on the molecular essence of humankind was too intriguing and too promising to forgo”) and then lays out his vision for what we now call Personalized Medicine, including an imagined 2010 encounter where a young soker learns of his increased risk of lung cancer, which “provides the key motivation for him to join a support group of persons at genetically high risk for serious complications of smoking, and he successfully kicks the habit.”


  • Race as a biological variable. A paper that finds differences in Alzheimer biomarkers between African-Americans and non-Hispanic whites: lower cerebrospinal fluid concentrations of tau, with results varying by APOE status. In addition to a race-by-biomarker interaction, this is a race-by-genotype interaction. They argue that these uncovered links mean that a) any attempts to use biomarkers in diagnosis should adjust for race, and b) hope for better treatment based on incorporating “race dependent biological mechanisms”.
  • A review of polygenic risk scores in psychiatry. Reviews use of GWAS for “increasingly informative individual-level genetic risk prediction” for psychiatric disorders, which are not yet ready for the clinic. Does a nice job of showing that the idea of summing up the effects of many genetic variants to explain a continuous phenotype goes back to the very beginning of the study of hereditary. “ To understand how to incorporate PRS into clinical practice for patients with heritable psychiatric disorders, studies will need to assess health outcomes for various behavioral interventions, treatment regimens, and/or differential diagnoses” They make an interesting observation as regards height: “prediction accuracy is not distributed evenly; it performs particularly poorly at the extreme short end of the height distribution, indicating a larger contribution of environmental factors, large-effect rare variants, and/or other factors in these individuals”
  • Polygenic scores are trained on cohorts of diagnosed individuals. A Danish study shows that polygenic scores for depression are also predictive in a general cohort. A score of one standard deviation above the mean gives a 30% increased chance of a diagnosis of depression.


  • A look at how consumer genetic testing companies market testing for Native American ancestry, focusing on the claimed links to identity. They wonder, whilst acknowledging it is beyond the scope of their paper, “Are companies changing consumer behavior, or are consumers already coming in with certain expectations of verifying tribal ancestry and using the results as a means to accomplish this goal?” (See this paper for qualitative interviews with 100 people looking at this question). The paper references that the US government required tribes in 1934 to have a minimum “blood quantum” for enrollment. There is clear, and problematic, precedent for the use of genetics/blood to define identities. Rewind the clock to 2013, and in giving the majority opinion against a Native American father in a complicated court case, Justice Alito choose to start with “This case is about a little girl (Baby Girl) who is classified as an Indian because she is 1.2% (3/256) Cherokee.”, seemingly drawing attention to the genetic component of her Indian-ness, and in particular that it was only 1.2% (I recommend the More Perfect podcast episode about this case). Reading more about these considerations convinces me of just how counterproductive it was for Elizabeth Warren to act upon Trump’s goad to be genetically tested to proof her claims of Native ancestry.
  • A piece on SpectrumNews, about the benefits of genetic testing. I was struck by this quote from the mother of a child who received a pathogenic finding when her daughter was an adolescent: “Instead of trying to change her behaviors, we’re modifying how we take care of her… It has given me a lot of relief knowing where her autism comes from, and that there was nothing I could have done differently.” Her daughter was the “way she is because of biology”. Routine genetic testing, if it had been available at the time could have saved her ”years of worry and guilt”.  
  • Antonio Regalado reports on a start-up working towards designer babies. One of the founders “is skeptical of the role regulations can play—a lesson he says he learned working with Bitcoin, a digital currency outside the control of any central bank.” I was alarmed at this: “Bishop told me none of the ethicists he e-mailed had ever gotten back to him.”
  • The FBI can now send a sample to FamilyTree, and they will sequence it and see if it matches anything in their database:
  • Speaking of polygenic scores, did you see that the Scripps has an App for that?
  • Some lovely visual explainers of the difference between NGS and genotyping from NYT.


  • Designing babies with high IQs may be far off, but selecting between embryos based on polygenic scores for IQ is more or less upon us (witness Genomic Prediction). Erik Parens, Paul Appelbaum, and Wendy Chung argue that profiling embryos for IQ would be unethical. Parents have two competing ethical obligations: to accept their children as they are, and to shape their children. The market is producing a “grotesque” pull towards the latter. “Placing limits on the genetic selection of embryos is one small way for our society to affirm the importance of achieving a balance between the ethical obligations to shape our children and to accept them as they are — and the importance of closing, rather than widening, the gap between the rich and the poor.” They argue for regulation, pointing out that the UK manages to do just that.
  • Ethics dumping? The Economist asks whether Deem — the American scientist who was He’s thesis advisor and who had same role in the notorious experiments — is guilty.

The Backstreet Boys have a new album called DNA. Why? Watch this cringeworthy explainer.