Catch-up Sept 3rd – Dec 8th

I started a fellowship at Harvard this fall, and it’s taken me a while to find the time to sit down to do a round-up. There’s been a torrent of news, covering DTC player expansions, major new initiatives, and law enforcement’s use of DNA technology – as well as a host of other topics. We’re a year out from the birth of the first genetically modified babies, and full swing into the polygenic turn. Diversity remains a governing watchword.








Regulation etc

Round-up July 23rd – Sept 2nd

The polygenics of sexuality (see below) has been in the limelight. I appreciated that the dialogue around this study got into some of the “so what” of this type of work. In a post on Medium I reflect on the fact that we seem to be a bit picky and choosy about which traits we are willing to embrace as having a genetic etiology. A recently published study reveals some other biases: Antisocial behavior is judged as less genetically influenced than prosocial behavior. Probably because people wanted to hold others to be blameworthy for their actions.

I continue to think that we’re racing into a future where we have some handle on biological differences between individuals (via probabilistically predictive scores of various types, genetic and otherwise), and that we lack the necessary frameworks for knowing what to do with this information. Putting my time where my worries are, starting today I’m officially employed to think about exactly these questions. 



  • The “no single gay gene” study in Science, covered by numerous news outliets, found that the trait ever versus never had same sex sex is ~32% (95% confidence interval 11-54%) heritable, and is polygenic. From a scientific point of view, that result is completely unsurprising, just shoeing that this trait is like all others studied. One thing that was different about this trait: male-female genetic correlations were considerably less than the usual one. Secondary results from the study included that personality traits, mental health disorders, and risky behaviors were genetically correlated with the trait, but not physical traits. They also looked at the proportion of same-sex to total partners, and found that this trait was not that highly correlated with the ever/never had same sex sex (0.73 for men, 0.52 for women). Their conclusion? Sexuality is complex, and can’t be captured by a single dimension, as scales like Kinsey’s try and do. The study is an interesting case example of how ethics was engaged with. 
  • Polygenic scores cannot be interpreted as a “genetic endowment” because the predictive power can come from a number of other sources: population stratification, assortative mating, and genotype-environment correlation (rGE). They found that the predictive power of polygenic scores for cognitive traits was higher (by 60%) between families than when looking at non-identical twins. But if socioeconomic status was controlled for, much of the difference disappeared. They conclude that socioeconomic status contributes to the between family predictive power of cognitive polygenic scores, acting as an “environmentally mediated parental genetic effect”, a type of genotype-environment correlation. 
  • A preprint that covers the potential effects of selecting embryos on the basis of polygenic scores. When selecting between ten embryos, with 20% of the variance captured by the polygenic score, the gain (selecting for highest score embryo versus average embryo) would be ~0.5 of a standard deviation, with a 95% confidence interval of -1.2 to 2.3. Compare to the standard deviation of 6cm for height and 15 IQ points. 
  • A preprint of a large (135,000) Finnish study on polygenic scores for a few major diseases. Comparing the top 2.5% to the middle 20-80%, hazard ratios of 2.0-4.3 (depending on disease). They conclude that the performance of the polygenic scores was similar to established clinical risk scores.
  • An example of sociogenomic research: finding that some polygenic scores are associated with exposure to bullying.
  • A preprint that points to a way polygenic scores could be made clinically relevant: by focusing on the variants that contribute to polygenic scores that also fall within gene sets tied to particular drugs. 
  • Polygenic score for Alcohol abuse, 
  • More evidence for the common genetic underpinnings of various psychiatric disorders.
  • Some genes in humans are up-regulated after death



  • Erik Parens and Paul Appelbaum consider the evidence for the widely predicted negative psychosocial impacts of genetic testing. Years of study have shown that individuals do not end up more depressed or anxious or stressed (though there may be transitory effects. Note the recent Hastings Center Special Report containing many papers on this topic). They see a path forward “Using more subtle measures than merely asking people if they feel depressed or anxious” to show that “genetic results can affect people in real ways.” Here are the examples of this newer generation of work that they point to:
    • A study that shows those were APOE e4+ and knew rated their memory worse and performed worse on a memory test than those who were positive and didn’t know. 
    • A study where individuals were told whether they had a genetic variant associated with decreased exercise capacity independently of their genetic results found that those told they had the variant performed worse.
    • A study that shows the impact of perceived genetic cause of obesity, with subjects more likely to eat more if they thought it was genetic.



Round-up May 29th – July 22nd

In addition to various news sites (GenomeWeb, STAT) and newsletters (e.g. GA4GH) I usually use Rxvist to see what’s new. They show which pre-prints have been generating the most tweets. Except this month, that functionality is broken, so they’re showing the most downloaded articles instead — many of which were not published recently. The top hits in genetics/genomics include face prediction, tutorial for how to construct polygenic scores, ancestry, the effects of an extra X, and single-cell methods.

The stories that have been grabbing my attention are those that link genetics and identity. An excellent interview  interview with Dorothy Roberts, author of  Fatal Invention: How Science, Politics, and Big Business Re-create Race in the Twenty-First Century, discussed the relationship between genetic ancestry and race. Some selected excerpts

  • “Some people think it’s harmless to believe in biological differences between races as long as we don’t value one over another, but the whole point of dividing humans into races is to value some more than others.” 
  • “Racism isn’t a product of race. Race is a product of racism. People think it’s OK to categorize people by race as long as they’re not racist, but any division of people into supposedly natural races promotes a racist agenda, whether we intend it to or not.” 
  • On the eugenics movement: “If certain groups of people are at a disadvantage, the thinking went, it must be because of their biological inferiority, not due to state violence and structural inequalities…. Such thinking is still used to explain social inequality in the present: If we believe biology produces these unequal social and economic conditions, then how can they be immoral and in need of change? They are “natural.” The situation can’t be changed.” 
  • On where you look for explanations: “Have you looked into the fact that black children get expelled and arrested at far higher rates than white children for the same behaviors, like missing school, talking back to a teacher, or roughhousing? Have you looked for any explanations other than within the gray matter of their brains?”” 
  • “If you add a new technology to an already racist system, you’ll get another racist product.” 
  • And her conclusion: “I don’t believe we should be “color-blind,” that we shouldn’t pay any attention to race. As a political invention, race continues to determine power arrangements and is not going to just go away. We have to dismantle racist institutions to affirm our common humanity. And to do that, we need to understand how the concept of race really functions.” 

An interesting study on what happens when white nationalists find out some of their ancestry is not European. A study looked at the responses individuals in a white nationalist group got when they posted their unexpected results — the vast majority of comments they received focused on potential inaccuracies of the testing. The authors conclude: “White nationalism is not simply an identity community or political movement but should be understood as bricoleurs with genetic knowledge displaying aspects of citizen science.”

Genetic identity is key to the debates over donor conception. In the news recently have been reveals of prior mix-ups in sperm donor conception based on genetic testing. One couple whose children were conceived with a donor they didn’t select is revealing of attitudes within the process: ““I didn’t choose someone who has a history of brain cancer in the family. I would never have chosen this donor. They should be ashamed to even have this donor on the website.”” The position of the courts is that there are no grounds to sue if the child is healthy. Meanwhile a Singaporean court defined a new type of loss, a loss of genetic affinity, to deal with a case of a couple who unintentionally ended up with a bi-racial child.



Regulation etc

An interview with George Church touching on many future tech possibilities: “Just being different at all from the middle of the bell curve gives you an advantage in a part of society that cherishes innovation and out-of-the-box thinking.” 

Round-up April 26 – May 28

I’ve started this round-up with recent papers focused on two scientific themes that will dominate the near term progress in understanding links between genotype and phenotype, 1) trouble ahead for polygenic scores, and 2) the coming together of rare and common variation analysis.

Trouble for polygenic scores

Previous work on the genetics of human height, based on the GIANT consortium, had identified evolutionary adaptation signatures to explain the North-South height gradient. But two new studies applying the same methodology to the more homogenous and larger UK Biobank data found no such evidence. They did find that the same SNPs were identified, and with similar effect sizes. But population structure biases these effect sizes. This is particularly problematic on meta-analysis that combine heterogeneous data sources. And is much worse if sub-significant SNPs are included. This should cause extreme caution when a) looking for signals of polygenic adaptation, b) between-population differences. Additionally, this population structure can be “an additional source of error in polygenic scores and affect their applicability even within populations.” (paper 1). As “even small differences in ancestry will be inadvertently translated into large differences in predicted phenotype” (paper 2). These results are nicely put in context here, with a quote from a former teacher of mine, “The methods developed so far really think about genetics and environment as separate and orthogonal, as independent factors. When in truth, they’re not independent. The environment has had a strong impact on the genetics, and it probably interacts with the genetics,” said Gil McVean, a statistical geneticist at the University of Oxford. “We don’t really do a good job of … understanding [that] interaction.”

Question: what would the same analysis show for the Educational Attainment polygenic score? Which stands as the other score based on very large heterogeneous data, and utilized many non-significant SNPs.

A separate preprint shows how, even within an ancestry group, porting polygenic scores has its challenges. “The prediction accuracy of polygenic scores depends on characteristics such as the age or sex composition of the individuals in which the GWAS and the prediction were conducted, and on the GWAS study design.”


Coming together of rare and common variation.

We’re getting to the stage of having very large cohorts of NGS data. What will we learn of how common and rare variation jointly contribute to disease? And what implications does this have for the clinic?

An exome cohort of over 20,000 T2D cases and 24,000 controls, representing one of the largest studies yet using NGS data. For 76% of their cohort they also had array data plus imputation. The broad relevance of this type of study lead me to read this paper fairly closely. What did they find?

  • Looking exome wide, of the 6.3 million variants in their data set, 15 were exome-wide significant. They were powered to find variants with an effect size of OR 2.5 at a frequency of 0.2%
  • They aggregated to the gene level and found 3 significant genes. Looking at the near misses in other datasets leads them to think that these will become exome wide significant in the future. They estimated that the top 100 gene level signals would capture a mere ~2% of the genetic variance in their sample.
  • Then they aggregate another level up, at the gene set level, only drawing the weak conclusion that this line of work “can be used as a potential metric to prioritize candidate genes relevant to T2D.”
  • They found almost all the variants they had found in the exome data in the array data (8 of the 10 single variants), and then 14 more non-coding in the array data. The vast majority of their overall variants were not imputable. Because the array data identified common variants, it explained more of the genetic liability
  • The basic issue is that they continue to be underpowered to a) find rarer variants (<<0.2%), b) accurately estimate their effect sizes. They suggest, as an antidote to (a), relying on prior suspicions of a gene-disease connection to narrow search space (and hence lower threshold for detecting significance).
  • They conclude that for research, GWAS are best for “locus discovery and fine mapping”, and NGS for gene characterization and confidence in gene-disease connections.
  • And for personalized medicine, the very rare variants of large effect sizes may be useful, but these are so rare as to complement (rather than replace) polygenic scores based on array data.



  • Large datasets such as the UK Biobank are showing that a lot of the candidate gene work was spurious. Here is a piece from Ed Yong focusing on SCL6A4 and its connections with depression, which about 450 papers investigated. Now many are claiming that there is no evidence that the connection exists (and indeed that this has been clear for years now). But some are saying that we know the effects of this gene depend on the environment, and the new studies do not measure the environment anywhere near as accurately as needed.
  • Sarah Zhang at the Atlantic points out another enduring legacy of some of the candidate gene work. A gene called MTHFR was associated to adverse results following smallpox vaccination in a small 2008 study. Just like other candidate genes, it hasn’t stood the test of time. But MTHFR is the single gene that 23andMe gets the most questions about, by Anti-Vaxxers hoping to find their child has a variant that will get them a medical exemption from vaccination (to do this they have to download their raw data and upload it somewhere else).
  • In a preprint Plomin et al argue on the basis of ~7000 twin pairs for the existence of a substantially heritable (50-60%) p-factor, polygenic general psychopathology factor
  • A study in PNAS and a write-up in the NY Times locates several more cases where an extreme difference in smell perception can be linked to single SNPs.
  • A polygenic score for obesity, with those in the top decile 13kg heavier than those in the bottom decile by age 18.
  • A large (~30k cases, ~170k controls) GWAS of bipolar disorder identifies 30 genome wide significant variants. One scary thing: their first analysis was of a subset of the data (20k cases, 31k controls), in which they found 19 variants, 8 of which did not replicate in the combined analysis.
  • Most people who are at a 50% risk of developing Huntingtons disease do not want to know if they carry the genetic variant. Why? A study based on data from 1999-2008 found the two biggest reasons were no effective cure/treatment (66%) and inability to undo knowledge (66%).





What next for human germline editing?

The issues are somewhat clearly identified. The lack of concrete proposals is deafening.


Yesterday I attended an event at Harvard, “Editorial Humility: A Moratorium on Human Germline Editing?”, sparked by the recently published call for a moratorium on human germline editing that Eric Lander co-authored with 17 others.

Back in 2015 there was a clear call for a moratorium, with a focus on whether this is a road we want to go down at all. In 2017, the National Academies of Science and Medicine published a report that watered down this call, focusing on questions of safety and efficacy (I argued at the time against this watering down). He Jiankui pointed to this NASAM report in claiming that there was no clear writ against his decision to pursue the human germline editing that lead to the birth of Lulu and Nana. In other words, the absence of a clearly called for moratorium likely had a role in the actual use of the technology.

Is a moratorium the right approach? Eric Lander, speaking first, explained that the main aim of the new call was to seed the debate. He is backed by the NIH, represented at the event by Carrie Wolinetz. In the US there is a ban against germline editing already in place, but the world’s largest funder of biomedical research nonetheless thought using their “bully pulpit” position in support of a moratorium was the right thing to do.

A moratorium is a lightweight solution. It would be time limited from the get go. And it would leave eventual decisions with each sovereign power on a country by country basis. For panellists Betsy Bartholet and Sheila Jasanoff, this does not go far enough, and we should be aspiring to an International Treaty. Betsy Bartholet called on Eric Lander and the concerned scientific community to start the work to get a treaty in place. Eric Lander called on Bartholet and the lawyers to instead do this work. Both claimed a lack of expertise. This was concerning. An example of how things can truly fall between the cracks.

Playing devil’s advocate, I Glenn Cohen argued that moratoriums can be “sticky”, even if they have a sunset clause. Moreover, bioethicists have been talking about this for years; we’ve done all the thinking we need to. He also argued that we need to de-exceptionalize genetic modification as a technology. Many technologies, e.g. smartphones, have disturbing ethical implications. I agree with Lander’s response that Yes, we have issues across the board when it comes to new technologies, but that’s a reason to engage with all of them, not to disengage with this one.

Steve Hyman, ex-provost of Harvard, said that he was much more concerned about the use of cognitive polygenic scores for selecting embryos. Given my research interests, no surprise that I strongly agree. I’ve also published (joint with Sarah Polcz) that although scientists are making a big distinction between heritable/non-heritable, I think the bigger distinction is therapy/enhancement.  

I took two main things away from the panel.

First, everybody was horrified that the scientific community so thoroughly dropped the ball. Many others knew what He Jiankui was up to, and no-one raised the alarm. (Note that the scientific community is divided about whether blame should fall entirely on He or not.) What should be done? Eric Lander made the case that you can’t expect scientists to self-regulate because they have an inherent conflict of interest. When you work on anything you have to be the biggest believers in the upsides, the optimists. Jasanoff reminded Lander that the metro between Harvard and Kendall “runs both ways”, and that he should come to the Kennedy School more often to explore the issues from a different point of view. The question of the role of scientists seems prescient in light of the debate over the extent to which the social media giants can and should self-regulate. In this arena, bioethicists are given heat for being too conservative, for overplaying the risks of a technology and failing to see the potential benefits. So what is the right balance of roles?   

Second, as ever, there were broad calls for public debate. But when it came to concrete proposals for “deliberative explorations”, nothing. There was reference to learning about how other countries have done this. The UK is always held up as the shining example. (And the inevitable “How do they manage to achieve consensus on the use of reproductive technologies, but end up in such a mess over Brexit?”). I’ve added to my To Do list a comparative approach to the various approaches to public engagement taken around the world with respect to mitochondrial replacement therapy. Approaches that have caught my eye include the Moral Machines work, where the public were invited to state who they thought a self-driving car could kill. And something like what the folks at World Wide Views are up to. I’m very interested to hear of deliberation and public engagement that others are enthusiastic about.

Round-up March 13 – April 25

Three topics have dominated genomics happenings. First, polygenic scores: the science continues to mature, there are calls for their rapid clinical integration, there are concerns about their use, and there are commercially available products. Second, how to regulate human germline modification. Third, use of genomics in forensics.


Polygenic scores

  • Polygenic scores for cognitive traits like IQ and Educational Attainment are confounded by gene-environment effects, especially socioeconomic status. Preprint from Plomin and team. Within family predictions were ~60% lower than between family predictions for these traits, but not for traits like BMI and height. The difference disappeared after accounting for SES, suggesting that SES is part of “passive gene environment correlation” or “genetic nurture”. All genetic influences operate via the environment. Three genotype-environment correlation (rGE) mechanisms:

Passive rGE: “Parents generate family environments consistent with their own genotypes, which in turn facilitate the development of the offspring trait, thus inducing a correlation between offspring genotype and family environment ”  

Active rGE: children select, modify and create experiences

Evocative rGE: children evoke responses in their environment (correlated to their genetic propensities.)

Within family genetic differences can include active and evocative rGE effects, but not passive rGE effects, which are shared within the family.

If aiming just to maximize trait prediction, using between family based scores (i.e. calculated from unrelated individuals) is legitimate. But for causal analysis, including the use of Mendelian Randomization, within family designs are necessary.

  • Another pre-print examining the “nature of nurture”, gene-environment correlations in the context of educational attainment which also had access to the polygenic scores for the mothers. They show that both mothers’ and children’s polygenic scores are predictive of parenting style. And also that mothers genetics predicted childhood educational attainment beyond direct transmission, mostly via providing a stimulating cognitive environment.


Germline genetic modification: moratorium or no?



  • Rwanda is proposing a DNA database of all its citizens for fighting crime purposes. The plan is in its earliest stages, no legislation has yet been passed. In 2015 Kuwait proposed a similar database for fighting terrorism, but it was later struck down by the constitutional court.
  • FamilyTreeDNA let the FBU access genetic data without telling its customers, and faced a backlash for it. Now customers can opt out from law enforcement access to their data. But the company doesn’t want its customers to do that, and has launched Ad campaigns on that premise, stating it feels it has a “moral responsibility” to help solve cases.
  • Meanwhile, the growth of GEDMatch, the platform that consumers can choose to upload their genetic test results to in full knowledge that law enforcement have access to it, coupled with the ability that up to 4th cousins can be identified from these uploads, is leading to a “National DNA database by default”. So states Natalie Ram in Slate, who calls for ending familial searching, and points to a Bill in Maryland that hopes to do just that.
  • An example of DNA being used for more than an ID. In a murder case, police sequenced DNA found on the victim and found that it belonged to a black man, which changed their search strategy. They then asked nearly 400 black men who had been taken into custody in the region for DNA samples, as part of a “Race-biased dragnet”.
  • An apartment complex on Long Island is setting up a registry of the DNA of residents’ dogs, and will test dog poop to punish those dog owners that do to clean up after their pets.







Round-up Feb 19 – March 12

Two truly ginormous releases of data

  • The Biobank, ~500,000 individuals with extensive phenotypes, has released the first ~50,000 whole exome sequences (complementing the Chip data that has been around for longer).
  • The National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program has a data set of ~50,000 whole genomes (of a planned 145,000). An exciting fact about this data set is that ~30% are from individuals with African ancestry. The individuals are extensively phenotyped. Much of the genetic data (I’m unclear how much) is available on dbGap.

The size of NGS data has truly exploded. Here’s hoping that this sort of size dataset will allow us to peel back the curtain on the clinical relevance of rare variation.

Controversy — mostly China



  • 23andMe have launched a Type 2 Diabetes score. Using a freshly developed polygenic score based on their 2.5 million customers, it adjusts the score based on ethnicity and age to give not just a relative odds, but a percentage chance of developing the condition in the next x years. I was unable to confirm this as it doesn’t work on my report — perhaps because it only works for the latest chip.
  • The PeopleSeq consortium has partnered with the major projects that offer genome sequencing to healthy individuals (“predispositional screening”). They send out surveys to participants before and after screening. In their first published results covering several hundred people, they found that while most individuals discussed the results with their doctor, only 13.5% made an appointment specifically for that purpose. About 40% reported that they learnt something new about their health, but fewer than 10% made any changes. More than half were disappointed that they did not receive more actionable information. One message the authors want us to take home: patients felt empowered rather than distressed by their results.
  • A group of 8 institutions wants to see whole genome sequencing in the clinic, and have formed the Medical Genome Initiative to help establish best practices etc to make this happen.


And in other interesting things, here is a nice write-up of the extent to which humans are innately violent — tracing the debates, and pointing to the question, do we need an answer to this question? Also, whether our views on this question affect our beliefs about peace-keeping efforts.