I’ve started this round-up with recent papers focused on two scientific themes that will dominate the near term progress in understanding links between genotype and phenotype, 1) trouble ahead for polygenic scores, and 2) the coming together of rare and common variation analysis.
Trouble for polygenic scores
Previous work on the genetics of human height, based on the GIANT consortium, had identified evolutionary adaptation signatures to explain the North-South height gradient. But two new studies applying the same methodology to the more homogenous and larger UK Biobank data found no such evidence. They did find that the same SNPs were identified, and with similar effect sizes. But population structure biases these effect sizes. This is particularly problematic on meta-analysis that combine heterogeneous data sources. And is much worse if sub-significant SNPs are included. This should cause extreme caution when a) looking for signals of polygenic adaptation, b) between-population differences. Additionally, this population structure can be “an additional source of error in polygenic scores and affect their applicability even within populations.” (paper 1). As “even small differences in ancestry will be inadvertently translated into large differences in predicted phenotype” (paper 2). These results are nicely put in context here, with a quote from a former teacher of mine, “The methods developed so far really think about genetics and environment as separate and orthogonal, as independent factors. When in truth, they’re not independent. The environment has had a strong impact on the genetics, and it probably interacts with the genetics,” said Gil McVean, a statistical geneticist at the University of Oxford. “We don’t really do a good job of … understanding [that] interaction.”
Question: what would the same analysis show for the Educational Attainment polygenic score? Which stands as the other score based on very large heterogeneous data, and utilized many non-significant SNPs.
A separate preprint shows how, even within an ancestry group, porting polygenic scores has its challenges. “The prediction accuracy of polygenic scores depends on characteristics such as the age or sex composition of the individuals in which the GWAS and the prediction were conducted, and on the GWAS study design.”
Coming together of rare and common variation.
We’re getting to the stage of having very large cohorts of NGS data. What will we learn of how common and rare variation jointly contribute to disease? And what implications does this have for the clinic?
An exome cohort of over 20,000 T2D cases and 24,000 controls, representing one of the largest studies yet using NGS data. For 76% of their cohort they also had array data plus imputation. The broad relevance of this type of study lead me to read this paper fairly closely. What did they find?
- Looking exome wide, of the 6.3 million variants in their data set, 15 were exome-wide significant. They were powered to find variants with an effect size of OR 2.5 at a frequency of 0.2%
- They aggregated to the gene level and found 3 significant genes. Looking at the near misses in other datasets leads them to think that these will become exome wide significant in the future. They estimated that the top 100 gene level signals would capture a mere ~2% of the genetic variance in their sample.
- Then they aggregate another level up, at the gene set level, only drawing the weak conclusion that this line of work “can be used as a potential metric to prioritize candidate genes relevant to T2D.”
- They found almost all the variants they had found in the exome data in the array data (8 of the 10 single variants), and then 14 more non-coding in the array data. The vast majority of their overall variants were not imputable. Because the array data identified common variants, it explained more of the genetic liability
- The basic issue is that they continue to be underpowered to a) find rarer variants (<<0.2%), b) accurately estimate their effect sizes. They suggest, as an antidote to (a), relying on prior suspicions of a gene-disease connection to narrow search space (and hence lower threshold for detecting significance).
- They conclude that for research, GWAS are best for “locus discovery and fine mapping”, and NGS for gene characterization and confidence in gene-disease connections.
- And for personalized medicine, the very rare variants of large effect sizes may be useful, but these are so rare as to complement (rather than replace) polygenic scores based on array data.
Science
- Large datasets such as the UK Biobank are showing that a lot of the candidate gene work was spurious. Here is a piece from Ed Yong focusing on SCL6A4 and its connections with depression, which about 450 papers investigated. Now many are claiming that there is no evidence that the connection exists (and indeed that this has been clear for years now). But some are saying that we know the effects of this gene depend on the environment, and the new studies do not measure the environment anywhere near as accurately as needed.
- Sarah Zhang at the Atlantic points out another enduring legacy of some of the candidate gene work. A gene called MTHFR was associated to adverse results following smallpox vaccination in a small 2008 study. Just like other candidate genes, it hasn’t stood the test of time. But MTHFR is the single gene that 23andMe gets the most questions about, by Anti-Vaxxers hoping to find their child has a variant that will get them a medical exemption from vaccination (to do this they have to download their raw data and upload it somewhere else).
- In a preprint Plomin et al argue on the basis of ~7000 twin pairs for the existence of a substantially heritable (50-60%) p-factor, polygenic general psychopathology factor
- A study in PNAS and a write-up in the NY Times locates several more cases where an extreme difference in smell perception can be linked to single SNPs.
- A polygenic score for obesity, with those in the top decile 13kg heavier than those in the bottom decile by age 18.
- A large (~30k cases, ~170k controls) GWAS of bipolar disorder identifies 30 genome wide significant variants. One scary thing: their first analysis was of a subset of the data (20k cases, 31k controls), in which they found 19 variants, 8 of which did not replicate in the combined analysis.
- Most people who are at a 50% risk of developing Huntingtons disease do not want to know if they carry the genetic variant. Why? A study based on data from 1999-2008 found the two biggest reasons were no effective cure/treatment (66%) and inability to undo knowledge (66%).
Applications
- DNA testing of genetic parent-child relationships has started at the US-Mexico border. The claim is that some children are “recycled” to help get adults into the US, though there is no evidence that this happening on any scale. The technology is the same used police to see if a suspect’s DNA has a database match.
- GEDmatch, the main source of genetic data used by the police, has changed its policy such that people have to opt in to this use.
- A new start-up, Verve therapeutics, aims to simultaneously edit many variants affecting heart disease. They’ve nabbed Sekar Kathiresan, who has published some of the most significant polygenic scores, as their CEO.
- An ambitious $1.7bn Russian government push for gene edited plants and animals.
- 23andMe and AirBnB have teamed up for travel bookings connecting people with their heritage. The press release shows that “heritage travel” is indeed a large market, and they have some quirky stats to go with it, e.g. “57% of survey respondents in the United States would give up alcohol for a year for a free heritage trip”.
- Ancestry.com have continued to stir controversies about the links between identity and DNA. They just pulled an advert that showed a white man trying to entice a black woman to leave the South during the pre-civil war era with an engagement ring.
Regulation
- The National Academies and The Royal Society are spearheading an International Commision to investigate appropriate use of human germline editing. They are set to report in Spring 2020.
- A German ethics council has expressed that there is nothing intrinsically wrong with germline modification, but that we’re not there yet. This non hardline approach is particularly significant coming from Germany. Due to the history of eugenic Nazi practices, it is usually the most conservative when it comes to any reproductive technology that could influence which children are born.
- China is rewriting its civil code, and a section has been added that would make those who carried out the editing of embryos or adults responsible for any negative effects. This is in addition to draft regulations that would impose an approval process on those who wanted to modify embryos.
- The LawSeq project is wrapping up and a couple of good articles have used that as an excuse to cover the legal state of play when it comes to genomics. One, in Science, covers medical malpractice cases involving genetics, showing that plaintiffs have a much higher success rate in this than in other areas of medicine. An example is of a family who successfully sued a doctor who treated a man who died of a heart condition for not not recommending that they get their son genetically tested. The doctor never treated the son, who also died. The other, in Wired, focuses on the shortcomings of the patchwork of laws that cover genetic privacy.
- A contestant on the Bachelorette’s claims to have 114 children as a sperm donor has highlighted issues with the field of donor conception, specifically its lack of regulation. Other countries limit the number of children per donor, but there are only vague, non-binding guidelines in the US. The main fears are of accidental incest between siblings, cases where the donor has a genetic disease that only shows up later in life, and unknown psychological impacts on children on having that many siblings.
- Controversial biohacker Josiah Zayner, famous for self-injecting a DIY gene therapy has been called to interview for suspected unlawful practice of medicine by the California Department of Consumer Affairs.
- (Not really regulation) An an Op-Ed in the Daily News, Kathryn Paige Harden says we need to accept the genetics that helps us get ahead in life as a form of privilege, and that refusing to do so “feeds the myth” that America is a meritocracy.