CSC guest lecture: Michael Linderman
Thursday, November 20, 2025 12:15-1:10 p.m.
Location:
Seelye Hall 106
For:
Smith College Community
Computer Science Seminar Series: "Applying machine learning and the “unreasonable effectiveness of simulation” to structural variant genotyping in whole genome sequencing"
Michael Linderman
Associate Professor
Department of Computer Science
Middlebury College
Structural variants (SVs), defined here as genetic variants 50 nucleotides or larger, play a causal role in numerous diseases. However, due to their larger size, SVs are difficult to accurately genotype (determine whether a person has 0, 1, or 2 copies of an SV) with widely used short read genome sequencers (SRS). Existing SV genotypers can only partially account for the sample, SV- and analysis pipeline-specific biases that reduce genotyping accuracy. Instead of trying to model those complex and interconnected effects, we use SRS simulation to generate the data we should expect. I will present a series of simulation-based machine learning tools for improving SV genotyping developed with Middlebury Computer Science students. Our experiments have led us to think about SV genotyping as an image similarity problem, akin to facial recognition, for which we can leverage simulation, variation graphs, and ongoing advances in deep learning.