Algorithms and the Human Genome
Data science and genetics are closely linked and have been for some time. But now, data science is playing an even larger role in genetics, a trend that is prompting researchers to look hard at their ethical responsibilities, says Chiara Sabatti, a professor of biomedical data science and statistics at Stanford University.
As is the case in many other fields, geneticists have access to much more data than in the past, and because it is digitized, it can be mined. “Scientists rely on statisticians to mine this data and help them formulate hypotheses,” Sabatti said during an interview recorded for this year’s Women in Data Science podcast at Stanford. Truly understanding and interpreting this data correctly will become increasingly important for the public good as the relationship between accessibility and privacy continues to grow, she noted.
Because there is such a wealth of data, there are potentially thousands of hypotheses that could be explored in some cases, an obviously unworkable situation. Data scientists need to determine which of the hypotheses drawn from the data are worth pursuing, says Sabatti. And that means developing new tools “to be able to confidently say to the scientist, ‘these are the hypotheses that you should follow up.’”
Sabatti voiced her concerns about the public’s confidence in science. “I am really worried that as scientists we contribute to this by putting forward results that are not as solid as they should be,” she says. “The idea that data speaks by itself is an illusion. It’s very important for us to find a way to communicate to the general public what are the challenges of the data analysis.” This is particularly true in genetics, especially in light of increasing fascination with commercial DNA testing, says Sabatti. “I think the public is not aware of all the consequences of putting their data, genetic or not, online and available for mining. I think it’s up to us as scientists to try to communicate clearly what it is that we can do with this data and what are the opportunities that come from data sharing,” she says.
Beyond genetics, Sabatti cited the need for “algorithmic fairness,” a new concept that seeks to eliminate biases and contribute to a more equitable understanding of data. She is also hopeful for the next generation of statisticians. “I actually look at this field in a very optimistic view. I am amazed by the intelligence and the knowledge of young people coming into it. I cannot keep up with my students or the students in other people’s labs. There is a lot of energy, and there’s going to be a lot of interesting knowledge that comes out of this investigation,” she says.
About the Host
Stanford Professor [Emerita] Margot Gerritsen is the Executive Director and co-founder of Women in Data Science Worldwide (WiDS) and born and raised in the Netherlands. Margot received her MSc in Applied Mathematics from Delft University of Technology before moving to the US in search of sunnier and hillier places. In. 1996 she completed her PhD in Scientific Computing & Computational Mathematics at Stanford University and moved further West to New Zealand where she spent 5 years at the University of Auckland as a lecturer in Engineering Science. In 2001, she returned to Stanford as faculty member in Energy Resources Engineering. Margot was the Director of the Institute for Computational & Mathematical Engineering (ICME) at Stanford from 2010-2018 and the Senior Associate Dean for Educational Affairs in Stanford’s School of Earth Sciences from 2015-2020. In 2022, Margot took Emerita status to devote herself to WiDS full time. Margot is a Fellow of the Society of Industrial & Applied Mathematics, and received honorary doctorates from Uppsala University, Sweden, and the Eindhoven University of Technology in the Netherlands. She now lives in Oregon with her husband Paul.