Skip to main content
WiDS Posts | October 18, 2022

Mitigating Bias in Machine Learning and Data Science

Popular online services and apps that use machine learning to personalize experiences need to consider bias in how recommendations are developed. Nadia Fawaz, Senior Staff Applied Research Scientist and Tech Lead Inclusive AI at Pinterest, discusses some ways that Pinterest is addressing bias in machine learning in her WiDS talk Inclusive Search and Recommendations. She describes how machine learning technologies are paving the way for more inclusive inspirations in Search and in Pinterest’s augmented reality technology, Try-On, and are also driving advances for more diverse recommendations across the platform. She explains how developing inclusive AI in production requires an end-to-end iterative and collaborative approach.​

Online ads and speech recognition are other areas where bias is design can cause unintended consequences. Allison Koenecke, an assistant professor at Cornell, describes her research into fairness in algorithmic systems and online systems such as speech-to-text or online ads, and causal inference in public health in her WiDS podcast, Researching Algorithmic Fairness and Causal Inference in Public Health. One of her research projects investigated how Google ads are used to enroll people in food stamps and how to make decisions about fairness when it costs more to show those ads to Spanish speakers versus English speakers. She also conducted fairness research on racial disparities on speech-to-text systems developed by large tech companies to ensure systems are usable for African American populations that speak with a different variety of English than standard English. She’s hoping to bring awareness to different blind spots to make sure technology actually works for everyone.

​Bias also shows up when using data science to design and optimize public transportation. Many questions can be addressed using travel demand analysis tools. For example, what mix of transportation improvements will offer the greatest boost in accessibility for travelers who most need it? Should regions invest in more buses on transit routes, or new bus routes to provide greater transportation accessibility for vulnerable communities? Tierra Bills, Assistant Professor of Civil and Environmental Engineering and Public Policy at UCLA, discusses this topic in her WiDS talk, Confronting Data Bias in Travel Demand Modeling. She describes various biases in travel data that arise due to underrepresentation of vulnerable populations, how they may come to be, and how such biases can influence travel modeling outcomes.

​Problems of bias in algorithms are often framed in terms of lack of representative data or formal fairness optimization constraints to be applied to automated decision-making systems. However, this sidesteps deeper issues with data used in AI, including problems with categorizations and the extractive logics of crowd work and data mining. In her WiDS talk Algorithmic Unfairness, Infrastructure, and Genealogies of Data, Alex Hanna, Director of Research at the DAIR Institute, reframes data as a form of infrastructure, implicating politics and power in the construction of datasets. And she discusses the development of a research program around the genealogy of datasets used in machine learning and AI systems.
Related Articles:
Evaluating Effectiveness: Robustness, Reproducibility, and Interpretability of Algorithms
How Data Science, Machine Learning and AI are Transforming Healthcare
Applying Data Science for Good