Skip to main content
Category

Data Wrangling

Triple Negative Breast Cancer Datathon – A Tutorial

Working with the WiDS Datathon dataset over the past week has been a thrilling exercise. This dataset presents an opportunity to learn about interesting and real-world modeling challenges, and is different from other curated datasets in textbooks and classic machine learning exercises. For that reason, I discuss some of the challenges you may experience around missing data, multicollinearity and linear/ nonlinear approaches. I will also provide resources to help you on these topics.

Read More

Embrace the journey: learnings & inspiration from a non linear path into Data | Gabriela de Queiroz

Thumbnail for Embrace the journey: learnings & inspiration from a non linear path into Data | Gabriela de Queiroz

Gabriela de Queiroz, Principal Cloud Advocate, Microsoft presents the Technical Vision Talk “Embrace the journey: learnings and inspiration from a non-linear path into Data Science”. This talk focuses on the importance of embracing non-linear career paths and the cumulative effect of seemingly disparate skills in becoming a successful data scientist. The talk highlights the power of a learning and growth mindset in overcoming obstacles and unlocking one’s full potential. Attendees will leave the talk feeling empowered to embrace their unique backgrounds and experiences and approach their careers with openness, honesty, and a willingness to learn and grow. Whether you are just starting out on your data science journey or looking to take your skills to the next level, this talk is an opportunity to be inspired, connect with like-minded individuals, and explore the limitless possibilities of a career in data science.

Biography:
Gabriela leads and manages the Global AI/ML/Data team in Education Advocacy. Before that, she worked at IBM as a Program Director on Open Source, Data & AI Technologies and then as Chief Data Scientist at IBM, leading AI Strategy and Innovations.

Gabriela is the founder of AI Inclusive, a global organization that is helping increase the representation and participation of gender minorities in Artificial Intelligence. She is also the founder of R-Ladies, a worldwide organization for promoting diversity in the R community with more than 200 chapters in 55+ countries.

Read More

Preparing for a career in DS | Montse Cordero, Adriana Velez Thames, Elaine Yi Xu, Sanne Smith

Thumbnail for Preparing for a career in DS | Montse Cordero

Panel: Preparing for a career in data science

Moderator:
Sanne Smith, Director of Master’s Program, Education Data Science, Stanford University, is the director of the master‚Äôs program Education Data Science and a lecturer at the Stanford Graduate School of Education. She teaches courses that introduce students to coding, data wrangling and visualization, various statistical methods, and the interpretation of quantitative research. She studies social networks and thriving, diverse contexts.

Panelists:
Montse Cordero, Mathematics Designer, youcubed, is a mathematics designer for youcubed, a center at Stanford University that aims to inspire, educate and empower teachers of mathematics, transforming the latest research on maths learning into accessible and practical forms. He is a co-author and professional development provider for youcubed’s Explorations in Data Science high school curriculum and has participated in multiple national summits for the advancement of data science in K-12 education (Data Science 4 Everyone Coalition, National Academies of Sciences Engineering and Medicine). Montse is also a mathematician interested in work at the intersection of combinatorics, algebra, and geometry. In all facets of their work, Montse endeavors to change the ways our culture thinks and talks about mathematics.

Adriana Velez Thames, Geophysicist-Data Scientist, Springboard Alumni. Adriana recently completed a transition to Data Science after many years in the Oil and Gas industry as a Senior Geophysicist. Her primary focus was in seismic data processing for imaging the Earth’s subsurface to guide energy exploration projects. From 2012-2019, she worked at TGS where her responsibilities included QC of deliverables, testing of internal software updates, and conducting test projects and benchmarks. This involved extensive analysis and manipulation of terabyte-sized digital subsurface data using sophisticated algorithms. She believes that data-driven decisions are the best way to solve problems in any industry. Having been born in Colombia and attained post-graduate degrees in Russia, she is fluent in English, Spanish, and has working proficiency in Russian. Currently she continues educational studies in data science and spatial data science.

Elaine Yi Xu, Staff Business Data Analyst, Intuit, is a passionate data analytics and data science practitioner, putting her undergrad degree in Statistics and MS in Info Sys and DS into everyday business decision-making. She’s been working in-house in web analytics, product analytics, and marketing analytics for multiple industries, including retail (lululemon), automotive (Kelley Blue Book), and most recently at Intuit, the global technology platform. She specializes in the measurement of Go-To-Market marketing strategies, assessment of marketing campaign effectiveness, optimization of user experience, and A/B Testing. She thrives to be the connective tissue between business, analytics, engineering, and data science, combining all facets of science to help arrive at the most optimal business decisions.

Read More

Making Biosignal Interfaces Accessible | Momona Yamagami

Thumbnail for Making Biosignal Interfaces Accessible | Momona Yamagami

Momona Yamagami, Incoming Assistant Professor, Electrical and Computer Engineering, Rice University presents the Technical Vision Talk on “Making Biosignal Interfaces Accessible”. Biosignal interfaces that use electromyography sensors, accelerometers, and other biosignals as inputs provide promise to improve accessibility for people with disabilities. However, generalized models that are not personalized to the individual‚Äôs abilities, body sizes, and skin tones may not perform well. Individualized interfaces that are personalized to the individual and their abilities could significantly enhance accessibility.

In this talk, I discuss how continuous (i.e., 2-dimensional trajectory-tracking) and discrete (i.e., gesture) electromyography (EMG) interfaces can be personalized to the individual. For the continuous task, we used methods from game theory to iteratively optimize a linear model that mapped EMG input to cursor position. For the discrete task, we developed a dataset of participants with and without disabilities performing gestures that are accessible to them. As biosignal interfaces become more commonly available, it is important to ensure that such interfaces have high performance across a wide spectrum of users.

Biography:
Momona will be an Assistant Professor at Rice University Electrical & Computer Engineering starting summer 2023 as part of the Digital Health Initiative. Her research focuses on modeling and enhancing human-machine interaction (HMI) to support accessibility and health using biosignals and control theory applied to the field of HCI (human-computer interaction). I am currently a CREATE postdoctoral scholar at the University of Washington in Seattle, WA, advised by Prof. Jennifer Mankoff.

Momona’s dissertation research leveraged control theory methods to model and enhance continuous HMIs and explore biosignals like electromyography (EMG) as accessible machine inputs for people with and without disabilities. Her current research interests include how multi-input biosignals can improve HMI accessibility for new and emerging technology like virtual reality and support the health of people with disabilities.

Read More

How data visualization helps people understand and explore data

Photographs of Fernanda Viégas, Nicole Crosdale, Jenn Schilling, and Pariza Kambo. With WiDS branded illustrations.

​With the massive amounts of data that are generated and collected today, data visualization is an invaluable tool to help people explore and understand what it all means. Data visualizations can be exploratory to help analyze the data and explanatory to present insights to a broader audience. Both art and science, data visualization turns information into images and helps people see patterns, trends, and outliers in large data sets. Here is a sampling of recent WiDS talks and workshops that delve into different aspects of data visualization.

Read More

Introduction to Precision Medicine: From Statistics to Society

Thumbnail for Introduction to Precision Medicine: From Statistics to Society

Precision medicine aims to learn from data how to match the right treatment to the right person at the right time. One common goal in precision medicine is the estimation of optimal dynamic treatment regimens (DTRs), sequences of decision rules that recommend treatments to patients in a way that, if followed, would optimize outcomes for each individual and overall, in the targeted population. In this presentation, we will describe how the precision medicine framework formalizes sequential clinical decision-making and briefly review a subset of the most popular strategies for learning optimal dynamic treatment regimes. We will then invite the workshop group to ideate and discuss the critical opportunities and challenges for the translation of DTRs to clinical and community care, the role of stakeholder engagement and cross-disciplinary collaboration, and considerations for evaluating DTRs in practice.

This workshop was conducted by Nikki Freeman and Anna Kahkoska from the University of North Carolina at Chapel Hill.

Slides and resources used in this workshop: https://bit.ly/precision_medicine_slides

Read More

Earth observation & machine learning for agroecological applications

Thumbnail for Earth observation & machine learning for agroecological applications

The usage of machine learning (ML) has been growing exponentially. Its significant power in generalization and a large amount of available data make machine learning indispensable. In parallel, humanity is focused more than ever on space exploration, developing cutting-edge Earth Observation (EO) technology. Have you ever wondered how these two can be combined?

One domain that can be greatly benefited from this coalition is agriculture. With climate change and population rise, maintaining natural ecosystems while enhancing agricultural productivity and supporting farmers is of primary importance. In this sense, ML and EO technologies are the key enablers in developing actionable recommendations for farmers and policymakers to achieve resilient agriculture. In this workshop, we discuss the usage of ML for EO-related applications, focusing on agriculture and ecosystem services. We will present two applications of how ML bridges the gap between scientific knowledge and actionable advice for farmers and policymakers. The first application will consist of a predictive ML model related to the occurrence of pests in cotton fields. The second application will showcase the combination of a geographical model and an ML algorithm to identify the local-specific contribution of agricultural management to ecosystem services. For both applications, there will be live demonstrations using Python and R. By the end of this workshop, we hope you will be acquainted with establishing the link between machine learning, earth observation, and sustainable agriculture. Wishing you a fruitful exploration of this field having provided you with the necessary tools to start your journey!

This workshop was conducted by Roxanne Suzette Lorilla and Ornela Nanushi from the National Observatory of Athens.

Slides and materials used in this workshop: https://bit.ly/agroecological_applica…

Read More

A Data Scientist’s Deep Dive into the WiDS Datathon

Working with the WiDS Datathon dataset over the past week has been a thrilling exercise. This dataset presents an opportunity to learn about interesting and real-world modeling challenges, and is different from other curated datasets in textbooks and classic machine learning exercises. For that reason, I discuss some of the challenges you may experience around missing data, multicollinearity and linear/ nonlinear approaches. I will also provide resources to help you on these topics.

Read More

Linear Least Squares | Abeynaya Gnanasekaran

Thumbnail for Linear Least Squares | Abeynaya Gnanasekaran

The least squares method is one of the most widely used techniques in data science and is used to fit a linear model to data. In this workshop, we will study least squares problems from a linear algebraic perspective and discuss the techniques to solve them.

This workshop assumes that you have a basic understanding of linear algebra including concepts such as matrices, rank, range space, orthogonality, and matrix decompositions (Cholesky, QR, SVD).

This workshop was conducted by Abeynaya Gnanasekaran, a Senior Research Engineer at Raytheon Technologies Research Center.

Read More

Low-Code AI: Making AI accessible to everyone | Mathworks

Thumbnail for Low-Code AI: Making AI accessible to everyone | Mathworks

Learn how you can apply AI in your field without extensive knowledge in programming. This hands-on session includes a quick recap on the fundamentals of AI and two exercises where you will learn how to classify human activities using MATLAB® interactive tools and apps:

– Accessing and preprocessing data acquired from a mobile device
– Classifying the labeled data using two apps: The Classification Learner app and the Deep Network Designer app

At the end of the workshop, you will be able to design and train different machine learning and deep learning models without extensive programming knowledge. In addition, you will also learn how to automatically generate code from the interactive workflow. This will not only help you to reuse the models without manually going through all the steps but also to learn programming or advance your coding skills.

This workshop was conducted by Gaby Arellano Bello and Neha Sardesai, Senior Application Engineers in Education at Mathworks.

Access resources for this workshop: https://bit.ly/low_code_ai_resources

Read More

Introduction to Explainable AI | Supreet Kaur

Thumbnail for Introduction to Explainable AI | Supreet Kaur

Responsible AI is reaching new heights these days. Companies have started exploring Explainable AI as a means to explain the results better to senior leadership and increase their trust in AI Algorithms. This workshop will entail an overview of this area, importance of it in today’s era, and some of the practical techniques that you can use to implement it. As a bonus, it will also cover some industry use cases and limitations of these techniques. Join me in unboxing this black box!

This workshop was conducted by Supreet Kaur, Assistant Vice President at Morgan Stanley.

Slides for this workshop: https://bit.ly/explainableai_slides

Read More

Baby steps towards building your first ML model | Manogna Mantripragada

Thumbnail for Baby steps towards building your first ML model | Manogna Mantripragada

This workshop aims to enable young data scientists to start their first ML project. It would help them understand the process from gathering data to building their ML model. Building an ML model is easy, but building it the correct way is a lot harder than known.

This workshop was conducted by Manogna Mantripragada, Data Scientist at Greenlink Analytics.

Access resources for this workshop: https://bit.ly/energy_burden_analysis…

Read More

Alternative approaches to A/B Experiments – 3 Causal Impact Approaches | Jennifer Vlasiu

Thumbnail for Alternative approaches to A/B Experiments - 3 Causal Impact Approaches | Jennifer Vlasiu

Make answering ‘what if’ analysis questions a whole lot easier by learning about state-of-the-art, end-to-end applied frameworks for causal inference.

We will cover:
Microsoft’s “Do Why” Package Causal Impact in Python – DoWhy | An end-to-end library for causal inference — DoWhy | An end-to-end library for causal inference documentation (microsoft.github.io)
Bayesian Causal Impact in R
MLE Causal Impact in Python
Bonus: AA Testing, when to use and why it matters
We will apply these models in the context of understanding the impact of a marketing rewards campaign, as well as understand the impact from a product/feature upgrade

This workshop was conducted by Jennifer Vlasiu, Data Science & Big Data Instructor at York University

Useful resources for this workshop:
– https://bit.ly/github_casual_impact

Read More

Predicting customer choice: A case study on integrating AI within a discrete choice model | Kathryn

Thumbnail for Predicting customer choice: A case study on integrating AI within a discrete choice model | Kathryn

Neural networks have been widely celebrated for their power to solve difficult problems across a number of domains. We explore an approach for leveraging this technology within a statistical model of customer choice. Conjoint-based choice models are used to support many high-value decisions at GM. In particular, we test whether using a neural network to model customer utility enables us to better capture non-compensatory behavior (i.e., decision rules where customers only consider products that meet acceptable criteria) in the context of conjoint tasks. We find the neural network can improve hold-out conjoint prediction accuracy for synthetic respondents exhibiting non-compensatory behavior only when trained on very large conjoint data sets. Given the limited amount of training data (conjoint responses) available in practice, a mixed logit choice model with a traditional linear utility function outperforms the choice model with the embedded neural network.

This workshop was conducted by Kathryn Schumacher, Staff Researcher in the Advanced Analytics Center of Expertise within General Motor’s Chief Data and Analytics Office.

Read More

Demystifying Data Pre-processing & Data Wrangling for Data Science | Pariza Kamboj

Thumbnail for Demystifying Data Pre-processing & Data Wrangling for Data Science | Pariza Kamboj

In the current era, Data Science is rapidly evolving and proving very decisive in ERP (Enterprise Resource Planning). The dataset required for building the analytical model using data science, is collected from various sources such as Government, Academic, Web Scraping, API’s, Databases, Files, Sensors and many more. We cannot use such real-world data for analysis process directly because it is often inconsistent, incomplete, and more likely to contain bulk errors. We often hear the phrase “garbage in, garbage out”. Dirty data or messy data riddled with inaccuracies and errors, result in a bad/improperly trained model which in turn might result in poor business decisions and sometimes even hazardous to the domain. Any powerful algorithm is failed in providing correct analysis when applied to bad data. Therefore, data must be curated, cleaned and refined to be used in data science and products based on data science. To perform these tasks, “Data Preparation” is required which includes two methods that are: Data Pre-processing, and Data Wrangling. Most data scientists spend the majority of their time in data preparation.

This workshop was conducted by Pariza Kamboj, Professor at Sarvajanik College of Engineering & Technology (SCET).

Useful resources for this workshop:
– https://bit.ly/jupyter_code
– https://bit.ly/cars3_dataset
– https://bit.ly/execution_google_colab
– https://bit.ly/anaconda_installation_…

Read More

Keynote: The Rigorous and Human Life of Data | Cecilia Aragon | University of Washington

Thumbnail for Keynote: The Rigorous and Human Life of Data | Cecilia Aragon | University of Washington

Cecilia Aragon, Professor, Human Centered Design & Engineering, University of Washington, presents a Keynote at the WiDS Worldwide conference.

Very often, the words ‘rigorous’ and ‘human-centered’ have been used as opposites in technical fields, with the implication that a focus on human aspects makes science ‘soft’ or ‘insufficiently technical’. This is a false dichotomy that Cecilia will argue in this talk.

While extraordinary advances in our ability to collect, analyze, and interpret vast amounts of data have been transforming the fundamental nature of data science, the human aspects of data science, including how to support scientific creativity and human insight, how to address ethical concerns, and the consideration of societal impacts, have been less studied. Yet these human issues are becoming increasingly vital to the future of data science. Cecilia will reflect on a 30-year career in data science in industry, government, and academia, discuss what it means for data science to be both rigorous and human-centered, and speculate upon future directions for data science.

Read More

Confronting Data Bias in Travel Demand Modeling | Tierra Bills | UCLA | WiDS 2022

Thumbnail for Confronting Data Bias in Travel Demand Modeling | Tierra Bills | UCLA | WiDS 2022

Tierra Bills, Assistant Professor of Civil and Environmental Engineering and Public Policy, UCLA, presents a Technical Vision Talk at the WiDS Worldwide conference.

Should regions invest in more buses on transit routes, or new bus routes to provide greater transportation accessibility for vulnerable communities? What mix of transportation improvements will offer the greatest boost in accessibility for travelers who most need it? Such questions can be addressed using travel demand analysis tools.

This presentation will summarize various biases in travel data that arise due to underrepresentation of vulnerable populations, how they may come to be, and how such biases can influence travel modeling outcomes.

Read More

Panel: Algorithms and Data for Equity | WiDS 2022

Thumbnail for Panel: Algorithms and Data for Equity | WiDS 2022

WiDS Worldwide panel: Algorithms and Data for Equity

Moderated by Jenny Suckale, Associate Professor, Stanford University

Panelists:
– Tierra Bills, Assistant Professor of Civil and Environmental Engineering and Public Policy, UCLA
– Jessica Granderson, Director for Building Technology, White House Council on Environmental Quality
– Ling Jin, Research Scientist, Lawrence Berkeley National Laboratory

Read More

Career Panel | WiDS 2022

Thumbnail for Career Panel | WiDS 2022

WiDS 2022 Career Panel

Moderated by Suzanne Weekes, Executive Director, SIAM

Panelists:
– Cecilia Aragon, Professor, Human Centered Design & Engineering, University of Washington
– Sharon Hutchins, VP & Chief of Operations, Intuit AI+Data
– Tamara Kolda, Mathematical Consultant, MathSci.ai
– Maggie Wang, Robotics Software Engineer, Skydio

Read More

LIVE: Women in Data Science (WiDS) Worldwide Conference 2022

Thumbnail for LIVE: Women in Data Science (WiDS) Worldwide Conference 2022

Join us online on March 7, 2022, for the Women in Data Science (WiDS) Worldwide conference, a technical conference featuring outstanding women doing exceptional work in data science and related fields, in a wide variety of domains. Everyone is welcome and encouraged to attend. Broadcasted LIVE from Stanford University 8am – 5pm PST.

Read More

Adapting to Climate Change Bit by Bit w/Planetary Health Informatics & Machine Learning, Sara Khalid

Thumbnail for Adapting to Climate Change Bit by Bit w/Planetary Health Informatics & Machine Learning

Living through a pandemic in the era of climate change it can be easy to sense doom and gloom. Yet living in the era data science, for the machine learning community there has not been a better time to act than now. This talk will introduce the audience to planetary health and some of the most pressing issues facing us (and our planet), cover a review of the state-of-the-art in artificial intelligence and data science methods in planetary health informatics and present a summary of the latest research, and finally highlight opportunities for budding and experienced data scientists in this rapidly growing and pertinent field.

This workshop was conducted by Sara Khalid, University Research Lecturer and Senior Research Associate at University of Oxford.

Read More

Dealing with Missing Data

Photographs of Fatima Abu Salem, Maria Gargiulo, Madeleine Udell, and Megan Price. With WiDS branded illustrations in the background.

We live in an era of big data with data sets that require computational analysis to gain insights and knowledge. The volume of big data has been increasing steadily, and will only continue to climb. Since we started the WiDS initiative in 2015, Statistica estimates that the volume of data has increased from 15.5 to 74 zetabytes, and they forecast that data volume will double again by 2024.
Yet with all of this data, one of the biggest challenges that data scientists and researchers face is dealing with missing data. In some cases, the missing data is due to not readily having access to the data sets that are required to perform the analysis, while other cases involve data sets that are incomplete and not uniformly populated.

Read More

Graph Theory for Data Science, Part I: What is a graph and What Can We Do With It?

Thumbnail for Evolution of Applied Recommender Systems | Walmart

Graph theory provides an effective way to study relationships between data points, and is applied to everything from deep learning models to social networks. This workshop is part I in a series of three workshops. Throughout the series we will progress from introductory explanations of what a graph is, through the most common algorithms performed on graphs, and end with an investigation of the attributes of large-scale graphs using real data.

And in particular for Part I:
Graphs are structures that represent pairwise connections, and are used for everything from finding the shortest route between two locations to google’s page rank algorithm. Are you interested in learning about graph theory but don’t know where to start? In this workshop we will introduce graphs, develop comfort with their associated terminology, and investigate real-world applications with a focus on intuitive explanations and examples.

This workshop was conducted by Stanford ICME PhD student, Julia Olivieri.

Read More

Machine Learning for Scientific R&D: Why it’s Hard and Why it’s Fun | Julia Ling

Thumbnail for Machine Learning for Scientific R&D: Why it's Hard and Why it's Fun | Julia Ling

Julia Ling, CTO at Citrine Informatics hosts a workshop on ‘Machine Learning for Scientific R&D: Why it’s Hard and Why it’s Fun’ in which she covers some of the key challenges in machine learning for R&D applications: the small, often-messy, sample-biased datasets; the exploratory nature of scientific discovery; and the curious, hands-on approach of scientific users. Julia discusses potential solutions to these challenges, including transfer learning, integration of scientific domain knowledge, uncertainty quantification, and machine learning model interpretability.

Read More

Evolution of Applied Recommender Systems | Walmart

Thumbnail for Evolution of Applied Recommender Systems | Walmart

Debanjana Banerjee, Data Scientist and Sinduja Subramaniam, Staff Data Scientist with Walmart host a workshop ‘Evolution of Applied Recommender Systems’ where they take you through the whirlwind journey of the recommender system from GroupLens in the 1990s, Content Based Filtering, Matrix Factorization and Hybrid Recommender Systems in the late 2000s all the way to DeepLearning based recommenders of today. The workshop will address foundational concepts such as user-item interaction matrix, user/item profiles, cold-start problem, sparsity, scalability, etc. along with mathematical formulation for different types of recommender systems using applications in Retail.

Read More

An introduction to Data Mesh | Zhamak Dehghani

Thumbnail for An introduction to Data Mesh | Zhamak Dehghani

Zhamak Dehghani, Director, Emerging Technologies, North America at Thoughtworks hosts a workshop on ‘An introduction to Data Mesh: a paradigm shift in analytical data management’ in where Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse. She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.

Read More

Data Processing & Statistical Models to Impute Missing Perpetrator Information | HRDAG

Thumbnail for Data Processing & Statistical Models to Impute Missing Perpetrator Information | HRDAG

Megan Price, Executive Director and Maria Gargiulo, Statistician with Human Rights Data Analysis Group (HRDAG) host a workshop on ‘Data Processing and Statistical Models to Impute Missing Perpetrator Information’ where they use methods from statistics and computer science to help answer questions about mass violence using incomplete and unrepresentative datasets from the context in which HRDAG works and how open-source tools are crucial to their analytical projects.

Read More

Tech Talk: The joys and perils of leveraging mechanistic models in health ML | Emily Fox | WiDS 2021

Thumbnail for Tech Talk: The joys and perils of leveraging mechanistic models in health ML | Emily Fox | WiDS 2021

Emily Fox, Distinguised Engineer at Apple and Professor at the University of Washington explores the hybrid approaches that combine the domain knowledge of mechanistic models with the flexibility and expressivity of machine learning methods. She explore these ideas through two use cases: Glucose forecasting in Type 1 diabetes and modeling the relationship between mobility and transmission in the COVID-19 pandemic.

Read More

Panel: The Democratization of Data | Mary Gray, Zhamak Dehghani & Amanda Obidike | WiDS 2021

Thumbnail for Panel: The Democratization of Data | Mary Gray

Panel discussion on ‘The Democratization of Data’

Moderator: Margot Gerritsen, Professor at Stanford University,
Panelists:
-Mary Gray, Senior Principal Researcher at Microsoft Research and Associate Professor, The School of Informatics, Computing, and Engineering at Indiana University
-Zhamak Dehghani, Director of Next Tech Incubation, Thoughtworks
-Amanda Obidike, Executive Director, STEMi Makers Africa

Read More

Tech Talk: Doing Data Science in Data Deserts | Fatima Abu Salem | WiDS 2021

Thumbnail for Tech Talk: Doing Data Science in Data Deserts | Fatima Abu Salem | WiDS 2021

Fatima Abu Salem, Associate Professor at the American University of Beirut reports on a series of works associated with the Syrian conflict, with help from data obtained from the Violations Documentation Center (VDC). Fatima presents on fake news detection, predicting primary health care demand by Syrian refugees in Lebanon, and understanding some notions of Syrian refugee mobility in Turkey, all seen as instigated by “peaks’’ in the Syrian war, revealed through the VDC. She also presents a brief overview of in-progress projects with a social impact, in application to smart irrigation, predicting birth defects in Lebanon using air pollution data, and quantifying anti-refugee bias across Lebanese news corpora.

Read More

Panel: Ethics & Responsible Data Science | WiDS 2021

Thumbnail for Panel: Ethics & Responsible Data Science | WiDS 2021

Panel discussion on ‘Ethics and Responsible Data Science’

Moderator: Shir Meir Lador, Data Science Group Manager, Intuit
Panelists:
-Andrea Martin, Leader IBM Watson Center Munich & EMEA Client Centers, IBM Distinguished Engineer, IBM
-Monica Scannapieco, Head of the Division “Information and Application Architecture”, Italian National Institute of Statistics
-Nazareen Ebrahim, AI Ethics Officer, Socially Acceptable – South Africa

Read More

UPDATED VIDEO: WiDS Datathon 2020 Webinar: Lessons Learned + Best Practices with Health Data

Thumbnail for UPDATED VIDEO: WiDS Datathon 2020 Webinar: Lessons Learned + Best Practices with Health Data

This open-to-all webinar on-demand explores challenges and opportunities from working with healthcare data, and discuss distinct issues around the technology and the clinical aspect of healthcare machine learning. The panel discusses privacy and compliance, reproducibility, data sensitivity, data complexity, and the end-to-end workflow of AI-based solutions that impact healthcare in the United States and globally.

Speakers:
– Vani Mandava, Director, Data Science, Microsoft Research
– Carly Eckert MD MPH, Director of Clinical Informatics, KenSci
– Leo Anthony Celi MD MS MPH, MIT, Beth Israel Deaconess Medical Center
– Marzyeh Ghassemi PhD, Assistant Professor, University of Toronto
– Meredith Lee PhD, Executive Director, West Big Data Innovation Hub

Download webinar slides: bit.ly/wids_datathon_webinar_slides
More information: widsconference.org/datathon

Read More

Don’t Look. See! Are We Blinded by Data (Visualization)? | Fanny Chevalier | WiDS 2020

Thumbnail for Don't Look. See! Are We Blinded by Data (Visualization)? | Fanny Chevalier | WiDS 2020

Fanny Chevalier, Assistant Professor at University of Toronto delivers a Technical Vision Talk at WiDS Stanford University on March 2, 2020:

We are constantly required to make decisions about the world we live in. But are we good judges of how things work and what is best to do in each situation? Dr. Chevalier’s talk will explore why we may not always make well-informed decisions, even with best intentions, and even when our motivations are driven by careful examination of data. She will challenge the ways we leverage data for analysis and communication, and propose strategies that embrace the imperfect, subjective nature of human’s perception.

Read More

Why a World with AI Needs More EQ | Tsu-Jae King Liu | WiDS 2020

Thumbnail for Why a World with AI Needs More EQ | Tsu-Jae King Liu | WiDS 2020

Tsu-Jae King Liu, Dean of Berkeley School of Engineering at University of California, Berkeley delivers a Keynote presentation at WiDS Stanford University on March 2, 2020:

Today we live in a dynamic and unpredictable world that is increasingly dependent on engineered devices, processes and systems. A 2017 workforce report by the McKinsey Global Institute indicates that all workers will need to adapt as their occupations evolve with increasingly capable machines. In the age of artificial intelligence (AI) and data science, workers will spend more time on activities that require social and emotional skills, creativity, high-level cognitive capabilities and other skills that are relatively hard to automate.

There is growing evidence of the importance of a high emotional quotient (EQ) as a predictor of success and organizational performance. In this talk, Professor Liu will share insights gained from her personal career journey and describe initiatives being undertaken in the College of Engineering at the University of California, Berkeley to cultivate EQ in their students and to advance equity and inclusion, toward a brighter future for all.

Read More

Talithia Williams, Harvey Mudd College | Stanford Women in Data Science (WiDS) Conference 2020

Thumbnail for Talithia Williams

Talithia Williams, Host of NOVA Wonders PBS & Associate Professor of Mathematics, Harvey Mudd College | @Dr_TalithiaW sits down with Sonia Tagare for WiDS 2020 in Stanford, CA.

#WiDS2020 #WomenInTech #theCUBE

https://siliconangle.com/2020/03/05/i…

Harvey Mudd College professor highlights importance of personal health data, diversity in tech

There’s no doubt that the use of data is valuable for businesses. But it’s not just companies that can benefit from data insights.

Individuals can and should also collect their own body data and use it to have a better life, according to Talithia Williams (pictured), associate dean and associate professor of mathematics at Harvey Mudd College.

“We have so many devices that collect data automatically for us, and often we don’t pause long enough to actually look at that history,” she said. “It’s really challenging people to think about how they can use data that they collect about their bodies to help make better health decisions.”

Williams spoke with Sonia Tagare, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Women in Data Science conference in Stanford, California. They discussed the ways in which people can obtain their own data, the importance of including women of color in the technology industry and the privacy challenges related to using data for business purposes.

Understanding the information
Among the information that people can collect about themselves are, for example, blood pressure, blood sugar, and temperature. But just as important as collecting the data is to be active in interpreting it, according to Williams.

“It’s not like if you take this data, you will be healthier or you will live to 100,” she added. “It’s really a matter of challenging people to own the data that they have and get excited about understanding it.”

Data is also important in enabling individuals to set goals to change their lifestyle practices.

“When I take my heart rate data or my pulse, I’m really trying to see if I can get lower than how it was before,” Williams said. “So, the push is really how my exercise and my diet are changing so that I can bring my resting heart rate down.”

Diversity in STEM fields
With a doctorate in statistics, in addition to her role as a professor, Williams is host of a PBS program called “NOVA Wonders,” which “follows researchers as they tackle unanswered questions about life and the cosmos.” She also wrote the book “Power in Numbers: The Rebel Women of Mathematics,” which aims to inspire women of color to work in technology-related industries.

“I really wanted to highlight sort of where we have been, but also where we are going and the amazing women that are doing work on it,” she explained.

It’s the responsibility of those in STEM fields to find ways to advocate for women and especially for women of color, according to Williams.

“Often it takes someone who’s already at the table to invite other people to the table,” she said. “I think the onus is more on people who occupy those spaces already to think about how they can be more intentional in bringing diversity.”

Read More

Newsha Ajami, Stanford University | Stanford Women in Data Science (WiDS) Conference 2020

Thumbnail for Newsha Ajami

Newsha Ajami, Director of Urban Water Policy, Stanford University sits down with Sonia Tagare for WiDS 2020 at Stanford, CA.

#WiDS2020 #WomenInTech #theCUBE

https://siliconangle.com/2020/03/16/c…

Creating resilient, sustainable water supplies means flipping the management paradigm

Humanity is dependent on water, but modern methods have major flaws. Perhaps today’s technology can help.

For millennium, settlements were centered on water supplies and drought signaled disaster. The Egyptians were the first to manage the critical resource; diverting the Nile flood waters using a series of dams and canals in a major irrigation project that turned a seasonal lake into a water reservoir. The concept was exploited to great success by the Romans; the empire became known for its sophisticated aqueduct system that transported water from rural areas to towns.

What was basically the same idea continued into the twentieth century, and dams and reservoirs are still being built to supply our ever-expanding cities. However, despite being the definitive method for water management for most of human civilization, the top-down model has a major flaw.

“People were not part of the loop. The way that they behaved, their decision-making process, what they use, how they use it, wasn’t necessarily part of the process,” said Newsha Ajami (pictured), director of urban water infrastructure and policy at Stanford University’s Water in the West program. “We assume there’s enough water out there to bring water to people and they can do whatever they want with it.”

As well as Water in the West, Ajami works with Stanford-based National Science Foundation Engineering Research Center’s Re-Inventing the Nation’s Urban Water Infrastructure (ReNUWIt), and is in her second term serving on the San Francisco Bay Regional Water Quality Control Board.

Ajami spoke with Sonia Tagare, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Women in Data Science conference in Stanford, California. They discussed how Ajami is working to bridge the gap between science and policy in water management, building solutions for water resilient cities, and changing the traditional top-down water management model to a more collaborative bottom-up approach.

This week theCUBE spotlights Newsha Ajami in its Women in Tech feature

Girls and boys are equal in STEM
Ajami was born in Tehran, Iran, to a family of engineers. She was encouraged in her love of math and problem solving and recalls spending hours building Legos and playing mathematical games. She credits her mother with being her biggest fan and mentor and is quoted as saying she was “raised gender blind” and taught “to be fearless, open-minded, and resilient.”

Thanks to her family’s recognition and support of her math and science abilities, Ajami attended Amirkabir University of Technology, one of Iran’s top universities. She graduated with a bachelor of science in civil and environmental engineering.

“We are all equal. Our brains are all made the same way. It doesn’t matter what’s on the surface,” is Ajami’s message to those who want to study for a career in math, science, technology or engineering. “I encourage all girls to study hard and not get discouraged. Fail as many times as you can, because failing is an opportunity to become more resilient and learn how to grow,” she said.

After living in Tehran during the Iran-Iraq war, Ajami personally understands how water shortages can affect daily life for a city’s inhabitants. “Demand management and public awareness was a centerpiece in dealing with scarcity,” she said, recalling how the experience inspired her to focus on sustainable resource management.

Ajami moved to the United States to attend graduate school at George Washington University, but soon switched to the University of Arizona’s hydrology and water resources program to study under Soroosh Sorooshian, founding director of the university’s National Science Foundation center on sustainability of semi-arid hydrology and riparian areas. “It was one of the best decisions I made,” she said

Her time at the University of Arizona inspired a love for public policy and applied interdisciplinary research, and after obtaining her master’s degree, Ajami followed Sorooshian to the University of California at Irvine to pursue a doctorate in civil and environmental engineering. She continued her education with post-doc research at the University of California at Berkeley, in “the impacts of hydrological uncertainty on efficient and sustainable water resources management and planning.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the Women in Data Science conference:

Read More

Talithia Williams, Harvey Mudd College | Stanford Women in Data Science (WiDS) Conference 2020

Thumbnail for Talithia Williams

Talithia Williams, Host of NOVA Wonders PBS & Associate Professor of Mathematics, Harvey Mudd College | @Dr_TalithiaW sits down with Sonia Tagare for WiDS 2020 in Stanford, CA.

#WiDS2020 #WomenInTech #theCUBE

https://siliconangle.com/2020/03/05/i…

Harvey Mudd College professor highlights importance of personal health data, diversity in tech

There’s no doubt that the use of data is valuable for businesses. But it’s not just companies that can benefit from data insights.

Individuals can and should also collect their own body data and use it to have a better life, according to Talithia Williams (pictured), associate dean and associate professor of mathematics at Harvey Mudd College.

“We have so many devices that collect data automatically for us, and often we don’t pause long enough to actually look at that history,” she said. “It’s really challenging people to think about how they can use data that they collect about their bodies to help make better health decisions.”

Williams spoke with Sonia Tagare, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Women in Data Science conference in Stanford, California. They discussed the ways in which people can obtain their own data, the importance of including women of color in the technology industry and the privacy challenges related to using data for business purposes.

Understanding the information
Among the information that people can collect about themselves are, for example, blood pressure, blood sugar, and temperature. But just as important as collecting the data is to be active in interpreting it, according to Williams.

“It’s not like if you take this data, you will be healthier or you will live to 100,” she added. “It’s really a matter of challenging people to own the data that they have and get excited about understanding it.”

Data is also important in enabling individuals to set goals to change their lifestyle practices.

“When I take my heart rate data or my pulse, I’m really trying to see if I can get lower than how it was before,” Williams said. “So, the push is really how my exercise and my diet are changing so that I can bring my resting heart rate down.”

Diversity in STEM fields
With a doctorate in statistics, in addition to her role as a professor, Williams is host of a PBS program called “NOVA Wonders,” which “follows researchers as they tackle unanswered questions about life and the cosmos.” She also wrote the book “Power in Numbers: The Rebel Women of Mathematics,” which aims to inspire women of color to work in technology-related industries.

“I really wanted to highlight sort of where we have been, but also where we are going and the amazing women that are doing work on it,” she explained.

It’s the responsibility of those in STEM fields to find ways to advocate for women and especially for women of color, according to Williams.

“Often it takes someone who’s already at the table to invite other people to the table,” she said. “I think the onus is more on people who occupy those spaces already to think about how they can be more intentional in bringing diversity.”

Read More

Filling in Missing Data with Low Rank Models | Madeleine Udell | WiDS 2019

Thumbnail for Filling in Missing Data with Low Rank Models | Madeleine Udell | WiDS 2019

Madeleine Udell, Assistant Professor of Operations Research & Information Engineering, & Richard and Sybil Smith Sesquicentennial Fellow, Cornell University

Data scientists are often faced with the challenge of understanding a high dimensional data set organized as a table. These tables may have columns of different (sometimes, non-numeric) types, and often have many missing entries. In this talk, we discuss how to use low rank models to analyze these big messy data sets.

Low rank models perform well across a wide range of data science applications, including recommender systems, movie references, topic models, medical records, and genomics. In this talk, we introduce the mathematics of low rank models,
demonstrate a few surprising applications of low rank models in data science, and present a simple mathematical explanation for their efficacy.

Read More

Better Reinforcement Learning for Human in the Loop Systems | Emma Brunskill | WiDS 2019

Thumbnail for Better Reinforcement Learning for Human in the Loop Systems | Emma Brunskill | WiDS 2019

Emma Brunskill, Assistant Professor, Computer Science, Stanford University

There is increasing excitement about reinforcement learning– a subarea of machine learning for enabling an agent to learn to make good decisions. Yet numerous questions and challenges remain for reinforcement learning to help support progress in important high stakes domains like education, consumer marketing and healthcare. I’ll discuss some recent advances in these areas, and our work towards creating transparent, accountable reinforcement learning approaches that can interact beneficially with people.

Read More

Building Trust in the Digital Age | Yinglian Xie | WiDS 2019

Thumbnail for Building Trust in the Digital Age | Yinglian Xie | WiDS 2019

Yinglian Xie is the CEO and co-founder of DataVisor, the leading AI and big data analytics company that protects consumer-facing enterprises from a variety of fraud, abuse, and money laundering activities.

She shares her insights into the growing problem of highly sophisticated fraud, and how DataVisor‚Äôs innovative technology helps companies fight back. Yinglian founded the AI-based fraud-detection company with an ambitious vision: to stop fraudsters in their tracks and to restore online trust with the help of big data and machine learning. She is set to present DataVisor’s quarterly Fraud Index Report, focusing on how cyber-criminals‚Äô techniques are ever-evolving ‚Äî from basic attacks to the most sophisticated and highly-organized attacks. She will also speak on fraud detection and share which methods work best at each stage, as well as share her vision of how fraud detection will unfold in the coming decade. Yinglian, who is passionate about helping entrepreneurs succeed, will also share her business advice and success tips to help others launch, operate, and grow their own businesses.

Read More

Madeleine Udell, Cornell University | WiDS 2019

Thumbnail for Madeleine Udell

Madeleine Udell, Assistant Professor, Cornell University, @madeleineudell sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #CornellUniversity #theCUBE

https://siliconangle.com/2019/03/08/t…

This professor is cleaning up tech’s ‘messy data’ problem

Strong data sets are table stakes for any organization today. Data insights can provide the tentpoles for building a strategic roadmap and offer unexpected learnings for businesses to leverage as new market opportunities. But even the most valuable data set can prove worthless if its insights are entangled in the unstructured digital void.

An estimated 80 percent of all data is unstructured, which renders the intel buried in its complex documents and media files inaccessible without an alternative method of analysis. As information floods the tech industry faster than new talent is prepared to make sense of it, the unstructured data challenge is posing a formidable hurdle for businesses in the digital age.

Madeleine Udell (pictured), assistant professor of operations research and information engineering at Cornell University, is educating a new era of technologists to decode this so-calledmessy data” with a more effective approach to tech collaboration.

Oftentimes people only learn about big, messy data when they go to industry,” Udell said.I’m interested in understanding low dimensional structure in large, messy data sets [to] figure out ways of … making them seem cleaner, smaller and easier to work with.”

Udell spoke with Lisa Martin, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the recent Stanford Women in Data Science event at Stanford University.

This week, theCUBE spotlights Madeleine Udell in its Women in Tech feature.

The unstructured data challenge
The rise of messy data can be attributed in large part to the influx of information from a growing number of digital endpoints. Internet of things devices deliver a stream ofmessy” data, but the clutter can also come from images, videos, social media, emails, and other data sets not already formatted for simple analysis.

Though more complex and tedious to decipher, these data sources are some of the most highly valued in a market focused on individual user targeting. That gap between ability and potential innovation is what drives Udell’s interest in unstructured data, an area of technology the assistant professor says people entering the tech industry are not adequately prepared for. In her own classes, Udell teaches optimization for machine learning from a messy data perspective.

[The class] introduces undergraduates to what messy data sets look like, which they often don’t see in their undergraduate curriculum, and ways to wrangle them into forms they could use with other tools they have learned as undergraduates,” she said.

Udell’s interest in messy data was piqued when she met the challenge head on working in the Obama 2012 presidential campaign. She was tasked with analyzing voter information but found the unstructured data sets too cumbersome to yield valuable insight.

They had hundreds of millions of rows, one for every voter in the United States, and tens of thousands of columns about things that we knew about those voters,” Udell said.Gender … education level, approximate income, whether or not they had voted in the last elections, and much of the data was missing. How do you even visualize this kind of data set?”

When Udell returned to work on her Ph.D., she was intent on discovering a more efficient method for parsing out value from unstructured data sets.I wanted to figure out the right way of approaching this, because a lot of people will just sort of hack it,” she saidI wanted to understand what’s really going on.”

Making an impact with communication
Udell is as interested in the technical architectures that enable data analysis as she is in supporting organizations through the implementation processes that will allow them to benefit from her work. A comprehensive answer to data management requires both math and communication, and Udell says her broad skill set is part of what has enabled her to make sense of messy data.

If you want your technical work to have an impact, you need to be able to communicate it to other people,” Udell stated.

The social aspect of her role is crucial to finding solutions that actually address user problems and work within existing processes.You need to make … sure you’re working on the right problems, which means talking with people to figure out what the right problems are,” she said.This is … fundamental to my career, talking to people about problems they’re facing that they don’t know how to solve.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event:

Read More

Kristina Draper, Wells Fargo | WiDS 2019

Thumbnail for Kristina Draper

Kristina Draper, Technology Division Executive, Consumer Bank & Services Technology, Wells Fargo | @kristinadraper sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #WellsFargo #theCUBE

https://siliconangle.com/2019/03/05/q…

Q&A: Wells Fargo aims for 100-percent data transparency in new era of consumer trust

The big data explosion has created transformative innovation opportunities for technology, as well as businesses across industries. As consumers better understand their piece in that data puzzle and the market begins to find its footing in a data-driven digital landscape, companies must adopt a responsibility around transparency to maintain trust and efficiency.

Greater visibility around data-driven processes can also lead to more comprehensive solutions through interdisciplinary collaboration, according to Kristina Draper (pictured), chief technology officer at Wells Fargo & Co.

Draper spoke with Lisa Martin (@LisaMartinTV), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Stanford Women in Data Science event in Stanford, California. They discussed the role data is playing in a new era of accountability at Wells Fargo, as well as how Draper is reaching beyond the financial industry for greater innovation opportunities.

[Editor’s note: The following answers have been condensed for clarity.]

Tell a little about your involvement in WiDS, as well as Wells Fargo’s involvement as a sponsor.

Draper: We believe so strongly that in the consumer bank space we have a tremendous opportunity and responsibility to understand how our customers interact with Wells Fargo, and that will require a discipline around data science. We had an opportunity this year to be an executive sponsor and jumped at it. I think we’ll continue to be at that sponsor level in future years.

You were recently named one of the 50 most powerful women in technology. What are some of the [ways] Wells Fargo is re-imagining data and trust? What have you seen of the evolution of females in technology and leadership roles?

Draper: The recognition [of] women in technology … is an opportunity to demonstrate that we should be very confident in the value that we bring as leaders, and that confidence as a woman is hard to come by. I think of my own personal career and the way that doors were opened for me along the way; often we are our own worst enemies. We second guess ourselves, we second guess our value, and we have to really work for that seat at the table.

My coming back to Wells was really … as a leader in technology. I felt I could make a real impact. When I think about what we can do as women leaders in technology and in data science, a lot of it is owning that accountability to leadership and paving the way for leaders behind us. There comes a part in a career, certainly mine, where you’re no longer thinking about the next job for yourself.

We’re in a consumer banking space and financial services, so there’s certainly a lot of places to innovate [and] think about how technology can help to serve a Wells Fargo customer. You need your bank throughout your entire life. Whether you are thinking about a home purchase, an auto purchase, college for your children, retirement, there’s so many big markers in life. And that’s where I get excited about not only the leadership role that I have now, but I have the opportunity to bring a team with me to contribute real value.

You have a pay-it-forward attitude. How are you using that to expand your team … to continue this big re-imagining that Wells Fargo as a business is undergoing?

Draper: WiDS is … a tremendous network opportunity. [I’m] so inspired about how they’re turning data science and really thinking about different problems [and] ways we can improve not only our lives, but the lives of future generations to come.

I come from a financial services background, but the problems that our future generations will face can’t be solved with just one lens. You can’t solve problems with just a financial services expertise or just a technical expertise. It’s the space in between art and science. It’s an ability to think across industry and apply solutions and innovation that have been brought forward through other industries, through other companies, through other academia, and thinking about how that could apply in solving the problems that we’re faced with in the financial services space.

If I turned some of the problems that we’re faced with upside down and thought about it with that perspective, and invited some collaboration to help solve problems, we might come up with a better answer.

How can financial services and the data that you deal with help customers?

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event.

Read More

Janet George, Western Digital | WiDS 2019

Thumbnail for Janet George

Janet George, “Fellow” Chief Data Officer/Scientist/Big Data/Cognitive Computing, Western Digital sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #WesternDigital #theCUBE

https://siliconangle.com/2019/03/07/q…

Q&A: How AI is cultivating a responsible community to better mankind

Artificial intelligence initiatives powered by big data are propelling businesses beyond the capacity of human labor. While AI tech offers an undeniable opportunity for innovation, it has also sparked a debate around potential misuse through the vast reach of programmed biases and other problematic behaviors.

The power of AI can be comprehensively harnessed for good by fostering diverse teams focused on ethical solutions and working in tandem with policymakers to ensure responsible scale, according to Janet George (pictured), fellow and chief data officer at WD, a Western Digital Company.

George spoke with Lisa Martin (@LisaMartinTV), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Stanford Women in Data Science event in Stanford, California. They discussed the range of possibilities in AI and how WD is leveraging the technology toward sustainability.

[Editor’s note: The following answers have been condensed for clarity.]

Tell us about Western Digital’s continued sponsorship and what makes this important to you.

George: Western Digital has recently transformed itself … and we are a data-driven … data-infrastructure company. This momentum of AI is a foundational shift in the way we do business. Businesses are realizing that they’re going to be in two categories, the ‘have’ and the ‘have not.’ In order to be in the have category, you have to embrace AI … data … [and] scale. You have to transform yourself to put yourself in a competitive position. That’s why Western Digital is here.

How has Western Digital transformed to harness AI for good?

George: We are not just a company that focuses on business for AI. One of the initiatives we are doing is AI for Good and … Data for Good … working with the UN. We’ve been focusing on trying to figure out the data that impacts climate change. Collecting data and providing infrastructure to stow massive amounts of species data in the environment that we’ve never actually collected before. Climate change is a huge area for us, education … [and] diversity. We’re using all of these areas as a launching pad for Data for Good and trying to use data … and AI to better mankind.

Now we have the data to put out massively predictive models that can help us understand what the change would look like 25 years from now and take corrective action. We know carbon emissions are causing very significant damage to our environment and there’s something we can do about it. Data is helping us do that. We have the infrastructure, economies of scale. We can build massive platforms that can stow this data and then we can analyze this data at scale. We have enough technology now to adapt to our ecosystem … and be better in the next 10 years.

What are your thoughts on data scientists taking something like a Hippocratic Oath to start owning accountability for the data that they’re working with?

George: We need a diversity of data scientists to have multiple models that are completely diverse, and we have to be very responsible when we start to create. Creators have to be responsible for their creation. Where we get into tricky areas are when you are the human creator of an AI model, and now the AI model has self-created because it has self-learned. Who owns the copyright to those when AI becomes the creator? The group of people that are responsible for creating the environment, creating the models, the question comes into how do we protect the authors, the users, the producers, and the new creators of the original piece of art.

You can use the creation for good or bad. The creation recreates itself, like AI learning, on its own with massive amounts of data after an original data scientist has created the model. Laws have to change; policies have to change. Innovation has to go, and at the same time, we have to be responsible about what we innovate.

Where are we as a society in starting to understand the different principles and practices that have to be implemented in order for proper management of data to enable innovation?

George: We’re debating the issues. We’re coming together as a community. We’re having discussions with experts. What are we seeing as the longevity of that AI model in a business setting, in a non-business setting? How does the AI perform? We are now able to see the sustained performance of the AI model.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event.

Read More

Liza Donnelly, The New Yorker | WiDS 2019

Thumbnail for Liza Donnelly

Liza Donnelly, Writer & Cartoonist, The New Yorker sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #TheNewYorker #theCUBE

https://siliconangle.com/2019/03/07/q…

Q&A: Cartoons illustrate what’s possible in a more accessible tech industry

The real value of data is in its ability to tell a story through the technologists working to analyze and implement it in new creative solutions. Telling stories through a more accessible medium is what Liza Donnelly (pictured), staff cartoonist atThe New Yorker,” does in her visual journalism work by sharing sketches that condense a rich experience into a single snapshot.

Donnelly spoke with Lisa Martin (@LisaMartinTV), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Stanford Women in Data Science event in Stanford, California. They discussed how cartoons can be used to tell stories from different perspectives and why illustrating women working in technology is quietly revolutionary.

[Editor’s note: The following answers have been condensed for clarity.]

Tell us a little … about visual journalism.

Donnelly: I am somebody who goes to events — political, social or cultural — and draws what I see. I’m not a court reporter. I’m an impressionist. I give people the feeling that they’re there with me by what I draw. I try to capture that person’s essence. Oftentimes I try to capture a sentence that they’re saying that has a more universal appeal that somehow brings like a layman into the subject a little bit. This visual journalism is more like reportage. I do behind the scenes too. At the Oscars I’ll do the stars if I can get them … but then I also do the people taking out the trash, the guy painting the sideboard, the cameraman. I try to give a sense of what it’s like to be there.

I do them on my iPad, and I send them out on social media almost immediately so they feel like they’re there. It gives people a different perspective of what’s going on, and I think that my background as a cartoonist forThe New Yorker” for 40 years informs these drawings in an indirect background kind of way because I’ve been watching culture [and] politics for a very long time.

I’d love to understand, from your perspective, the evolution of cartoons and the impact they can make in society.

Donnelly: Cartoons can be very controversial and problematic. That’s been true through the course of the history of our country … but it’s compounded now because of the internet. Cartoons can be misunderstood. They can be used as weapons.

I’m going to be talking about this at South by Southwest … about political cartoons and what their impact has been in the past, and how they create an impact now, and why that is, and how we can use it to good effect. I think a problem we’re dealing with right now in our culture is everybody is so divided, and so opinionated, and so hateful towards each other. Can we use cartoons not to perpetuate that but to make things better in some way?

There are more and more cartoons on the internet now. There’s a lot of webcomics, and young cartoonists are using the internet effectively to put out their ideas. The internet is just a dialogue with people. I think this new generation is really trying to find ways to use these tools in a good way. They’re trying to make a better world.

Tell us about how you got involved with Women in Data Science.

Donnelly: A big part of what I want to do with my work is promoting equal rights for women around the world, and so I thought,This sounds terrific.” Plus it’s global, and I do a lot of work globally to help women and freedom of speech as well. It seemed to be a great fit, and it seems even more to be a good fit in that it’s a way to get the information out there in a visual way, because people, they hear the worddata,” and they probably just glaze over. But they see it connected with a cartoon or a drawing, it humanizes it for them a little bit.

Today I was drawing a woman speaker talking about really technical data science. I put it on the internet and I thought, it’s just a constant reminder to people that women are doing this. If you see it, it resonates a little bit more quickly and more forcefully in your brain. I think more women are stepping into this field and being recognized for doing so.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event.

Read More

When Data Science IS the Business! | Leda Braga | WiDS 2018

Thumbnail for When Data Science IS the Business! | Leda Braga | WiDS 2018

Leda Braga, Chief Executive Officer at Systematica Investments, delivers a Keynote presentation at the WiDS 2018 Conference held at Stanford University.

Objective analysis of relevant data can improve the execution of most businesses. From the simple client feedback form through to production statistics, listening to the data helps. In the investment management industry, by contrast, data analysis IS the business. Investment management is information management and data science is not an aid to decision making, but rather the essence of it.

This talk will explore the reality of investment management, how recent developments in data and AI are shaping the fund management industry and the challenges of dealing with financial data. Also in the context of the WiDS forum and its clear focus on diversity, trends such as ethical investing (or Socially responsible Investing SRI) will also be discussed.

Read More

Integrating Data Science and Cyber Security | Bhavani Thuraisingham | WiDS 2018

Thumbnail for Integrating Data Science and Cyber Security | Bhavani Thuraisingham | WiDS 2018

Bhavani Thuraisingham, Professor of Computer Science at University of Texas at Dallas, presents Integrating Data Science and Cyber Security at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

The collection, storage, manipulation, analysis and retention of massive amounts of data have resulted in serious security and privacy considerations. Various regulations are being proposed to handle big data so that the privacy of the individuals is not violated. For example, even if personally identifiable information is removed from the data, when data is combined with other data, an individual can be identified. While collecting massive amounts of data causes security and privacy concerns, big data analytics applications in cyber security is exploding. For example, an organization can outsource activities such as identity management, intrusion detection and malware analysis to the cloud. The question is, how can the developments in data science techniques be used to solve security problems? Furthermore, how can we ensure that such techniques are secure and adapt to adversarial attacks? This presentation will first describe our research in data science including in stream data analytics and novel class detection and discuss its applications to insider threat detection. Second, it will discuss the emerging research area of adversarial machine learning. Finally, it will discuss why women should pursue careers in data science.

Read More

Career Panel | WiDS 2018

Thumbnail for Career Panel | WiDS 2018

A Career Panel moderated by Margot Gerrtisen and with questions from the audience. Panelists include:

– Elena Grewal, Head of Data Science at Airbnb
– Bhavani Thuraisingham, Professor of Computer Science at University of Texas at Dallas
– Ziya Ma, VP Software and Services Group and Director of Big Data Technologyies at Intel Corporation
– Jennifer Prendki, Head of Data Science at Atlassian

Read More

More Data, More (Statistical) Problems | Daniela Witten | WiDS 2018

Thumbnail for More Data

Daniela Witten, Associate Professor of Statistics and Biostatistics at University of Washington, presents More Data, More (Statistical) Problems at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

By now, virtually every field has become inundated with big data. We have been promised that this data will usher in a new era of previously unimaginable societal and scientific progress. While it is certainly true that more data brings with it incredible opportunities, it is also true that more data can bring new and previously unimaginable statistical challenges. I will talk about some of those statistical challenges, as well as statistical ways to solve them. Examples will be taken from biomedical research.

Read More

Healthcare Beyond the Horizon — Going Digital to Improve People’s Lives | Mala Anand | WiDS 2018

Thumbnail for Healthcare Beyond the Horizon -- Going Digital to Improve People's Lives | Mala Anand | WiDS 2018

Mala Anand, EVP, President of Leonardo, Data & Analytics at SAP, presents Healthcare Beyond the Horizon — Going Digital to Improve People’s Lives at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

Never before have there been so many promising breakthrough technologies available – and with it, opportunities to dramatically change the way we live every day. Nowhere is this more evident than in Healthcare, where we see technologies like Analytics, IoT, Machine Learning, Big Data and Blockchain playing significant roles in transforming people’s lives all over the planet. Mala Anand, President of SAP Leonardo, Data & Analytics, will provide some insight on how the healthcare industry is going digital and how far we can possibly go to improve patient outcomes.

Read More

Data-Driven Storytelling | Nathalie Henry Riche | WiDS 2018

Thumbnail for Data-Driven Storytelling | Nathalie Henry Riche | WiDS 2018

Nathalie Henry Riche, Researcher at Microsoft Research, presents Data-Driven Storytelling at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

Data visualization is a powerful medium to makes sense of large amounts of data and communicate insights gained from analyses to a general audience. Research in the field of information visualization aims at designing interactive visual interfaces to augment human cognition for exploring and communicating with data.

In this talk, I will present our latest research efforts in the field of information visualization and data-driven storytelling. Stories supported by facts extracted from data analysis (data-driven storytelling) proliferate in many different forms from static infographics shared on social media to dynamic and interactive applications available on leading news media outlets. I will present research shedding light on what makes visual stories compelling and share insights on how to empower people to build these experiences without programming.

Read More

Data Science Supporting National Security | Dr. Deborah Frincke | WiDS 2017

Thumbnail for Data Science Supporting National Security | Dr. Deborah Frincke | WiDS 2017

Dr. Deborah Frincke leads the Research Directorate of the National Security Agency (NSA), the largest “in-house” research organization in the U.S. Intelligence Community. She also serves as the NSA Science Advisor and Innovation Champion, and is a recipient of the President’s Meritorious Rank Award. In her presentation, Dr. Frincke will discuss NSA’s unclassified research programs and describe how the Research Directorate supports national-level missions. She will provide key insights on data science challenges facing NSA and the nation.
Dr. Frincke talks about Mission-Oriented Research; how rock climbing is similar to sorting through messy data; and how adversarial machine learning is an area of active research.

Read More

A Quest for Visual Intelligence in Computers | Fei-Fei Li | WiDS 2017

Thumbnail for A Quest for Visual Intelligence in Computers | Fei-Fei Li | WiDS 2017

It takes nature and evolution more than five hundred million years to develop a powerful visual system in humans. The journey for AI and computer vision is about fifty years. In this talk, I will briefly discuss the key ideas and the cutting edge advances in the quest for visual intelligences in computers. I will particularly focus on the latest work developed in my lab for both image and video understanding, powered by big data and the deep learning (a.k.a. neural network) architecture.

Fei-Fei Li, Chief Scientist, AI/ML, Google Cloud, Professor of Computer Science, Stanford University
Director, Artificial Intelligence Lab

Read More

Making a complete toolbox for quantitative biological data analyses | Susan Holmes | WiDS 2017

Thumbnail for Making a complete toolbox for quantitative biological data analyses | Susan Holmes | WiDS 2017

Dr. Holmes shares a survey of the current challenges in the analyses of heterogeneous biological data. Combining networks, contingency tables and data from multiple omics domains provides the analysts with multiple choices. The result can be an erroneous p-value or a complicated workflow, both can be irreproducible. I will survey some of the recent approaches to this challenge.

Dr. Susan Holmes, Professor of Statistics, describes processes for analyzing large messy microbiome data sets, and the importance of reproducibility.

Read More

Data Science, Making the World Run Better | Sinead Kaiya | WiDS 2017

Thumbnail for Data Science

SAP is recognized as the most purposeful technology brand in the world by the Fit for Purpose index. It is our vision to help the world run better and improve people’s lives. With 74% of the world’s transaction revenue touching an SAP system, we see tremendous opportunity to apply leading big data technologies to tackle some of the world’s most pressing and complex challenges. Learn how data science is being applied by global enterprises to support the UN Sustainable Development Goals and to positively impact the economy, society and the environment.
Sinead Kaiya, COO at SAP, describes how data science is transforming businesses worldwide, and driving the fourth industrial revolution.

Read More

What Machine Learning Can Do for Healthcare | Finale Doshi-Velez | WiDS 2017

Thumbnail for What Machine Learning Can Do for Healthcare | Finale Doshi-Velez | WiDS 2017

Healthcare is an area where data science and artificial intelligence have tremendous potential to improve lives and where significant methodological advances are needed to achieve that promise. In this talk, I will highlight clinical needs in which data science can help — accurate diagnosis, long-term disease management, and personalized treatment — and also the hard, interesting methodological challenges — particularly in robust inference and interpretability — that will be part of the solution. I will do so by sharing examples of work from our group, which focuses on learning timeseries and sequential decision-making models for health applications ranging from better understanding autism spectrum disorder to managing patients with HIV or in the ICU.
Dr. Finale Doshi-Velez from Harvard University describes how machine learning is optimizing treatment for HIV patients, and beyond.

Read More

Beware what you ask for: The secret life of predictive models | Claudia Perlich | WiDS 2017

Thumbnail for Beware what you ask for: The secret life of predictive models | Claudia Perlich | WiDS 2017

Predictive modeling and its variants are at the core of an increasing number of technical advances that touch us in every aspect of our life. Today, nobody doubts the ability of machines to learn from historical data and predict with far higher accuracy than any human. But real world applications of machine learning are often a far cry from the well understood academic assurances of how these algorithms should behave. In this talk I will share some practical lessons when models had a surprising secret life and did something very different from what I thought I had asked them to do. As the creators of machine learning solutions it is our responsibility to pay attention to the often subtle symptoms and to let our human intuition be the gate keeper deciding when our models are ready to be released ‘into the wild’.

Claudia Perlich, Chief Scientist at Dstillery, talks about how data scientists need to use a combination of data science and intuition to deliver accurate insights from data sets.

Read More

Designing Visualizations: A Systematic Approach | Miriah Meyer | WiDS 2017

Thumbnail for Designing Visualizations: A Systematic Approach | Miriah Meyer | WiDS 2017

Designing visualizations requires a problem-driven approach, beginning with a deep understanding of the need, and continuing with a close collaboration with domain experts to guide the design of algorithms, visual encodings, and interaction mechanisms. I’ll describe this process, and the role of visualization tools for deriving meaning and providing insights with complex, multivariate datasets.

Miriah Meyer, University of Utah

Read More

Big Graph Data Science | Lise Getoor | WiDS 2015

Thumbnail for Big Graph Data Science | Lise Getoor | WiDS 2015

One of the challenges in big data analytics lies in being able to reason collectively about extremely large, heterogeneous, incomplete, and noisy interlinked data. We need data science techniques that can represent and reason effectively with this form of rich and multi-relational graph data. In this talk, I will describe some common inference patterns needed for graph data including: collective classification (predicting missing labels for nodes), link prediction (predicting potential edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe some key capabilities required to solve these problems, and finally I will describe a highly scalable open-source probabilistic programming language being developed within my group to solve these challenges.

Read More

Enabling Breakthrough Insights | Diane Bryant | WiDS 2015

Thumbnail for Enabling Breakthrough Insights | Diane Bryant | WiDS 2015

The vast ocean of data created in today’s digital world offers enormous potential. However, the key to unlocking that potential lies not in the data itself, but in the science that refines it. The well-defined processes and toolsets designed for legacy BI solutions do not meet the needs of today’s big data analytics environments. Diane will share Intel’s investments in both the technology and the ecosystem to enable the next breakthrough insights.

Read More