Skip to main content
Category

Data Generation/Collection

LIVE: Women in Data Science (WiDS) Worldwide Conference 2022

Thumbnail for LIVE: Women in Data Science (WiDS) Worldwide Conference 2022

Join us online on March 7, 2022, for the Women in Data Science (WiDS) Worldwide conference, a technical conference featuring outstanding women doing exceptional work in data science and related fields, in a wide variety of domains. Everyone is welcome and encouraged to attend. Broadcasted LIVE from Stanford University 8am – 5pm PST.

Read More

Using AI to Fight the Climate Crisis

Illustration of Priya Donti

Priya Donti, Executive Director of Climate Change AI, explains multiple ways that machine learning and AI can be used to mitigate climate change. Her work at the intersection of climate change, computer science, and data science led her to co-found Climate Change AI, our partner for the WiDS Datathon 2023 challenge to improve long-term weather forecasting.

Read More

Data in Seismology and Genomics Research

Illustration of Eileen Martin #Nila Monnier Ioannidis

Finding new ways to collect data – and a willingness to share it – are the hallmarks of a career in academia, according to Eileen Martin and Nila Monnier Ioannidis, when they were at Stanford, as a PhD student and postdoc, respectively. Now, Eileen is an Assistant Professor at Virginia Tech, moving to become an Assistant Professor at Colorado School of Mines in January 2022. Nila is an Assistant Professor at UC Berkeley.

Read More

WiDS Welcome | Margot Gerritsen | WiDS Stanford 2023

Thumbnail for WiDS Welcome | Margot Gerritsen | WiDS Stanford 2023

Margot Gerritsen, Professor Emerita & WiDS Executive Director, Stanford University opens WiDS Stanford 2023.

Biography:
Margot is Professor [Emerita] in the Department of Energy Science & Engineering at Stanford. Her specialties are data analysis, computer simulation and mathematical analysis of natural and engineering processes. From 2010 to 2018, she directed the Institute for Computational and Mathematical Engineering. From 2015 – 2020, she was the Senior Associate Dean for Educational Affairs in the School of Earth, Energy and Environmental Sciences. She co-founded WiDS in 2015. Margot is the Executive Director of WiDS Worldwide, and co-hosts the WiDS Podcast series.

Read More

Opening Address | Srinija Srinivasan | WiDS Stanford 2023

Thumbnail for Opening Address | Srinija Srinivasan | WiDS Stanford 2023

Srinija Srinivasan, Co-Founder, Loove opens WiDS Stanford 2023.

Biography:
Born in India and raised in Lawrence, Kansas, Srinija Srinivasan followed her siblings to college in California. Having studied artificial intelligence at Stanford and worked at a large-scale AI project after graduating, Srinija joined Yahoo! in 1995 as their fifth employee and self-titled Ontological Yahoo. She served as Vice President, Editor-in-Chief at Yahoo! for over 15 years, where her work centered on the human experience, from the categorization system of the Yahoo! Directory to editorial and policy issues globally. During that time she also chaired the board of non-profit SFJAZZ, and these experiences together inspired her to co-found Loove, a music venture exploring how commerce and technology can be guided by artistic values rather than letting our culture be led by market values. She’s a board member of the On Being Project and a vice chair of Stanford University’s Board of Trustees. She lives in Palo Alto, CA and Brooklyn, NY.

Read More

Putting our values into practice in data science | Megan Price, Jennifer Pan, Trina Reynolds-Tyler

Thumbnail for Putting our values into practice in data science | Megan Price

Panel: Putting our values into practice in data science work

Moderator:
Megan Price, Executive Director, Human Rights Data Analysis Group (HRDAG). As the Executive Director of the Human Rights Data Analysis Group, Megan drives the organization’s overarching strategy, leads scientific projects, and presents HRDAG’s work to diverse audiences. Her scientific work includes analyzing documents from the National Police Archive in Guatemala and contributing analyses submitted as evidence in multiple court cases in Guatemala. Her work in Syria includes collaborating with the Office of the United Nations High Commissioner of Human Rights (OHCHR) and Amnesty International on several analyses of conflict-related deaths in that country. In 2022 she was named a Fellow in the American Statistical Association.

Panelists:
Jennifer Pan is a Professor of Communication and Senior Fellow at the Freeman Spogli Institute at Stanford University. Her research resides at the intersection of political communication and authoritarian politics. Using large-scale datasets on political activity in China and other authoritarian countries, her work answers questions about how autocrats perpetuate their rule; how political censorship, propaganda, and information manipulation work in the digital age; and how preferences and behaviors are shaped as a result. Her papers have appeared in peer-reviewed publications such as Science, the American Political Science Review, the American Journal of Political Science, and Journal of Politics. She graduated from Princeton University, summa cum laude, and received her Ph.D. from Harvard University’s Department of Government.

Trina Reynolds-Tyler, Data Director, Invisible Institute, an abolitionist, and a native of south side Chicago. She leads Beneath the Surface, a project employing machine learning to identify gender based violence at the hands of Chicago police. Trina works to document how communities unable to depend on the police are creating safety and accountability outside of the carceral state. As a data scientist, she centers the practice of narrative justice in her inquiries.

Trina organizes with Not Me We, and is serving on a University of Chicago council attempting to measure the institution’s impact on the south side population. She developed the skills to use data science for real world problems as a Pozen Center for Human Rights intern with the Human Rights Data Analysis Group (HRDAG), and was a Pearson Institute Fellow. Trina holds a masters degree in public policy from the University of Chicago.

Read More

Real World Successes and Lessons Learned in Deploying ML Models | Wendy Ku

Thumbnail for Real World Successes and Lessons Learned in Deploying ML Models | Wendy Ku

Wendy Ku, Computer Vision Tech Lead, Senior Data Scientist, Getty Images presents the Technical Vision Talk “ML through a wide-angle lens: Real World Successes and Lessons Learned in Deploying ML Models”. Image search has been a well-established problem area across industries, with a wide range of applications including e-commerce, social media and search engines. As we collectively create and consume more visual content, image search capabilities are becoming increasingly more important. In recent years, multiple large-scale image-text models have been released, reinventing the performance of image-text understanding tasks. However, applying these generalized models out-of-the-box often results in less than desired performance. In practice, deploying and maintaining an image search system presents a different set of challenges.

Wondering what else is involved in a machine learning solution besides training and deployment? Or how real world model evaluations differ from Kaggle scoreboards? This talk will cover the less discussed journey of bringing language and image-text models to production.

Biography:
Wendy is a Senior Data Scientist at Getty Images, where she develops multilingual and visual-language representation models to improve users’ search experience. She leads Getty Images’ efforts on diagnosing bias and improving fairness in machine learning systems. Prior to joining Getty Images, Wendy was involved in product and operations optimization projects in cybersecurity, consumer finance and restaurant companies. When she’s not working, Wendy enjoys working on her art and running.

Read More

Killing Diseases with Really Big Computers: Building Analysis Tools to Solve Disease| Marisa Torres

Thumbnail for Killing Diseases with Really Big Computers: Building Analysis Tools to Solve Disease| Marisa Torres

Marisa Torres, Bioinformatics Lead, Lawrence Livermore National Lab (LLNL) presents the Technical Vision Talk “Killing Diseases with Really Big Computers: Building Analysis Tools to Solve Disease”. Our team at LLNL has been working on improving bioinformatics tools and models for COVID-19 and for cancer. We think rapid response to a disease outbreak should be a national security priority. We’ve run huge gene simulations for drug discovery and built machine learning models on the results. For COVID-19, we‚Äôve aimed to design viral inhibitors with no adverse health reactions, and we‚Äôve successfully released most of this work publicly as a searchable and usable tool. We’re now scaling up our work from a few COVID-19 genes, up to tens of thousands of human genes for the American Heart Association. When designing therapeutics, we use every informatic and statistical tool available. We create new machine learning strategies for scaling virtual screens for the human or microbe. We use 3D models and docking poses of protein structures to make predictions. We screen for safety properties, because we don‚Äôt want detrimental interactions.

Biography:
Marisa designs, implements, and integrates data in relational databases, provides software engineering support for DNA signature discovery, and responds to internal and external customer requests for signature analysis. She has provided signature development and bioinformatics analysis for the Environmental Protection Agency and the National Bio-Forensics Analysis Center. In 2000, Marisa designed DNA signatures that were promoted for use in the BASIS program. She has taken the lead on signature erosion checking, which during the most recent DHS proposal cycle was recognized as important for continued reliable detection of pathogens, and she supports public health and biosecurity customers combining her versatile skill set of software engineering and biology background.

Read More

Harnessing AI and Data Science for Health Equity within Communities | Irene Dankwa-Mullan

Thumbnail for Harnessing AI and Data Science for Health Equity within Communities | Irene Dankwa-Mullan

Irene Dankwa-Mullan, Chief Health Equity Officer at Merative & Affiliate Professor at GWU Milken Institute School of Public Health presents Technical Vision Talk “Harnessing AI and Data Science for Health Equity within Communities”. A robust data science agenda can help support communities in their interventions to achieve health equity, and measure progress toward ensuring quality and optimal health for all. However, there are challenges for data science in promoting community-engaged interventions addressing health disparities. This talk will provide a background on the role of data science in promoting a vision for a productive health AI ecosystem of research, technology development and implementation to improve community health and advance health equity.

Biography:
Irene Dankwa-Mullan is an affiliate professor in the Department of Health Policy and Management, Milken Institute School of Public Health at The George Washington University. She is a nationally recognized industry physician, scientist, thought leader, author with over 20 years of diverse leadership experience in primary care, healthcare, businesses, and the community. She also serves in a strategic advisory role for various health technology start-ups. Irene most recently served as Chief Health Equity Officer at IBM Watson Health and provided leadership for the data and evidence strategy for implementation of technology and clinical decision-support solutions. She was previously Deputy Director for extramural scientific programs at the National Institute. Irene has published widely on health equity, community and public health and building AI technologies for social good.

Read More

Keynote: A Sparkle in the Dark The Outlandish Quest for Dark Matter | Maria Elena Monzani

Thumbnail for Keynote: A Sparkle in the Dark The Outlandish Quest for Dark Matter | Maria Elena Monzani

Maria Elena Monzani, Lead Scientist, SLAC National Accelerator Laboratory and Kavli Institute for Particle Astrophysics and Cosmology, Stanford University leads the Keynote Address “A Sparkle in the Dark: The Outlandish Quest for Dark Matter.” The nature and origin of dark matter are among the most compelling mysteries of contemporary science. There is strong evidence for dark matter from its role in shaping the galaxies and galaxy clusters that we observe in the universe. Still, for over three decades, physicists have been trying to detect the dark matter particles themselves with little success.

This talk will describe the leading effort in that search, the LUX-ZEPLIN (LZ) detector. LZ is an instrument that is superlative in many ways. It consists of 10 tons of liquified xenon gas, maintained at almost atomic purity and stored in a refrigerated titanium cylinder a mile underground in a former gold mine in Lead, South Dakota.

During its science run, LZ is projected to accumulate a massive dataset, consisting of many petabytes of data and recording several billions of particle interactions, only a handful of which might be produced by potential dark matter candidates (if nature cooperates). Identifying the dark matter signals in this amassment of data represents an extreme “needle in a haystack” problem, and requires leveraging advanced detector design and stat-of-the art machine learning algorithms. The talk will present some of the challenges in constructing this large-scale underground experiment and interpreting its data, along with the prospects LZ presents for finally discovering the dark matter particle, and recently-released results from its initial search for new physics.

Biography:
Maria Elena Monzani is a dark matter data wrangler. Her research field is Astroparticle physics, which focuses on topics at the intersection between particle physics and astrophysics/cosmology, using the tools of data intensive science. She received a dual PhD from University of Milano and University of Paris 7, performing research with the Borexino experiment that measured neutrinos produced by the Sun. She then held a postdoctoral position at Columbia University before joining SLAC in 2007 to work on the Fermi Gamma-ray Space Telescope. Today, Monzani is a lead scientist at SLAC and a senior member of the Kavli Institute for Particle Astrophysics and Cosmology at Stanford. She leads the software computing effort for the LZ Dark Matter Experiment and the science operations team for the Fermi satellite. She is also an Adjunct Scholar at the Vatican Observatory, and enjoys discussing the shared philosophical foundations of the scientific and religious endeavors.

Read More

Preparing for a career in DS | Montse Cordero, Adriana Velez Thames, Elaine Yi Xu, Sanne Smith

Thumbnail for Preparing for a career in DS | Montse Cordero

Panel: Preparing for a career in data science

Moderator:
Sanne Smith, Director of Master’s Program, Education Data Science, Stanford University, is the director of the master‚Äôs program Education Data Science and a lecturer at the Stanford Graduate School of Education. She teaches courses that introduce students to coding, data wrangling and visualization, various statistical methods, and the interpretation of quantitative research. She studies social networks and thriving, diverse contexts.

Panelists:
Montse Cordero, Mathematics Designer, youcubed, is a mathematics designer for youcubed, a center at Stanford University that aims to inspire, educate and empower teachers of mathematics, transforming the latest research on maths learning into accessible and practical forms. He is a co-author and professional development provider for youcubed’s Explorations in Data Science high school curriculum and has participated in multiple national summits for the advancement of data science in K-12 education (Data Science 4 Everyone Coalition, National Academies of Sciences Engineering and Medicine). Montse is also a mathematician interested in work at the intersection of combinatorics, algebra, and geometry. In all facets of their work, Montse endeavors to change the ways our culture thinks and talks about mathematics.

Adriana Velez Thames, Geophysicist-Data Scientist, Springboard Alumni. Adriana recently completed a transition to Data Science after many years in the Oil and Gas industry as a Senior Geophysicist. Her primary focus was in seismic data processing for imaging the Earth’s subsurface to guide energy exploration projects. From 2012-2019, she worked at TGS where her responsibilities included QC of deliverables, testing of internal software updates, and conducting test projects and benchmarks. This involved extensive analysis and manipulation of terabyte-sized digital subsurface data using sophisticated algorithms. She believes that data-driven decisions are the best way to solve problems in any industry. Having been born in Colombia and attained post-graduate degrees in Russia, she is fluent in English, Spanish, and has working proficiency in Russian. Currently she continues educational studies in data science and spatial data science.

Elaine Yi Xu, Staff Business Data Analyst, Intuit, is a passionate data analytics and data science practitioner, putting her undergrad degree in Statistics and MS in Info Sys and DS into everyday business decision-making. She’s been working in-house in web analytics, product analytics, and marketing analytics for multiple industries, including retail (lululemon), automotive (Kelley Blue Book), and most recently at Intuit, the global technology platform. She specializes in the measurement of Go-To-Market marketing strategies, assessment of marketing campaign effectiveness, optimization of user experience, and A/B Testing. She thrives to be the connective tissue between business, analytics, engineering, and data science, combining all facets of science to help arrive at the most optimal business decisions.

Read More

Bringing Motion Diffusion Models to Immersive Entertainment | Jhanvi Shriram and Ketaki Shriram

Thumbnail for Bringing Motion Diffusion Models to Immersive Entertainment | Jhanvi Shriram and Ketaki Shriram

Jhanvi Shriram, Co-Founder and CEO, Krikey alongside Ketaki Shriram
Co-Founder and CTO, Krikey present Technical Vision Talk “Bringing Motion Diffusion Models to Immersive Entertainment”. Most generative models thus far have focused on utilizing LLMs for consumer products. The introduction of motion diffusion models to this space provides a novel avenue to engage consumers, especially in the field of entertainment. This talk will cover a text-to-animation motion diffusion model. This model generates animations in less than 5 minutes. These animations can be applied to any 3D file and utilized with any 3D software. Practical applications include optimizing production pipelines for gaming, film, and immersive learning. We will also cover the implications for these industries as they adopt new generative tools in production workflows. To learn more about our tool and try it for yourself, please visit krikey.ai.

Jhanvi Biography:
Jhanvi is currently the CEO of Krikey, an AI gaming tools service that she co-founded with her sister. Krikey recently closed their Series A round, led by Reliance Jio, India’s biggest telecom operator. Prior to Krikey, Jhanvi worked at YouTube as a Production Strategist on operations and creator community programs, which sparked her interest in working with content creators. She also worked at JauntVR and Participant Media. In 2014, Jhanvi and her sister, Ketaki Shriram, co-produced a feature film titled, ‚ÄúTrue Son,‚Äù which followed a 22-year old‚Äôs political campaign in Stockton, CA. The film premiered at the 2014 Tribeca Film Festival and was acquired by FusionTV/Univision. Jhanvi holds a BA (Political Science and African Studies) and MBA from Stanford University, and a MFA (Producing) from USC. You can learn more here: krikey.ai.

Ketaki Shriram Biography:
Dr. Shriram is a scientist, film producer, and wildlife photographer interested in the impact of immersive worlds on human behavior. She is currently the Chief Technology Officer at Krikey, an AI gaming tools service that she co-founded with her sister. Krikey recently closed their Series A round, led by Reliance Jio, India’s biggest telecom operator. Dr. Shriram received her BA, MA, and PhD at the Stanford Virtual Human Interaction Lab. She previously worked at Google [x] and at Meta‚Äôs Reality Labs. Dr. Shriram was selected for the Forbes 30 Under 30 2020 Class in the Gaming category. You can learn more here: krikey.ai.

Read More

Optimization in the loop machine learning for energy and climate | Priya Donti

Thumbnail for Optimization in the loop machine learning for energy and climate | Priya Donti

Priya Donti, Co-Founder and Executive Director, Climate Change AI presents Technical Vision Talk “Optimization-in-the-loop machine learning for energy and climate”. Addressing climate change will require concerted action across society, including the development of innovative technologies. While machine learning (ML) methods have the potential to play an important role, these methods often struggle to contend with the physics, hard constraints, and complex decision-making processes that are inherent to many climate and energy problems. To address these limitations, I present the framework of ‚Äúoptimization-in-the-loop ML,‚Äù and show how it can enable the design of ML models that explicitly capture relevant constraints and decision-making processes. For instance, this framework can be used to design learning-based controllers that provably enforce the stability criteria or operational constraints associated with the systems in which they operate. It can also enable the design of task-based learning procedures that are cognizant of the downstream decision-making processes for which a model‚Äôs outputs will be used. By significantly improving performance and preventing critical failures, such techniques can unlock the potential of ML for operating low-carbon power grids, improving energy efficiency in buildings, and addressing other high-impact problems of relevance to climate action.

Biography:
Priya Donti is the Co-founder and Executive Director of Climate Change AI, a global non-profit initiative to catalyze impactful work at the intersection of climate change and machine learning, which she is currently running through the Cornell Tech Runway Startup Postdoc Program. She will also join MIT EECS as an Assistant Professor in Fall 2023. Her research focuses on developing physics-informed machine learning methods for forecasting, optimization, and control in high-renewables power grids. Priya received her Ph.D. in Computer Science and Public Policy from Carnegie Mellon University, and is a recipient of the MIT Technology Review’s 2021 “35 Innovators Under 35” award, the ACM SIGEnergy Doctoral Dissertation Award, the Siebel Scholarship, the U.S. Department of Energy Computational Science Graduate Fellowship, and best paper awards at ICML (honorable mention), ACM e-Energy (runner-up), PECI, the Duke Energy Data Analytics Symposium, and the NeurIPS workshop on AI for Social Good.

Read More

Data Democratization Panel | Priya Donti, Julia Stewart Lowndes, Nikki Tulley, Michela Taufer

Four women sitting on stage during a conference panel.

Panel: Data democratization: a powerful means for creating sustainable and equitable communities

Moderator:
Michela Taufer is an ACM Distinguished Scientist and holds the Dongarra Professorship in High-Performance Computing in the Department of Electrical Engineering and Computer Science at the University of Tennessee Knoxville (UTK). She earned her undergraduate degree (Laurea) in Computer Engineering from the University of Padova (Italy) and her doctoral degree (Ph.D.) in Computer Science from the Swiss Federal Institute of Technology or ETH (Switzerland). From 2003 to 2004, she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry.

Michela is well-known for her work in establishing trustworthy scientific discoveries on heterogeneous cyberinfrastructures. Throughout her career, she has put the principle of trustworthiness into practice. She has promoted scientific computing for the general population through volunteer computing, defined accurate scientific applications on accelerators and GPUs, and developed in situ analysis methods for scientific workflows on converging HPC and Cloud platforms. She has been serving as the principal investigator of several NSF collaborative projects. She has significant experience in mentoring a diverse population of students on interdisciplinary research and establishing long-lasting workforce development.

Panelists:
Priya Donti, Co-Founder and Executive Director, Climate Change AI (CCAI). Climate Change AI, a global non-profit initiative to catalyze impactful work at the intersection of climate change and machine learning, which she is currently running through the Cornell Tech Runway Startup Postdoc Program. She will also join MIT EECS as an Assistant Professor in Fall 2023. Her research focuses on developing physics-informed machine learning methods for forecasting, optimization, and control in high-renewables power grids. Priya received her Ph.D. in Computer Science and Public Policy from Carnegie Mellon University, and is a recipient of the MIT Technology Review’s 2021 “35 Innovators Under 35” award, the ACM SIGEnergy Doctoral Dissertation Award, the Siebel Scholarship, the U.S. Department of Energy Computational Science Graduate Fellowship, and best paper awards at ICML (honorable mention), ACM e-Energy (runner-up), PECI, the Duke Energy Data Analytics Symposium, and the NeurIPS workshop on AI for Social Good.

Julia Stewart Lowndes, Director, Openscapes is a marine ecologist working at the intersection of actionable environmental science, data science, and open science. Julia’s main focus is mentoring teams to develop technical and leadership mindsets and skills for data-intensive research, grounded in climate solutions, inclusion, and kindness. She founded Openscapes in 2018 as a Mozilla Fellow and Senior Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California Santa Barbara (UCSB), having earned her PhD from Stanford University in 2012 studying drivers and impacts of Humboldt squid in a changing climate.

Nikki Tulley, Doctoral Student, University of Arizona; Indigenous Researcher, NASA Ames Research Center. Nikki is from the Navajo Nation (NN), an Indigenous Nation located in the United States. The work and research Nikki does is influenced by her upbringing. Born and raised on the NN Reservation, she has seen firsthand the impacts of water access and water quality challenges rural communities face. The NN has wicked water problems related to anthropogenic activities and climate change. Now, as an Indigenous Scientist, she recognizes that opportunity to braid traditional ecological knowledge and western science together to address water challenges. Taking a step beyond braiding the two knowledge systems together she has begun to use Earth Observation satellite imagery to tell a story of the changes being monitored from space and those observed from the landscapes. Nikki’s passion is empowering communities through data access and capacity building. She believes that community involvement in research can significantly aid in seeking solutions for resilient and sustainable communities.

Read More

Keynote: Why ‘users first’ is important for a good monetization | Gayatree Ganu

Thumbnail for Keynote: Why “users first” is important for a good monetization | Gayatree Ganu

Gayatree Ganu, Vice President, Data Science, Facebook presents Keynote Address “Put the horse before the cart: Why ‚Äúusers first‚Äù is important for a good monetization strategy”.
Meta has over 3B users on our platform engaging with our different products and services. Meta also makes over $100B annually through advertising. There is a strong connection between user engagement on our platform and how we build a sustainable business. Our mission statement for ads at Meta is “Make meaningful connections between people and businesses”. Connecting users to monetization or ads is an important part of Meta‚Äôs long term success. In this talk I will describe the frameworks to connect user engagement and revenue potential, allowing us to focus our products and services. We will also discuss how high quality and relevant ads can actually bring more engagement to our platform, making it a win-win situation. We will cover a lot of fun and challenging data science topics from weighted metrics, producer-consumer experimental setups, counterfactuals, incrementality, all at an extraordinary scale of 3B users and $100B!

Biography:
Gayatree Ganu leads the Engagement Ecosystem and Monetization Data Science teams at Facebook. The Engagement Ecosystem team’s mission is to inform Facebook’s strategy through better understanding and forecasting the health of the app. The Monetization team’s mission is to give everyone a voice and to champion economic prosperity. Gayatree leads a Data Science team with a diverse portfolio spanning modeling and machine learning, product optimizations of user experience, and strategic innovations. Gayatree has a PhD in Computer Science in Search and Recommendations from Rutgers University. She joined Facebook (now Meta) in 2013 and has worked on several problems and product areas through the last 10 years.

Gayatree believes deeply in fairness and equality in opportunity and is passionate about bringing more representation and providing sustained support to women and under-represented minorities in Tech. She leads recruiting for all Data Science roles at Meta, and is helping build an organization that values diverse perspectives as well as strong technical and analytical skills.

Read More

Uncovering Online Censorship and Propaganda in China | Jennifer Pan

Thumbnail for Uncovering Online Censorship and Propaganda in China | Jennifer Pan

Jennifer Pan, Professor of Communication and FSI Senior Fellow, Stanford University presents the Technical Vision Talk “Uncovering Online Censorship and Propaganda in China.” Although digital communication technologies have revolutionized the way information can flow across borders and national boundaries, governments all over the world impose restrictions on access to digital information. Nowhere is the effort to control and manipulate the flow of digital information more sophisticated, more extensive and more sustained than in China. Controlling China’s digital ecosystem involves a huge organizational effort that is obviously designed to suppress information, but this effort paradoxically reveals the goals, intentions, actions of Chinese regime when its footprints are analyzed at scale..

Biography:
Jennifer Pan is a Professor of Communication and Senior Fellow at the Freeman Spogli Institute at Stanford University. Her research resides at the intersection of political communication and authoritarian politics. Using large-scale datasets on political activity in China and other authoritarian countries, her work answers questions about how autocrats perpetuate their rule; how political censorship, propaganda, and information manipulation work in the digital age; and how preferences and behaviors are shaped as a result. Her papers have appeared in peer-reviewed publications such as Science, the American Political Science Review, the American Journal of Political Science, and Journal of Politics. She graduated from Princeton University, summa cum laude, and received her Ph.D. from Harvard University’s Department of Government.

Read More

Openscapes Supporting Kinder Science for Future Us | Julia Stewart Lowndes

Thumbnail for Openscapes Supporting Kinder Science for Future Us | Julia Stewart Lowndes

Julia Stewart Lowndes, Director, Openscapes presents Technical Vision Talk “Openscapes: Supporting Kinder Science for Future Us”. At Openscapes, we believe open science can accelerate interoperable, data-driven solutions and increase diversity, equity, inclusion, and belonging in research and beyond. Our main activity is mentoring environmental and Earth science teams in open science, and connecting and elevating these researchers both through tech like R, Python, Quarto, and JupyterHubs and communities like RLadies, Black Women in Ecology Evolution, and Marine Science, Ladies of Landsat, and NASA. We will share stories and approaches about open science as a daily practice ‚Äì better science for future us ‚Äì and welcome you to join the movement.

Biography:
Julia Stewart Lowndes, PhD, is a marine ecologist working at the intersection of actionable environmental science, data science, and open science. Julia’s main focus is mentoring teams to develop technical and leadership mindsets and skills for data-intensive research, grounded in climate solutions, inclusion, and kindness. She founded Openscapes in 2018 as a Mozilla Fellow and Senior Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California Santa Barbara (UCSB), having earned her PhD from Stanford University in 2012 studying drivers and impacts of Humboldt squid in a changing climate.

Read More

Productizing Data for Humanitarian Aid Applications | Kathryn Hymes

Thumbnail for Productizing Data for Humanitarian Aid Applications | Kathryn Hymes

Kathryn Hymes, Lead of Product and Innovation, Médecins Sans Frontières-USA presents the Technical Vision Talk “Productizing Data for Humanitarian Aid Applications”. In humanitarian efforts focused on delivering medical interventions in low-resource settings, there are many opportunities for data science to improve decision-making and produce valuable insights, both on the ground and in long-term operations. This talk will focus on product approaches to data that support insights for long-term engagement with some of the work of M√©decins Sans Fronti√®res, a global aid organization focused on public health.

Biography:
Kathryn Hymes is a technologist, computational linguist, and game designer. She currently serves as the lead of product and innovation at Médecins Sans Frontières-USA. She leads a humanitarian tech team building new products rooted in modern engineering practice to aid in MSF’s global work. Previously she was the head of international product expansion at Slack and an advisor at Airtable. She is a fellow at the Berkman Klein Center for Internet and Society with a focus on how playful design can contribute to a better digital life. Kathryn is a co-founder of Thorny Games (https://thornygames.com/), an award-winning design studio that regularly collaborates with universities, nonprofits and museums to apply playful design to hard problems. Her writing has appeared in The Atlantic, Wired, and The New York Times. Kathryn holds an MS in Computational and Mathematical Engineering from Stanford, an MA in Linguistics from Stanford, and a BS in Math from UCLA.

Read More

Principles of Good Data Viz | Jenn Schilling

Thumbnail for Principles of Good Data Viz | Jenn Schilling

What key principles of design and data viz do you need to know to create effective and clear graphs? This talk will cover preattentive attributes, Gestalt principles, and principles of color use. It will provide the key concepts from design and data viz research that you need to know to communicate data effectively. The talk will include examples to demonstrate applying the concepts and comparing data viz effectiveness.

This workshop was conducted by Jenn Schilling, Founder of Schilling Data Studio.

Read More

Introduction to Linear Regression | Laura Lyman

Thumbnail for Introduction to Linear Regression | Laura Lyman

Linear regression is a fundamental tool in statistics and data science for modeling the relationship between different parameters. It can be used for prediction, forecasting and error reduction by fitting a predictive model between a response variable and a collection of explanatory variables based on an observed data set. Through linear regression analysis, we can quantify the strength of the linear relationship between the response and different explanatory variables, and we can identify parameters that may contain redundant information.

This workshop introduces the basics of simple and multiple linear regression. We will present both mathematical theory and applications in the context of real data sets — ranging from survey results collected by the US National Center for Health Statistics (NHANES), to real estate listings in Sacramento, CA. After the talk, the R code used will be provided, so attendees can revisit examples of how to apply this foundational modeling method.

This workshop was conducted by Laura Lyman, Instructor of Mathematics, Statistics, and Computer Science (MSCS) at Macalester College

Read More

Introduction to Precision Medicine: From Statistics to Society

Thumbnail for Introduction to Precision Medicine: From Statistics to Society

Precision medicine aims to learn from data how to match the right treatment to the right person at the right time. One common goal in precision medicine is the estimation of optimal dynamic treatment regimens (DTRs), sequences of decision rules that recommend treatments to patients in a way that, if followed, would optimize outcomes for each individual and overall, in the targeted population. In this presentation, we will describe how the precision medicine framework formalizes sequential clinical decision-making and briefly review a subset of the most popular strategies for learning optimal dynamic treatment regimes. We will then invite the workshop group to ideate and discuss the critical opportunities and challenges for the translation of DTRs to clinical and community care, the role of stakeholder engagement and cross-disciplinary collaboration, and considerations for evaluating DTRs in practice.

This workshop was conducted by Nikki Freeman and Anna Kahkoska from the University of North Carolina at Chapel Hill.

Slides and resources used in this workshop: https://bit.ly/precision_medicine_slides

Read More

Earth observation & machine learning for agroecological applications

Thumbnail for Earth observation & machine learning for agroecological applications

The usage of machine learning (ML) has been growing exponentially. Its significant power in generalization and a large amount of available data make machine learning indispensable. In parallel, humanity is focused more than ever on space exploration, developing cutting-edge Earth Observation (EO) technology. Have you ever wondered how these two can be combined?

One domain that can be greatly benefited from this coalition is agriculture. With climate change and population rise, maintaining natural ecosystems while enhancing agricultural productivity and supporting farmers is of primary importance. In this sense, ML and EO technologies are the key enablers in developing actionable recommendations for farmers and policymakers to achieve resilient agriculture. In this workshop, we discuss the usage of ML for EO-related applications, focusing on agriculture and ecosystem services. We will present two applications of how ML bridges the gap between scientific knowledge and actionable advice for farmers and policymakers. The first application will consist of a predictive ML model related to the occurrence of pests in cotton fields. The second application will showcase the combination of a geographical model and an ML algorithm to identify the local-specific contribution of agricultural management to ecosystem services. For both applications, there will be live demonstrations using Python and R. By the end of this workshop, we hope you will be acquainted with establishing the link between machine learning, earth observation, and sustainable agriculture. Wishing you a fruitful exploration of this field having provided you with the necessary tools to start your journey!

This workshop was conducted by Roxanne Suzette Lorilla and Ornela Nanushi from the National Observatory of Athens.

Slides and materials used in this workshop: https://bit.ly/agroecological_applica…

Read More

Catching Fire: Autonomous Drones to Detect and Track Wildfires | Mathworks

Thumbnail for Catching Fire: Autonomous Drones to Detect and Track Wildfires | Mathworks

Can drones help prevent natural disasters? Wildfires have become highly destructive in recent years, ravaging the environment and human lives. In this hands-on workshop, build a wildfire detection system with autonomous drones. Explore cutting-edge methods to detect fire outbreaks and predict their direction of spread. Gain skills in simulation and AI that you can apply to life-saving problems.

This workshop was conducted by Shweta Singh, Sheeba Ransing and Arushi Kapurwan from Mathworks.

Resources used for this workshop can be accessed on Github: https://bit.ly/wids_catching_fire
Slides for this workshop: https://bit.ly/3DvenVR

Read More

Linear Least Squares | Abeynaya Gnanasekaran

Thumbnail for Linear Least Squares | Abeynaya Gnanasekaran

The least squares method is one of the most widely used techniques in data science and is used to fit a linear model to data. In this workshop, we will study least squares problems from a linear algebraic perspective and discuss the techniques to solve them.

This workshop assumes that you have a basic understanding of linear algebra including concepts such as matrices, rank, range space, orthogonality, and matrix decompositions (Cholesky, QR, SVD).

This workshop was conducted by Abeynaya Gnanasekaran, a Senior Research Engineer at Raytheon Technologies Research Center.

Read More

Low-Code AI: Making AI accessible to everyone | Mathworks

Thumbnail for Low-Code AI: Making AI accessible to everyone | Mathworks

Learn how you can apply AI in your field without extensive knowledge in programming. This hands-on session includes a quick recap on the fundamentals of AI and two exercises where you will learn how to classify human activities using MATLAB® interactive tools and apps:

– Accessing and preprocessing data acquired from a mobile device
– Classifying the labeled data using two apps: The Classification Learner app and the Deep Network Designer app

At the end of the workshop, you will be able to design and train different machine learning and deep learning models without extensive programming knowledge. In addition, you will also learn how to automatically generate code from the interactive workflow. This will not only help you to reuse the models without manually going through all the steps but also to learn programming or advance your coding skills.

This workshop was conducted by Gaby Arellano Bello and Neha Sardesai, Senior Application Engineers in Education at Mathworks.

Access resources for this workshop: https://bit.ly/low_code_ai_resources

Read More

Introduction to Explainable AI | Supreet Kaur

Thumbnail for Introduction to Explainable AI | Supreet Kaur

Responsible AI is reaching new heights these days. Companies have started exploring Explainable AI as a means to explain the results better to senior leadership and increase their trust in AI Algorithms. This workshop will entail an overview of this area, importance of it in today’s era, and some of the practical techniques that you can use to implement it. As a bonus, it will also cover some industry use cases and limitations of these techniques. Join me in unboxing this black box!

This workshop was conducted by Supreet Kaur, Assistant Vice President at Morgan Stanley.

Slides for this workshop: https://bit.ly/explainableai_slides

Read More

Baby steps towards building your first ML model | Manogna Mantripragada

Thumbnail for Baby steps towards building your first ML model | Manogna Mantripragada

This workshop aims to enable young data scientists to start their first ML project. It would help them understand the process from gathering data to building their ML model. Building an ML model is easy, but building it the correct way is a lot harder than known.

This workshop was conducted by Manogna Mantripragada, Data Scientist at Greenlink Analytics.

Access resources for this workshop: https://bit.ly/energy_burden_analysis…

Read More

Exploratory data analysis using personal data from Strava and Apple Watch | Deepnote

Thumbnail for Exploratory data analysis using personal data from Strava and Apple Watch | Deepnote

During the workshop, we show a simple exploratory data analysis using Deepnote. We will focus on personal data from Camino de Santiago pilgrimage which we retrieved from our Strava API and show you how to get it from your own device. Using this data we explain a theory about Exploratory Data Analysis and show some use cases.

This workshop was conducted by Tereza Vaňková and Alleanna Clark of Deepnote.

Resources used in this workshop:
– https://bit.ly/deepnote_notebook
– https://bit.ly/deepnote_slides

Read More

Exploring Hidden Markov Models | Julia Christina Costacurta

Thumbnail for Exploring Hidden Markov Models | Julia Christina Costacurta

Exploring Hidden Markov Models | Julia Christina Costacurta

Hidden Markov Models (HMMs) are used to describe and analyze sequential data in a wide range of fields, including handwriting recognition, protein folding, and computational finance. In this workshop, we will cover the basics of how HMMs are defined, why we might want to use one, and how to implement an HMM in Python. This workshop might be of particular interest to attendees from May 25’s “Intro to Markov Chains and Bayesian Inference” session. Introductory background in probability, statistics, and linear algebra is assumed.

This workshop was conducted by Julia Christina Costacurta, PhD Candidate at Stanford University

Useful resources for this workshop:
– https://bit.ly/hmm_presentation
– https://bit.ly/hmm_tutorial_notebook

Read More

Alternative approaches to A/B Experiments – 3 Causal Impact Approaches | Jennifer Vlasiu

Thumbnail for Alternative approaches to A/B Experiments - 3 Causal Impact Approaches | Jennifer Vlasiu

Make answering ‘what if’ analysis questions a whole lot easier by learning about state-of-the-art, end-to-end applied frameworks for causal inference.

We will cover:
Microsoft’s “Do Why” Package Causal Impact in Python – DoWhy | An end-to-end library for causal inference — DoWhy | An end-to-end library for causal inference documentation (microsoft.github.io)
Bayesian Causal Impact in R
MLE Causal Impact in Python
Bonus: AA Testing, when to use and why it matters
We will apply these models in the context of understanding the impact of a marketing rewards campaign, as well as understand the impact from a product/feature upgrade

This workshop was conducted by Jennifer Vlasiu, Data Science & Big Data Instructor at York University

Useful resources for this workshop:
– https://bit.ly/github_casual_impact

Read More

Open-sourced Propensity Model Package: From Modeling to Activation (Workshop #2) | Google

Thumbnail for Open-sourced Propensity Model Package: From Modeling to Activation (Workshop #2) | Google

A propensity model attempts to estimate the propensity (probability) of a behavior (e.g., conversion, churn, purchase, etc.) happening during a well-defined time period into the future based on historical data. It is a widely used technique by organizations or marketing teams for providing targeted messages, products or services to customers. This workshop shares an open-sourced package developed by Google, for building an end-to-end Propensity Modeling solution using datasets like GA360, Firebase or CRM and using the propensity predictions to design, activate and measure the impact of a media campaign. The package has enabled companies from e-commerce, retail, gaming, CPG and other industries to make accelerated data-driven marketing decisions.

This workshop was conducted by Lingling Xu, Bingjie Xu, Shalini Pochineni and Xi Li, data scientists on the Google APAC team.

Useful resources for this workshop:
– Workshop #1: https://youtu.be/rQhQca8RCuM
– https://bit.ly/propensity_modeling_pa…
– https://bit.ly/bigquery_export_schema
– https://bit.ly/ga_sample_dataset
– https://bit.ly/ml_windowing_pipeline

Read More

Predicting customer choice: A case study on integrating AI within a discrete choice model | Kathryn

Thumbnail for Predicting customer choice: A case study on integrating AI within a discrete choice model | Kathryn

Neural networks have been widely celebrated for their power to solve difficult problems across a number of domains. We explore an approach for leveraging this technology within a statistical model of customer choice. Conjoint-based choice models are used to support many high-value decisions at GM. In particular, we test whether using a neural network to model customer utility enables us to better capture non-compensatory behavior (i.e., decision rules where customers only consider products that meet acceptable criteria) in the context of conjoint tasks. We find the neural network can improve hold-out conjoint prediction accuracy for synthetic respondents exhibiting non-compensatory behavior only when trained on very large conjoint data sets. Given the limited amount of training data (conjoint responses) available in practice, a mixed logit choice model with a traditional linear utility function outperforms the choice model with the embedded neural network.

This workshop was conducted by Kathryn Schumacher, Staff Researcher in the Advanced Analytics Center of Expertise within General Motor’s Chief Data and Analytics Office.

Read More

Open-sourced Propensity Model Package: Accelerating Data-Driven Decisions (Workshop #1) | Google

Thumbnail for Open-sourced Propensity Model Package: Accelerating Data-Driven Decisions (Workshop #1) | Google

A propensity model attempts to estimate the propensity (probability) of a behavior (e.g., conversion, churn, purchase, etc.) happening during a well-defined time period into the future based on historical data. It is a widely used technique by organizations or marketing teams for providing targeted messages, products or services to customers. This workshop shares an open-sourced package developed by Google, for building an end-to-end Propensity Modeling solution using datasets like GA360, Firebase or CRM and using the propensity predictions to design, activate and measure the impact of a media campaign. The package has enabled companies from e-commerce, retail, gaming, CPG and other industries to make accelerated data-driven marketing decisions.

This workshop was conducted by Lingling Xu, Bingjie Xu, Shalini Pochineni and Xi Li, data scientists on the Google APAC team.

Useful resources for this workshop:
– https://bit.ly/github_propensity_mode…
– https://bit.ly/bigquery_export_schema
– https://bit.ly/ga_sample_dataset
– https://bit.ly/ml_windowing_pipeline

Read More

Basic to Intermediate Level SQL | Sreelaxmi Chakkadath

Thumbnail for Basic to Intermediate Level SQL | Sreelaxmi Chakkadath

The workshop would focus on the basic to intermediate levels of SQL. We will start with querying a database, using filters to clean the data. Joining different tables. Aggregate functions and use of ‘CASE WHEN’ for better query performances. Subqueries and Common Table Expressions (CTEs) and a comparison between them. Use of window functions. Lead and lag functions and the scenarios when they can be used. Pivot tables and when not to use them!

This workshop was conducted by Sreelaxmi Chakkadath, Data Science Master’s student at Indiana University Bloomington.

Useful resources for this workshop:
– PostgreSQL install link: https://www.postgresql.org/
– https://bit.ly/sql_workshop_script
– https://bit.ly/sql_workshop_codes
– https://bit.ly/sql_ppt_slides

Read More

How can we make sense of the unseen world? Using AI, sensors & IoT for scene exploration | Mathworks

Thumbnail for How can we make sense of the unseen world? Using AI

Have you wondered about being able to detect buried objects? Do you think your mobile device can be used to detect these buried objects? Metal is all around us and is often not seen but buried. The detection of metal is in many places on Earth. In fact the detection of metal is connected to a variety of applications such as: to provide insight regarding land use, detection of historic artifacts, determine the presence of various devices, and more.

In our workshop, we will explore using your own mobile device as a metal detector in your local environment. During this workshop we will provide an overview of the basics of sensors, AI, and IoT which will be required for building a prototype of our application. We’ll do hands-on exercises where you will acquire data from sensors, obtain summary statistics on the acquired data, and train a human activity classifier to understand what was done while data was being collected. We will also have an engaged discussion regarding topics to be mindful of with respect to this application such as considerations regarding the collection and usage of location data. You will leave motivated and ready to use sensors, AI, and IoT in your own projects via MATLAB!

Workshop presenters:
– Louvere Walker-Hannon, Application Engineering Senior Team Lead, MathWorks
– Loren Shure, Consulting Application Engineer, MathWorks
– Sarah Mohamed, Senior Software Engineer, MathWorks
– Shruti Karulkar, Quality Engineering Manager, MathWorks

Read More

Data Storytelling for Data Scientists | Hana M.K

Thumbnail for Data Storytelling for Data Scientists | Hana M.K

Workshop presented by Hana M.K., Data Storytelling and Presentation Instructor Host of “The Art of Communicating Data” show.

As humans, we enjoy stories. But as data practitioners, we sometimes forget that we need a compelling data story to accompany our work when sharing with others. In this workshop you’ll learn why it’s necessary for data scientists to also be data storytellers and how to craft a data story.

Read More

A Turing Test for Chest Radiology AI | Tanveer Syeda-Mahmood | IBM | WiDS 2022

Tanveer Syeda-Mahmood, IBM Fellow, IBM Research Center, presents at Technical Vision Talk at the WiDS Worldwide conference.

Chest radiographs are the most common imaging exams in hospitals and clinics, comprising 60% of x-rays in the US. They are also one of the hardest to interpret due to their low resolution in reflecting 2D projections of 3D volumes, and cognitive biases leading to interpretation errors. AI assistance with automated preliminary reads can expedite clinical workflows, reduce bias and increase diagnostic throughput of radiologists.

Read More

Estimating Undocumented Human Rights Violations in Conflict Settings | Maria Gargiulo | HRDAG

Thumbnail for Estimating Undocumented Human Rights Violations in Conflict Settings | Maria Gargiulo | HRDAG

Maria Gargiulo, Statistician, Human Rights Data Analyst Group, presents a Technical Vision Talk at the WiDS Worldwide conference.

Collecting data on human rights violations in conflict settings is difficult and dangerous, and the data that results is often incomplete on multiple levels. Some victims� stories are never recorded, and those whose stories are documented may still be missing critical information about the victim, the perpetrator, or other contextual details about the violation. Furthermore, the data that is documented may not be statistically representative of the victim population as a whole. Drawing population-level inferences from this data without correcting for the missingness risks incorrectly answering questions about patterns of violence.

This talk will demonstrate how multiple systems estimation and multiple imputation can be used together to address both levels of missingness in order to draw population level inferences that are statistically valid and include a measure of uncertainty.

Read More

Panel: Data Science in Healthcare: Opportunities & Challenges | WiDS 2022

Thumbnail for Panel: Data Science in Healthcare: Opportunities & Challenges | WiDS 2022

WiDS Worldwide panel: Data Science in Healthcare: Opportunities & Challenges

Moderated by Tina Hernandez Boussard, Associate Professor, Stanford University

Panelists:
– Sylvia K. Plevritis, Chair of Biomedical Data Science, Stanford University
– Tanveer Syeda-Mahmood, IBM Fellow, IBM Research Center
– Jinoos Yazdany, Chief of Rheumatology, Zuckerberg San Francisco General Hospital

Read More

Panel: Algorithms and Data for Equity | WiDS 2022

Thumbnail for Panel: Algorithms and Data for Equity | WiDS 2022

WiDS Worldwide panel: Algorithms and Data for Equity

Moderated by Jenny Suckale, Associate Professor, Stanford University

Panelists:
– Tierra Bills, Assistant Professor of Civil and Environmental Engineering and Public Policy, UCLA
– Jessica Granderson, Director for Building Technology, White House Council on Environmental Quality
– Ling Jin, Research Scientist, Lawrence Berkeley National Laboratory

Read More

Career Panel | WiDS 2022

Thumbnail for Career Panel | WiDS 2022

WiDS 2022 Career Panel

Moderated by Suzanne Weekes, Executive Director, SIAM

Panelists:
– Cecilia Aragon, Professor, Human Centered Design & Engineering, University of Washington
– Sharon Hutchins, VP & Chief of Operations, Intuit AI+Data
– Tamara Kolda, Mathematical Consultant, MathSci.ai
– Maggie Wang, Robotics Software Engineer, Skydio

Read More

Skydio Autonomy: Data-Driven Approaches Towards Real-Time 3D Reconstruction in Drones | Maggie Wang

Thumbnail for Skydio Autonomy: Data-Driven Approaches Towards Real-Time 3D Reconstruction in Drones | Maggie Wang

Maggie Wang, Robotics Software Engineer at Skydio, presents a Technical Vision Talk at the WiDS Worldwide conference.

Skydio is the leading US drone company and the world leader in autonomous flight. Our drones are used for everything from capturing amazing video, to inspecting bridges, to tracking progress on construction sites. Using six 4K navigational cameras, our drones create a 3D model of its surroundings that updates at a rate of over one million data points per second, and runs up to nine deep neural networks to predict into the future.

In this talk, Maggie will discuss how data-driven processes are used in Skydio 3D Scan, a revolutionary adaptive scanning software that enables Skydio drones to autonomously generate 3D models with comprehensive coverage and ultra-high resolution.

Read More

WiDS Educational Outreach 2022

Thumbnail for WiDS Educational Outreach 2022

The WiDS Educational Outreach program aspires to take data science to secondary school students. Through the program we strive to educate and inspire young minds by facilitating relevant courses and paths to consider future careers involving data science, artificial intelligence (AI) and other related areas.

Watch this video to learn of the Education Outreach collaborations with schools around the world from Hyderabad, India to Dar es Salaam, Tanzania, and more.

Read More

WiDS Datathon 2022

Thumbnail for WiDS Datathon 2022

The WiDS Datathon is an initiative to provide a platform for data science enthusiast to learn, apply and hone their data science skills through the social impact challenges presented to them. Participants are trained and mentored by partners, ambassadors, and data enthusiasts.

Watch how the WiDS Datathon has evolved over the past fours years and an insight on the 2022 challenge that was focused on climate change.

Read More

Adapting to Climate Change Bit by Bit w/Planetary Health Informatics & Machine Learning, Sara Khalid

Thumbnail for Adapting to Climate Change Bit by Bit w/Planetary Health Informatics & Machine Learning

Living through a pandemic in the era of climate change it can be easy to sense doom and gloom. Yet living in the era data science, for the machine learning community there has not been a better time to act than now. This talk will introduce the audience to planetary health and some of the most pressing issues facing us (and our planet), cover a review of the state-of-the-art in artificial intelligence and data science methods in planetary health informatics and present a summary of the latest research, and finally highlight opportunities for budding and experienced data scientists in this rapidly growing and pertinent field.

This workshop was conducted by Sara Khalid, University Research Lecturer and Senior Research Associate at University of Oxford.

Read More

AI & Neuroscience: Combining Real-Time Brain Imaging and Machine Learning | Romy Lorenz

Thumbnail for AI & Neuroscience: Combining Real-Time Brain Imaging and Machine Learning | Romy Lorenz

Cognitive neuroscientists are often interested in broad research questions, yet use overly narrow experimental designs by considering only a small subset of possible experimental conditions. This limits the generalizability and reproducibility of many research findings. In this workshop, I present an alternative approach, “The AI Neuroscientist”, that resolves these problems by combining real-time brain imaging with a branch of machine learning, Bayesian optimization. Neuroadaptive Bayesian optimization is an active sampling approach that allows to intelligently search through large experiment spaces with the aim to optimize an unknown objective function. It thus provides a powerful strategy to efficiently explore many more experimental conditions than is currently possible with standard brain imaging methodology. Alongside methodological details on non-parametric Bayesian optimization using Gaussian process regression, I will present results from a clinical study where we applied the method to map cognitive dysfunction in stroke patients. Our results demonstrate that this technique is both feasible and robust also for clinical cohorts. Moreover, our study highlights the importance of moving beyond traditional ‘one-size-fits-all’ approaches where patients are treated as one group. Our approach can be combined with brain stimulation or other therapeutics, thereby opening new avenues for precision medicine targeting a diverse range of neurological and psychiatric conditions.

In this workshop, we focus on temporal domain from perspective of both traditional recommender systems and deep neural networks. We first start with the classic latent factor model. We introduce temporal dynamics in the latent factor model and show how this improves performance. We then move into sequential modelling using deep neural networks by presenting state-of-the-art in the field and discuss the advantages and disadvantages.

This workshop was conducted by Romy Lorenz, Postdoctoral Fellow at Stanford University and University of Cambridge

Read More

How do I get started with Machine Learning? | Mathworks

Thumbnail for How do I get started with Machine Learning? | Mathworks

Data Science workflows typically entail using Machine Learning.

Machine Learning can provide insight into various datasets and can assist with automating various types of analysis.

In this workshop you will explore a process for getting started with implementing Machine Learning interactively to train a model to predict tsunami intensity and implement other relevant tasks.

This workshop was conducted by Louvere Walker-Hannon, and Heather Gorr from Mathworks.

Read More

Pocket AI and IoT, or How to be a Data Scientist using Your Mobile Device | Mathworks

Thumbnail for Do You See What I See: Exploration of Using AI and AR | Mathworks

Want to learn more about trends like AI, IoT and wearable tech? In one hour, we will cut through the hype by building a “smart” fitness tracker using your own mobile device. We’ll do hands-on exercises: you’ll acquire data from sensors, design a step counter and train a human activity classifier. You will leave motivated and ready to use machine learning and sensors in your own projects!

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

Read More

Telling and Sharing Stories | Izzy Aguiar

Thumbnail for Machine Learning for Scientific R&D: Why it's Hard and Why it's Fun | Julia Ling

How can sharing stories help us as a community? How do we learn how to find a story from the events of someone else’s life or our own? How can this relate to our own tendency as data-scientists to connect the dots, to find meaning through patterns? Join us in this WiDS workshop on telling and sharing stories where we will address these questions and learn how our stories are important in shaping the community we want to see in Data Science.

This workshop was conducted by Izzy Aguiar, phD student at Stanford University, ICME.

Read More

Recommender Systems | Walmart

Thumbnail for Spelling Correction for 100+ Languages | Jingwen Lu

Recommender systems are playing a major role in e-commerce industry. They are keeping users engaged by recommending relevant content and have a significant role in driving digital revenue.

Following tremendous gains in computer vision and natural language processing with deep neural networks in the past decade, the recent years have seen a shift from traditional recommender systems to deep neural network architectures in research and industry.

In this workshop, we focus on temporal domain from perspective of both traditional recommender systems and deep neural networks. We first start with the classic latent factor model. We introduce temporal dynamics in the latent factor model and show how this improves performance. We then move into sequential modelling using deep neural networks by presenting state-of-the-art in the field and discuss the advantages and disadvantages.

This workshop was conducted by Aleksandra Cerekovic & Selene Xu at Walmart Gobal Tech.

Read More

Do You See What I See: Exploration of Using AI and AR | MathWorks

Thumbnail for Data Analysis for Health

Welcome to the world of artificial intelligence (AI) and augmented reality (AR)! This workshop explains AI and AR via hands on exercises where you will interact with your augmented world. You will learn about applications where the technologies of AI+AR are combined, their limitations, and their impacts in society. You’ll leave armed with code, inspiration, and an ethical framework for your own projects!

Artificial intelligence (AI) is used in a variety of industries for many applications. AI can be combined with other technologies to assist with understanding implications of certain aspects of applications. In this workshop, you explore how pose estimation results implemented using Deep Learning are impacted based on a location which is provided using augmented reality. These combined technologies provide insight into how poses could be interpreted differently based on a scene. This workshop also raises awareness regarding consequences of using AI for applications that are different from its originally intended use, which could lead to both technical and ethical challenges.

Specific topics that will be covered in this workshop are listed below:
• understand how AI and AR can be used for applications
• explore how to implement AI and AR
• discover what tools can be used to implement AI and AR
• review code that implements pose estimation using AI and changing background scenes using AR
• gain guidance regarding challenges to address societal impacts of the results from applications that use AI and AR

In addition to receiving an overview of terminology and an understanding of the workflows for each topic, code will be provided to demonstrate how to implement these workflows with tools from MathWorks.

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

Read More

Graph Theory for Data Science, Part III: Characterizing Graphs in the Real World

Thumbnail for Why we love arrays for data science | Eileen Martin

Graph theory provides an effective way to study relationships between data points, and is applied to everything from deep learning models to social networks. This workshop is part I in a series of three workshops. Throughout the series we will progress from introductory explanations of what a graph is, through the most common algorithms performed on graphs, and end with an investigation of the attributes of large-scale graphs using real data.

And in particular for Part III:
Many of the systems we study today can be represented as graphs, from social media networks to phylogenetic trees to airplane flight paths. In this workshop we will explore real-world examples of graphs, discussing how to extract graphs from real data, data structures for storing graphs, and measures to characterize graphs. We will work with real examples of graph data to create a table of values that summarize different example graphs, exploring values such as the centrality, assortativity, and diameter of each graph. Python code will be provided so that attendees can get hands-on experience analyzing graph data.

This workshop was conducted by Stanford ICME PhD student, Julia Olivieri.

Read More

Graph Theory for Data Science, Part II: Graph Algorithms: Traversing the tree and beyond

Thumbnail for Why I love Linear Algebra

Graph theory provides an effective way to study relationships between data points, and is applied to everything from deep learning models to social networks. This workshop is part II in a series of three workshops. Throughout the series we will progress from introductory explanations of what a graph is, through the most common algorithms performed on graphs, and end with an investigation of the attributes of large-scale graphs using real data.

And in particular for Part II:
Graph-based algorithms are essential for everything from tracking relationships in social networks to finding the shortest driving distance on Google Maps. In this workshop we will explore some of the most useful graph algorithms, from both the breadth-first and depth-first methods for searching graphs, to Kruskal’s algorithm for finding a minimum spanning tree of a weighted graph, to approximation methods for solving the traveling salesman problem. We will use hands-on examples in python to explore the computational complexity and accuracy of these algorithms, and discuss their broader applications.

This workshop was conducted by Stanford ICME PhD student, Julia Olivieri.

Read More

Natural Language Processing | Riyanka Bhowal, Walmart

Thumbnail for Automating Machine Learning | Madeleine Udell

Natural language processing has direct real-world applications, from speech recognition to automatic text generation, from lexical semantics understanding to question answering. In just a decade, neural machine learning models became widespread, largely abandoning the statistical methods due to its requirement of elaborate feature engineering. Popular techniques include use of word-embeddings to capture semantic properties of words. In this workshop, we take you through the ever-changing journey of neural models while addressing their boons and banes.

The workshop will address concepts of word-embedding, frequency-based and prediction-based embedding, positional embedding, multi-headed attention and application of the same in unsupervised context.

This workshop was conducted by Riyanka Bhowal, Senior Data Scientist at Walmart Gobal Tech.

Read More

Pocket AI and loT, or How to be a Data Scientist using your Mobile Device | Mathworks

Thumbnail for Tackling the WiDS Datathon Challenge 2021 | Usha Rengaraju

Want to learn more about trends like AI, IoT and wearable tech? In less than one hour, we will cut through the hype by building a “smart” fitness tracker using your own mobile device.

We’ll do hands-on exercises: you’ll acquire data from sensors, design a step counter and train a human activity classifier. You will leave motivated and ready to use machine learning and sensors in your own projects!

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

Read More

Machine Learning for Scientific R&D: Why it’s Hard and Why it’s Fun | Julia Ling

Thumbnail for Machine Learning for Scientific R&D: Why it's Hard and Why it's Fun | Julia Ling

Julia Ling, CTO at Citrine Informatics hosts a workshop on ‘Machine Learning for Scientific R&D: Why it’s Hard and Why it’s Fun’ in which she covers some of the key challenges in machine learning for R&D applications: the small, often-messy, sample-biased datasets; the exploratory nature of scientific discovery; and the curious, hands-on approach of scientific users. Julia discusses potential solutions to these challenges, including transfer learning, integration of scientific domain knowledge, uncertainty quantification, and machine learning model interpretability.

Read More

Evolution of Applied Recommender Systems | Walmart

Thumbnail for Evolution of Applied Recommender Systems | Walmart

Debanjana Banerjee, Data Scientist and Sinduja Subramaniam, Staff Data Scientist with Walmart host a workshop ‘Evolution of Applied Recommender Systems’ where they take you through the whirlwind journey of the recommender system from GroupLens in the 1990s, Content Based Filtering, Matrix Factorization and Hybrid Recommender Systems in the late 2000s all the way to DeepLearning based recommenders of today. The workshop will address foundational concepts such as user-item interaction matrix, user/item profiles, cold-start problem, sparsity, scalability, etc. along with mathematical formulation for different types of recommender systems using applications in Retail.

Read More

Data Processing & Statistical Models to Impute Missing Perpetrator Information | HRDAG

Thumbnail for Data Processing & Statistical Models to Impute Missing Perpetrator Information | HRDAG

Megan Price, Executive Director and Maria Gargiulo, Statistician with Human Rights Data Analysis Group (HRDAG) host a workshop on ‘Data Processing and Statistical Models to Impute Missing Perpetrator Information’ where they use methods from statistics and computer science to help answer questions about mass violence using incomplete and unrepresentative datasets from the context in which HRDAG works and how open-source tools are crucial to their analytical projects.

Read More

Closing Address: Why data science needs more women | Andrea Goldsmith | WiDS 2021

Thumbnail for Closing Address: Why data science needs more women | Andrea Goldsmith | WiDS 2021

Andrea Goldsmith, Dean of Engineering and Applied Science at Princeton University discusses why data scientists with diverse perspectives, experiences, and knowledge are needed for the field to thrive and achieve maximum impact. She paints a vision for a diverse and inclusive culture in data science, and propose how to achieve that vision.

Read More

Tech Talk: What does it mean to have a robust algorithm? | Cindy Orozco Bohorquez | WiDS 2021

Thumbnail for Tech Talk: What does it mean to have a robust algorithm? | Cindy Orozco Bohorquez | WiDS 2021

Cindy Orozco Bohorquez, Ph.D. Candidate in Computational and Mathematical Engineering at Stanford University studies which choice is the correct one for a classical problem in computer graphics and satellite communication called point-set registration. She focuses on the special case of recovering the rotation that aligns two data sets that belong to the d-dimensional sphere. She explores combining results from statistics, optimization, and differential geometry, to compare the solutions given by these algorithms.

Read More

Panel: Energy and Sustainability | Rosalind Archer, Xin Ma, Lesly Goh, Nida Rizwan Farid | WiDS 2021

Thumbnail for Panel: Energy and Sustainability | Rosalind Archer

Panel Discussion on ‘Energy and Sustainability’

Moderator: Rosalind Archer, Professor, University of Auckland
Panelists:
-Xin Ma, Managing Director, Asia Platform, TOTAL
-Lesly Goh, Senior Fellow, National University of Singapore Lee Kuan Yew School of Public Policy
-Nida Rizwan Farid, Aerospace Engineer and Energy Efficiency Consultant, Save Joules

Read More

Tech Talk: Improving Livestock Health with Deep Learning | Dina Machuve | WiDS 2021

Thumbnail for Tech Talk: Improving Livestock Health with Deep Learning | Dina Machuve | WiDS 2021

Dina Machuve, Lecturer and Researcher, Nelson Mandela African Institution of Science and Technology, discusss how with the help of CNNs, farmers will have the potential to better diagnose poultry diseases and improve livestock health in small to medium scale farming (crop and livestock) which accounts for 70% of the food production of the developing world and supports over 380 million farming households.

Read More

Tech Talk: Doing Data Science in Data Deserts | Fatima Abu Salem | WiDS 2021

Thumbnail for Tech Talk: Doing Data Science in Data Deserts | Fatima Abu Salem | WiDS 2021

Fatima Abu Salem, Associate Professor at the American University of Beirut reports on a series of works associated with the Syrian conflict, with help from data obtained from the Violations Documentation Center (VDC). Fatima presents on fake news detection, predicting primary health care demand by Syrian refugees in Lebanon, and understanding some notions of Syrian refugee mobility in Turkey, all seen as instigated by “peaks’’ in the Syrian war, revealed through the VDC. She also presents a brief overview of in-progress projects with a social impact, in application to smart irrigation, predicting birth defects in Lebanon using air pollution data, and quantifying anti-refugee bias across Lebanese news corpora.

Read More

Panel: Ethics & Responsible Data Science | WiDS 2021

Thumbnail for Panel: Ethics & Responsible Data Science | WiDS 2021

Panel discussion on ‘Ethics and Responsible Data Science’

Moderator: Shir Meir Lador, Data Science Group Manager, Intuit
Panelists:
-Andrea Martin, Leader IBM Watson Center Munich & EMEA Client Centers, IBM Distinguished Engineer, IBM
-Monica Scannapieco, Head of the Division “Information and Application Architecture”, Italian National Institute of Statistics
-Nazareen Ebrahim, AI Ethics Officer, Socially Acceptable – South Africa

Read More

Building Reproducible, Reusable & Robust Deep Reinforcement Learning Systems | Jöelle Pineau

Thumbnail for Building Reproducible

Jöelle Pineau, Computer Scientist and Associate Professor of McGill University and Lead of Facebook’s Artificial Intelligence Research lab talks about challenges that arise in experimental techniques and reporting procedures in deep learning, with a particular focus on reinforcement learning and applications to healthcare. She describes several recent results and guidelines designed to make future results more reproducible, reusable and robust!

Read More

Panel: Paths to Leadership in Data Science | WiDS 2021

Thumbnail for Panel: Paths to Leadership in Data Science | WiDS 2021

Panel discussion on ‘Paths to Leadership in Data Science’

Moderator: Martina Lauchengco, Operating Partner, Costanoa Ventures
Panelists:
-Afua Bruce, Chief Program Officer, DataKind
-Daniela Braga, Founder and CEO, DefinedCrowd
-Aishwarya Agrawal, Professor, University of Montreal; Research Scientist, DeepMind
-Michelle Rodriguez, Dean, Engineering School Universidad del Pacífico

Read More

The Emerging Role of Cryptography in Trustworthy AI | Shafi Goldwasser | WiDS 2021

Thumbnail for The Emerging Role of Cryptography in Trustworthy AI | Shafi Goldwasser | WiDS 2021

Shafi Goldwasser, Director of the Simons Institute for the Theory of Computing, Professor of Electrical Engineering and Computer Science at the University of California Berkeley, Professor of Electrical Engineering and Computer Science at MIT and Professor of Computer Science and Applied Mathematics at the Weizmann Institute of Science Israel, speaks on how cryptographic models and tools can and should play a role in ensuring the trustworthiness of AI and machine learning and address problems such as privacy of training input, model verification and robustness against adversarial examples.

Read More

WiDS Next Gen Overview Video

Thumbnail for WiDS Next Gen Overview Video

The WiDS Next Gen program inspires secondary school students to consider careers involving data science, artificial intelligence (AI), and related fields. We particularly encourage young women and girls by showing examples of successful women who are having an impact in the field.

This introductory video provides an overview of what data science is and how data science is ‚ÄÇ‚ÄÇ‚ÄÇ‚ÄÇ being applied in the real world. The video also features the day-in-the-life of four data scientists who are having a positive impact in the field today.

Read More

UPDATED VIDEO: WiDS Datathon 2020 Webinar: Lessons Learned + Best Practices with Health Data

Thumbnail for UPDATED VIDEO: WiDS Datathon 2020 Webinar: Lessons Learned + Best Practices with Health Data

This open-to-all webinar on-demand explores challenges and opportunities from working with healthcare data, and discuss distinct issues around the technology and the clinical aspect of healthcare machine learning. The panel discusses privacy and compliance, reproducibility, data sensitivity, data complexity, and the end-to-end workflow of AI-based solutions that impact healthcare in the United States and globally.

Speakers:
– Vani Mandava, Director, Data Science, Microsoft Research
– Carly Eckert MD MPH, Director of Clinical Informatics, KenSci
– Leo Anthony Celi MD MS MPH, MIT, Beth Israel Deaconess Medical Center
– Marzyeh Ghassemi PhD, Assistant Professor, University of Toronto
– Meredith Lee PhD, Executive Director, West Big Data Innovation Hub

Download webinar slides: bit.ly/wids_datathon_webinar_slides
More information: widsconference.org/datathon

Read More

Why a World with AI Needs More EQ | Tsu-Jae King Liu | WiDS 2020

Thumbnail for Why a World with AI Needs More EQ | Tsu-Jae King Liu | WiDS 2020

Tsu-Jae King Liu, Dean of Berkeley School of Engineering at University of California, Berkeley delivers a Keynote presentation at WiDS Stanford University on March 2, 2020:

Today we live in a dynamic and unpredictable world that is increasingly dependent on engineered devices, processes and systems. A 2017 workforce report by the McKinsey Global Institute indicates that all workers will need to adapt as their occupations evolve with increasingly capable machines. In the age of artificial intelligence (AI) and data science, workers will spend more time on activities that require social and emotional skills, creativity, high-level cognitive capabilities and other skills that are relatively hard to automate.

There is growing evidence of the importance of a high emotional quotient (EQ) as a predictor of success and organizational performance. In this talk, Professor Liu will share insights gained from her personal career journey and describe initiatives being undertaken in the College of Engineering at the University of California, Berkeley to cultivate EQ in their students and to advance equity and inclusion, toward a brighter future for all.

Read More

Polyglot AI: The Role of Natural Language Processing (NLP) | Rama Akkiraju | WiDS 2020

Thumbnail for Polyglot AI: The Role of Natural Language Processing (NLP) | Rama Akkiraju | WiDS 2020

Rama Akkiraju, IBM Fellow and Director of AI Operations at IBM, delivers a Technical Vision Talk at WiDS Stanford University on March 2, 2020:

AI applications are proliferating in consumer and business domains these days around the world. Have you ever wondered how Siri, Google Home, Google Maps or Amazon Echo speaks to users in different countries in their local languages? How does an automated customer support chat bot that you are speaking with or texting with speak or understand your local language to resolve your problems? AI models that power these applications have to speak the language of the user and the language of the business for them to be useful and relevant. Polyglot AI is not magic just the way AI itself is not magic! It takes a lot of hard work to teach AI to understand and speak new languages. In this talk, I’ll take you through some behind the scenes hard work to build multilingual natural language processing systems that enables AI to speak multiple languages.

Read More

Data Science in a Cloud World: What Every Data Scientist Needs to Know | Nhung Ho | WiDS 2020

Thumbnail for Data Science in a Cloud World: What Every Data Scientist Needs to Know | Nhung Ho | WiDS 2020

Nhung Ho, Directory of Data Science at Intuit delivers a Technical Vision Talk at WiDS Stanford University on March 2, 2020:

In today’s digital world, cloud adoption is mainstream. Whether for business or higher education, organizations are on an accelerated migration path to achieve greater flexibility, speed and cost efficiencies. For data scientists, the cloud can serve as an underlying platform to speed AI innovation by offering the processing capability needed to manage massive amounts of data, sophisticated algorithms and complex models that must be as performant as possible. In this talk, Nhung Ho, Director of Data Science for Intuit AI, will draw upon real-world experiences in academia and building and modernizing Intuit Mint’s categorization models to describe what every data scientist needs to know about data science in a cloud world.

Read More

Interpretability For Everyone | Been Kim | WiDS 2020

Thumbnail for Interpretability For Everyone | Been Kim | WiDS 2020

Been Kim, Research Scientist at Google Brain delivers a Technical Vision Talk at WiDS Stanford University on March 2, 2020:

In this talk, Been will reflect on some of the progress made in the field of interpretable machine learning. We will reflect on where we are going as a field, and what are the things we need to be aware and be careful as we make progress. With that perspective, she will then discuss some of her recent work 1) sanity checking popular methods and 2) developing more lay-person friendly interpretability method.

Read More

Creating Global Economic Opportunity with Responsible Data Science | Ya Xu | WiDS 2020

Thumbnail for Creating Global Economic Opportunity with Responsible Data Science | Ya Xu | WiDS 2020

Ya Xu, Head of Data Science at LinkedIn delivers a Technical Vision Talk at WiDS Stanford University on March 2, 2020:

At LinkedIn, data plays an essential role in achieving our vision of creating economic opportunity for every member of the global workforce. It is critical that we are not just using data to create opportunities, but creating them responsibly. This goes beyond just complying with regulations. It starts with taking data privacy protection seriously with Differential Privacy, and avoiding unintended consequences in both our products and ML models to ensure fairness. In this talk, Ya will share perspectives from her experience addressing these challenges at LinkedIn.

Read More

Building Water Security From the Bottom Up by Leveraging Big Data | Newsha Ajami | WiDS 2020

Thumbnail for Building Water Security From the Bottom Up by Leveraging Big Data | Newsha Ajami | WiDS 2020

Newsha Ajami, Director of Urban Water Policy at Stanford University, delivers a Technical Vision Talk at WiDS Stanford University on March 2, 2020

Access to safe and reliable water is the foundation of social, economic, and environmental wellbeing, however it is being threatened by impacts of climate change, environmental degradation, population growth, and aging infrastructure. According to the United Nations water security is one of the 21st century’s greatest challenges. Our current water infrastructure networks have been designed and governed under the assumption of abundance and stationarity, believing that by harnessing nature we could deliver unlimited amounts of water to various sectors. There was limited accounting for human dynamics uncertainties in managing these complex networks. Building climate-resilient and sustainable water systems under such structural and environmental pressures requires an integrated and holistic approach that would better represent the interlinks among science/engineering, society, and policy.

While this is easier said than done, some of the emerging data sources from digital platforms, online aggregators, social media, and measurement technologies (e.g. sensors and remotely sensed data), combined with improved computing power, are offering new opportunities to unfold and better define the new frontiers of water sustainability and resiliency. In this talk I will share a portfolio of innovative water management tools that harness new data sources to assess both evolving water demand trends and modern supply regimes to achieve water security.

Read More

How Data Science Can Unlock Teaching & Learning at Scale | Emily Glassberg Sands | WiDS 2020

Thumbnail for How Data Science Can Unlock Teaching & Learning at Scale | Emily Glassberg Sands | WiDS 2020

Emily Glassberg Sands, Head of Data Science at Coursera delivers a Technical Vision Talk at WiDS Stanford University on March 2, 2020:

Coursera is the world’s largest platform for higher education, providing 50 million learners access to life-transforming skills and credentials. With the rich data generated as over 50 million learners engage on the platform, we have the unique opportunity to use data science and machine learning to unlock high-quality teaching and learning at scale. This talk will take you behind-the-scenes of some of our latest data products — from the personalized coaching that motivates and unblocks learners, to the algorithmic skill scores that track real-time progress against career goals, to the human-in-the-loop systems accelerating grading and student support. We’ll touch on the math, the product, the impact, and our own learnings along the way.

Read More

How Data Science Can Unlock Teaching & Learning at Scale | Emily Glassberg Sands | WiDS 2020

Thumbnail for How Data Science Can Unlock Teaching & Learning at Scale | Emily Glassberg Sands | WiDS 2020

Emily Glassberg Sands, Head of Data Science at Coursera delivers a Technical Vision Talk at WiDS Stanford University on March 2, 2020:

Coursera is the world’s largest platform for higher education, providing 50 million learners access to life-transforming skills and credentials. With the rich data generated as over 50 million learners engage on the platform, we have the unique opportunity to use data science and machine learning to unlock high-quality teaching and learning at scale. This talk will take you behind-the-scenes of some of our latest data products ‚Äî from the personalized coaching that motivates and unblocks learners, to the algorithmic skill scores that track real-time progress against career goals, to the human-in-the-loop systems accelerating grading and student support. We‚Äôll touch on the math, the product, the impact, and our own learnings along the way.

Read More

Nhung Ho, Intuit | Stanford Women in Data Science (WiDS) Conference 2020

Thumbnail for Nhung Ho

Nhung Ho, Director of Data Science, Intuit sits down with Sonia Tagare for WiDS 2020 in Stanford, CA.

#WiDS2020 #WomenInTech #theCUBE

https://siliconangle.com/2020/03/03/c…

Customer diversity plays important role for Intuit in product design

Diversity applies to more than just the makeup of any given workforce. It’s become especially important in understanding how products will be received by customers.

Intuit Inc. develops and sells accounting and tax preparation software, and its QuickBooks product serves more than 2.5 million users worldwide. With a customer base that large, being attuned to the diverse needs of its subscribers has become a key ingredient in the company’s overall success.

“We serve consumers, small businesses, and the self-employed,” said Nhung Ho (pictured), director of data science at Intuit. “The diversity of our customers mirrors the general population. You need to bring in diverse perspectives so you can build the best products possible because the people who are using those products come from a diverse background as well.”

Ho spoke with Sonia Tagare, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Women in Data Science conference in Stanford, California. They discussed how testing in the cloud has helped Intuit build products for a highly diverse customer base and ways the company can tailor its solutions for specific users.

Cloud enhances testing
One way to know how users might respond to newly introduced products is through a process known as A/B testing. It’s a process of showing two variants of a product to different groups of people to determine which will achieve better results.

Before the cloud, testing was usually limited to once a year — just before a new product would ship. Times have changed.

“Now that we’re in the cloud, it allows us to test continuously via A/B testing,” Ho explained. “You turn what was once a one-time change-management process to one that’s distributed throughout the entire year. At any one time we’re running hundreds of tests to make sure we’re shipping the best things for our customers.”

This process has also allowed Intuit to personalize products in ways that weren’t often possible before.

“We have enough data and we have enough compute where we can build a model tailored just for you,” Ho said. “That means I can help a cupcake shop owner actually manage her cash flow to help her succeed. That’s really powerful, and that’s where data science is headed.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Women in Data Science conference.

Read More

Ya Xu, LinkedIn | Stanford Women in Data Science (WiDS) Conference 2020

Thumbnail for Ya Xu

Ya Xu, Head of Data Science, LinkedIn sits down with Sonia Tagare for WiDS 2020 in Stanford, CA.

#WiDS2020 #WomenInTech #theCUBE

https://siliconangle.com/2020/03/04/l…

LinkedIn pursues its vision to leverage data for global economic opportunity

It would be a mistake to simply characterize LinkedIn Corp. as merely a job or networking website.

With 660 million users, LinkedIn has the ability to leverage a tremendous amount of data in ways that go far beyond the latest job switch or promotion. The field of data science is helping it transform economic structures on a worldwide scale.

“Everybody can benefit from better data and better data access,” said Ya Xu (pictured), head of data science at LinkedIn. “We truly believe in the vision that we are working towards, which is creating economic opportunity for every member of the global workforce.”

Xu spoke with Sonia Tagare, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Women in Data Science conference in Stanford, California. They discussed using data responsibly, the importance of diversity, and advice for women seeking a career in the data science field.

Data privacy is fundamental
Speaking at the WiDS conference this week, Xu addressed ways that responsible data can create global opportunity. Data privacy and diversity are fundamental components of that strategy, according to Xu.

“The fundamental thing that we have to start with is to be able to preserve the privacy of our members,” Xu said. “If you have a diverse team that is a representation of the customers you are serving, then you are able to come up with better features that are able to serve the needs of a population. That’s just the right thing to do.”

In her role as a data science leader for LinkedIn, Xu has advice for other women who may be seeking to follow their own careers in the field.

“Just have that ‘can do’ attitude,” Xu said. “We’re not any less than a man, and there are certainly many strong and talented women that we have in the field. Don’t let people’s’ perceptions or biases around you bring you down.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the Women in Data Science conference:

Read More

Emily Glassberg Sands, Coursera | Stanford Women in Data Science (WiDS) Conference 2020

Thumbnail for Emily Glassberg Sands

Emily Glassberg Sands, Head of Data Science, Coursera sits down with Sonia Tagare for WiDS 2020 at Stanford, CA.

#WiDS2020 #WomenInTech #theCUBE

https://siliconangle.com/2020/03/13/q…

Q&A: Coursera uses student skill tracking data to help companies create a more diverse workforce

Distance learning has been around since the late 18th century, when students received assignments via mail, completed them, and sent them back for grading. Today, massive open online courses, known as MOOC’s, can have hundreds of thousands of students. Some of the most popular free lectures on YouTube, such as Stanford University’s lecture on Einstein’s theory of relativity, have millions of views.

MOOC’s started to gain popularity back in 2012, when Stanford professor’s Daphne Koller and Andrew Ng decided to make their lectures available online. Those courses became the foundation for Coursera Inc., which today has around 50 million students and is the world’s largest platform for higher education.

Emily Glassberg Sands (pictured), senior director of data science at Coursera, joined Sonia Tagare, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Women in Data Science conference in Stanford, California. They discussed the changes happening in MOOC structure, and how tracking student skill data can help companies hire a more diverse workforce.

[Editor’s note: The following content has been condensed for clarity.]

How has Coursera changed from when it started in 2012?

Glassberg Sands: It’s evolved a lot. We’ve moved from partnering exclusively with universities to recognizing that a lot of the most important education for folks in the labor market is being taught within companies. So, we’ve expanded to including education that’s provided not just by top institutions like Stanford, but also by top institutions that are companies like Amazon and Google.

The second big change is we’ve recognized that while for many learners and individual course or an MOOC is sufficient, some learners need access to a full degree — a diploma bearing credential. We now have 14 degrees live on the platform, including master’s degrees in computer science and data science.

The third major change is that we launched Coursera enterprise, which is about providing learning content through employers and through governments so we can reach a wider swath of individuals who might not be able to afford it themselves.

Could you explain how Coursera use data science to track individual user preferences and user behavior?

Glassberg Sands: We personalize throughout the learner journey. So, in discovery up-front when you first join the platform, we ask: What’s your career goal? What role are you in today? And then we help you find the right content to close the gap.

As you’re moving through courses, we predict whether or not you need some additional support. So, we identify for each individual what type of human touch might they need and we serve up to support staff recommendations for who they should reach out to, whether it’s a counselor reaching out to a degree student who hasn’t logged in for a while or a teaching assistant reaching out to a degree student who’s struggling with an assignment. Data really powers all of that, understanding someone’s goals, their backgrounds, the content that’s going to close the gap, as well as understanding where they need additional support and what type of help we can provide.

Tell us about Coursera’s latest data products.

Glassberg Sands: We’ve launched three data products over the last couple of years. The first is predicting when learners are going to need additional nudges and intervening in fully automated ways to get them back on track.

The second is about identifying learners who need human support and serving up really easily interpretable insights to support staff so they can reach out to the right learner with the right help.

Then the third is a little bit different. It’s about once learners are out in the labor market, how can they credibly signal what they know so that they can be rewarded for that learning on the job. And this is a product called skill scoring, where we’re actually measuring what skills each learner has up to what level so I can, for example, compare that to the skills required in my target career or show it to my employer so I can be rewarded for what I know.

That would be really helpful when people are creating resumes, by ranking the level of skills that they have.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Women in Data Science conference.

Read More

Daphne Koller, insitro | Stanford Women in Data Science (WiDS) Conference 2020

Thumbnail for Daphne Koller

Daphne Koller, CEO and Founder, insitro sits down with Sonia Tagare at Stanford University for WiDS 2020.

#WiDS2020 #WomenInTech #theCUBE

https://siliconangle.com/2020/03/12/a…

AI works to slash drug development costs as technology and biology join forces to defeat Eroom’s Law

The convergence of previously discrete fields is a hallmark of the digital era. Remember the divide between development and operations teams? That gap vanished into the cloud, as DevOps became the new way of working.

Now technology is becoming incorporated into other disciplines. In the 1990s, quantitative biology took a leap from a descriptive science to gene sequencing, thanks to technology such as microarrays. At the same time, big data was revolutionizing information technology.

“What I think is coming now, 30 years later, is the convergence of those two fields into one field that I like to think of as digital biology,” said Daphne Koller (pictured), founder and chief executive officer of Insitro Inc.

Koller spoke with Sonia Tagare, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Women in Data Science conference in Stanford, California. They discussed how applying machine-learning techniques to traditionally biological research fields — such as drug research — could bring down the costs of medicine.

Applying ML models to drug development
Measuring biology has taken on new levels of detail, fidelity and scale thanks to new technology, according to Koller. Artificial intelligence and machine learning allow scientists to interpret what they are seeing and engineer new solutions that “will have implications in biomaterials, in energy, in the environment, in agriculture, and I think also in human health,” Koller said.

One of the biggest problems in the health field is the negative trend in number of drugs approved versus dollars spent on research. This is known as Eroom’s Law, because it is the opposite of Moore’s Law.

“Despite many important advancements, the costs just keep going up and up and up,” Koller said.

Approach problems with diversity
Machine learning could hold the key to breaking this trend, but it requires a cross-discipline approach, according to Koller. “One needs to really build a culture of people who work together from different disciplines, each bringing their own insights and their own ideas into the mix,” she said.

The team she has created at Insitro is half life scientists and half machine learning and data science experts.

“They start from the very beginning to understand what are the problems that one could solve together: How do you design the experiment? How do you build the model? And how do you drive insights that can help us make better medicines for people?” she said.

Using a data-driven approach, collecting and analyzing huge amounts of data will reveal new hypotheses, according to Koller.

“Hopefully, we’ll be able to create enough data and apply machine learning to address key bottlenecks in the drug discovery and development process,” she said. “[Then] we can bring better drugs to people, and we can do it faster and, hopefully, at much lower cost.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Women in Data Science conference.

Read More

Technology Driven Business Opportunities for the Next Decade | Padmasree Warrior | WiDS 2019

Thumbnail for Technology Driven Business Opportunities for the Next Decade | Padmasree Warrior | WiDS 2019

Padmasree shares technology driven business creation opportunities for the next decade. Over the next few years, we will see the emergence of new global mega businesses that will be built on innovations around Data, AI/ML and Blockchain. This will transform many industries including transportation, healthcare, content, education, transactions and others. Existing models of social networks and digital advertising will face significant headwinds. In light of heightened user awareness about value of personal data, concerns regarding privacy, increased pressure for governmental regulations; how do we as technologists proactively address these issues? What significant leadership roles can women occupy in these technical domain, as entrepreneurs and business leaders?

Padmasree Warrior is the former Chief Executive Officer of NIO U.S., and former Chief Development Officer and Board Member of NIO, a manufacturer of electric and autonomous vehicles.

Read More

Career Panel | WiDS 2019

Thumbnail for Career Panel | WiDS 2019

WiDS 2019 Career Panel moderated by Margot Gerritsen, Senior Associate Dean, Stanford University

Panelists:
– Natalie Evans Harris; Co-founder and Head of Strategy Initiatives, BrightHive Inc.
– Marzyeh Ghassemi; Assistant Professor, University of Toronto
– Emily Glassberg Sands; Head of Data Science and Data Engineering, Coursera
– Yinglian Xie; CEO and Co-Founder, DataVisor

Read More

Better Reinforcement Learning for Human in the Loop Systems | Emma Brunskill | WiDS 2019

Thumbnail for Better Reinforcement Learning for Human in the Loop Systems | Emma Brunskill | WiDS 2019

Emma Brunskill, Assistant Professor, Computer Science, Stanford University

There is increasing excitement about reinforcement learning– a subarea of machine learning for enabling an agent to learn to make good decisions. Yet numerous questions and challenges remain for reinforcement learning to help support progress in important high stakes domains like education, consumer marketing and healthcare. I’ll discuss some recent advances in these areas, and our work towards creating transparent, accountable reinforcement learning approaches that can interact beneficially with people.

Read More

Srujana Kaddevarmuth, Accenture | WiDS 2019

Thumbnail for Srujana Kaddevarmuth

Srujana Kaddevarmuth, Senior Manager, Data Science , Accenture & Women in Data Science Ambassador, Bengaluru | @Srujanadev sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #Accenture #theCUBE

https://siliconangle.com/2019/03/11/w…

WiDS Datathon mixes up data science with collaborative teams

If only a data set and some pre-packaged data-analytics software were all it takes to solve real-world problems. The reality is that tools require hands to ply them. And just like a comprehensive data set is better than a limited one, a comprehensive set of skills helps people design better solutions.

Looking at the problem from different perspectives and collaboration are the keys to be able to be successful in data science,” said Srujana Kaddevarmuth (pictured), data science and analytics executive at Accenture LLP and ambassador for the Women in Machine Learning & Data Science team in Bengaluru (formerly Bangalore).

Take a problem like deforestation from palm-oil plantations. Consider all the factors that might be involved: agriculture, climate, ecology, economics, politics, etc. What are the odds that one random data expert can ask all the right questions, pull together all the necessary data, and derive actionable insight? Probably not great.

This is the thinking behind collaborative data-science projects, like the Women in Data Science, or WiDS, Datathon. This year, it organized several teams to collaborate and use data and satellite imagery to analyze this particular problem.

Kaddevarmuth spoke with Lisa Martin (@LisaMartinTV), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Stanford Women in Data Science event in Stanford, California. They discussed this year’s Datathon and why collaboration results in better outcomes for data scientists.

From clueless to Kaggle code in three weeks
At the WiDS Bengaluru regional event, organizers set up a community workshop. The goal was to form teams to participate in the Datathon. They would submit the fruits of their endeavors to something called Kaggle, a platform for data-science projects and competitions. In India, Kaggle participation is very male heavydespite that region having amazing female data scientists who are innovators in their space with multiple patents, publications and innovations to their credit,” Kaddevarmuth said.

WiDS teamed mentors with participating teams to work together for three weeks. One team from the engineering division who was brand new to Kaggle learned new concepts, honed skills in deep learning and neural networks, and submitted original code to the Kaggle leaderboard.

They were not the top-scoring team, but this entire experience of being able to collaborate, look at the problem from different perspectives, and be able to submit the code despite a lot of these challenges — and also navigate the platform in itself — was a decent achievement from my perspective,” Kaddevarmuth concluded.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event.
License
Creative Commons Attribution license (reuse allowed)
Show les

Read More

Madeleine Udell, Cornell University | WiDS 2019

Thumbnail for Madeleine Udell

Madeleine Udell, Assistant Professor, Cornell University, @madeleineudell sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #CornellUniversity #theCUBE

https://siliconangle.com/2019/03/08/t…

This professor is cleaning up tech’s ‘messy data’ problem

Strong data sets are table stakes for any organization today. Data insights can provide the tentpoles for building a strategic roadmap and offer unexpected learnings for businesses to leverage as new market opportunities. But even the most valuable data set can prove worthless if its insights are entangled in the unstructured digital void.

An estimated 80 percent of all data is unstructured, which renders the intel buried in its complex documents and media files inaccessible without an alternative method of analysis. As information floods the tech industry faster than new talent is prepared to make sense of it, the unstructured data challenge is posing a formidable hurdle for businesses in the digital age.

Madeleine Udell (pictured), assistant professor of operations research and information engineering at Cornell University, is educating a new era of technologists to decode this so-calledmessy data” with a more effective approach to tech collaboration.

Oftentimes people only learn about big, messy data when they go to industry,” Udell said.I’m interested in understanding low dimensional structure in large, messy data sets [to] figure out ways of … making them seem cleaner, smaller and easier to work with.”

Udell spoke with Lisa Martin, host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the recent Stanford Women in Data Science event at Stanford University.

This week, theCUBE spotlights Madeleine Udell in its Women in Tech feature.

The unstructured data challenge
The rise of messy data can be attributed in large part to the influx of information from a growing number of digital endpoints. Internet of things devices deliver a stream ofmessy” data, but the clutter can also come from images, videos, social media, emails, and other data sets not already formatted for simple analysis.

Though more complex and tedious to decipher, these data sources are some of the most highly valued in a market focused on individual user targeting. That gap between ability and potential innovation is what drives Udell’s interest in unstructured data, an area of technology the assistant professor says people entering the tech industry are not adequately prepared for. In her own classes, Udell teaches optimization for machine learning from a messy data perspective.

[The class] introduces undergraduates to what messy data sets look like, which they often don’t see in their undergraduate curriculum, and ways to wrangle them into forms they could use with other tools they have learned as undergraduates,” she said.

Udell’s interest in messy data was piqued when she met the challenge head on working in the Obama 2012 presidential campaign. She was tasked with analyzing voter information but found the unstructured data sets too cumbersome to yield valuable insight.

They had hundreds of millions of rows, one for every voter in the United States, and tens of thousands of columns about things that we knew about those voters,” Udell said.Gender … education level, approximate income, whether or not they had voted in the last elections, and much of the data was missing. How do you even visualize this kind of data set?”

When Udell returned to work on her Ph.D., she was intent on discovering a more efficient method for parsing out value from unstructured data sets.I wanted to figure out the right way of approaching this, because a lot of people will just sort of hack it,” she saidI wanted to understand what’s really going on.”

Making an impact with communication
Udell is as interested in the technical architectures that enable data analysis as she is in supporting organizations through the implementation processes that will allow them to benefit from her work. A comprehensive answer to data management requires both math and communication, and Udell says her broad skill set is part of what has enabled her to make sense of messy data.

If you want your technical work to have an impact, you need to be able to communicate it to other people,” Udell stated.

The social aspect of her role is crucial to finding solutions that actually address user problems and work within existing processes.You need to make … sure you’re working on the right problems, which means talking with people to figure out what the right problems are,” she said.This is … fundamental to my career, talking to people about problems they’re facing that they don’t know how to solve.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event:

Read More

Kristina Draper, Wells Fargo | WiDS 2019

Thumbnail for Kristina Draper

Kristina Draper, Technology Division Executive, Consumer Bank & Services Technology, Wells Fargo | @kristinadraper sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #WellsFargo #theCUBE

https://siliconangle.com/2019/03/05/q…

Q&A: Wells Fargo aims for 100-percent data transparency in new era of consumer trust

The big data explosion has created transformative innovation opportunities for technology, as well as businesses across industries. As consumers better understand their piece in that data puzzle and the market begins to find its footing in a data-driven digital landscape, companies must adopt a responsibility around transparency to maintain trust and efficiency.

Greater visibility around data-driven processes can also lead to more comprehensive solutions through interdisciplinary collaboration, according to Kristina Draper (pictured), chief technology officer at Wells Fargo & Co.

Draper spoke with Lisa Martin (@LisaMartinTV), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Stanford Women in Data Science event in Stanford, California. They discussed the role data is playing in a new era of accountability at Wells Fargo, as well as how Draper is reaching beyond the financial industry for greater innovation opportunities.

[Editor’s note: The following answers have been condensed for clarity.]

Tell a little about your involvement in WiDS, as well as Wells Fargo’s involvement as a sponsor.

Draper: We believe so strongly that in the consumer bank space we have a tremendous opportunity and responsibility to understand how our customers interact with Wells Fargo, and that will require a discipline around data science. We had an opportunity this year to be an executive sponsor and jumped at it. I think we’ll continue to be at that sponsor level in future years.

You were recently named one of the 50 most powerful women in technology. What are some of the [ways] Wells Fargo is re-imagining data and trust? What have you seen of the evolution of females in technology and leadership roles?

Draper: The recognition [of] women in technology … is an opportunity to demonstrate that we should be very confident in the value that we bring as leaders, and that confidence as a woman is hard to come by. I think of my own personal career and the way that doors were opened for me along the way; often we are our own worst enemies. We second guess ourselves, we second guess our value, and we have to really work for that seat at the table.

My coming back to Wells was really … as a leader in technology. I felt I could make a real impact. When I think about what we can do as women leaders in technology and in data science, a lot of it is owning that accountability to leadership and paving the way for leaders behind us. There comes a part in a career, certainly mine, where you’re no longer thinking about the next job for yourself.

We’re in a consumer banking space and financial services, so there’s certainly a lot of places to innovate [and] think about how technology can help to serve a Wells Fargo customer. You need your bank throughout your entire life. Whether you are thinking about a home purchase, an auto purchase, college for your children, retirement, there’s so many big markers in life. And that’s where I get excited about not only the leadership role that I have now, but I have the opportunity to bring a team with me to contribute real value.

You have a pay-it-forward attitude. How are you using that to expand your team … to continue this big re-imagining that Wells Fargo as a business is undergoing?

Draper: WiDS is … a tremendous network opportunity. [I’m] so inspired about how they’re turning data science and really thinking about different problems [and] ways we can improve not only our lives, but the lives of future generations to come.

I come from a financial services background, but the problems that our future generations will face can’t be solved with just one lens. You can’t solve problems with just a financial services expertise or just a technical expertise. It’s the space in between art and science. It’s an ability to think across industry and apply solutions and innovation that have been brought forward through other industries, through other companies, through other academia, and thinking about how that could apply in solving the problems that we’re faced with in the financial services space.

If I turned some of the problems that we’re faced with upside down and thought about it with that perspective, and invited some collaboration to help solve problems, we might come up with a better answer.

How can financial services and the data that you deal with help customers?

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event.

Read More

Janet George, Western Digital | WiDS 2019

Thumbnail for Janet George

Janet George, “Fellow” Chief Data Officer/Scientist/Big Data/Cognitive Computing, Western Digital sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #WesternDigital #theCUBE

https://siliconangle.com/2019/03/07/q…

Q&A: How AI is cultivating a responsible community to better mankind

Artificial intelligence initiatives powered by big data are propelling businesses beyond the capacity of human labor. While AI tech offers an undeniable opportunity for innovation, it has also sparked a debate around potential misuse through the vast reach of programmed biases and other problematic behaviors.

The power of AI can be comprehensively harnessed for good by fostering diverse teams focused on ethical solutions and working in tandem with policymakers to ensure responsible scale, according to Janet George (pictured), fellow and chief data officer at WD, a Western Digital Company.

George spoke with Lisa Martin (@LisaMartinTV), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Stanford Women in Data Science event in Stanford, California. They discussed the range of possibilities in AI and how WD is leveraging the technology toward sustainability.

[Editor’s note: The following answers have been condensed for clarity.]

Tell us about Western Digital’s continued sponsorship and what makes this important to you.

George: Western Digital has recently transformed itself … and we are a data-driven … data-infrastructure company. This momentum of AI is a foundational shift in the way we do business. Businesses are realizing that they’re going to be in two categories, the ‘have’ and the ‘have not.’ In order to be in the have category, you have to embrace AI … data … [and] scale. You have to transform yourself to put yourself in a competitive position. That’s why Western Digital is here.

How has Western Digital transformed to harness AI for good?

George: We are not just a company that focuses on business for AI. One of the initiatives we are doing is AI for Good and … Data for Good … working with the UN. We’ve been focusing on trying to figure out the data that impacts climate change. Collecting data and providing infrastructure to stow massive amounts of species data in the environment that we’ve never actually collected before. Climate change is a huge area for us, education … [and] diversity. We’re using all of these areas as a launching pad for Data for Good and trying to use data … and AI to better mankind.

Now we have the data to put out massively predictive models that can help us understand what the change would look like 25 years from now and take corrective action. We know carbon emissions are causing very significant damage to our environment and there’s something we can do about it. Data is helping us do that. We have the infrastructure, economies of scale. We can build massive platforms that can stow this data and then we can analyze this data at scale. We have enough technology now to adapt to our ecosystem … and be better in the next 10 years.

What are your thoughts on data scientists taking something like a Hippocratic Oath to start owning accountability for the data that they’re working with?

George: We need a diversity of data scientists to have multiple models that are completely diverse, and we have to be very responsible when we start to create. Creators have to be responsible for their creation. Where we get into tricky areas are when you are the human creator of an AI model, and now the AI model has self-created because it has self-learned. Who owns the copyright to those when AI becomes the creator? The group of people that are responsible for creating the environment, creating the models, the question comes into how do we protect the authors, the users, the producers, and the new creators of the original piece of art.

You can use the creation for good or bad. The creation recreates itself, like AI learning, on its own with massive amounts of data after an original data scientist has created the model. Laws have to change; policies have to change. Innovation has to go, and at the same time, we have to be responsible about what we innovate.

Where are we as a society in starting to understand the different principles and practices that have to be implemented in order for proper management of data to enable innovation?

George: We’re debating the issues. We’re coming together as a community. We’re having discussions with experts. What are we seeing as the longevity of that AI model in a business setting, in a non-business setting? How does the AI perform? We are now able to see the sustained performance of the AI model.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event.

Read More

Liza Donnelly, The New Yorker | WiDS 2019

Thumbnail for Liza Donnelly

Liza Donnelly, Writer & Cartoonist, The New Yorker sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #TheNewYorker #theCUBE

https://siliconangle.com/2019/03/07/q…

Q&A: Cartoons illustrate what’s possible in a more accessible tech industry

The real value of data is in its ability to tell a story through the technologists working to analyze and implement it in new creative solutions. Telling stories through a more accessible medium is what Liza Donnelly (pictured), staff cartoonist atThe New Yorker,” does in her visual journalism work by sharing sketches that condense a rich experience into a single snapshot.

Donnelly spoke with Lisa Martin (@LisaMartinTV), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Stanford Women in Data Science event in Stanford, California. They discussed how cartoons can be used to tell stories from different perspectives and why illustrating women working in technology is quietly revolutionary.

[Editor’s note: The following answers have been condensed for clarity.]

Tell us a little … about visual journalism.

Donnelly: I am somebody who goes to events — political, social or cultural — and draws what I see. I’m not a court reporter. I’m an impressionist. I give people the feeling that they’re there with me by what I draw. I try to capture that person’s essence. Oftentimes I try to capture a sentence that they’re saying that has a more universal appeal that somehow brings like a layman into the subject a little bit. This visual journalism is more like reportage. I do behind the scenes too. At the Oscars I’ll do the stars if I can get them … but then I also do the people taking out the trash, the guy painting the sideboard, the cameraman. I try to give a sense of what it’s like to be there.

I do them on my iPad, and I send them out on social media almost immediately so they feel like they’re there. It gives people a different perspective of what’s going on, and I think that my background as a cartoonist forThe New Yorker” for 40 years informs these drawings in an indirect background kind of way because I’ve been watching culture [and] politics for a very long time.

I’d love to understand, from your perspective, the evolution of cartoons and the impact they can make in society.

Donnelly: Cartoons can be very controversial and problematic. That’s been true through the course of the history of our country … but it’s compounded now because of the internet. Cartoons can be misunderstood. They can be used as weapons.

I’m going to be talking about this at South by Southwest … about political cartoons and what their impact has been in the past, and how they create an impact now, and why that is, and how we can use it to good effect. I think a problem we’re dealing with right now in our culture is everybody is so divided, and so opinionated, and so hateful towards each other. Can we use cartoons not to perpetuate that but to make things better in some way?

There are more and more cartoons on the internet now. There’s a lot of webcomics, and young cartoonists are using the internet effectively to put out their ideas. The internet is just a dialogue with people. I think this new generation is really trying to find ways to use these tools in a good way. They’re trying to make a better world.

Tell us about how you got involved with Women in Data Science.

Donnelly: A big part of what I want to do with my work is promoting equal rights for women around the world, and so I thought,This sounds terrific.” Plus it’s global, and I do a lot of work globally to help women and freedom of speech as well. It seemed to be a great fit, and it seems even more to be a good fit in that it’s a way to get the information out there in a visual way, because people, they hear the worddata,” and they probably just glaze over. But they see it connected with a cartoon or a drawing, it humanizes it for them a little bit.

Today I was drawing a woman speaker talking about really technical data science. I put it on the internet and I thought, it’s just a constant reminder to people that women are doing this. If you see it, it resonates a little bit more quickly and more forcefully in your brain. I think more women are stepping into this field and being recognized for doing so.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Stanford Women in Data Science event.

Read More

When Data Science IS the Business! | Leda Braga | WiDS 2018

Thumbnail for When Data Science IS the Business! | Leda Braga | WiDS 2018

Leda Braga, Chief Executive Officer at Systematica Investments, delivers a Keynote presentation at the WiDS 2018 Conference held at Stanford University.

Objective analysis of relevant data can improve the execution of most businesses. From the simple client feedback form through to production statistics, listening to the data helps. In the investment management industry, by contrast, data analysis IS the business. Investment management is information management and data science is not an aid to decision making, but rather the essence of it.

This talk will explore the reality of investment management, how recent developments in data and AI are shaping the fund management industry and the challenges of dealing with financial data. Also in the context of the WiDS forum and its clear focus on diversity, trends such as ethical investing (or Socially responsible Investing SRI) will also be discussed.

Read More

Integrating Data Science and Cyber Security | Bhavani Thuraisingham | WiDS 2018

Thumbnail for Integrating Data Science and Cyber Security | Bhavani Thuraisingham | WiDS 2018

Bhavani Thuraisingham, Professor of Computer Science at University of Texas at Dallas, presents Integrating Data Science and Cyber Security at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

The collection, storage, manipulation, analysis and retention of massive amounts of data have resulted in serious security and privacy considerations. Various regulations are being proposed to handle big data so that the privacy of the individuals is not violated. For example, even if personally identifiable information is removed from the data, when data is combined with other data, an individual can be identified. While collecting massive amounts of data causes security and privacy concerns, big data analytics applications in cyber security is exploding. For example, an organization can outsource activities such as identity management, intrusion detection and malware analysis to the cloud. The question is, how can the developments in data science techniques be used to solve security problems? Furthermore, how can we ensure that such techniques are secure and adapt to adversarial attacks? This presentation will first describe our research in data science including in stream data analytics and novel class detection and discuss its applications to insider threat detection. Second, it will discuss the emerging research area of adversarial machine learning. Finally, it will discuss why women should pursue careers in data science.

Read More

Career Panel | WiDS 2018

Thumbnail for Career Panel | WiDS 2018

A Career Panel moderated by Margot Gerrtisen and with questions from the audience. Panelists include:

– Elena Grewal, Head of Data Science at Airbnb
– Bhavani Thuraisingham, Professor of Computer Science at University of Texas at Dallas
– Ziya Ma, VP Software and Services Group and Director of Big Data Technologyies at Intel Corporation
– Jennifer Prendki, Head of Data Science at Atlassian

Read More

Dynamic Pricing and Matching in Ride-Sharing | Dawn Woodard | WiDS 2018

Thumbnail for Dynamic Pricing and Matching in Ride-Sharing | Dawn Woodard | WiDS 2018

Dawn Woodard, Senior Data Science Manager of Maps at Uber presents Dynamic Pricing and Matching in Ride-Sharing at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

Ride-sharing platforms like Uber, Lyft, Didi Chuxing, and Ola are transforming urban mobility by connecting riders with drivers via the sharing economy. These platforms have achieved explosive growth, in part by dramatically improving the efficiency of matching, and by calibrating the balance of supply and demand through dynamic pricing. The dynamic adjustment of prices ensures a reliable service for riders, and incentivizes drivers to provide rides at peak times and locations. Dynamic pricing is particularly important for ride-sharing, because pricing too low causes pickup ETAs to get very long, which reduces the efficiency of the platform and causes a poor experience for riders and drivers. We review the literature on matching and pricing techniques in ride-sharing. We also discuss how to estimate several key inputs to those algorithms: predictions of demand, supply, and travel time in the road network.

Read More

More Data, More (Statistical) Problems | Daniela Witten | WiDS 2018

Thumbnail for More Data

Daniela Witten, Associate Professor of Statistics and Biostatistics at University of Washington, presents More Data, More (Statistical) Problems at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

By now, virtually every field has become inundated with big data. We have been promised that this data will usher in a new era of previously unimaginable societal and scientific progress. While it is certainly true that more data brings with it incredible opportunities, it is also true that more data can bring new and previously unimaginable statistical challenges. I will talk about some of those statistical challenges, as well as statistical ways to solve them. Examples will be taken from biomedical research.

Read More

How Cascades Grow | Lada Adamic | WiDS 2018

Thumbnail for How Cascades Grow | Lada Adamic | WiDS 2018

Lada Adamic, Research Scientist Manager at Facebook, presents How Cascades Grow at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

This talk will describe several studies of information diffusion in social networks. The early spread of the cascade can be used to predict features such as size and shape, as well as whether it will recur. The information itself may change in the course of propagation, revealing the evolutionary structure of memes. Finally, the structure of the cascade can vary depending on the underlying mechanism by which the information is shared.

Read More

Data-Driven Storytelling | Nathalie Henry Riche | WiDS 2018

Thumbnail for Data-Driven Storytelling | Nathalie Henry Riche | WiDS 2018

Nathalie Henry Riche, Researcher at Microsoft Research, presents Data-Driven Storytelling at the WiDS 2018 Conference held at Stanford University on March 5, 2018.

Data visualization is a powerful medium to makes sense of large amounts of data and communicate insights gained from analyses to a general audience. Research in the field of information visualization aims at designing interactive visual interfaces to augment human cognition for exploring and communicating with data.

In this talk, I will present our latest research efforts in the field of information visualization and data-driven storytelling. Stories supported by facts extracted from data analysis (data-driven storytelling) proliferate in many different forms from static infographics shared on social media to dynamic and interactive applications available on leading news media outlets. I will present research shedding light on what makes visual stories compelling and share insights on how to empower people to build these experiences without programming.

Read More

Data Science Supporting National Security | Dr. Deborah Frincke | WiDS 2017

Thumbnail for Data Science Supporting National Security | Dr. Deborah Frincke | WiDS 2017

Dr. Deborah Frincke leads the Research Directorate of the National Security Agency (NSA), the largest “in-house” research organization in the U.S. Intelligence Community. She also serves as the NSA Science Advisor and Innovation Champion, and is a recipient of the President’s Meritorious Rank Award. In her presentation, Dr. Frincke will discuss NSA’s unclassified research programs and describe how the Research Directorate supports national-level missions. She will provide key insights on data science challenges facing NSA and the nation.
Dr. Frincke talks about Mission-Oriented Research; how rock climbing is similar to sorting through messy data; and how adversarial machine learning is an area of active research.

Read More

How Data Science is Revolutionizing the Oil & Gas Industry | Stephanie Gottlieb-Zeh | WiDS 2017

Thumbnail for How Data Science is Revolutionizing the Oil & Gas Industry | Stephanie Gottlieb-Zeh | WiDS 2017

The upstream oil & gas industry (i.e. the exploration for and production of hydrocarbons) needs to reap the benefits of new technology to improve efficiency. Making more effective use of increasing amounts of collected data is on the verge of transforming the business.

Transformation through data analytics is equally relevant on both the operational and financial sides of the business.
On the upstream operational side: for decades now, we have been inventing new and increasingly sophisticated tools (both hardware and software) to generate new data types that extend the boundaries of geoscience knowledge, and allow us to understand our hydrocarbon reservoirs in ever increasing detail. Historically, we have processed only a fraction of the data collected, but that is changing. Now, among the most important criteria governing the efficiency of oil and gas companies are the hugely increased volume of data collected but also the variety, velocity and veracity of information that can be extracted from that data. That’s data science! Data analytics as a discipline is now increasingly integrated within our upstream workflows in drilling, reservoir characterization and the actual production (extraction) of hydrocarbons in the most economically efficient ways possible. To this end, one goal is the development of an analytics platform that will perform a key role in increasing productivity through the simultaneous optimization of drilling planning and execution, the improvement of asset utilization and the overall reduction of non-productive time.

On the financial side: the oil and gas industry has a long history of being secretive and, as a result, judgment of the quality and accuracy of non-technical data has proved very difficult. In general, insufficient attention has been paid to addressing these challenges leading to unnecessary volatility in price movements through inadequate or conflicting data, and this volatility impacts decision-making within companies. In the information age, where markets react instantaneously to a multitude of data sources, it is time to understand better this key driver of our industry. Decision-enabling information is extremely critical to the efficient functioning of an industry that is driven by the signals coming from commercial markets. Understanding the quality and accuracy of that information through data science is a key enabler in filling a major gap currently preventing more effective management of oil and gas company assets.

Digital transformation, implying the transition from desktop to the cloud and mobile devices, easy access to information, new scalable online services and automated industrial workflows, is about to radically change the way we work in any industry (oil and gas, defense, transport, automotive, medicine, telecom, logistics, etc). This is no longer a trend, but a reality clearly demonstrated by the world’s most valuable companies adopting expanded and enhanced data analytics in response to common drivers of operational efficiency, operational safety and accuracy of real-time decision-making. That’s the promise of Big Data, to really understand the systems that make our technological industry. As you begin to understand the interactions of all the constituent components then you can build systems that are better and more effective at addressing the key industry drivers, irrespective of the industry. New technology is increasingly playing a huge new role. Data is the new oil!

Dr. Gottlib-Zeh describes how data science is transforming the oil and gas industry for better planning and efficiency, for both drilling and production.

Read More

Making a complete toolbox for quantitative biological data analyses | Susan Holmes | WiDS 2017

Thumbnail for Making a complete toolbox for quantitative biological data analyses | Susan Holmes | WiDS 2017

Dr. Holmes shares a survey of the current challenges in the analyses of heterogeneous biological data. Combining networks, contingency tables and data from multiple omics domains provides the analysts with multiple choices. The result can be an erroneous p-value or a complicated workflow, both can be irreproducible. I will survey some of the recent approaches to this challenge.

Dr. Susan Holmes, Professor of Statistics, describes processes for analyzing large messy microbiome data sets, and the importance of reproducibility.

Read More

Beware what you ask for: The secret life of predictive models | Claudia Perlich | WiDS 2017

Thumbnail for Beware what you ask for: The secret life of predictive models | Claudia Perlich | WiDS 2017

Predictive modeling and its variants are at the core of an increasing number of technical advances that touch us in every aspect of our life. Today, nobody doubts the ability of machines to learn from historical data and predict with far higher accuracy than any human. But real world applications of machine learning are often a far cry from the well understood academic assurances of how these algorithms should behave. In this talk I will share some practical lessons when models had a surprising secret life and did something very different from what I thought I had asked them to do. As the creators of machine learning solutions it is our responsibility to pay attention to the often subtle symptoms and to let our human intuition be the gate keeper deciding when our models are ready to be released ‘into the wild’.

Claudia Perlich, Chief Scientist at Dstillery, talks about how data scientists need to use a combination of data science and intuition to deliver accurate insights from data sets.

Read More

Susan Athey: Data science is about using data to answer questions and test hypotheses

Thumbnail for Susan Athey: Data science is about using data to answer questions and test hypotheses

Susan Athey, the Economics of Technology Professor at Stanford Graduate School of Business, has always been interested in the intersection of economics and computer science. As an undergraduate she was a math, computer science and economics triple major. She explains that combining economics, social science, engineering and machine-learning tools allows you to answer questions in a way that wasn’t possible before.

Read More

Jennifer Chayes: Being an engineer is not a lonely, geeky job; it’s collaborative and creative

Thumbnail for Jennifer Chayes: Being an engineer is not a lonely

Jobs in engineering have the potential to have a huge impact on the world, says Jennifer Chayes, managing director of Microsoft Research. The merging of data science and healthcare is particularly exciting to Chayes. As we begin to use technology to monitor our health more consistently, we’ll be able to use this data to figure out how to treat disease in a more personalized fashion. This will revolutionize healthcare and the quality of our lives.

Read More

Netflix: A confluence of metrics, algorithms, and experimentation | Caitlin Smallwood | WiDS 2015

Thumbnail for Netflix: A confluence of metrics

Whether it’s personalizing recommendations of movies and TV shows or optimizing the streaming of video bits to peoples’ households, Netflix relies heavily on data science techniques. We believe in continuous learning through predictive modeling & algorithms, experimentation, and principled metric design. This talk will highlight Netflix’s core data science strategies and uses, with particular focus on our successes and challenges in experimenting with personalization algorithms.

Read More

Network Science: From the Online World to Cancer Genomics | Jennifer Chayes | WiDS 2015

Thumbnail for Network Science: From the Online World to Cancer Genomics | Jennifer Chayes | WiDS 2015

Everywhere we turn these days, we find that networks can be used to describe relevant interactions. In the high tech world, we see the Internet, the World Wide Web, mobile phone networks, and a variety of online social networks. In economics, we are increasingly experiencing both the positive and negative effects of a global networked economy. In epidemiology, we find disease spreading over our ever-growing social networks, complicated by mutation of the disease agents. In biomedical research, we are beginning to understand the structure of gene regulatory networks, with the prospect of using this understanding to manage many human diseases. In this talk, I look quite generally at some of the models we are using to describe these networks, processes we are studying on the networks, algorithms we have devised for the networks, and finally, methods we are developing to indirectly infer network structure from measured data. I’ll discuss in some detail particular applications to cancer genomics, applying network algorithms to suggest possible drug targets for certain kinds of cancer.

Read More

Enabling Breakthrough Insights | Diane Bryant | WiDS 2015

Thumbnail for Enabling Breakthrough Insights | Diane Bryant | WiDS 2015

The vast ocean of data created in today’s digital world offers enormous potential. However, the key to unlocking that potential lies not in the data itself, but in the science that refines it. The well-defined processes and toolsets designed for legacy BI solutions do not meet the needs of today’s big data analytics environments. Diane will share Intel’s investments in both the technology and the ecosystem to enable the next breakthrough insights.

Read More