Topic: Software Design and Engineering

Amy Jaggard In Algorithms, Data Generation/Collection, Domains, Software Design and Engineering

Data Science in the Healthcare and Life Sciences Industry

Amy Jaggard In Software Design and Engineering

Sklearn Pipelines

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Data Wrangling, Foundations (Mathematics/Statistics), Software Design and Engineering

Preparing for a career in DS | Montse Cordero, Adriana Velez Thames, Elaine Yi Xu, Sanne Smith

Thumbnail for Preparing for a career in DS | Montse Cordero

Panel: Preparing for a career in data science

Moderator:
Sanne Smith, Director of Master’s Program, Education Data Science, Stanford University, is the director of the master‚Äôs program Education Data Science and a lecturer at the Stanford Graduate School of Education. She teaches courses that introduce students to coding, data wrangling and visualization, various statistical methods, and the interpretation of quantitative research. She studies social networks and thriving, diverse contexts.

Panelists:
Montse Cordero, Mathematics Designer, youcubed, is a mathematics designer for youcubed, a center at Stanford University that aims to inspire, educate and empower teachers of mathematics, transforming the latest research on maths learning into accessible and practical forms. He is a co-author and professional development provider for youcubed‚Äôs Explorations in Data Science high school curriculum and has participated in multiple national summits for the advancement of data science in K-12 education (Data Science 4 Everyone Coalition, National Academies of Sciences Engineering and Medicine). Montse is also a mathematician interested in work at the intersection of combinatorics, algebra, and geometry. In all facets of their work, Montse endeavors to change the ways our culture thinks and talks about mathematics.

Adriana Velez Thames, Geophysicist-Data Scientist, Springboard Alumni. Adriana recently completed a transition to Data Science after many years in the Oil and Gas industry as a Senior Geophysicist. Her primary focus was in seismic data processing for imaging the Earth‚Äôs subsurface to guide energy exploration projects. From 2012-2019, she worked at TGS where her responsibilities included QC of deliverables, testing of internal software updates, and conducting test projects and benchmarks. This involved extensive analysis and manipulation of terabyte-sized digital subsurface data using sophisticated algorithms. She believes that data-driven decisions are the best way to solve problems in any industry. Having been born in Colombia and attained post-graduate degrees in Russia, she is fluent in English, Spanish, and has working proficiency in Russian. Currently she continues educational studies in data science and spatial data science.

Elaine Yi Xu, Staff Business Data Analyst, Intuit, is a passionate data analytics and data science practitioner, putting her undergrad degree in Statistics and MS in Info Sys and DS into everyday business decision-making. She‚Äôs been working in-house in web analytics, product analytics, and marketing analytics for multiple industries, including retail (lululemon), automotive (Kelley Blue Book), and most recently at Intuit, the global technology platform. She specializes in the measurement of Go-To-Market marketing strategies, assessment of marketing campaign effectiveness, optimization of user experience, and A/B Testing. She thrives to be the connective tissue between business, analytics, engineering, and data science, combining all facets of science to help arrive at the most optimal business decisions.

admin In Data Generation/Collection, Software Design and Engineering

Bringing Motion Diffusion Models to Immersive Entertainment | Jhanvi Shriram and Ketaki Shriram

Jhanvi Shriram, Co-Founder and CEO, Krikey alongside Ketaki Shriram
Co-Founder and CTO, Krikey present Technical Vision Talk “Bringing Motion Diffusion Models to Immersive Entertainment”. Most generative models thus far have focused on utilizing LLMs for consumer products. The introduction of motion diffusion models to this space provides a novel avenue to engage consumers, especially in the field of entertainment. This talk will cover a text-to-animation motion diffusion model. This model generates animations in less than 5 minutes. These animations can be applied to any 3D file and utilized with any 3D software. Practical applications include optimizing production pipelines for gaming, film, and immersive learning. We will also cover the implications for these industries as they adopt new generative tools in production workflows. To learn more about our tool and try it for yourself, please visit krikey.ai.

Jhanvi Biography:
Jhanvi is currently the CEO of Krikey, an AI gaming tools service that she co-founded with her sister. Krikey recently closed their Series A round, led by Reliance Jio, India’s biggest telecom operator. Prior to Krikey, Jhanvi worked at YouTube as a Production Strategist on operations and creator community programs, which sparked her interest in working with content creators. She also worked at JauntVR and Participant Media. In 2014, Jhanvi and her sister, Ketaki Shriram, co-produced a feature film titled, ‚ÄúTrue Son,‚Äù which followed a 22-year old‚Äôs political campaign in Stockton, CA. The film premiered at the 2014 Tribeca Film Festival and was acquired by FusionTV/Univision. Jhanvi holds a BA (Political Science and African Studies) and MBA from Stanford University, and a MFA (Producing) from USC. You can learn more here: krikey.ai.

Ketaki Shriram Biography:
Dr. Shriram is a scientist, film producer, and wildlife photographer interested in the impact of immersive worlds on human behavior. She is currently the Chief Technology Officer at Krikey, an AI gaming tools service that she co-founded with her sister. Krikey recently closed their Series A round, led by Reliance Jio, India’s biggest telecom operator. Dr. Shriram received her BA, MA, and PhD at the Stanford Virtual Human Interaction Lab. She previously worked at Google [x] and at Meta‚Äôs Reality Labs. Dr. Shriram was selected for the Forbes 30 Under 30 2020 Class in the Gaming category. You can learn more here: krikey.ai.

—

admin In Data Generation/Collection, Data Science as a Career, Software Design and Engineering, Values

Openscapes Supporting Kinder Science for Future Us | Julia Stewart Lowndes

Julia Stewart Lowndes, Director, Openscapes presents Technical Vision Talk “Openscapes: Supporting Kinder Science for Future Us”. At Openscapes, we believe open science can accelerate interoperable, data-driven solutions and increase diversity, equity, inclusion, and belonging in research and beyond. Our main activity is mentoring environmental and Earth science teams in open science, and connecting and elevating these researchers both through tech like R, Python, Quarto, and JupyterHubs and communities like RLadies, Black Women in Ecology Evolution, and Marine Science, Ladies of Landsat, and NASA. We will share stories and approaches about open science as a daily practice ‚Äì better science for future us ‚Äì and welcome you to join the movement.

Biography:
Julia Stewart Lowndes, PhD, is a marine ecologist working at the intersection of actionable environmental science, data science, and open science. Julia‚Äôs main focus is mentoring teams to develop technical and leadership mindsets and skills for data-intensive research, grounded in climate solutions, inclusion, and kindness. She founded Openscapes in 2018 as a Mozilla Fellow and Senior Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California Santa Barbara (UCSB), having earned her PhD from Stanford University in 2012 studying drivers and impacts of Humboldt squid in a changing climate.

—

admin In Data Generation/Collection, Data Visualization, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Principles of Good Data Viz | Jenn Schilling

What key principles of design and data viz do you need to know to create effective and clear graphs? This talk will cover preattentive attributes, Gestalt principles, and principles of color use. It will provide the key concepts from design and data viz research that you need to know to communicate data effectively. The talk will include examples to demonstrate applying the concepts and comparing data viz effectiveness.

This workshop was conducted by Jenn Schilling, Founder of Schilling Data Studio.

admin In Software Design and Engineering, Values

Creating Data Visualizations with Spotify Data | Nicole Crosdale

This workshop is targeted toward those who are new to coding. This presentation will teach an individual how to analyze their personal Spotify data, create visualizations and prepare their data to be used in business processes. This demonstration will use Python so a new coder will understand foundational coding syntax that can be used in other languages.

This workshop was conducted by Nicole Crosdale, a Graduate student at the University of Florida.

Resources and slides for this workshop: https://bit.ly/spotify_resources

admin In Algorithms, Software Design and Engineering, Values

Using MATLAB and Python Together| Mathworks

You’ve heard it before – Python vs MATLAB vs R but in reality, programming languages are often used together! In this hands-on workshop, you’ll learn how to use MATLAB and Python together with practical examples. Specifically, you’ll learn how to: – Call Python libraries from MATLAB – Call user-defined Python commands, scripts, and modules – Manage and convert data between languages – Package MATLAB algorithms to be called from Python

This workshop was conducted by Heather Gorr, Senior Product Marketing Manager, MATLAB and Grace Woolson, Student Competitions Technical Evangelist – Data Science at Mathworks.

Resources and slides for this workshop: https://bit.ly/matlab_python_slides

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Data Wrangling, Software Design and Engineering, Values

Low-Code AI: Making AI accessible to everyone | Mathworks

Learn how you can apply AI in your field without extensive knowledge in programming. This hands-on session includes a quick recap on the fundamentals of AI and two exercises where you will learn how to classify human activities using MATLAB® interactive tools and apps:

– Accessing and preprocessing data acquired from a mobile device
– Classifying the labeled data using two apps: The Classification Learner app and the Deep Network Designer app

At the end of the workshop, you will be able to design and train different machine learning and deep learning models without extensive programming knowledge. In addition, you will also learn how to automatically generate code from the interactive workflow. This will not only help you to reuse the models without manually going through all the steps but also to learn programming or advance your coding skills.

This workshop was conducted by Gaby Arellano Bello and Neha Sardesai, Senior Application Engineers in Education at Mathworks.

Access resources for this workshop: https://bit.ly/low_code_ai_resources

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Data Wrangling, Software Design and Engineering, Values

Introduction to Explainable AI | Supreet Kaur

Responsible AI is reaching new heights these days. Companies have started exploring Explainable AI as a means to explain the results better to senior leadership and increase their trust in AI Algorithms. This workshop will entail an overview of this area, importance of it in today’s era, and some of the practical techniques that you can use to implement it. As a bonus, it will also cover some industry use cases and limitations of these techniques. Join me in unboxing this black box!

This workshop was conducted by Supreet Kaur, Assistant Vice President at Morgan Stanley.

Slides for this workshop: https://bit.ly/explainableai_slides

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Exploratory data analysis using personal data from Strava and Apple Watch | Deepnote

During the workshop, we show a simple exploratory data analysis using Deepnote. We will focus on personal data from Camino de Santiago pilgrimage which we retrieved from our Strava API and show you how to get it from your own device. Using this data we explain a theory about Exploratory Data Analysis and show some use cases.

This workshop was conducted by Tereza Vaňková and Alleanna Clark of Deepnote.

Resources used in this workshop:
– https://bit.ly/deepnote_notebook
– https://bit.ly/deepnote_slides

Misha Suresh In Data Science as a Career, Data Science Conference, Datathon, Software Design and Engineering

Announcing the WiDS Datathon 2022 Excellence in Research Award Results

This year’s WiDS Datathon Excellence in Research Award (Phase II) broadened participants' focus to examine the impact of climate change. Participants were able to explore a domain related to their...

admin In Algorithms, Data Science as a Career, Software Design and Engineering, Values

Dashboard Design Thinking | Jenn Schilling

Best practices in data visualization and dashboard design are numerous and sometimes contradictory, but a straightforward method to apply design thinking to creating dashboards is effective and universally applicable. This session will cover the details of design thinking and how it can be applied to dashboard development to create impactful dashboards that meet user needs and provide valuable insights.

This workshop was conducted by Jenn Schilling, Senior Research Analyst at the University of Arizona.

admin In Data Generation/Collection, Software Design and Engineering, Values

Exploring Hidden Markov Models | Julia Christina Costacurta

Hidden Markov Models (HMMs) are used to describe and analyze sequential data in a wide range of fields, including handwriting recognition, protein folding, and computational finance. In this workshop, we will cover the basics of how HMMs are defined, why we might want to use one, and how to implement an HMM in Python. This workshop might be of particular interest to attendees from May 25’s “Intro to Markov Chains and Bayesian Inference” session. Introductory background in probability, statistics, and linear algebra is assumed.

This workshop was conducted by Julia Christina Costacurta, PhD Candidate at Stanford University

Useful resources for this workshop:
– https://bit.ly/hmm_presentation
– https://bit.ly/hmm_tutorial_notebook

admin In Algorithms, Data Generation/Collection, Data Wrangling, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Alternative approaches to A/B Experiments – 3 Causal Impact Approaches | Jennifer Vlasiu

Make answering ‘what if’ analysis questions a whole lot easier by learning about state-of-the-art, end-to-end applied frameworks for causal inference.

We will cover:
Microsoft’s “Do Why” Package Causal Impact in Python – DoWhy | An end-to-end library for causal inference — DoWhy | An end-to-end library for causal inference documentation (microsoft.github.io)
Bayesian Causal Impact in R
MLE Causal Impact in Python
Bonus: AA Testing, when to use and why it matters
We will apply these models in the context of understanding the impact of a marketing rewards campaign, as well as understand the impact from a product/feature upgrade

This workshop was conducted by Jennifer Vlasiu, Data Science & Big Data Instructor at York University

Useful resources for this workshop:
– https://bit.ly/github_casual_impact

admin In Algorithms, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Introduction to Deep Learning for Image Classification | Cindy Gonzales

Image classification is a task in the Computer Vision domain that takes in an image as input and outputs a label for that image. Deep learning is the most effective modern method for modeling this task. In this interactive workshop, we will walkthrough a Jupyter Notebook which will overview how to perform multi-class image classification in Python using the PyTorch library. The intention is to give the audience a broad overview of this task of classification and inspire participants to explore the vast fields of visual recognition and computer vision at large.

This workshop was conducted by Cindy Gonzales, Data Science Team Lead for the Biosecurity and Data Science Applications Group at Lawrence Livermore National Laboratory

Useful resources for this workshop:
– https://bit.ly/deep_learning_files
– https://bit.ly/deep_learning_notebook

admin In Algorithms, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Counterfactual Explanations: The Future of Explainable AI | Aviv Ben Arie

As data scientists, the ability to understand our models’ decisions is important, especially for models that could have a high impact on people’s lives. This may pose several challenges, as most models used in the industry are not inherently explainable. Today, the most popular explainability methods are SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanation). Each method offers convenient APIs, backed by solid mathematical foundations, but falls short in intuitiveness and actionability.

In this workshop/article, I will introduce a relatively new model explanation method – Counterfactual Explanations (CFs). CFs are explanations based on minimal changes to a model’s input features that lead the model to output a different (mostly opposite) predicted class. CFs have been shown to be more intuitive for humans to comprehend and provide actionable feedback, compared to traditionalSHAP and LIME methods. I will review the challenges in this novel field (such as how to ensure that the CF proposes changes which are feasible), provide a birds-eye view of the latest research and give my perspective, based on my research in collaboration with Tel Aviv University, on the various aspects in which CFs can transform the way data science practitioners understand their ML models.

This workshop was conducted by Aviv Ben Arie, Data Science Manager at Intuit

admin In Data Science as a Career, Software Design and Engineering, Values

Effective & Ideal Data Presentation using Visualization Techniques & Lucid Perceptions | Pariza

Research proves that the human brain processes visualizations better than text. And data visualizations prove that further.

Data visualization is the last phase in the data life cycle. It is the art and science of making data easy to understand and consume for the end user. Data visualizations present clusters of data in an easy-to-understand layout and that’s the reason it becomes mandatory for large amounts of complex data. Ideal data visualization shows the right amount of data, in the right order, in the right visual form, to convey the high priority information to the right audience and for the right purpose. If the data is presented in too much detail, then the consumer of that data might lose interest and the insight.

There are innumerable types of visual graphing techniques available for visualizing data. The right visualization arises from an understanding of the totality of the situation in context of the business domain’s functioning, consumers’ needs, nature of data, and the appropriate tools and techniques to present data. Ideal data visualization should tell a true, complete and simple story backed by data effectively, while keeping it insightful and engaging.

This workshop was conducted by Pariza Kamboj, Professor at Sarvajanik College of Engineering & Technology (SCET).

Useful resources for this workshop:
– Workshop #1: https://youtu.be/lRBuknaPRNI
– Jupyter code: https://bit.ly/jupyter_notebook2
– https://bit.ly/cars3_data
– https://bit.ly/execution_google_colab
– https://bit.ly/anaconda_installation_…

admin In Algorithms, Data Generation/Collection, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Open-sourced Propensity Model Package: From Modeling to Activation (Workshop #2) | Google

A propensity model attempts to estimate the propensity (probability) of a behavior (e.g., conversion, churn, purchase, etc.) happening during a well-defined time period into the future based on historical data. It is a widely used technique by organizations or marketing teams for providing targeted messages, products or services to customers. This workshop shares an open-sourced package developed by Google, for building an end-to-end Propensity Modeling solution using datasets like GA360, Firebase or CRM and using the propensity predictions to design, activate and measure the impact of a media campaign. The package has enabled companies from e-commerce, retail, gaming, CPG and other industries to make accelerated data-driven marketing decisions.

This workshop was conducted by Lingling Xu, Bingjie Xu, Shalini Pochineni and Xi Li, data scientists on the Google APAC team.

Useful resources for this workshop:
– Workshop #1: https://youtu.be/rQhQca8RCuM
– https://bit.ly/propensity_modeling_pa…
– https://bit.ly/bigquery_export_schema
– https://bit.ly/ga_sample_dataset
– https://bit.ly/ml_windowing_pipeline

admin In Algorithms, Data Generation/Collection, Data Wrangling, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Predicting customer choice: A case study on integrating AI within a discrete choice model | Kathryn

Neural networks have been widely celebrated for their power to solve difficult problems across a number of domains. We explore an approach for leveraging this technology within a statistical model of customer choice. Conjoint-based choice models are used to support many high-value decisions at GM. In particular, we test whether using a neural network to model customer utility enables us to better capture non-compensatory behavior (i.e., decision rules where customers only consider products that meet acceptable criteria) in the context of conjoint tasks. We find the neural network can improve hold-out conjoint prediction accuracy for synthetic respondents exhibiting non-compensatory behavior only when trained on very large conjoint data sets. Given the limited amount of training data (conjoint responses) available in practice, a mixed logit choice model with a traditional linear utility function outperforms the choice model with the embedded neural network.

This workshop was conducted by Kathryn Schumacher, Staff Researcher in the Advanced Analytics Center of Expertise within General Motor’s Chief Data and Analytics Office.

admin In Algorithms, Data Generation/Collection, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Basic to Intermediate Level SQL | Sreelaxmi Chakkadath

The workshop would focus on the basic to intermediate levels of SQL. We will start with querying a database, using filters to clean the data. Joining different tables. Aggregate functions and use of ‘CASE WHEN’ for better query performances. Subqueries and Common Table Expressions (CTEs) and a comparison between them. Use of window functions. Lead and lag functions and the scenarios when they can be used. Pivot tables and when not to use them!

This workshop was conducted by Sreelaxmi Chakkadath, Data Science Master’s student at Indiana University Bloomington.

Useful resources for this workshop:
– PostgreSQL install link: https://www.postgresql.org/
– https://bit.ly/sql_workshop_script
– https://bit.ly/sql_workshop_codes
– https://bit.ly/sql_ppt_slides

admin In Algorithms, Data Science as a Career, Data Wrangling, Software Design and Engineering, Values

Demystifying Data Pre-processing & Data Wrangling for Data Science | Pariza Kamboj

In the current era, Data Science is rapidly evolving and proving very decisive in ERP (Enterprise Resource Planning). The dataset required for building the analytical model using data science, is collected from various sources such as Government, Academic, Web Scraping, API’s, Databases, Files, Sensors and many more. We cannot use such real-world data for analysis process directly because it is often inconsistent, incomplete, and more likely to contain bulk errors. We often hear the phrase “garbage in, garbage out”. Dirty data or messy data riddled with inaccuracies and errors, result in a bad/improperly trained model which in turn might result in poor business decisions and sometimes even hazardous to the domain. Any powerful algorithm is failed in providing correct analysis when applied to bad data. Therefore, data must be curated, cleaned and refined to be used in data science and products based on data science. To perform these tasks, “Data Preparation” is required which includes two methods that are: Data Pre-processing, and Data Wrangling. Most data scientists spend the majority of their time in data preparation.

This workshop was conducted by Pariza Kamboj, Professor at Sarvajanik College of Engineering & Technology (SCET).

Useful resources for this workshop:
– https://bit.ly/jupyter_code
– https://bit.ly/cars3_dataset
– https://bit.ly/execution_google_colab
– https://bit.ly/anaconda_installation_…

admin In Data Science as a Career, Software Design and Engineering

Closing Address | Rukmini Iyer | Microsoft | WiDS 2022

Rukmini Iyer, Distinguished Engineer, Corporate Vice President at Microsoft, delivers the Closing Address at the WiDS Worldwide conference.

Rukmini runs Microsoft Advertising�s engineering organization, managing a multi-billion dollar marketplace globally.

admin In Algorithms, Datathon, Domains, Software Design and Engineering

A Beginner’s Tutorial for the WiDS Datathon 2022 challenge

two methodologies that you might consider when deciding how to develop a model.

Climate change is one of the critical challenges facing humanity today. Over the past few years, there have been widespread climate-driven disruptive events such as floods and wildfires. The devastation caused by these events has resulted in an awareness of the urgency of the issue. Indeed, people and governments have started working together in the direction of climate-focused coordinated action. At WiDS, we believe that it will be important for future data scientists to gain familiarity with mathematical and statistical models used to model climate data. For this reason, the focus of the WiDS Datathon this year is a climate-focused challenge: prediction of building energy consumption.

Misha Suresh In Analytics, Data Generation/Collection, Data Science as a Career, Datathon, Software Design and Engineering

Announcing the 5th Annual WiDS Datathon 2022 Challenge: Using Data Science to Mitigate Climate Change

The 5th Annual Women in Data Science (WiDS) Datathon will launch in January, in advance of the WiDS Worldwide Conference to be held on March 7th, 2022. This year’s WiDS Datathon, organized...

admin In Algorithms, Data Generation/Collection, Foundations (Mathematics/Statistics), Software Design and Engineering

AI & Neuroscience: Combining Real-Time Brain Imaging and Machine Learning | Romy Lorenz

Cognitive neuroscientists are often interested in broad research questions, yet use overly narrow experimental designs by considering only a small subset of possible experimental conditions. This limits the generalizability and reproducibility of many research findings. In this workshop, I present an alternative approach, “The AI Neuroscientist”, that resolves these problems by combining real-time brain imaging with a branch of machine learning, Bayesian optimization. Neuroadaptive Bayesian optimization is an active sampling approach that allows to intelligently search through large experiment spaces with the aim to optimize an unknown objective function. It thus provides a powerful strategy to efficiently explore many more experimental conditions than is currently possible with standard brain imaging methodology. Alongside methodological details on non-parametric Bayesian optimization using Gaussian process regression, I will present results from a clinical study where we applied the method to map cognitive dysfunction in stroke patients. Our results demonstrate that this technique is both feasible and robust also for clinical cohorts. Moreover, our study highlights the importance of moving beyond traditional ‘one-size-fits-all’ approaches where patients are treated as one group. Our approach can be combined with brain stimulation or other therapeutics, thereby opening new avenues for precision medicine targeting a diverse range of neurological and psychiatric conditions.

In this workshop, we focus on temporal domain from perspective of both traditional recommender systems and deep neural networks. We first start with the classic latent factor model. We introduce temporal dynamics in the latent factor model and show how this improves performance. We then move into sequential modelling using deep neural networks by presenting state-of-the-art in the field and discuss the advantages and disadvantages.

This workshop was conducted by Romy Lorenz, Postdoctoral Fellow at Stanford University and University of Cambridge

admin In Algorithms, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Bayesian Machine Learning & Sampling Methods | Walmart

Thumbnail for Actionable Ethics for Data Scientists | Emily Miller

In this workshop, you will learn about the core concepts of BML – how it is different from the frequentist approaches, building blocks of Bayesian inference and what known ML techniques look like in a bayesian set-up. You will also learn how to use various sampling techniques for bayesian inference and why we need such techniques in the first place. The workshop will also provide links and materials to continue your Bayesian journey afterwards.

This workshop is meant as an introduction to select BML modules – we strongly recommend you to continue exploring the world of bayesian once you have taken this first step.

This workshop was conducted by Ashwini Chandrashekharaiah & Debanjana Banerjee at Walmart Global Tech.

admin In Algorithms, Data Generation/Collection, Software Design and Engineering, Values

Recommender Systems | Walmart

Thumbnail for Spelling Correction for 100+ Languages | Jingwen Lu

Recommender systems are playing a major role in e-commerce industry. They are keeping users engaged by recommending relevant content and have a significant role in driving digital revenue.

Following tremendous gains in computer vision and natural language processing with deep neural networks in the past decade, the recent years have seen a shift from traditional recommender systems to deep neural network architectures in research and industry.

This workshop was conducted by Aleksandra Cerekovic & Selene Xu at Walmart Gobal Tech.

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Do You See What I See: Exploration of Using AI and AR | MathWorks

Welcome to the world of artificial intelligence (AI) and augmented reality (AR)! This workshop explains AI and AR via hands on exercises where you will interact with your augmented world. You will learn about applications where the technologies of AI+AR are combined, their limitations, and their impacts in society. You’ll leave armed with code, inspiration, and an ethical framework for your own projects!

Artificial intelligence (AI) is used in a variety of industries for many applications. AI can be combined with other technologies to assist with understanding implications of certain aspects of applications. In this workshop, you explore how pose estimation results implemented using Deep Learning are impacted based on a location which is provided using augmented reality. These combined technologies provide insight into how poses could be interpreted differently based on a scene. This workshop also raises awareness regarding consequences of using AI for applications that are different from its originally intended use, which could lead to both technical and ethical challenges.

Specific topics that will be covered in this workshop are listed below:
• understand how AI and AR can be used for applications
• explore how to implement AI and AR
• discover what tools can be used to implement AI and AR
• review code that implements pose estimation using AI and changing background scenes using AR
• gain guidance regarding challenges to address societal impacts of the results from applications that use AI and AR

In addition to receiving an overview of terminology and an understanding of the workflows for each topic, code will be provided to demonstrate how to implement these workflows with tools from MathWorks.

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

admin In Algorithms, Hardware, Software Design and Engineering

Hands On Deep Learning and IoT Workshop | Mathworks

Thumbnail for Design Thinking for Data Science Problems | Sita Syal

In this workshop, we engage beginner and intermediate participants interested in getting started with Deep Learning and the Internet of Things (IoT). We’ll do hands-on exercises where you’ll use a webcam and a neural network to recognize images, aggregate data, and run real-time IoT analytics. Our goal is to get you excited about IoT and Deep Learning, and to set you up for success with various types of projects for work, school, and beyond.

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

admin In Algorithms, Data Generation/Collection, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Natural Language Processing | Riyanka Bhowal, Walmart

Thumbnail for Automating Machine Learning | Madeleine Udell

Natural language processing has direct real-world applications, from speech recognition to automatic text generation, from lexical semantics understanding to question answering. In just a decade, neural machine learning models became widespread, largely abandoning the statistical methods due to its requirement of elaborate feature engineering. Popular techniques include use of word-embeddings to capture semantic properties of words. In this workshop, we take you through the ever-changing journey of neural models while addressing their boons and banes.

The workshop will address concepts of word-embedding, frequency-based and prediction-based embedding, positional embedding, multi-headed attention and application of the same in unsupervised context.

This workshop was conducted by Riyanka Bhowal, Senior Data Scientist at Walmart Gobal Tech.

admin In Algorithms, Data Science as a Career, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Using Natural Language Processing to Analyze US History Textbooks

Thumbnail for An introduction to Data Mesh | Zhamak Dehghani

In this workshop, Dora Demszky, a Stanford PhD student, illustrates how natural language processing (NLP) can be used to answer social science questions. The workshop will focus on applying NLP to analyze the content of 15 US history textbooks used in Texas, to analyze the representation of historically marginalized people and groups.

The workshop is based on a paper (https://journals.sagepub.com/doi/pdf/…) that also has an associated toolkit, and it will provide examples of how this toolkit can be used using a Jupyter notebook that will be made available.

admin In Algorithms, Data Wrangling, Software Design and Engineering

Graph Theory for Data Science, Part I: What is a graph and What Can We Do With It?

Thumbnail for Evolution of Applied Recommender Systems | Walmart

Graph theory provides an effective way to study relationships between data points, and is applied to everything from deep learning models to social networks. This workshop is part I in a series of three workshops. Throughout the series we will progress from introductory explanations of what a graph is, through the most common algorithms performed on graphs, and end with an investigation of the attributes of large-scale graphs using real data.

And in particular for Part I:
Graphs are structures that represent pairwise connections, and are used for everything from finding the shortest route between two locations to google’s page rank algorithm. Are you interested in learning about graph theory but don’t know where to start? In this workshop we will introduce graphs, develop comfort with their associated terminology, and investigate real-world applications with a focus on intuitive explanations and examples.

This workshop was conducted by Stanford ICME PhD student, Julia Olivieri.

admin In Algorithms, Data Generation/Collection, Foundations (Mathematics/Statistics), Hardware, Software Design and Engineering, Values

Pocket AI and loT, or How to be a Data Scientist using your Mobile Device | Mathworks

Thumbnail for Tackling the WiDS Datathon Challenge 2021 | Usha Rengaraju

Want to learn more about trends like AI, IoT and wearable tech? In less than one hour, we will cut through the hype by building a “smart” fitness tracker using your own mobile device.

We’ll do hands-on exercises: you’ll acquire data from sensors, design a step counter and train a human activity classifier. You will leave motivated and ready to use machine learning and sensors in your own projects!

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

admin In Algorithms, Data Wrangling, Hardware, Software Design and Engineering

Parallel Computing 101: All you need to know about the hardware that powers data science | WiDS 2021

Cindy Orozco Bohorquez, Ph.D. Candidate at Stanford hosts a workshop on ‘Parallel Computing 101: All you need to know about the hardware that powers data science’.

admin In Algorithms, Data Science as a Career, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Actionable Ethics for Data Scientists | Emily Miller

Emily Miller, Senior Data Scientist at Drivendata.org hosts a workshop on ‘Actionable Ethics for Data Scientists’ in which she illustrates the different types of ethical concerns that arise in the course of data science work, grounding these in concrete examples of times where things have gone wrong.

admin In Algorithms, Data Generation/Collection, Data Wrangling, Foundations (Mathematics/Statistics), Software Design and Engineering

Machine Learning for Scientific R&D: Why it’s Hard and Why it’s Fun | Julia Ling

Julia Ling, CTO at Citrine Informatics hosts a workshop on ‘Machine Learning for Scientific R&D: Why it’s Hard and Why it’s Fun’ in which she covers some of the key challenges in machine learning for R&D applications: the small, often-messy, sample-biased datasets; the exploratory nature of scientific discovery; and the curious, hands-on approach of scientific users. Julia discusses potential solutions to these challenges, including transfer learning, integration of scientific domain knowledge, uncertainty quantification, and machine learning model interpretability.

admin In Algorithms, Data Wrangling, Foundations (Mathematics/Statistics), Software Design and Engineering

An introduction to Data Mesh | Zhamak Dehghani

Zhamak Dehghani, Director, Emerging Technologies, North America at Thoughtworks hosts a workshop on ‘An introduction to Data Mesh: a paradigm shift in analytical data management’ in where Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse. She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.

admin In Algorithms, Data Generation/Collection, Data Wrangling, Software Design and Engineering, Values

Data Processing & Statistical Models to Impute Missing Perpetrator Information | HRDAG

Megan Price, Executive Director and Maria Gargiulo, Statistician with Human Rights Data Analysis Group (HRDAG) host a workshop on ‘Data Processing and Statistical Models to Impute Missing Perpetrator Information’ where they use methods from statistics and computer science to help answer questions about mass violence using incomplete and unrepresentative datasets from the context in which HRDAG works and how open-source tools are crucial to their analytical projects.

admin In Algorithms, Software Design and Engineering, Values

Meet the Speaker: Dina Machuve moderated by Mahadia Tunga | WiDS 2021

Have an opportunity to Meet-the-Speakers from WiDS Worldwide! Speaker Dina Machuve, Lecturer and Researcher at Nelson Mandela African Institution of Science and Technology is interviewed by Mahadia Tunga, Co Founder and Director Data Science, Research and Capacity Development of Tanzania Data Lab

admin In Algorithms, Data Generation/Collection, Software Design and Engineering, Values

Meet the Speaker: Fatima Abu Salem moderated by Lama Moussawi | WiDS 2021

Have an opportunity to Meet-the-Speakers from WiDS Worldwide! Speaker Fatima Abu Salem, Associate Professor at the American University of Beirut is interviewed by Lama Moussawi, Associate Dean for Research and Faculty Development at the American University of Beirut.

admin In Algorithms, Data Generation/Collection, Foundations (Mathematics/Statistics), Hardware, Software Design and Engineering

Meet the Speaker: Maria Schuld moderated by Margot Gerritsen | WiDS 2021

Have an opportunity to Meet-the-Speakers from WiDS Worldwide! Speaker Maria Schuld, Senior Researcher at University of KwaZulu-Natal is interviewed by Margot Gerritsen, Professor at Stanford University.

admin In Algorithms, Data Science as a Career, Software Design and Engineering, Values

Timnit Gebru | Understanding the Limitations of AI | WiDS Stanford 2019

Best of WiDS features Timnit Gebru on her talk ‘Understanding the Limitations of AI: When Algorithms Fail’ from WiDS Stanford 2019!

admin In Data Science as a Career, Foundations (Mathematics/Statistics), Software Design and Engineering

Megan Price | Machine Learning to Determine How Many Have Been Killed in Syria | Stanford 2017

Best of WiDS features Megan Price on her talk ‘Machine Learning to Determine How Many People Have Been Killed in Syria’ from Stanford 2017!

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Data Wrangling, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Stanford Ethics Panel: Lynn Kirabo, Lucy Bernholz and Aslihan Dermikaya | Stanford 2020

Thumbnail for Stanford Ethics Panel: Lynn Kirabo

Best of WiDS features Stanford’s Ethics Panel from Stanford 2020!

admin In Algorithms, Data Science as a Career, Data Wrangling, Software Design and Engineering, Values

Madeleine Udell | Filling in Missing Data with Low Rank Models | WiDS Stanford 2019

Best of WiDS features Madeleine Udell on her talk ‘Filling in Missing Data with Low Rank Models’ from WiDS Stanford 2019!

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Emily Glassberg-Sands | Data Science for Unlocking Teaching & Learning at Scale | WiDS Stanford 2019

Best of WiDS features Emily Glassberg-Sands on her talk ‘Data Science for Unlocking Teaching & Learning at Scale’ from WiDS Stanford 2019!

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Foundations (Mathematics/Statistics), Software Design and Engineering

Panel: Energy and Sustainability | Rosalind Archer, Xin Ma, Lesly Goh, Nida Rizwan Farid | WiDS 2021

Thumbnail for Panel: Energy and Sustainability | Rosalind Archer

Panel Discussion on ‘Energy and Sustainability’

Moderator: Rosalind Archer, Professor, University of Auckland
Panelists:
-Xin Ma, Managing Director, Asia Platform, TOTAL
-Lesly Goh, Senior Fellow, National University of Singapore Lee Kuan Yew School of Public Policy
-Nida Rizwan Farid, Aerospace Engineer and Energy Efficiency Consultant, Save Joules

admin In Algorithms, Foundations (Mathematics/Statistics), Hardware, Software Design and Engineering, Values

Tech Talk: Machine learning with quantum computers | Maria Schuld | WiDS 2021

Maria Schuld, Senior Researcher at Xanadu, and the University of KwaZulu-Natal provides an overview of quantum machine learning research and illustrate that quantum algorithms can be trained like neural nets, but look formally very similar to kernel methods.

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Data Wrangling, Software Design and Engineering, Values

Why a World with AI Needs More EQ | Tsu-Jae King Liu | WiDS 2020

Tsu-Jae King Liu, Dean of Berkeley School of Engineering at University of California, Berkeley delivers a Keynote presentation at WiDS Stanford University on March 2, 2020:

Today we live in a dynamic and unpredictable world that is increasingly dependent on engineered devices, processes and systems. A 2017 workforce report by the McKinsey Global Institute indicates that all workers will need to adapt as their occupations evolve with increasingly capable machines. In the age of artificial intelligence (AI) and data science, workers will spend more time on activities that require social and emotional skills, creativity, high-level cognitive capabilities and other skills that are relatively hard to automate.

There is growing evidence of the importance of a high emotional quotient (EQ) as a predictor of success and organizational performance. In this talk, Professor Liu will share insights gained from her personal career journey and describe initiatives being undertaken in the College of Engineering at the University of California, Berkeley to cultivate EQ in their students and to advance equity and inclusion, toward a brighter future for all.

admin In Algorithms, Foundations (Mathematics/Statistics), Software Design and Engineering, Values

Infusing Structure into Machine Learning Algorithms | Anima Anandkumar | WiDS 2019

Anima Anandkumar, Professor of Computing and Mathematical Sciences at CalTech and Director of Research in Machine Learning, NVIDIA.

Standard deep-learning algorithms are based on a function-fitting approach that do not exploit any domain knowledge or constraints. This makes them unsuitable in applications that have limited data or require safety or stability guarantees, such as robotics. By infusing structure and physics into deep-learning algorithms, we can overcome these limitations. There are several ways to do this. For instance, we use tensorized neural networks to encode multidimensional data and higher-order correlations. We infuse symbolic expressions into deep learning to obtain strong generalization. We utilize spectral normalization of neural networks to guarantee stability and apply it to stable landing of quadrotor drones. These instances demonstrate that building structure into ML algorithms can lead to significant gains.

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Data Wrangling, Software Design and Engineering, Values

Janet George, Western Digital | WiDS 2019

Janet George, “Fellow” Chief Data Officer/Scientist/Big Data/Cognitive Computing, Western Digital sits down with Lisa Martin at Stanford University for WiDS 2019.

#WiDS2019 #WesternDigital #theCUBE

https://siliconangle.com/2019/03/07/q…

Q&A: How AI is cultivating a responsible community to better mankind

Artificial intelligence initiatives powered by big data are propelling businesses beyond the capacity of human labor. While AI tech offers an undeniable opportunity for innovation, it has also sparked a debate around potential misuse through the vast reach of programmed biases and other problematic behaviors.

The power of AI can be comprehensively harnessed for good by fostering diverse teams focused on ethical solutions and working in tandem with policymakers to ensure responsible scale, according to Janet George (pictured), fellow and chief data officer at WD, a Western Digital Company.

George spoke with Lisa Martin (@LisaMartinTV), host of theCUBE, SiliconANGLE Media‚Äôs mobile livestreaming studio, during the Stanford Women in Data Science event in Stanford, California. They discussed the range of possibilities in AI and how WD is leveraging the technology toward sustainability.

[Editor‚Äôs note: The following answers have been condensed for clarity.]

Tell us about Western Digital‚Äôs continued sponsorship and what makes this important to you.

George: Western Digital has recently transformed itself ‚Ä¶ and we are a data-driven ‚Ä¶ data-infrastructure company. This momentum of AI is a foundational shift in the way we do business. Businesses are realizing that they‚Äôre going to be in two categories, the ‚Äòhave‚Äô and the ‚Äòhave not.‚Äô In order to be in the have category, you have to embrace AI ‚Ä¶ data ‚Ä¶ [and] scale. You have to transform yourself to put yourself in a competitive position. That‚Äôs why Western Digital is here.

How has Western Digital transformed to harness AI for good?

George: We are not just a company that focuses on business for AI. One of the initiatives we are doing is AI for Good and ‚Ä¶ Data for Good ‚Ä¶ working with the UN. We‚Äôve been focusing on trying to figure out the data that impacts climate change. Collecting data and providing infrastructure to stow massive amounts of species data in the environment that we‚Äôve never actually collected before. Climate change is a huge area for us, education ‚Ä¶ [and] diversity. We‚Äôre using all of these areas as a launching pad for Data for Good and trying to use data ‚Ä¶ and AI to better mankind.

Now we have the data to put out massively predictive models that can help us understand what the change would look like 25 years from now and take corrective action. We know carbon emissions are causing very significant damage to our environment and there‚Äôs something we can do about it. Data is helping us do that. We have the infrastructure, economies of scale. We can build massive platforms that can stow this data and then we can analyze this data at scale. We have enough technology now to adapt to our ecosystem ‚Ä¶ and be better in the next 10 years.

What are your thoughts on data scientists taking something like a Hippocratic Oath to start owning accountability for the data that they‚Äôre working with?

George: We need a diversity of data scientists to have multiple models that are completely diverse, and we have to be very responsible when we start to create. Creators have to be responsible for their creation. Where we get into tricky areas are when you are the human creator of an AI model, and now the AI model has self-created because it has self-learned. Who owns the copyright to those when AI becomes the creator? The group of people that are responsible for creating the environment, creating the models, the question comes into how do we protect the authors, the users, the producers, and the new creators of the original piece of art.

You can use the creation for good or bad. The creation recreates itself, like AI learning, on its own with massive amounts of data after an original data scientist has created the model. Laws have to change; policies have to change. Innovation has to go, and at the same time, we have to be responsible about what we innovate.

Where are we as a society in starting to understand the different principles and practices that have to be implemented in order for proper management of data to enable innovation?

George: We‚Äôre debating the issues. We‚Äôre coming together as a community. We‚Äôre having discussions with experts. What are we seeing as the longevity of that AI model in a business setting, in a non-business setting? How does the AI perform? We are now able to see the sustained performance of the AI model.

…

Watch the complete video interview below, and be sure to check out more of SiliconANGLE‚Äôs and theCUBE‚Äôs coverage of the Stanford Women in Data Science event.

admin In Data Science as a Career, Software Design and Engineering

Latanya Sweeney, Harvard University | WiDS 2018

Interview with Latanya Sweeney, Professor of Government and Technology in Residence, Harvard University

admin In Foundations (Mathematics/Statistics), Software Design and Engineering

Machine Learning to Determine How Many People Have Been Killed in Syria | Megan Price | WiDS 2017

The Human Rights Data Analysis Group (HRDAG) uses methods from statistics and computer science to quantify mass violence. As part of that work, we rely on open source tools, including python and R, for data processing, management, analysis, and visualization. This talk will highlight how we use those methods and tools to estimate how many people have been killed in the ongoing conflict in Syria.

Megan Price, Human Rights Data Analysis Group

admin In Algorithms, Data Generation/Collection, Data Science as a Career, Hardware, Software Design and Engineering

Keynote Address | Diane Greene | WiDS 2017

Presented by Diane Greene, Senior Vice President, Google

admin In Algorithms, Software Design and Engineering

The Multi-Facets of a Data Science Project to Answer: How Are Organs Formed | Bin Yu | WiDS 2015

In this talk, I present results from a current project co-led by biologist Erwin Frise from Lawrence Berkeley National Lab (LBNL) to answer the fundamental systems biology question in the talk title. We design a lens to decompose spatial gene expression data into meaningful regions that can be used to, for example, automatically label image pixels and construct local gene networks. Our team consists of statisticians, biologists, and computer scientists, and we are also working on
crispr knock-out experiments and an open source software.

admin In Algorithms, Data Wrangling, Foundations (Mathematics/Statistics), Software Design and Engineering

Big Graph Data Science | Lise Getoor | WiDS 2015

One of the challenges in big data analytics lies in being able to reason collectively about extremely large, heterogeneous, incomplete, and noisy interlinked data. We need data science techniques that can represent and reason effectively with this form of rich and multi-relational graph data. In this talk, I will describe some common inference patterns needed for graph data including: collective classification (predicting missing labels for nodes), link prediction (predicting potential edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe some key capabilities required to solve these problems, and finally I will describe a highly scalable open-source probabilistic programming language being developed within my group to solve these challenges.

admin In Algorithms, Data Generation/Collection, Data Wrangling, Hardware, Software Design and Engineering

Enabling Breakthrough Insights | Diane Bryant | WiDS 2015

The vast ocean of data created in today’s digital world offers enormous potential. However, the key to unlocking that potential lies not in the data itself, but in the science that refines it. The well-defined processes and toolsets designed for legacy BI solutions do not meet the needs of today’s big data analytics environments. Diane will share Intel’s investments in both the technology and the ecosystem to enable the next breakthrough insights.

Software Design and Engineering

Stay Updated

Sign up for the newsletter to stay up-to-date on all the latest events and news from WiDS.