Data

DoWhy: Microsoft’s new python library for causal inference

2 min read

Microsoft came out with a library, named DoWhy, earlier this week, for promoting widespread use of causal inference. Causal inference refers to the process of drawing a conclusion from a causal connection which is based on the conditions of the occurrence of an effect. Simply put, causal inference attempts to find or guess why something happened.

“DoWhy” is a Python library which is aimed to spark causal thinking and analysis. It provides a unified interface for causal inference methods. There’s also automatic testing of multiple assumptions making the inference accessible to non-experts.

According to Microsoft, “Our motivation for creating DoWhy comes from our experiences in causal inference studies — ranging from estimating the impact of a recommender system to predicting likely outcomes given a life event — we found ourselves repeating the common steps of finding the right identification strategy, devising the most suitable estimator, and conducting robustness checks, all from scratch”.

DoWhy highlights the critical assumptions lying beneath causal inference analysis. It is designed using four major principles:

  1. Model a causal inference problem using assumptions.
  2. Identifying expression for the causal effect (“causal estimand”).
  3. Estimate the expression using statistical methods
  4. Verifying validity of the estimate

How DoWhy works?

First, DoWhy builds an underlying causal graphical model for every problem. This makes each causal assumption explicit. The graph does not have to be complete and you can provide a partial graph which represents prior knowledge about variables. The rest of the variables are automatically considered as potential confounders by DoWhy.

Secondly, DoWhy distinguishes between identification and estimation. Identification of a causal effect refers to assumptions made about the data-generating process along with counterfactual expressions to specifying a target estimand. It uses the Bayesian graphical model framework to represent assumptions formally. Here the users can specify what they know and what they don’t know about the data-generation process. Thirdly, for estimation, there are methods based on the potential-outcomes framework including matching, stratification, and instrumental variables.

Lastly, there are robustness tests along with sensitivity checks for testing or verifying the reliability of an obtained estimate. With this, you can test how the estimate changes with varying assumptions. The library is also capable of automatically checking the validity of obtained estimate depending on assumptions in the graphical model.

DoWhy supports Python 3+ and requires packages such as numpy, scipy, scikit-learn, pandas, pygraphviz (for causal graphs plotting), networkx (for causal graphs analysis), matplotlib (for general plotting), and sympy (for symbolic expressions rendering).

Microsoft plans on adding more features to the DoWhy library. This includes improved estimation support, sensitivity methods and interoperability with available estimation software.

For more information, check out the official DoWhy documentation.

Read Next

Say hello to FASTER: new key-value store for large state management by Microsoft

NIPS 2017 Special: A deep dive into Deep Bayesian and Bayesian Deep Learning with Yee Whye Teh

Microsoft launches free version of its Teams app to take Slack head on

 

Natasha Mathur

Tech writer at the Packt Hub. Dreamer, book nerd, lover of scented candles, karaoke, and Gilmore Girls.

Share
Published by
Natasha Mathur

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago