Data

Salesforce Einstein team open sources TransmogrifAI, their automated machine learning library

2 min read

Salesforce has open sourced TransmogrifAI, their end-to-end automated machine learning library for structured data. This library is currently used in production to help power Salesforce Einstein AI platform. TransmogrifAI enables data scientists at Salesforce to transform customer data into meaningful, actionable predictions.  Now, they have open-sourced this project to enable other developers and data scientists to build machine learning solutions at scale, fast.

TransmogrifAI is built on Scala and SparkML that automates data cleansing, feature engineering, and model selection to arrive at a performant model. It encapsulates five main components of the machine learning process:

Source: Salesforce Engineering

Feature Inference:

TransmogrifAI allows users to specify a schema for their data to automatically extract the raw predictor and response signals as “Features”. In addition to allowing for user-specified types, TransmogrifAI also does inference of its own. The strongly-typed features allow developers to catch a majority of errors at compile-time rather than run-time.

Transmogrification or automated feature engineering:

TransmogrifAI comes with a myriad of techniques for all the supported feature types ranging from phone numbers, email addresses, geo-location to text data. It also optimizes the transformations to make it easier for machine learning algorithms to learn from the data.

Automated Feature Validation:

TransgmogrifAI has algorithms that perform automatic feature validation to remove features with little to no predictive power. These algorithms are useful when working with high dimensional and unknown data. They apply statistical tests based on feature types, and additionally, make use of feature lineage to detect and discard bias.

Automated Model Selection:

The TransmogrifAI Model Selector runs several different machine learning algorithms on the data and uses the average validation error to automatically choose the best one. It also automatically deals with the problem of imbalanced data by appropriately sampling the data and recalibrating predictions to match true priors.

Hyperparameter Optimization:

It automatically tunes hyperparameters and offers advanced tuning techniques.

This large-scale automation has brought down the total time taken to train models from weeks and months to a few hours with just a few lines of code. You can check out the project to get started with TransmogrifAI. For detailed information, read the Salesforce Engineering Blog.

Read Next

Salesforce Spring 18 – New features to be excited about in this release!

How to secure data in Salesforce Einstein Analytics

How to create and prepare your first dataset in Salesforce Einstein

Sugandha Lahoti

Content Marketing Editor at Packt Hub. I blog about new and upcoming tech trends ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development.

Share
Published by
Sugandha Lahoti
Tags: AI News

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago