2 min read

Salesforce has open sourced TransmogrifAI, their end-to-end automated machine learning library for structured data. This library is currently used in production to help power Salesforce Einstein AI platform. TransmogrifAI enables data scientists at Salesforce to transform customer data into meaningful, actionable predictions.  Now, they have open-sourced this project to enable other developers and data scientists to build machine learning solutions at scale, fast.

TransmogrifAI is built on Scala and SparkML that automates data cleansing, feature engineering, and model selection to arrive at a performant model. It encapsulates five main components of the machine learning process:

salesforce TransmogrifAI workflow

Source: Salesforce Engineering

Feature Inference:

TransmogrifAI allows users to specify a schema for their data to automatically extract the raw predictor and response signals as “Features”. In addition to allowing for user-specified types, TransmogrifAI also does inference of its own. The strongly-typed features allow developers to catch a majority of errors at compile-time rather than run-time.

Transmogrification or automated feature engineering:

TransmogrifAI comes with a myriad of techniques for all the supported feature types ranging from phone numbers, email addresses, geo-location to text data. It also optimizes the transformations to make it easier for machine learning algorithms to learn from the data.

Automated Feature Validation:

TransgmogrifAI has algorithms that perform automatic feature validation to remove features with little to no predictive power. These algorithms are useful when working with high dimensional and unknown data. They apply statistical tests based on feature types, and additionally, make use of feature lineage to detect and discard bias.

Automated Model Selection:

The TransmogrifAI Model Selector runs several different machine learning algorithms on the data and uses the average validation error to automatically choose the best one. It also automatically deals with the problem of imbalanced data by appropriately sampling the data and recalibrating predictions to match true priors.

Hyperparameter Optimization:

It automatically tunes hyperparameters and offers advanced tuning techniques.

This large-scale automation has brought down the total time taken to train models from weeks and months to a few hours with just a few lines of code. You can check out the project to get started with TransmogrifAI. For detailed information, read the Salesforce Engineering Blog.

Read Next

Salesforce Spring 18 – New features to be excited about in this release!

How to secure data in Salesforce Einstein Analytics

How to create and prepare your first dataset in Salesforce Einstein

Content Marketing Editor at Packt Hub. I blog about new and upcoming tech trends ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development.

LEAVE A REPLY

Please enter your comment!
Please enter your name here