Google Cloud announced the release of Feast, a new open source feature store that helps organizations to better manage, store, and discover new features for their machine learning projects, last week.
Feast, a collaboration project between Google Cloud and GO-JEK (an Indonesian tech startup)
is an open, extensible, and a unified platform for feature storage. “Feast is an essential component in building end-to-end machine learning systems at GO-JEK. We’re very excited to release it to the open source community,” says Peter Richens, Senior Data Scientist at GO-JEK.
It has been developed with an aim to find solutions for common challenges faced by Machine Learning Development teams. Some of these common challenges include:
- Machine Learning features not being reused (features representing similar business concepts get redeveloped many times when existing work from other teams could have been reused).
- Feature definitions vary (teams define features differently and many times there is no easy access to the documentation of a feature).
- Hard to serve up-to-date features (teams are hesitant in using real-time data).
- Inconsistency between training and serving (training requires historical data, whereas prediction models require the latest values. When data is broken down into various independent systems, it leads to inconsistencies as the systems then require separate tooling).
Feast gets rid of these challenges by providing teams with a centralized platform that allows teams to easily reuse the features developed by another team across different projects. Also, as you add more features to the store, it becomes cheaper to build models
Apart from that, Feast manages the ingestion of data by unifying it from both batch and streaming sources (using Apache Beam) into the feature warehouse and feature serving stores. Users can then query features in the warehouse using the same set of feature identifiers. It also allows easy access to historical feature data for its users, which in turn, can be used to produce datasets for training models. Moreover, Feast allows teams to capture documentation, metadata and metrics about features, allowing teams to communicate clearly about these features.
Feast aims to be deployable on Kubeflow in the future and would get integrated seamlessly with other Kubeflow components such as a Python SDK for use with Kubeflow’s Jupyter notebooks, and Kubeflow Pipelines. This is because Kubeflow focuses on improving packaging, training, serving, orchestration, and evaluation of models. “We hope that Feast can act as a bridge between your data engineering and machine learning teams”, says the Feast team.
For more information, check out the official Google Cloud announcement.