Yesterday, Facebook announced its contribution towards MLPerf, a benchmark suite of tests for providing guidelines to measure AI training and inference speed. Facebook also announced that the team is open-sourcing Mask R-CNN2Go, the leading-edge computer vision model optimized for embedded and mobile devices.
Why is Facebook supporting MLPerf?
MLPerf helps in building a common set of industry-wide benchmarks in order to measure the system-level performance of machine learning frameworks, cloud platforms, and hardware accelerators. This benchmark suite covers a set of application use cases such as object detection, image classification, speech to text translation, etc. By developing industry-standard ML models and benchmarks, researchers and engineers will get a chance to evaluate and demonstrate the impact of their work.
How will this collaboration prove to be beneficial?
The Facebook team and MLPerf Edge Inference working group have come together to provide benchmark references, trained with open source data sets for the Edge Inference category. For the image classification use case, they will provide the implementation for the state-of-the-art ShuffleNet model. They will also provide the implementation for the Mask R-CNN2Go model for the pose estimation use case. Representative benchmarks for edge inference use cases will be defined to characterize performance bottlenecks of on-device inference execution.
The human detection and segmentation model, based on the Mask R-CNN framework is a simple, flexible, and general framework for object detection and segmentation. It is used to detect objects in an image while predicting key points and also generating a segmentation mask for each object. To run the Mask R-CNN models in real-time in mobile devices, researchers and engineers from Camera, AML and Facebook AI Research (FAIR) teams came together and built a lightweight framework, Mask R-CNN2Go. Check out the video here.
Mask R-CNN2Go forms the basis of on-device ML use cases such as person segmentation, object detection, classification, and body pose estimation that enables accurate, real-time inference. It is mainly designed and optimized for mobile devices and is used for creating entertaining experiences on mobile devices like, hand tracking in the “Control the Rain” augmented reality (AR) effect in Facebook Camera.
Mask R-CNN2Go model consists of the following components:
- Trunk model: It contains multiple convolutional layers that generate deep feature representations of the input image.
- Region proposal network (RPN): It proposes candidate objects at predefined scales and aspect ratios.
- Detection head: It contains a set of pooling, convolution, and fully-connected layers. The detection head helps in refining the bounding box coordinates and grouping neighboring boxes with non-max suppression.
- Key point head: It helps in predicting a mask for each predefined key point on the body.
Currently, Mask R-CNN2Go runs on Caffe2 but might soon run on PyTorch 1.0 as the machine learning framework is adding more capabilities to provide developers with a seamless path from research to production.
Read more about this news on a post by Facebook.