Facebook AI team announced a new tool, called Aroma, last week. Aroma is a code-to-code search and recommendation tool that makes use of machine learning (ML) to simplify the process of gaining insights from big codebases.
Aroma allows engineers to find common coding patterns easily by making a search query without any need to manually browse through code snippets. This, in turn, helps save time in their development workflow. So, in case a developer has written code but wants to see how others have implemented the same code, he can run the search query to find similar code in related projects.
After the search query is run, results for codes are returned as code ‘recommendations’. Each code recommendation is built from a cluster of similar code snippets that are found in the repository.
Aroma is a more advanced tool in comparison to the other traditional code search tools. For instance, Aroma performs the search on syntax trees. Instead of looking for string-level or token-level matches, Aroma can find instances that are syntactically similar to the query code. It can then further highlight the matching code by cutting down the unrelated syntax structures.
How does Aroma work?
Aroma follows a three-step process to make code recommendations, namely, Feature-based search, re-ranking and clustering, and intersecting.
For feature-based search, Aroma indexes the code corpus as a sparse matrix. It parses each method in the corpus and then creates its parse tree. It further extracts a set of structural features from the parse tree of each method.
These features capture information about variable usage, method calls, and control structures. Finally, a sparse vector is created for each method according to its features and then the top 1,000 method bodies whose dot products are highest are retrieved as the candidate set for the recommendation.
In the case of re-ranking and clustering, Aroma first reranks the candidate methods by their similarity to the query code snippet. Since the sparse vectors contain only abstract information about what features are present, the dot product score is an underestimate of the actual similarity of a code snippet to the query. To eliminate that, Aroma applies ‘pruning’ on the method syntax trees. This helps to discard the irrelevant parts of a method body and helps retain all the parts best match the query snippet. This is how it reranks the candidate code snippets by their actual similarities to the query.
Further ahead, Aroma runs an iterative clustering algorithm to find clusters of code snippets similar to each other and consist of extra statements useful for making code recommendations.
In the case of intersecting, a code snippet is taken first as the “base” code and then ‘pruning’ is applied iteratively on it with respect to every other method in the cluster. The remaining code after the pruning process is the code which is common among all methods, making it a code recommendation.
“We believe that programming should become a semiautomated task in which humans express higher-level ideas and detailed implementation is done by the computers themselves”, states Facebook AI team.
For more information, check out the official Facebook AI blog.