Programming

Microsoft Bling introduces Fire: a Finite state machine and regular expression manipulation library

1 min read

A Microsoft team named Bling (Beyond Language Understanding) announced a Finite State machine and regular expression manipulation library called Fire, yesterday.

Fire has been developed to use in case of different linguistic operations inside Bing including Tokenization, Multi-word expression matching, Unknown word-guessing, and Stemming/Lemmatization among others.

Under Fire comes a tokenizer, which has been designed for fast-speed and quality tokenization of Natural Language text. Fire tokenization uses the tokenization logic of NLTK (Natural Language Toolkit), with an exception that hyphenated words can be split and only a few errors can be fixed. Also, when compared with other popular NLP libraries, Bling Fire becomes 10X faster speed in tokenization task.

The latest release of Bling Fire model is enabled to support most languages including East Asian (Chinese Simplified, Traditional, Japanese, Korean, Thai). The tokenizer’s high-level API is friendly to use from languages such as Python, Perl, C#, Java, etc. Also, the tokenizer has been designed in a way that it requires 0 zero configurations, or initialization, or additional files. The reason Tokenizer is very fast is because it makes use of deterministic finite state machines underneath.

In order to use the Bling Fire Library and Finite State Machine manipulation tools, the project can be built on Windows/Linux using CMake, which allows you to create your own tokenization/segmentation, stemming, etc. To use the Bling Fire Library in Python, users can install the release with the help of using: pip install blingfire

For more information, check out Bling Fire on GitHub.

Read Next

Microsoft reveals certain Outlook.com user accounts were hacked for months

Microsoft makes the first preview builds of Chromium-based Edge available for testing

Microsoft announces the general availability of Live Share and brings it to Visual Studio 2019

Natasha Mathur

Tech writer at the Packt Hub. Dreamer, book nerd, lover of scented candles, karaoke, and Gilmore Girls.

Share
Published by
Natasha Mathur

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago