4 min read

Developing Proprietary Models using Natural Language Processing and Machine Learning Strategies

Supplier Spotlight

Brain proprietary models

Crux’s mission is to help data flow efficiently between data suppliers and data consumers, and we look to highlight major trends and developments impacting both parties. Today’s spotlight features Brain, which just released a new alternative dataset “Brain Language Metrics on Company Filings” based on Natural Language Processing analysis of 10-K and 10-Q reports for the largest US stocks. Our Q&A was conducted with Francesco Cricchio, PhD and Matteo Campellone, PhD.

What is Brain?

Brain is a research company that develops proprietary signals and algorithms for investment strategies. Brain also supports clients in developing, optimizing and validating their own proprietary models.

The Brain platform includes Natural Language Processing (NLP) and Machine Learning (ML) infrastructures which enable clients to integrate state-of-the-art approaches into their strategies. All of our software is highly customizable to support the investment approach of our clients.

Our system incorporates alternative data and evaluates its relevance to financial models. 

What are some examples of signals that investors should know about?

Two good examples are the Brain Sentiment Indicator (BSI) and the Brain Machine Learning Stock Ranking (BSR).

The BSI is a sentiment indicator of global stocks produced by an automated engine that scans the financial news flow to gain a deeper understanding of the dynamic factors driving investor sentiment. This indicator relies on various NLP techniques to score financial news by company and extracts aggregated metrics on financial sentiment.

The incorporation of BSI rankings helps clients build quantitative strategies that include both sentiment and short-term momentum indicators. On a longer time horizon, the application of BSI adds value to strategies that seek companies that are under- or over-priced due to very low or very high sentiment. 

The BSR is used to generate a daily stock ranking based on the predicted future returns of a universe of stocks for various time horizons. BSR relies on machine learning classifiers that non-linearly combine a variety of features with a series of techniques aimed at mitigating the well-known overfitting problem for financial data with a low signal to noise ratio. This model uses a dynamic universe that is updated each year to avoid survivorship bias.

The incorporation of BSR enhances quantitative models and long/short strategies by adding a stock ranking that non-linearly combines stock specific market data with market regime indicators and calendar anomalies using advanced ML techniques.

How are you different than other firms? 

We’ve developed a scientific and rigorous approach based on our years of research and our experience implementing statistical models in state-of-the-art software.

We try to be as rigorous as possible in our models, which is especially important in extracting information from financial time-series data, where the signal to noise ratio is very low and overfitting risk is very relevant in validating meaningful signals.

What are some use cases for your data?

We offer a two-fold solution for clients. Financial firms can combine our systematic signals (BSI) with their own proprietary signals or algorithms to create a more complete model and perform back-testing validation. 

Alternatively, clients can come to us as a consultancy to support or validate a specific methodology or create a signal they can backtest for their hypotheses using ML or other advanced statistical techniques.

We also develop proprietary signals based on market and economic factors. One example of this is asset allocation models that try to capture risk-on and risk-off phases in the market.

What trends are you seeing in the market?

There are a number of providers of similar signals today. We see greater adoption of NLP-based sentiment signals being increasingly adopted in the market. Some providers are moving towards offering integrated platforms based on their technology and also using graphical interfaces. Our differentiator is that we are focused on continuously enhancing our algorithms. When deploying our integrated solutions, there is a lot of value in the customization of the product for each client.

Brain’s proprietary NLP algorithm uses semantic rules and dictionary-based approaches by looking at financial news to calculate sentiment on stocks. Beyond traditional sentiment data, we also developed other language metrics — like language complexity in earnings calls or similarity of language in regulatory filings — to investigate the correlation of these metrics with the company’s financial performance. 

Great, so who are your target clientele?

We have two main client groups. Large, global quant hedge funds look to us for our raw datasets. Other investment companies look to us for customized solutions. Based on our platform, we will create and integrate for them.

What are your backgrounds? 

As the Co-Founders of Brain, we share a common background in Physics and research. We focused on nurturing this as a common thread throughout the team. 

Matteo, Executive Chairman and Head of Research worked as a Theoretical Physicist, in the field of statistical mechanics of complex systems and non-linear stochastic equations. After receiving his Ph.D in physics and years dedicated to research, he obtained an MBA at IMD Business School. He then went on to work in various areas of finance, from financial engineering to risk management and investing.

Francesco, CEO and Chief Technology Officer, obtained a Ph.D in Computational Physics with a focus on solving complex computational problems with a wide range of techniques. He then focused on using ML methods and advanced statistics in the industrial space. Francesco’s technological know-how underpins the industrial machine learning solutions we deploy in our robust production environment.

What led you to work with Crux?

We believe that the partnership with Crux is a particularly good fit since Brain develops alternative datasets based on NLP and ML techniques while Crux builds and manages pipelines from data suppliers to its cloud platform. Thus, we are excited that our datasets will be delivered effectively to clients without performing different types of integration procedures for each new client we onboard. We rely on the Crux platform to help us scale our products more efficiently.

Thanks so much! Really enjoyed chatting with you both.

About Supplier Spotlight Series


In our mission to help data flow efficiently between data suppliers and data consumers, we look to highlight major trends and developments impacting both parties. The ‘Supplier Spotlight’ series is an impactful content series focused on sharing the latest developments by suppliers and their datasets delivered by Crux.



To receive these updates, join our community.

What Cloud Marketplaces Do and Don’t Do

What Cloud Marketplaces Do and Don’t Do

Not long ago, we observed here in our blog that the critical insights that drive business value come from data that is both (1) fast and (2) reliable.

Read More
The 3 Dimensions of AI Data Preparedness

The 3 Dimensions of AI Data Preparedness

This past year has been exciting, representing the dawning of a new age for artificial intelligence (AI) and machine learning (ML)—with large...

Read More
How Do Small Hedge Funds Solve the Big Problem of External-Data Integration?

How Do Small Hedge Funds Solve the Big Problem of External-Data Integration?

How do you get white-glove customer service from a major data supplier?

Read More