How can new machine learning capabilities enable the mining of stock documents for financial data?


How can new machine learning capabilities enable the mining of stock documents for financial data?


One of the exciting new frontiers of machine learning and AI is that scientists and engineers are embarking on various ways to use completely new types of resources to predict stock movement and investment outcomes. This is a tremendous game-changer in the financial world, and will revolutionize investment strategies in a very profound way.

One of the basis ideas for expanding this type of stock research is computational linguistics, which involves the modeling of natural language. Experts are investigating how to use text documents, from SEC filings to shareholder letters to other peripheral text-based resources, in order to augment or fine-tune stock analysis or to develop entirely new analyses.

The important disclaimer is that all of this is only made feasible through brand new advances in neural networks, machine learning and natural language analysis. Prior to the advent of ML/AI, computing technologies mostly used linear programming to "read" inputs. Text documents were too highly unstructured to be useful. But with the progress made in natural language analysis within the last few years, scientists are finding that it is possible to "mine" natural language for quantifiable results or, in other words, results that can be computed in some way.

Some of the best evidence and most useful examples of this come from various dissertations and doctoral work available on the web. In a paper, "Applications of Machine Learning and Computational Linguistics in Financial Economics," published April 2016, Lili Gao capably explains relevant processes specific to the mining of corporate SEC filings, shareholder calls, and social media messages.

"Extracting meaningful signals from unstructured and high dimensional text data is not an easy task," Gao writes. "However, with the development of machine learning and computational linguistic techniques, processing and statistically analyzing textual documents tasks can be accomplished, and many applications of statistical text analysis in social sciences have proven to be successful." From Gao's discussion of modeling and calibration in the abstract, the entire developed document shows how some of this type of analysis works in detail.

Other sources for active projects include pages like this GitHub project brief, and this IEEE resource talking specifically about getting valuable financial information from "Twitter sentiment analysis."

The bottom line is that the use of these new NLP models is driving quick innovation in using all sorts of text documents, not just for financial analysis, but for other kinds of cutting-edge discovery, blurring that traditionally established line between "language" and "data."

Have a question? Ask us here.

View all questions from Justin Stoltzfus.

Share this:
Written by Justin Stoltzfus
Profile Picture of Justin Stoltzfus
Justin Stoltzfus is a freelance writer for various Web and print publications. His work has appeared in online magazines including Preservation Online, a project of the National Historic Trust, and many other venues.
 Full Bio