Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 28;8(7):202321.
doi: 10.1098/rsos.202321. eCollection 2021 Jul.

On the impact of publicly available news and information transfer to financial markets

Affiliations

On the impact of publicly available news and information transfer to financial markets

Metod Jazbec et al. R Soc Open Sci. .

Abstract

We quantify the propagation and absorption of large-scale publicly available news articles from the World Wide Web to financial markets. To extract publicly available information, we use the news archives from the Common Crawl, a non-profit organization that crawls a large part of the web. We develop a processing pipeline to identify news articles associated with the constituent companies in the S&P 500 index, an equity market index that measures the stock performance of US companies. Using machine learning techniques, we extract sentiment scores from the Common Crawl News data and employ tools from information theory to quantify the information transfer from public news articles to the US stock market. Furthermore, we analyse and quantify the economic significance of the news-based information with a simple sentiment-based portfolio trading strategy. Our findings provide support for that information in publicly available news on the World Wide Web has a statistically and economically significant impact on events in financial markets.

Keywords: complex systems; financial markets; machine learning; sentiment analysis; transfer entropy.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The pipeline deployed to process and transform the Common Crawl News dataset into the dataset used by the sentiment model. Each box represents one stage of the pipeline where data transformation and filtering steps are applied. The numbers next to the arrows show how many articles are passed on from one stage to the next. The percentages in the brackets after each filtering step show the proportion of articles removed in that specific step.
Figure 2.
Figure 2.
Process chart of the sentiment model. The two assumptions underlying the sentiment model are depicted in the middle. The data used in fitting the model is shown at the top. We apply the predicted sentiment scores (bottom-left corner) to analyse transfer entropy and simulate several simple trading strategies.
Figure 3.
Figure 3.
Summary of the news dataset used in this article. (a) The most frequently mentioned companies as measured by the number of distinct articles. (b) The most frequent news sources as measured by the number of distinct articles associated with each source. (c) The median number of articles published per company and month. The companies are divided into top and bottom halves by the total number of articles published about them. The shaded regions represent the 25% and 75% percentiles of each half.
Figure 4.
Figure 4.
(a) Companies and corresponding significant Shannon transfer entropy (and effective transfer entropy) from hourly sentiment score differences to hourly price returns. The unit of transfer entropy is bits (logarithm with base 2), corresponding to the reduction of the average optimal code length needed to encode stock returns with lagged sentiment. Transfer entropy was calculated for the period from January 2018 to February 2020 using time series of hourly returns from 9.30 to 15.30 Eastern Time and corresponding lagged average sentiment scores. The statistical significance (p-value < 0.01) of transfer entropy was estimated with 300 bootstrap samples and 100 shuffles to obtain the effective transfer entropy. (b) Box and whisker plots of estimated distributions of the p-values for selected company tickers. The box and whisker plots show Q1, median, Q3, minimum, maximum and estimated outliers.
Figure 5.
Figure 5.
Cumulative returns of trading strategies and benchmarks. ‘Day 1’ represents the cumulative returns of the Day 1 sentiment strategy based on the Common Crawl News dataset from January 2018 to February 2020. SPY is the SPDR S&P 500 trust. ‘Random’ denotes the average of the random strategies along with 1 s.d. confidence bands obtained from 500 simulations. ‘Day 0’ and ‘Day −1’ are the ‘look-ahead’ sentiment strategies relying on future information.

Similar articles

Cited by

References

    1. Bachelier L. 1900. Théorie de la spéculation. Annales scientifiques de l’École Normale Supérieure17, 21–86.
    1. Mandelbrot B. 1963. The variation of certain speculative prices. J. Bus. 36, 394-419. (10.1086/294632) - DOI
    1. Jarrow R, Protter P. 2004. A short history of stochastic integration and mathematical finance: the early years, 1880–1970. In A festschrift for Herman Rubin (ed. A DasGupta), pp. 75–91. Beachwood, OH: Institute of Mathematical Statistics.
    1. Fama EF. 1970. Efficient capital markets: a review of theory and empirical work. J. Finance 25, 383-417. (10.2307/2325486) - DOI
    1. Clark PK. 1973. A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135-155. (10.2307/1913889) - DOI

LinkOut - more resources