. 2015 Jun 17;10(6):e0128193.

doi: 10.1371/journal.pone.0128193. eCollection 2015.

Computational Fact Checking from Knowledge Networks

Giovanni Luca Ciampaglia¹, Prashant Shiralkar¹, Luis M Rocha², Johan Bollen¹, Filippo Menczer¹, Alessandro Flammini¹

Affiliations

¹ Center for Complex Networks and Systems Research, Indiana University, Bloomington, Indiana, United States of America.
² Center for Complex Networks and Systems Research, Indiana University, Bloomington, Indiana, United States of America; Instituto Gulbenkian de Ciencia, Oeiras, Portugal.

PMID: 26083336
PMCID: PMC4471100
DOI: 10.1371/journal.pone.0128193

Computational Fact Checking from Knowledge Networks

Giovanni Luca Ciampaglia et al. PLoS One. 2015.

. 2015 Jun 17;10(6):e0128193.

doi: 10.1371/journal.pone.0128193. eCollection 2015.

Authors

Giovanni Luca Ciampaglia¹, Prashant Shiralkar¹, Luis M Rocha², Johan Bollen¹, Filippo Menczer¹, Alessandro Flammini¹

Affiliations

¹ Center for Complex Networks and Systems Research, Indiana University, Bloomington, Indiana, United States of America.
² Center for Complex Networks and Systems Research, Indiana University, Bloomington, Indiana, United States of America; Instituto Gulbenkian de Ciencia, Oeiras, Portugal.

PMID: 26083336
PMCID: PMC4471100
DOI: 10.1371/journal.pone.0128193

Erratum in

Correction: Computational Fact Checking from Knowledge Networks.
Ciampaglia GL, Shiralkar P, Rocha LM, Bollen J, Menczer F, Flammini A. Ciampaglia GL, et al. PLoS One. 2015 Oct 27;10(10):e0141938. doi: 10.1371/journal.pone.0141938. eCollection 2015. PLoS One. 2015. PMID: 26505751 Free PMC article. No abstract available.

Abstract

Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Fig 1. Using Wikipedia to fact-check statements.**
**(a)** To populate the knowledge graph with facts we use structured information contained in the ‘infoboxes’ of Wikipedia articles (in the figure, the infobox of the article about *Barack Obama*). **(b)** Using the Wikipedia Knowledge Graph, computing the truth value of a subject-predicate-object statement amounts to finding a path between subject and object entities. In the diagram we plot the shortest path returned by our method for the statement “*Barack Obama* is a *muslim*.” Numbers in parentheses indicate the degree of the nodes. The path traverses high-degree nodes representing generic entities, such as *Canada*, and is assigned a low truth value.

**Fig 2. Ideological classification of the US Congress based on truth values.**
**(a)** Ideological network of the 112th US Congress. The plot shows a subset of the WKG constituted by paths between Democratic or Republican members of the 112th US Congress and various ideologies. Red and blue nodes correspond to members of Congress, gray nodes to ideologies, and white nodes to vertices of any other type. The position of the nodes is computed using a force-directed layout [33], which minimizes the distance between nodes connected by an edge weighted by a higher truth value. For clarity only the most significant paths, whose values rank in the top 1% of truth values, are shown. **(b)** Ideological classification of members of the 112th US Congress. The plot shows on the x axis the party label probability given by a Random Forest classification model trained on the truth values computed on the WKG, and on the y axis the reference score provided by dw-nominate. Red triangles are members of Congress affiliated to the Republican party and blue circles to the Democratic party. Histograms and density estimates of the two marginal distributions, color-coded by actual affiliation, are shown on the top and right axes.

**Fig 3. Automatic truth assessments for simple factual statements.**
In each confusion matrix, rows represent subjects and columns represent objects. The diagonals represent true statements. Higher truth values are mapped to colors of increasing intensity. **(a)** Films winning the Oscar for Best Movie and their directors, grouped by decade of award (see the complete list in the S1 Text). **(b)** US presidents and their spouses, denoted by initials. **(c)** US states and their capitals, grouped by US Census Bureau-designated regions. **(d)** World countries and their capitals, grouped by continent.

**Fig 4. Receiver Operating Characteristic for the multiple questions task.**
For each confusion matrix depicted in Fig 3 we compute ROC curves where true statements correspond to the diagonal and false statements to off-diagonal elements. The red dashed line represents the performance of a random classifier.

**Fig 5. Real-world fact-checking scenario.**
**(a)** A document from the ground truth corpus. **(b)** Statement to fact-check: *Did Steve Tesich graduate from Indiana University, Bloomington?* This information is not present in the infobox, and thus it is not part of the WKG. **(c)** Annotations from five human raters. In this case, the majority of raters believe that the statement is true, and thus we consider it as such for classification purposes. **(d)** Receiver operating characteristic (ROC) curve of the classification for subject-predicate-object statements in which the predicate is “institution” (e.g., “Albert Einstein,” “institution,” “Institute for Advanced Studies”). A true positive rate above the false positive rate (dashed line), and correspondingly an area under the curve (AUC) above 0.5, indicate better than random performance. **(e)** ROC curve for statements with “degree” predicate (e.g., “Albert Einstein,” “degree,” “University Diploma”).

See this image and copyright information in PMC

References

1. Mendoza M, Poblete B, Castillo C. Twitter Under Crisis: Can We Trust What We RT? In: Proceedings of the First Workshop on Social Media Analytics SOMA’10. New York, NY, USA: ACM; 2010. p. 71–79.
1. Ratkiewicz J, Conover M, Meiss M, GonÇalves B, Flammini A, Menczer F. Detecting and Tracking Political Abuse in Social Media In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Barcelona, Spain: AAAI; 2011.
1. Cranor LF, LaMacchia BA. Spam! Commun ACM. 1998. August;41(8):74–83. 10.1145/280324.280336 - DOI
1. Jagatic TN, Johnson NA, Jakobsson M, Menczer F. Social Phishing. Commun ACM. 2007. October;50(10):94–100. 10.1145/1290958.1290968 - DOI
1. Friggeri A, Adamic LA, Eckles D, Cheng J. Rumor Cascades. In: Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media. Ann Arbor, MI: AAAI; 2014.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 LM011945/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational Fact Checking from Knowledge Networks

Affiliations

Computational Fact Checking from Knowledge Networks

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources