Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 17;10(6):e0128193.
doi: 10.1371/journal.pone.0128193. eCollection 2015.

Computational Fact Checking from Knowledge Networks

Affiliations

Computational Fact Checking from Knowledge Networks

Giovanni Luca Ciampaglia et al. PLoS One. .

Erratum in

Abstract

Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Using Wikipedia to fact-check statements.
(a) To populate the knowledge graph with facts we use structured information contained in the ‘infoboxes’ of Wikipedia articles (in the figure, the infobox of the article about Barack Obama). (b) Using the Wikipedia Knowledge Graph, computing the truth value of a subject-predicate-object statement amounts to finding a path between subject and object entities. In the diagram we plot the shortest path returned by our method for the statement “Barack Obama is a muslim.” Numbers in parentheses indicate the degree of the nodes. The path traverses high-degree nodes representing generic entities, such as Canada, and is assigned a low truth value.
Fig 2
Fig 2. Ideological classification of the US Congress based on truth values.
(a) Ideological network of the 112th US Congress. The plot shows a subset of the WKG constituted by paths between Democratic or Republican members of the 112th US Congress and various ideologies. Red and blue nodes correspond to members of Congress, gray nodes to ideologies, and white nodes to vertices of any other type. The position of the nodes is computed using a force-directed layout [33], which minimizes the distance between nodes connected by an edge weighted by a higher truth value. For clarity only the most significant paths, whose values rank in the top 1% of truth values, are shown. (b) Ideological classification of members of the 112th US Congress. The plot shows on the x axis the party label probability given by a Random Forest classification model trained on the truth values computed on the WKG, and on the y axis the reference score provided by dw-nominate. Red triangles are members of Congress affiliated to the Republican party and blue circles to the Democratic party. Histograms and density estimates of the two marginal distributions, color-coded by actual affiliation, are shown on the top and right axes.
Fig 3
Fig 3. Automatic truth assessments for simple factual statements.
In each confusion matrix, rows represent subjects and columns represent objects. The diagonals represent true statements. Higher truth values are mapped to colors of increasing intensity. (a) Films winning the Oscar for Best Movie and their directors, grouped by decade of award (see the complete list in the S1 Text). (b) US presidents and their spouses, denoted by initials. (c) US states and their capitals, grouped by US Census Bureau-designated regions. (d) World countries and their capitals, grouped by continent.
Fig 4
Fig 4. Receiver Operating Characteristic for the multiple questions task.
For each confusion matrix depicted in Fig 3 we compute ROC curves where true statements correspond to the diagonal and false statements to off-diagonal elements. The red dashed line represents the performance of a random classifier.
Fig 5
Fig 5. Real-world fact-checking scenario.
(a) A document from the ground truth corpus. (b) Statement to fact-check: Did Steve Tesich graduate from Indiana University, Bloomington? This information is not present in the infobox, and thus it is not part of the WKG. (c) Annotations from five human raters. In this case, the majority of raters believe that the statement is true, and thus we consider it as such for classification purposes. (d) Receiver operating characteristic (ROC) curve of the classification for subject-predicate-object statements in which the predicate is “institution” (e.g., “Albert Einstein,” “institution,” “Institute for Advanced Studies”). A true positive rate above the false positive rate (dashed line), and correspondingly an area under the curve (AUC) above 0.5, indicate better than random performance. (e) ROC curve for statements with “degree” predicate (e.g., “Albert Einstein,” “degree,” “University Diploma”).

References

    1. Mendoza M, Poblete B, Castillo C. Twitter Under Crisis: Can We Trust What We RT? In: Proceedings of the First Workshop on Social Media Analytics SOMA’10. New York, NY, USA: ACM; 2010. p. 71–79.
    1. Ratkiewicz J, Conover M, Meiss M, GonÇalves B, Flammini A, Menczer F. Detecting and Tracking Political Abuse in Social Media In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Barcelona, Spain: AAAI; 2011.
    1. Cranor LF, LaMacchia BA. Spam! Commun ACM. 1998. August;41(8):74–83. 10.1145/280324.280336 - DOI
    1. Jagatic TN, Johnson NA, Jakobsson M, Menczer F. Social Phishing. Commun ACM. 2007. October;50(10):94–100. 10.1145/1290958.1290968 - DOI
    1. Friggeri A, Adamic LA, Eckles D, Cheng J. Rumor Cascades. In: Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media. Ann Arbor, MI: AAAI; 2014.

Publication types