Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 19;17(7):e0270872.
doi: 10.1371/journal.pone.0270872. eCollection 2022.

A protocol to gather, characterize and analyze incoming citations of retracted articles

Affiliations

A protocol to gather, characterize and analyze incoming citations of retracted articles

Ivan Heibi et al. PLoS One. .

Abstract

In this article, we present a methodology which takes as input a collection of retracted articles, gathers the entities citing them, characterizes such entities according to multiple dimensions (disciplines, year of publication, sentiment, etc.), and applies a quantitative and qualitative analysis on the collected values. The methodology is composed of four phases: (1) identifying, retrieving, and extracting basic metadata of the entities which have cited a retracted article, (2) extracting and labeling additional features based on the textual content of the citing entities, (3) building a descriptive statistical summary based on the collected data, and finally (4) running a topic modeling analysis. The goal of the methodology is to generate data and visualizations that help understanding possible behaviors related to retraction cases. We present the methodology in a structured step-by-step form following its four phases, discuss its limits and possible workarounds, and list the planned future improvements.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
A graphical schema representing the methodology in its four phases (form left to right): (1) identifying, retrieving, and characterizing the citing entities, (2) defining additional features based on the citing entities contents, (3) building a descriptive statistical summary, and (4) applying a topic modeling (TM) analysis.
Fig 2
Fig 2. The decision model for the selection of a CiTO citation function to use for the annotation of the citation intent of an examined in-text citation based on its context.
The first large row contains the three macro-categories: (1) “Reviewing …”, (2) “Affecting …”, and (3) “Referring …”. Each macro-category has at least two subcategories, and each subcategory refers to a set of citation functions. The first row defines the suitable citation functions for it with the help of a guiding sentence to be completed according to the chosen sub-category and citation function.
Fig 3
Fig 3. A graphical representation for the distribution of the citing entities in the PERIOD-SET.
The graphic has two different versions sketched according to the retracted articles categories, i.e. RET-A (A) and RET-B (B). It also highlights the citing entities that have/have not mentioned the retraction, along with the citing entities that do not have an accessible full text.
Fig 4
Fig 4. A pie chart used to represent the distribution of the citing entities across the subject areas.
The chart shows the 10 most representative subject areas and groups the rest under the “Other subject areas” category. The graphic shows also the absolute number of entities for each category, along with the percentages of entities which have/have not mentioned the retraction.
Fig 5
Fig 5. A graphical representation for the distribution of the in-text citations in the PERIOD-SET.
The periods P0, P2, and P4 are split in fifths, while P1 and P3 are represented using in one slice. The graphic has two different based on the categories of the retracted articles, i.e. RET-A (in the top) and RET-B (in the bottom). The graphic also highlights the neutral, negative and positive in-text citations.
Fig 6
Fig 6. A horizontal bar chart representing the distribution of the in-text citations according to their citation functions.
The graphic highlights the negative/neutral/positive percentages of in-text citations and mentions, between brackets, the total number on in-text citations annotated with such a citation function. The length (total percentage value) of the bars is in relation to the total number of in-text citation for the period in the PERIOD-SET shown by the graph.
Fig 7
Fig 7. A horizontal bar chart representing the distribution of the in-text citations according to the section where they appear in.
The graphic highlights the percentages of negative/neutral/positive in-text citations and mentions, between brackets, the total number on in-text citations annotated with such a section. The length (total percentage value) of the bars is in relation to the total number of in-text citation for the period in the PERIOD-SET shown by the graph and is used to sort the values of the sections.
Fig 8
Fig 8. A plot example of the coherence score of different LDA topic models built using a variable number of topics, from 1 to 40.
The orange line is the average value, and it plateaus around 22–23 topics.
Fig 9
Fig 9. The LDAvis interface.
The left side of the visualization plots the topics in a two-dimensional plane whose centers are determined by computing the distance between topics. On the right side LDAvis lists 30 terms ranked using the term saliency measure, this list might show the 30 terms ranked using the relevancy measure of a specific topic if selected from the left graphic.
Fig 10
Fig 10. The MTMvis interface.
On the left side, users can modify some visual and filtering parameters to dynamically change the main visualization. Each topic is colored differently. The chart plots the topics as a function of an established metadata attribute (X-axis values), e.g., the PERIOD-SET.
Fig 11
Fig 11
The MITAO workflow used for building a LDA topic model (i.e., rectangle A) and generating the datasets (rectangle B), and the visualizations (rectangle C). The workflow takes two inputs: the documents, and the metadata of the documents.

References

    1. Teixeira da Silva JA, Dobránszki J. Highly cited retracted papers. Scientometrics. 2017. Mar;110(3):1653–61. doi: 10.1007/s11192-016-2227-4 - DOI
    1. Barbour V, Kleinert S, Wager E, Yentis S. Guidelines for retracting articles. Committee on Publication Ethics; 2009. Sep. doi: 10.24318/cope.2019.1.4 - DOI
    1. Budd JM, Sievert M, Schultz TR. Phenomena of Retraction: Reasons for Retraction and Citations to the Publications. JAMA. 1998. Jul 15;280(3):296. doi: 10.1001/jama.280.3.296 - DOI - PubMed
    1. Lu SF, Jin GZ, Uzzi B, Jones B. The Retraction Penalty: Evidence from the Web of Science. Sci Rep. 2013. Dec;3(1):3146. doi: 10.1038/srep03146 - DOI - PMC - PubMed
    1. Azoulay P, Bonatti A, Krieger JL. The career effects of scandal: Evidence from scientific retractions. Res Policy. 2017. Nov;46(9):1552–69. doi: 10.1016/j.respol.2017.07.003 - DOI