. 2022 Jul 19;17(7):e0270872.

doi: 10.1371/journal.pone.0270872. eCollection 2022.

A protocol to gather, characterize and analyze incoming citations of retracted articles

Ivan Heibi^{1

2}, Silvio Peroni^{1

2}

Affiliations

¹ Department of Classical Philology and Italian Studies, Research Centre for Open Scholarly Metadata, University of Bologna, Bologna, Italy.
² Department of Classical Philology and Italian Studies, Digital Humanities Advanced Research Centre (/DH.arc), University of Bologna, Bologna, Italy.

PMID: 35853087
PMCID: PMC9295990
DOI: 10.1371/journal.pone.0270872

A protocol to gather, characterize and analyze incoming citations of retracted articles

Ivan Heibi et al. PLoS One. 2022.

. 2022 Jul 19;17(7):e0270872.

doi: 10.1371/journal.pone.0270872. eCollection 2022.

Authors

Ivan Heibi^{1

2}, Silvio Peroni^{1

2}

Affiliations

¹ Department of Classical Philology and Italian Studies, Research Centre for Open Scholarly Metadata, University of Bologna, Bologna, Italy.
² Department of Classical Philology and Italian Studies, Digital Humanities Advanced Research Centre (/DH.arc), University of Bologna, Bologna, Italy.

PMID: 35853087
PMCID: PMC9295990
DOI: 10.1371/journal.pone.0270872

Abstract

In this article, we present a methodology which takes as input a collection of retracted articles, gathers the entities citing them, characterizes such entities according to multiple dimensions (disciplines, year of publication, sentiment, etc.), and applies a quantitative and qualitative analysis on the collected values. The methodology is composed of four phases: (1) identifying, retrieving, and extracting basic metadata of the entities which have cited a retracted article, (2) extracting and labeling additional features based on the textual content of the citing entities, (3) building a descriptive statistical summary based on the collected data, and finally (4) running a topic modeling analysis. The goal of the methodology is to generate data and visualizations that help understanding possible behaviors related to retraction cases. We present the methodology in a structured step-by-step form following its four phases, discuss its limits and possible workarounds, and list the planned future improvements.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1**
A graphical schema representing the methodology in its four phases (form left to right): (1) identifying, retrieving, and characterizing the citing entities, (2) defining additional features based on the citing entities contents, (3) building a descriptive statistical summary, and (4) applying a topic modeling (TM) analysis.

**Fig 2. The decision model for the selection of a CiTO citation function to use for the annotation of the citation intent of an examined in-text citation based on its context.**
The first large row contains the three macro-categories: (1) “Reviewing …”, (2) “Affecting …”, and (3) “Referring …”. Each macro-category has at least two subcategories, and each subcategory refers to a set of citation functions. The first row defines the suitable citation functions for it with the help of a guiding sentence to be completed according to the chosen sub-category and citation function.

**Fig 3. A graphical representation for the distribution of the citing entities in the PERIOD-SET.**
The graphic has two different versions sketched according to the retracted articles categories, i.e. RET-A (A) and RET-B (B). It also highlights the citing entities that have/have not mentioned the retraction, along with the citing entities that do not have an accessible full text.

**Fig 4. A pie chart used to represent the distribution of the citing entities across the subject areas.**
The chart shows the 10 most representative subject areas and groups the rest under the “Other subject areas” category. The graphic shows also the absolute number of entities for each category, along with the percentages of entities which have/have not mentioned the retraction.

**Fig 5. A graphical representation for the distribution of the in-text citations in the PERIOD-SET.**
The periods P0, P2, and P4 are split in fifths, while P1 and P3 are represented using in one slice. The graphic has two different based on the categories of the retracted articles, i.e. RET-A (in the top) and RET-B (in the bottom). The graphic also highlights the neutral, negative and positive in-text citations.

**Fig 6. A horizontal bar chart representing the distribution of the in-text citations according to their citation functions.**
The graphic highlights the negative/neutral/positive percentages of in-text citations and mentions, between brackets, the total number on in-text citations annotated with such a citation function. The length (total percentage value) of the bars is in relation to the total number of in-text citation for the period in the PERIOD-SET shown by the graph.

**Fig 7. A horizontal bar chart representing the distribution of the in-text citations according to the section where they appear in.**
The graphic highlights the percentages of negative/neutral/positive in-text citations and mentions, between brackets, the total number on in-text citations annotated with such a section. The length (total percentage value) of the bars is in relation to the total number of in-text citation for the period in the PERIOD-SET shown by the graph and is used to sort the values of the sections.

**Fig 8. A plot example of the coherence score of different LDA topic models built using a variable number of topics, from 1 to 40.**
The orange line is the average value, and it plateaus around 22–23 topics.

**Fig 9. The LDAvis interface.**
The left side of the visualization plots the topics in a two-dimensional plane whose centers are determined by computing the distance between topics. On the right side LDAvis lists 30 terms ranked using the *term saliency* measure, this list might show the 30 terms ranked using the *relevancy* measure of a specific topic if selected from the left graphic.

**Fig 10. The MTMvis interface.**
On the left side, users can modify some visual and filtering parameters to dynamically change the main visualization. Each topic is colored differently. The chart plots the topics as a function of an established metadata attribute (X-axis values), e.g., the PERIOD-SET.

**Fig 11**
The MITAO workflow used for building a LDA topic model (i.e., rectangle A) and generating the datasets (rectangle B), and the visualizations (rectangle C). The workflow takes two inputs: the documents, and the metadata of the documents.

See this image and copyright information in PMC

References

1. Teixeira da Silva JA, Dobránszki J. Highly cited retracted papers. Scientometrics. 2017. Mar;110(3):1653–61. doi: 10.1007/s11192-016-2227-4 - DOI
1. Barbour V, Kleinert S, Wager E, Yentis S. Guidelines for retracting articles. Committee on Publication Ethics; 2009. Sep. doi: 10.24318/cope.2019.1.4 - DOI
1. Budd JM, Sievert M, Schultz TR. Phenomena of Retraction: Reasons for Retraction and Citations to the Publications. JAMA. 1998. Jul 15;280(3):296. doi: 10.1001/jama.280.3.296 - DOI - PubMed
1. Lu SF, Jin GZ, Uzzi B, Jones B. The Retraction Penalty: Evidence from the Web of Science. Sci Rep. 2013. Dec;3(1):3146. doi: 10.1038/srep03146 - DOI - PMC - PubMed
1. Azoulay P, Bonatti A, Krieger JL. The career effects of scandal: Evidence from scientific retractions. Res Policy. 2017. Nov;46(9):1552–69. doi: 10.1016/j.respol.2017.07.003 - DOI

MeSH terms

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A protocol to gather, characterize and analyze incoming citations of retracted articles

Affiliations

A protocol to gather, characterize and analyze incoming citations of retracted articles

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources