Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023;57(1):415-448.
doi: 10.1007/s10579-021-09574-0. Epub 2022 Feb 2.

The ParlaMint corpora of parliamentary proceedings

Affiliations

The ParlaMint corpora of parliamentary proceedings

Tomaž Erjavec et al. Lang Resour Eval. 2023.

Abstract

This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project's GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.

Keywords: Comparable corpora; Parliamentary proceedings; TEI.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Encoding of the start of a corpus header
Fig. 2
Fig. 2
Example of encoding of a legislature taxonomy category
Fig. 3
Fig. 3
Example of parliament, political party and coalition/opposition encoding
Fig. 4
Fig. 4
Example of a speaker encoding
Fig. 5
Fig. 5
Example of encoding the start of a corpus component header
Fig. 6
Fig. 6
Example of encoded text with speeches
Fig. 7
Fig. 7
Example of a linguistically analysed text

References

    1. Bayley P. Introduction: The whys and wherefores of analyzing parliamentary discourse. In: Bayley P, editor. Cross-cultural perspectives on parliamentary discourse. John Benjamins Publishing; 2014. pp. 1–44.
    1. Calabretta, I., Dalton, C., Griscom, R., Kołczyńska, M., Pahor de Maiti, K., & Ros, R. (2021). Parliamentary debates in the COVID times. Retrieved from https://dhhackathon.wordpress.com/2021/05/28/parliamentary-debates-in-th...
    1. Calzada Perez M. Corpus-based methods for comparative translation and interpreting studies: Mapping differences and similarities with traditional and innovative tools. Translation and Interpreting Studies. 2017;12:231–252. doi: 10.1075/tis.12.2.03cal. - DOI
    1. Cheng JE. Islamophobia, Muslimophobia or racism? Parliamentary discourses on Islam and Muslims in debates on the minaret ban in Switzerland. Discourse & Society. 2015;26(5):562–586. doi: 10.1177/0957926515581157. - DOI
    1. Çöltekin, Ç. (2010). A freely available morphological analyzer for Turkish. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC) (pp. 820–827). Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/summaries/109.html

LinkOut - more resources