Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 8:47:108960.
doi: 10.1016/j.dib.2023.108960. eCollection 2023 Apr.

HMPLMD: Handwritten Malayalam palm leaf manuscript dataset

Affiliations

HMPLMD: Handwritten Malayalam palm leaf manuscript dataset

B J Bipin Nair et al. Data Brief. .

Abstract

The realization of high recognition rates of degraded documents such as palm leaf manuscripts primarily relies on document enhancement. Advancement of deep learning models in the process of document enhancement plays a major role among non-deep learning models or thresholding methods. Preparation of readily available ground truth data for creation of deep learning models is of paramount importance as it is highly time consuming task. The ground truth dataset preparation involves greater complexities as ancient documents are affected with degradations such as fungi, humidity, uneven illumination, discoloration, holes, cracks, and other damages. We propose a Handwritten Malayalam Palm Leaf Manuscript Dataset (HMPLMD) and its ground truth data aspiring for advancements in the field of palm leaf image analysis. We employ the palm leaf manuscripts of Kambaramayanam and Jathakas for the sake of experimentations. The proposed ground truth samples of degraded palm leaves plays a crucial role in creation of specialized deep/transfer learning models to handle challenges related to binarization.

Keywords: Binarization; Ground truth; Malayalam; Photoshop; Sauvola.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Figures

Fig. 1
Fig. 1
(a) Malayalam vowels and consonants (b) Word level samples.
Fig. 2
Fig. 2
(a) Sample 1- Jathakas (b) Sample 2- Jathakas (c) Sample 3- Jathakas (d) Sample 1-Kambaramayanam (e) Sample 2- Kambaramayanam (f) Sample 3- Kambaramayanam.
Fig. 3
Fig. 3
Degradation's from the documents.
Fig. 4
Fig. 4
Dataset capturing setup.
Fig. 5
Fig. 5
Samples of ground truth images (a) Original image (b) White-balanced image (c) Ground truth image – Final (d) Original Image – Kambaramayanam (e) White-balanced image (f) Ground truth image – Final.

References

    1. Fischer Andreas, Indermühle Emanuel, Bunke Horst, Viehhauser Gabriel, Stolz Michael. Proceedings of the 9th IAPR International Workshop on Document Analysis Systems(DAS '10) Association for Computing Machinery; New York, NY, USA: 2010. Ground truth creation for handwriting recognition in historical documents; pp. 3–10. - DOI
    1. Kesiman M.W.A., Burie J.-C., Wibawantara G.N.M.A., Sunarya I.M.G., Ogier J.-M. 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR) 2016. AMADI_LontarSet: the first handwritten balinese palm leaf manuscripts dataset; pp. 168–173. - DOI
    1. Shobha Rani N., Sajan Jain A., Kiran H.R. In: Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB) Pandian D., Fernando X., Baig Z., Shi F., editors. Vol. 30. Springer; Cham: 2019. A unified preprocessing technique for enhancement of degraded document images. ISMAC 2018. Lecture Notes in Computational Vision and Biomechanics. - DOI
    1. Sauvola J, Pietikainen M. Adaptive document image binarization. Pattern Recognit. 2000;33(2):225–236. doi: 10.1016/S0031-3203(99)00055-2. - DOI
    1. Niblack W. Prentice- Hall; Englewood Cliffs (NJ): 1986. An Introduction to Digital Image Processing.

LinkOut - more resources