Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 15:9:e1465.
doi: 10.7717/peerj-cs.1465. eCollection 2023.

A deep learning based approach for extracting Arabic handwriting: applied calligraphy and old cursive

Affiliations

A deep learning based approach for extracting Arabic handwriting: applied calligraphy and old cursive

Saber Zerdoumi et al. PeerJ Comput Sci. .

Abstract

Based on the results of this research, a new method for separating Arabic offline text is presented. This method finds the core splitter between the "Middle" and "Lower" zones by looking for sharp character degeneration in those zones. With the exception of script localization and the essential feature of determining which direction a starting point is pointing, the baseline also functions as a delimiter for horizontal projections. Despite the fact that the bottom half of the characteristics is utilized to differentiate the modifiers in zones, the top half of the characteristics is not. This method works best when the baseline is able to divide features into the bottom zone and the middle zone in a complex pattern where it is hard to find the alphabet, like in ancient scripts. Furthermore, this technique performed well when it came to distinguishing Arabic text, including calligraphy. With the zoning system, the aim is to decrease the number of different element classes that are associated with the total number of alphabets used in Arabic cursive writing. The components are identified using the pixel value origin and center reign (CR) technique, which is combined with letter morphology to achieve complete word-level identification. Using the upper baseline and lower baseline together, this proposed technique produces a consistent Arabic pattern, which is intended to improve identification rates by increasing the number of matches. For Mediterranean keywords (cities in Algeria and Tunisia), the suggested approach makes use of indicators that the correctness of the Othmani and Arabic scripts is greater than 98.14 percent and 90.16 percent, respectively, based on 84 and 117 verses. As a consequence of the auditing method and the assessment section's structure and software, the major problems were identified, with a few of them being specifically highlighted.

Keywords: Pattern Recognition; Recognition.

PubMed Disclaimer

Conflict of interest statement

Noor Zaman Jhanjhi is an Academic Editor for PeerJ. The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Diagram of the four main recognition steps for Arabic recognition systems.
Figure 2
Figure 2. Top row: original Arabic script (unlabeled). Bottom row: Arabic script with labels marking the upper baseline, baseline, and down line of the Arabic cursive writing.
Figure 3
Figure 3. Steps of zone segmentation-based windows segmentation approach.
0, original image; 1, detection of junction and end points; 2, sliding window; 2.1, detection of ligature (applying algorithms 1 and 2); 3, application of algorithm (baseline detection for splitting the image into upper, middle, and lower zones); 4, segments resulting from baseline detection.
Figure 4
Figure 4. Overall vision of proposed approach.
Figure 5
Figure 5. Complex zone segmentation for extracting from Othmani scrpt: (A) original image; (B) detection of junction, end points and sliding window; (C) detection of ligature (application of algorithms 1 and 2).
The figure demostrated the process of employing an advanced deep learning technique to extract complex segments from Othmani script. This involves a three-phase approach denoted as A, B, and C, representing the original image, junction and endpoint detection along with a sliding window, and the identification of ligatures using Algorithms 1 and 2, respectively. The framework aims to perform zone segmentation, allowing direct perception of Arabic word characteristics from calligraphy without necessitating explicit character proportion representation. It enables processing of characters in single or multiple strokes and handles cursive calligraphy independently. Moreover, this strategy defines word boundaries by either utilizing specific shape input signals or recognizing limits based on dialect attributes, dissecting calligraphy through discontinuity, division, function recognition, and dialect visualization, possibly occurring simultaneously. The input comprises Arabic script/Quran in the Othmani style, extracting compressed features and converting them into text values for authenticating Quranic verses using a string matching algorithm proposed by the research team. The recognition system interface (Output (string value)) distinguishes Arabic and Quranic text values, crucial for verse authentication, aligning with research outlined in Hakak et al. (2017).
Figure 6
Figure 6. Plot of SVM classification results of lower and upper zone modifiers.
Figure 7
Figure 7. The comparative training of successive improvements of entire word.

References

    1. Abdelaziz I, Abdou S, Al-Barhamtoshy H. A large vocabulary system for Arabic online handwriting recognition. Pattern Analysis and Applications. 2016;19:1129–1141. doi: 10.1007/s10044-015-0526-7. - DOI
    1. Al-Dmour A, Fraij F. Segmenting Arabic handwritten documents into text lines and words. International Journal of Advancements in Computing Technology. 2014;6(3):109–119.
    1. Al-Ma’adeed S, Elliman D, Higgins CA. A data base for Arabic handwritten text recognition research. Piscataway. Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on 2002.2002. pp. 485–489.
    1. Ali A, Ahmad M, Rafiq N, Akber J, Ahmad U, Akmal S. Language independent optical character recognition for hand written text. Piscataway. Multitopic conference, 2004. Proceedings of INMIC 2004. 8th international.2004. pp. 79–84.
    1. Amin A, Al-Sadoun H, Fischer S. Hand-printed Arabic character recognition system using an artificial network. Pattern Recognition. 1996;29(4):663–675. doi: 10.1016/0031-3203(95)00110-7. - DOI

LinkOut - more resources