Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep;645(8079):141-147.
doi: 10.1038/s41586-025-09292-5. Epub 2025 Jul 23.

Contextualizing ancient texts with generative neural networks

Affiliations

Contextualizing ancient texts with generative neural networks

Yannis Assael et al. Nature. 2025 Sep.

Abstract

Human history is born in writing. Inscriptions are among the earliest written forms, and offer direct insights into the thought, language and history of ancient civilizations. Historians capture these insights by identifying parallels-inscriptions with shared phrasing, function or cultural setting-to enable the contextualization of texts within broader historical frameworks, and perform key tasks such as restoration and geographical or chronological attribution1. However, current digital methods are restricted to literal matches and narrow historical scopes. Here we introduce Aeneas, a generative neural network for contextualizing ancient texts. Aeneas retrieves textual and contextual parallels, leverages visual inputs, handles arbitrary-length text restoration, and advances the state of the art in key tasks. To evaluate its impact, we conduct a large study with historians using outputs from Aeneas as research starting points. The historians find the parallels retrieved by Aeneas to be useful research starting points in 90% of cases, improving their confidence in key tasks by 44%. Restoration and geographical attribution tasks yielded superior results when historians were paired with Aeneas, outperforming both humans and artificial intelligence alone. For dating, Aeneas achieved a 13-year distance from ground-truth ranges. We demonstrate Aeneas' contribution to historical workflows through analysis of key traits in the renowned Roman inscription Res Gestae Divi Augusti, showing how integrating science and humanities can create transformative tools to assist historians and advance our understanding of the past.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Restoration of a damaged inscription.
Fragment of a bronze military diploma from Sardinia, issued by the emperor Trajan to a sailor on a warship. 113/14 CE (CIL XVI, 60, The Metropolitan Museum of Art, Public Domain).
Fig. 2
Fig. 2. Processing of a textual transcription by the Aeneas architecture.
Processing of the phrase Senatus populusque Romanus (‘The Senate and the people of Rome’) by Aeneas. Given the image and textual transcription of an inscription (with damaged sections of unknown length marked with ‘#’), Aeneas uses a transformer-based decoder (the torso) to process the text. Specialized networks (heads) handle character restoration, date attribution and geographical attribution (which also incorporates visual features). The torso’s intermediate representations are merged into a unified, historically enriched embedding to retrieve similar inscriptions from the LED, ranked by relevance. Photograph of the arch of Titus by T.S.
Fig. 3
Fig. 3. Aeneas’ hypotheses for attribution of the RGDA, aggregated across its 35 chapters.
The top-5 parallels retrieved by Aeneas were TM 262102, TM 558342, TM 224699, TM 535818 and TM 273657. Owing to length limitations, each chapter was processed individually. The resulting distributions were then averaged across all chapters. We report the maximum value from this averaged distribution, as it is less susceptible to noise arising from inter-chapter variance.
Extended Data Fig. 1
Extended Data Fig. 1. Aeneas vs. T5 embeddings using UMAP.
UMAP visualisation illustrating the chronological and geographical attribution of Aeneas’ historically rich embeddings in comparison to traditional T5 textual embeddings. Geographical labels were excluded due to their length; instead, a colour gradient was employed to encode geographical coordinates: yellow representing northern locations, red for western, green for eastern, and blue for southern Roman provinces. (a) Chronological Attribution – Aeneas; (b) Chronological Attribution - T5; (c) Geographical Attribution – Aeneas; (d) Geographical Attribution - T5.
Extended Data Fig. 2
Extended Data Fig. 2. Aeneas’ outputs for contextualising the altar CIL XIII 6665 (TM 211813 = HD54789), a votive altar from Mainz (ancient Mongontiacum, in the province of Germania superior), dating 15 July 211 CE.
For this inscription, Aeneas provides: (a) the retrieved textual and contextual parallels; (b) the chronological attribution predictions; (c) the geographical attribution predictions; (d) the image saliency map for geographical attribution; (e) the textual saliency map for geographical attribution; (f) the restoration hypotheses for a lacuna of unknown length. Photograph of inv. no. S 553, courtesy of GDKE-Landesmuseum Mainz (ph. Ursula Rudischer). Map reproduced from CartoCB basemaps under a CC BY 3.0 Attribution 3.0 Unported license.
Extended Data Fig. 3
Extended Data Fig. 3. Geographical attribution performance analysis (LED testing set).
Geographical attribution accuracy per Roman province (LED test set). Some provinces may be empty as no inscriptions exist in the test set.
Extended Data Fig. 4
Extended Data Fig. 4. Geographical attribution inscriptions per province (LED training set).
Geographical attribution inscriptions per province (LED training set).
Extended Data Fig. 5
Extended Data Fig. 5. Chronological attribution performance analysis (LED test set).
Chronological attribution date loss per decade (LED test set). Some decades may be empty as no inscriptions exist in the test set.
Extended Data Fig. 6
Extended Data Fig. 6. Chronological attribution inscriptions per province (LED training set).
Chronological attribution inscriptions per decade (LED training set).

References

    1. Robert, L. in Les Épigraphies et l’Épigraphie Grecque et Romaine (ed. Samaran, C.) 453–497 (Gallimard, 1961).
    1. Panciera, S. What is an inscription? Problems of definition and identity of an historical source. Z. Papyrol. Epigr.183, 1–10 (2012).
    1. Bodel, J. in Epigraphic Culture and the Epigraphic Mode (eds Benefiel, R. & Keesling, C.) 11–44 (Brill, 2023).
    1. Alföldy, G. Il futuro dell’epigrafia. In XI Congresso Internazionale di Epigrafia Greca e Latina 87–102 (Edizioni Quasar, 1999).
    1. Cooley, A. The Cambridge Manual of Latin Epigraphy (Cambridge Univ. Press, 2012).

Publication types

LinkOut - more resources