KILDA: identifying KIV-2 repeats from kmers
- PMID: 40453649
- PMCID: PMC12123407
- DOI: 10.1093/nargab/lqaf070
KILDA: identifying KIV-2 repeats from kmers
Abstract
High concentration of lipoprotein(a) [Lp(a)], a lipoprotein with proatherogenic properties, is an important risk factor for cardiovascular disease. This concentration is mostly genetically determined by a complex interplay between the number of kringle IV type 2 repeats and Lp(a)-affecting variants. Besides Lp(a) plasma concentration, there is an unmet need to identify individuals most at risk based on their LPA genotype. We developed KILDA (KIv2 Length Determined from a kmer Analysis), a Nextflow pipeline, to identify the number of kringle IV type 2 repeats and Lp(a)-affecting variants directly from kmers generated from FASTQ files. The pipeline was tested on the 1000 Genomes Project (n = 2459) and results were equivalent to DRAGEN-LPA (R 2= 0.92). In silico datasets proved the robustness of KILDA's predictions under different scenarios of sequencing coverage and quality. In brief, KILDA is a robust, open-source, and free-to-use pipeline that can identify the number of kringle IV type 2 repeats and Lp(a)-associated variants even when inputting low-coverage libraries.
© The Author(s) 2025. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
Conflict of interest statement
None declared.
Figures
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous