Application of the LDA Model for Topic Identification: A Case Study on Dengue Diagnosis

Authors

DOI:

https://doi.org/10.15359/ru.39-1.19

Keywords:

Text mining, Statistical learning, Text processing, Text recognition, LDA Model, Dengue

Abstract

[Objective]: This research aimed to identify and analyze topics in the scientific literature related to dengue, with a focus on diagnosis, signs, and symptoms using the Latent Dirichlet Assignment (LDA) model. [Methodology]: Articles were collected from various databases, including VHL, Web of Science, Ovid, Scopus, PubMed, Health & Medical, ScienceDirect, and Google Scholar, covering 2000-2024. The search equation was designed using key terms such as “dengue,” “signs,” “symptoms,” and “diagnosis,” along with MeSH terms to ensure the inclusion of relevant articles. The LDA model was then implemented to analyze the collected articles. [Results]: The implementation of the LDA model identified four main themes: 1) Dengue diagnosis and clinical presentation, 2) Research and control interventions, 3) Severe dengue and its clinical manifestations, and 4) Virus detection, including dengue, zika, and chikungunya. This thematic analysis facilitated the organization and understanding of literature, providing an overview of the predominant themes in dengue research. [Conclusions]: The study’s approach not only enhanced the organization and understanding of the articles found but also provided insights into the predominant themes in dengue literature, which may guide future research and improve diagnostic and treatment strategies.

Downloads

Download data is not yet available.

References

Arenas Silva, Y. K. (2022). Modelamiento de tópicos aplicado al análisis de contenido de los tweets sobre el dengue en Colombia [Trabajo de grado] Universidad Industrial de Santander. Repositorio Institucional Noesis. https://noesis.uis.edu.co/server/api/core/bitstreams/5ea6ffd8-5471-4ceb-beef-4abbbe2244e9/content

Arteaga, D., & Mendoza, G. (2023). Estudio sobre tendencias en líneas de investigación en los trabajos de grado del Programa de Estadística de la Universidad del Valle [Trabajo de pregrado] Universidad del Valle. https://bibliotecadigital.univalle.edu.co/server/api/core/bitstreams/1d895306-087d-4b7d-aeac-89da3109a40b/content

Arun, R., Suresh, V., Madhavan, C. V., & Murthy, M. N. (2010). On finding the natural number of topics with latent Dirichlet allocation: Some observations. In Advances in knowledge discovery and data mining: 14th Pacific-Asia Conference, PAKDD 2010, Hyderabad, India, June 21-24, 2010. Proceedings. Part I (Vol. 14, pp. 391-402). Springer. https://doi.org/10.1007/978-3-642-13657-3_43

Asmussen, C. B., & Møller, C. (2019). Smart literature review: A practical topic modelling approach to exploratory literature review. Journal of Big Data, 6(93), 1-18. https://doi.org/10.1186/s40537-019-0255-7

Baum, D. (2012). Recognising speakers from the topics they talk about. Speech Communication, 54(10), 1132-1142. https://doi.org/10.1016/j.specom.2012.06.003

Benoit, K., Muhr, D., & Watanabe, K. (2021). Stopwords: Multilingual stopword lists (Version 2.3.0) [R package]. Comprehensive R Archive Network (CRAN). https://CRAN.R-project.org/package=stopwords

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(1), 993-1022.

Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(1-9), 1775-1781. https://doi.org/10.1016/j.neucom.2008.06.011

Chatel-Chaix, L., Fischl, W., Scaturro, P., Cortese, M., Kallis, S., Bartenschlager, M., Fischer, B., & Bartenschlager, R. (2015). A combined genetic-proteomic approach identifies residues within Dengue virus NS4B critical for interaction with NS3 and viral replication. Journal of Virology, 89(14), 7170-7186. https://doi.org/10.1128/JVI.00867-15

Chiang, C., Beljanski, V., Yin, K., Olagnier, D., Ben Yebdri, F., Steel, C., Goulet, M. L., DeFilippis, V. R., Streblow, D. N., Haddad, E. K., et al. (2015). Sequence-specific modifications enhance the broad-spectrum antiviral response activated by RIG-I agonists. Journal of Virology, 89(15), 8011-8025. https://doi.org/10.1128/JVI.00845-15

da Silva Ferreira, E. R., de Oliveira Gonçalves, A. C., Tobal Verro, A., Undurraga, E. A., Lacerda Nogueira, M., Estofolete, C. F., & Santos da Silva, N. (2020). Evaluating the validity of dengue clinical-epidemiological criteria for diagnosis in patients residing in a Brazilian endemic area. Transactions of the Royal Society of Tropical Medicine and Hygiene, 114(8), 603-611. https://doi.org/10.1093/trstmh/traa031

Deveaud, R., SanJuan, E., & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document Numérique, 17(1), 61-84. https://doi.org/10.3166/dn.17.1.61-84

Díaz Rubiano, M. A. (2022). Análisis de temas utilizando Twitter: Una aplicación del modelo LDA al caso colombiano [Trabajo de grado] Universidad Santo Tomás. http://hdl.handle.net/11634/43303

DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding. Poetics, 41(6), 570-606. https://doi.org/10.1016/j.poetic.2013.08.004

Elgesem, D., Feinerer, I., & Steskal, L. (2016). Bloggers’ responses to the Snowden affair: Combining automated and manual methods in the analysis of news blogging. Computer Supported Cooperative Work (CSCW), 25(2-3), 167-191. https://doi.org/10.1007/s10606-016-9251-z

Elgesem, D., Steskal, L., & Diakopoulos, N. (2019). Structure and content of the discourse on climate change in the blogosphere: The big picture. En Climate change communication and the internet (pp. 21-40). Routledge. https://doi.org/10.1080/17524032.2014.983536

Evans, M. S. (2014). A computational approach to qualitative analysis in large textual datasets. PloS One, 9(2), e87908. https://doi.org/10.1371/journal.pone.0087908

Ghosh, D., & Guha, R. (2013). What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and Geographic Information System. Cartography and Geographic Information Science, 40(2), 90-102. https://doi.org/10.1080/15230406.2013.776210

Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(S1), 5228-5235. https://doi.org/10.1073/pnas.0307752101

Gulo, C. A., & Rúbio, T. R. (2015). Text mining scientific articles using R [Ponencia]. Proceedings of the Doctoral Symposium in Informatics Engineering, Porto, Portugal. https://paginas.fe.up.pt/~prodei/dsie15/web/papers/dsie15_submission_10.pdf

Guo, L., Vargo, C. J., Pan, Z., Ding, W., & Ishwar, P. (2016). Big social data analytics in journalism and mass communication: Comparing dictionary-based text analysis and unsupervised topic modeling. Journalism & Mass Communication Quarterly, 93(2), 332-359. https://doi.org/10.1177/1077699016639231

Guzman, M., Halstead, S., Art sob, H. et al. (2010). Dengue: A continuing global threat. Nature Reviews Microbiology, 8(12), S7-S16. https://doi.org/10.1038/nrmicro2460

Guzman-Ponce, A., Fernandez-Beltran, R., Valdovinos-Rosas, R. M., Romero-Huertas, M., & Marcial-Romero, J. R. (2023). Identification of latent topics in patients surviving COVID-19 in Mexico. IEEE Latin America Transactions, 21(3), 328-334. https://latamt.ieeer9.org/index.php/transactions/article/view/6995

Herbinger, K. H., Siess, C., Nothdurft, H., Von Sonnenburg, F., & Löscher, T. (2011). Skin disorders among travellers returning from tropical and non-tropical countries consulting a travel medicine clinic. Tropical Medicine & International Health, 16(11), 1457-1464. https://doi.org/10.1111/j.1365-3156.2011.02840.x

Iqtadar, S., Akbar, N., Huma, N., & Randhawa, F. A. (2017). Profile of hepatic involvement in dengue infections in adult Pakistani population. Pakistan Journal of Medical Sciences, 33(4), 963. https://doi.org/10.12669/pjms.334.13026

Jacobi, C., Van Atteveldt, W., & Welbers, K. (2018). Quantitative analysis of large amounts of journalistic texts using topic modelling. Rethinking research methods in an age of digital journalism. 4(1) (pp. 89-106). Routledge. https://doi.org/10.1080/21670811.2015.1093271

Jiang, L. (2023). Modelado de temas en documentos de texto: Análisis comparativo de LSA, PLSA y LDA [Trabajo de fin de máster] Universitat Politècnica de València. Riunet. https://riunet.upv.es/handle/10251/197043

Jockers, M. L., & Mimno, D. (2013). Significant themes in 19th-century literature. Poetics, 41(6), 750-769. https://doi.org/10.1016/j.poetic.2013.08.005

Jones, C. H., Benítez-Valladares, D., Guillermo-May, G., Dzul-Manzanilla, F., Che-Mendoza, A., Barrera-Pérez, M., Selem-Salas, C., Chablé-Santos, J., Sommerfeld, J., Kroeger, A., O’Dempsey, T., Medina-Barreiro, A., & Manrique-Saide, P. (2014). Use and acceptance of long lasting insecticidal net screens for dengue prevention in Acapulco, Guerrero, Mexico. BMC Public Health, 14(1), 1-10. https://doi.org/10.1186/1471-2458-14-846

Kalra, V., Ahmad, S., Shrivastava, V., & Mittal, G. (2016). Quantitative and volume, conductivity and scatter changes in leucocytes of patients with acute undifferentiated febrile illness: A pilot study. Transactions of The Royal Society of Tropical Medicine and Hygiene, 110(5), 281-285. https://doi.org/10.1093/trstmh/trw028

Khetpal, A., Godil, A., Alam, M. T., Makhdoom, I. U. H. M., Adam, A. M., Mallick, A., Abbas, M. A., Abbas, A. H., Hasan, S. S., Shaikh, A., et al. (2021). Role of C-reactive proteins and liver function tests in assessing the severity of dengue fever. JPMA. The Journal of the Pakistan Medical Association, 71(3), 810-815. https://doi.org/10.47391/JPMA.170

Khurram, M., Qayyum, W., Umar, M., Jawad, M., Mumtaz, S., & Khaar, H. B. (2016). Ultrasonographic pattern of plasma leak in dengue haemorrhagic fever. J Pak Med Assoc, 66(2), 260-264.

Koltsova, O., & Koltcov, S. (2013). Mapping the public agenda with topic modeling: The case of the Russian livejournal. Policy & Internet, 5(2), 207-227. https://doi.org/10.1002/1944-2866.POI331

Laoprasopwattana, K., Limpitikul, W., & Geater, A. (2020). Using clinical profiles and complete blood counts to differentiate causes of acute febrile illness during the 2009-11 outbreak of typhoid and chikungunya in a dengue endemic area. Journal of Tropical Pediatrics, 66(5), 504-510. https://doi.org/10.1093/tropej/fmaa006

Luquea, C., Rubriche, J., Galvis, J., & Sosa, J. (2021). Modelamiento de tópicos para identificar patrones en la investigación científica del Covid-19. Comunicaciones en Estadística, 14(1), 48-66. https://doi.org/10.15332/23393076.7705

Marks, M., Armstrong, M., Whitty, C. J., & Doherty, J. F. (2016). Geographical and temporal trends in imported infections from the tropics requiring inpatient care at the Hospital for Tropical Diseases, London-a 15-year study. Transactions of the Royal Society of Tropical Medicine and Hygiene, 110(8), 456-463. https://doi.org/10.1093/trstmh/trw053

Ministerio de Salud y Protección Social - Federación Médica Colombiana. (2013). DENGUE- MEMORIAS, 2013. 2012-2013. https://www.minsalud.gov.co/sites/rid/Lists/BibliotecaDigital/RIDE/VS/TH/Memorias_dengue.pdf

Morizono, K., & Chen, I. S. (2014). Role of phosphatidylserine receptors in enveloped virus infection. Journal of Virology, 88(8), 4275-4290. https://doi.org/10.1128/JVI.03287-13

Nazareth, T., Teodósio, R., Porto, G., Gonçalves, L., Seixas, G., Silva, A. C., & Sousa, C. A. (2014). Strengthening the perception-assessment tools for dengue prevention: A cross-sectional survey in a temperate region (Madeira, Portugal). BMC Public Health, 14(39), 1-10. https://doi.org/10.1186/1471-2458-14-39

Nikita. (2020). ldatuning: Tuning of the Latent Dirichlet Allocation models parameters (R package version 1.0.2) [R package]. Comprehensive R Archive Network (CRAN). https://cran.r-project.org/package=ldatuning

Nivedita. (2016). Knowledge, attitude, behaviour and practices (KABP) of the community and resultant IEC leading to behaviour change about dengue in Jodhpur City, Rajasthan. Journal of Vector Borne Diseases, 53(4), 279-282.

Parkash, O., Almas, A., Jafri, S. W., Hamid, S., Akhtar, J., & Alishah, H. (2010). Severity of acute hepatitis and its outcome in patients with dengue fever in a tertiary care hospital Karachi, Pakistan (South Asia). BMC Gastroenterology, 10(1), 1-8.

Parra, D., Trattner, C., Gómez, D., Hurtado, M., Wen, X., & Lin, Y. R. (2016). Twitter in academic events: A study of temporal usage, communication, sentimental and topical patterns in 16 computer science conferences. Computer Communications, 73(1), 301-314. https://doi.org/10.1016/j.comcom.2015.07.001

Pattabhi, S., Wilkins, C. R., Dong, R., Knoll, M. L., Posakony, J., Kaiser, S., Mire, C. E., Wang, M. L., Ireton, R. C., Geisbert, T. W., et al. (2016). Targeting innate immunity for antiviral therapy through small molecule agonists of the RLR pathway. Journal of Virology, 90(5), 2372-2387. https://doi.org/10.1128/JVI.02202-15

Pilacuan-Bonete, L., Galindo-Villardón, P., & Delgado-Álvarez, F. (2022). HJ-Biplot as a tool to give an extra analytical boost for the Latent Dirichlet Allocation (LDA) model: With an application to digital news analysis about COVID-19. Mathematics, 10(14), 2529. https://doi.org/10.3390/math10142529

Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209-228. https://doi.org/10.1111/j.1540-5907.2009.00427.x

Rojas, R. M. R. (2022). Modelamiento de tópicos utilizando mensajes de Twitter relacionados al cáncer cervical. Interfases, 16(16), 41-52. https://doi.org/10.26439/interfases2022.n016.5887

Sahria, Y., & Fudholi, D. H. (2020). Analysis of health research topics in Indonesia using the LDA (latent Dirichlet allocation) topic modeling method. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi, 4(2), 336-344. https://doi.org/10.29207/resti.v4i2.1821

Sanchez, L., Perez, D., Perez, T., Sosa, T., Cruz, G., Kouri, G., Boelaert, M., & Van Der Stuyft, P. (2005). Intersectoral coordination in Aedes aegypti control. A pilot project in Havana City, Cuba. Tropical Medicine & International Health, 10(1), 82-91. https://doi.org/10.1111/j.1365-3156.2004.01347.x

Schultes, O. L., Morais, M. H. F., Cunha, M. d. C. M., Sobral, A., & Caiaffa, W. T. (2021). Spatial analysis of dengue incidence and Aedes aegypti ovitrap surveillance in Belo Horizonte, Brazil. Tropical Medicine & International Health, 26(2), 237-255. https://doi.org/10.1111/tmi.13521

Sievert, C., & Shirley, K. (2015). LDAvis: Interactive visualization of topic models (Version 0.3.2) [R package]. Comprehensive R Archive Network (CRAN). https://CRAN.R-project.org/package=LDAvis

Silvestre Gómez, M. (2018). Implementación de asignación jerárquica latente de Dirichlet para modelado de temas [Trabajo de fin de máster, Universidad de Sevilla]. Archivo digital.

Thomas, L., Brouste, Y., Najioullah, F., Hochedez, P., Hatchuel, Y., Moravie, V., Kaidomar, S., Besnier, F., Abel, S., Rosine, J., Quenel, P., Césaire, R., & Cabié, A. (2010). Predictors of severe manifestations in a cohort of adult dengue patients. Journal of Clinical Virology, 48(2), 96-99. https://doi.org/10.1016/j.jcv.2010.03.008

Trueba-Gómez, R., & Estrada-Lorenzo, J. M. (2010). La base de datos PubMed y la búsqueda de información científica. Seminarios de la Fundación Española de Reumatología, 11(1), 49-63. https://doi.org/10.1016/j.semreu.2010.02.005

Van Atteveldt, W., Welbers, K., Jacobi, C., & Vliegenthart, R. (2014). LDA models topics... But what are “topics”. In Big Data in the Social Sciences Workshop.

World Health Organization. (2023, December 21). Disease Outbreak News; Dengue – Global situation. https://www.who.int/emergencies/disease-outbreak-news/item/2023-DON498

Xie, X., Zou, J., Puttikhunt, C., Yuan, Z., & Shi, P. Y. (2015). Two distinct sets of NS2A molecules are responsible for dengue virus RNA synthesis and virion assembly. Journal of Virology, 89(2), 1298-1313. https://doi.org/10.1128/JVI.02882-14

Published

2025-11-30

Issue

Section

Original scientific papers (evaluated by academic peers)

How to Cite

Portilla-Yela, J., Palomino-Montezuma, A. F., Manotas-Duque, D. F., & Tovar-Cuevas, J. R. (2025). Application of the LDA Model for Topic Identification: A Case Study on Dengue Diagnosis. Uniciencia, 39(1), 1-21. https://doi.org/10.15359/ru.39-1.19