Application of the LDA Model for Topic Identification: A Case Study on Dengue Diagnosis
DOI:
https://doi.org/10.15359/ru.39-1.19Keywords:
Text mining, Statistical learning, Text processing, Text recognition, LDA Model, DengueAbstract
[Objective]: This research aimed to identify and analyze topics in the scientific literature related to dengue, with a focus on diagnosis, signs, and symptoms using the Latent Dirichlet Assignment (LDA) model. [Methodology]: Articles were collected from various databases, including VHL, Web of Science, Ovid, Scopus, PubMed, Health & Medical, ScienceDirect, and Google Scholar, covering 2000-2024. The search equation was designed using key terms such as “dengue,” “signs,” “symptoms,” and “diagnosis,” along with MeSH terms to ensure the inclusion of relevant articles. The LDA model was then implemented to analyze the collected articles. [Results]: The implementation of the LDA model identified four main themes: 1) Dengue diagnosis and clinical presentation, 2) Research and control interventions, 3) Severe dengue and its clinical manifestations, and 4) Virus detection, including dengue, zika, and chikungunya. This thematic analysis facilitated the organization and understanding of literature, providing an overview of the predominant themes in dengue research. [Conclusions]: The study’s approach not only enhanced the organization and understanding of the articles found but also provided insights into the predominant themes in dengue literature, which may guide future research and improve diagnostic and treatment strategies.
Downloads
References
Arenas Silva, Y. K. (2022). Modelamiento de tópicos aplicado al análisis de contenido de los tweets sobre el dengue en Colombia [Trabajo de grado] Universidad Industrial de Santander. Repositorio Institucional Noesis. https://noesis.uis.edu.co/server/api/core/bitstreams/5ea6ffd8-5471-4ceb-beef-4abbbe2244e9/content
Arteaga, D., & Mendoza, G. (2023). Estudio sobre tendencias en líneas de investigación en los trabajos de grado del Programa de Estadística de la Universidad del Valle [Trabajo de pregrado] Universidad del Valle. https://bibliotecadigital.univalle.edu.co/server/api/core/bitstreams/1d895306-087d-4b7d-aeac-89da3109a40b/content
Arun, R., Suresh, V., Madhavan, C. V., & Murthy, M. N. (2010). On finding the natural number of topics with latent Dirichlet allocation: Some observations. In Advances in knowledge discovery and data mining: 14th Pacific-Asia Conference, PAKDD 2010, Hyderabad, India, June 21-24, 2010. Proceedings. Part I (Vol. 14, pp. 391-402). Springer. https://doi.org/10.1007/978-3-642-13657-3_43
Asmussen, C. B., & Møller, C. (2019). Smart literature review: A practical topic modelling approach to exploratory literature review. Journal of Big Data, 6(93), 1-18. https://doi.org/10.1186/s40537-019-0255-7
Baum, D. (2012). Recognising speakers from the topics they talk about. Speech Communication, 54(10), 1132-1142. https://doi.org/10.1016/j.specom.2012.06.003
Benoit, K., Muhr, D., & Watanabe, K. (2021). Stopwords: Multilingual stopword lists (Version 2.3.0) [R package]. Comprehensive R Archive Network (CRAN). https://CRAN.R-project.org/package=stopwords
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(1), 993-1022.
Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(1-9), 1775-1781. https://doi.org/10.1016/j.neucom.2008.06.011
Chatel-Chaix, L., Fischl, W., Scaturro, P., Cortese, M., Kallis, S., Bartenschlager, M., Fischer, B., & Bartenschlager, R. (2015). A combined genetic-proteomic approach identifies residues within Dengue virus NS4B critical for interaction with NS3 and viral replication. Journal of Virology, 89(14), 7170-7186. https://doi.org/10.1128/JVI.00867-15
Chiang, C., Beljanski, V., Yin, K., Olagnier, D., Ben Yebdri, F., Steel, C., Goulet, M. L., DeFilippis, V. R., Streblow, D. N., Haddad, E. K., et al. (2015). Sequence-specific modifications enhance the broad-spectrum antiviral response activated by RIG-I agonists. Journal of Virology, 89(15), 8011-8025. https://doi.org/10.1128/JVI.00845-15
da Silva Ferreira, E. R., de Oliveira Gonçalves, A. C., Tobal Verro, A., Undurraga, E. A., Lacerda Nogueira, M., Estofolete, C. F., & Santos da Silva, N. (2020). Evaluating the validity of dengue clinical-epidemiological criteria for diagnosis in patients residing in a Brazilian endemic area. Transactions of the Royal Society of Tropical Medicine and Hygiene, 114(8), 603-611. https://doi.org/10.1093/trstmh/traa031
Deveaud, R., SanJuan, E., & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document Numérique, 17(1), 61-84. https://doi.org/10.3166/dn.17.1.61-84
Díaz Rubiano, M. A. (2022). Análisis de temas utilizando Twitter: Una aplicación del modelo LDA al caso colombiano [Trabajo de grado] Universidad Santo Tomás. http://hdl.handle.net/11634/43303
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding. Poetics, 41(6), 570-606. https://doi.org/10.1016/j.poetic.2013.08.004
Elgesem, D., Feinerer, I., & Steskal, L. (2016). Bloggers’ responses to the Snowden affair: Combining automated and manual methods in the analysis of news blogging. Computer Supported Cooperative Work (CSCW), 25(2-3), 167-191. https://doi.org/10.1007/s10606-016-9251-z
Elgesem, D., Steskal, L., & Diakopoulos, N. (2019). Structure and content of the discourse on climate change in the blogosphere: The big picture. En Climate change communication and the internet (pp. 21-40). Routledge. https://doi.org/10.1080/17524032.2014.983536
Evans, M. S. (2014). A computational approach to qualitative analysis in large textual datasets. PloS One, 9(2), e87908. https://doi.org/10.1371/journal.pone.0087908
Ghosh, D., & Guha, R. (2013). What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and Geographic Information System. Cartography and Geographic Information Science, 40(2), 90-102. https://doi.org/10.1080/15230406.2013.776210
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(S1), 5228-5235. https://doi.org/10.1073/pnas.0307752101
Gulo, C. A., & Rúbio, T. R. (2015). Text mining scientific articles using R [Ponencia]. Proceedings of the Doctoral Symposium in Informatics Engineering, Porto, Portugal. https://paginas.fe.up.pt/~prodei/dsie15/web/papers/dsie15_submission_10.pdf
Guo, L., Vargo, C. J., Pan, Z., Ding, W., & Ishwar, P. (2016). Big social data analytics in journalism and mass communication: Comparing dictionary-based text analysis and unsupervised topic modeling. Journalism & Mass Communication Quarterly, 93(2), 332-359. https://doi.org/10.1177/1077699016639231
Guzman, M., Halstead, S., Art sob, H. et al. (2010). Dengue: A continuing global threat. Nature Reviews Microbiology, 8(12), S7-S16. https://doi.org/10.1038/nrmicro2460
Guzman-Ponce, A., Fernandez-Beltran, R., Valdovinos-Rosas, R. M., Romero-Huertas, M., & Marcial-Romero, J. R. (2023). Identification of latent topics in patients surviving COVID-19 in Mexico. IEEE Latin America Transactions, 21(3), 328-334. https://latamt.ieeer9.org/index.php/transactions/article/view/6995
Herbinger, K. H., Siess, C., Nothdurft, H., Von Sonnenburg, F., & Löscher, T. (2011). Skin disorders among travellers returning from tropical and non-tropical countries consulting a travel medicine clinic. Tropical Medicine & International Health, 16(11), 1457-1464. https://doi.org/10.1111/j.1365-3156.2011.02840.x
Iqtadar, S., Akbar, N., Huma, N., & Randhawa, F. A. (2017). Profile of hepatic involvement in dengue infections in adult Pakistani population. Pakistan Journal of Medical Sciences, 33(4), 963. https://doi.org/10.12669/pjms.334.13026
Jacobi, C., Van Atteveldt, W., & Welbers, K. (2018). Quantitative analysis of large amounts of journalistic texts using topic modelling. Rethinking research methods in an age of digital journalism. 4(1) (pp. 89-106). Routledge. https://doi.org/10.1080/21670811.2015.1093271
Jiang, L. (2023). Modelado de temas en documentos de texto: Análisis comparativo de LSA, PLSA y LDA [Trabajo de fin de máster] Universitat Politècnica de València. Riunet. https://riunet.upv.es/handle/10251/197043
Jockers, M. L., & Mimno, D. (2013). Significant themes in 19th-century literature. Poetics, 41(6), 750-769. https://doi.org/10.1016/j.poetic.2013.08.005
Jones, C. H., Benítez-Valladares, D., Guillermo-May, G., Dzul-Manzanilla, F., Che-Mendoza, A., Barrera-Pérez, M., Selem-Salas, C., Chablé-Santos, J., Sommerfeld, J., Kroeger, A., O’Dempsey, T., Medina-Barreiro, A., & Manrique-Saide, P. (2014). Use and acceptance of long lasting insecticidal net screens for dengue prevention in Acapulco, Guerrero, Mexico. BMC Public Health, 14(1), 1-10. https://doi.org/10.1186/1471-2458-14-846
Kalra, V., Ahmad, S., Shrivastava, V., & Mittal, G. (2016). Quantitative and volume, conductivity and scatter changes in leucocytes of patients with acute undifferentiated febrile illness: A pilot study. Transactions of The Royal Society of Tropical Medicine and Hygiene, 110(5), 281-285. https://doi.org/10.1093/trstmh/trw028
Khetpal, A., Godil, A., Alam, M. T., Makhdoom, I. U. H. M., Adam, A. M., Mallick, A., Abbas, M. A., Abbas, A. H., Hasan, S. S., Shaikh, A., et al. (2021). Role of C-reactive proteins and liver function tests in assessing the severity of dengue fever. JPMA. The Journal of the Pakistan Medical Association, 71(3), 810-815. https://doi.org/10.47391/JPMA.170
Khurram, M., Qayyum, W., Umar, M., Jawad, M., Mumtaz, S., & Khaar, H. B. (2016). Ultrasonographic pattern of plasma leak in dengue haemorrhagic fever. J Pak Med Assoc, 66(2), 260-264.
Koltsova, O., & Koltcov, S. (2013). Mapping the public agenda with topic modeling: The case of the Russian livejournal. Policy & Internet, 5(2), 207-227. https://doi.org/10.1002/1944-2866.POI331
Laoprasopwattana, K., Limpitikul, W., & Geater, A. (2020). Using clinical profiles and complete blood counts to differentiate causes of acute febrile illness during the 2009-11 outbreak of typhoid and chikungunya in a dengue endemic area. Journal of Tropical Pediatrics, 66(5), 504-510. https://doi.org/10.1093/tropej/fmaa006
Luquea, C., Rubriche, J., Galvis, J., & Sosa, J. (2021). Modelamiento de tópicos para identificar patrones en la investigación científica del Covid-19. Comunicaciones en Estadística, 14(1), 48-66. https://doi.org/10.15332/23393076.7705
Marks, M., Armstrong, M., Whitty, C. J., & Doherty, J. F. (2016). Geographical and temporal trends in imported infections from the tropics requiring inpatient care at the Hospital for Tropical Diseases, London-a 15-year study. Transactions of the Royal Society of Tropical Medicine and Hygiene, 110(8), 456-463. https://doi.org/10.1093/trstmh/trw053
Ministerio de Salud y Protección Social - Federación Médica Colombiana. (2013). DENGUE- MEMORIAS, 2013. 2012-2013. https://www.minsalud.gov.co/sites/rid/Lists/BibliotecaDigital/RIDE/VS/TH/Memorias_dengue.pdf
Morizono, K., & Chen, I. S. (2014). Role of phosphatidylserine receptors in enveloped virus infection. Journal of Virology, 88(8), 4275-4290. https://doi.org/10.1128/JVI.03287-13
Nazareth, T., Teodósio, R., Porto, G., Gonçalves, L., Seixas, G., Silva, A. C., & Sousa, C. A. (2014). Strengthening the perception-assessment tools for dengue prevention: A cross-sectional survey in a temperate region (Madeira, Portugal). BMC Public Health, 14(39), 1-10. https://doi.org/10.1186/1471-2458-14-39
Nikita. (2020). ldatuning: Tuning of the Latent Dirichlet Allocation models parameters (R package version 1.0.2) [R package]. Comprehensive R Archive Network (CRAN). https://cran.r-project.org/package=ldatuning
Nivedita. (2016). Knowledge, attitude, behaviour and practices (KABP) of the community and resultant IEC leading to behaviour change about dengue in Jodhpur City, Rajasthan. Journal of Vector Borne Diseases, 53(4), 279-282.
Parkash, O., Almas, A., Jafri, S. W., Hamid, S., Akhtar, J., & Alishah, H. (2010). Severity of acute hepatitis and its outcome in patients with dengue fever in a tertiary care hospital Karachi, Pakistan (South Asia). BMC Gastroenterology, 10(1), 1-8.
Parra, D., Trattner, C., Gómez, D., Hurtado, M., Wen, X., & Lin, Y. R. (2016). Twitter in academic events: A study of temporal usage, communication, sentimental and topical patterns in 16 computer science conferences. Computer Communications, 73(1), 301-314. https://doi.org/10.1016/j.comcom.2015.07.001
Pattabhi, S., Wilkins, C. R., Dong, R., Knoll, M. L., Posakony, J., Kaiser, S., Mire, C. E., Wang, M. L., Ireton, R. C., Geisbert, T. W., et al. (2016). Targeting innate immunity for antiviral therapy through small molecule agonists of the RLR pathway. Journal of Virology, 90(5), 2372-2387. https://doi.org/10.1128/JVI.02202-15
Pilacuan-Bonete, L., Galindo-Villardón, P., & Delgado-Álvarez, F. (2022). HJ-Biplot as a tool to give an extra analytical boost for the Latent Dirichlet Allocation (LDA) model: With an application to digital news analysis about COVID-19. Mathematics, 10(14), 2529. https://doi.org/10.3390/math10142529
Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209-228. https://doi.org/10.1111/j.1540-5907.2009.00427.x
Rojas, R. M. R. (2022). Modelamiento de tópicos utilizando mensajes de Twitter relacionados al cáncer cervical. Interfases, 16(16), 41-52. https://doi.org/10.26439/interfases2022.n016.5887
Sahria, Y., & Fudholi, D. H. (2020). Analysis of health research topics in Indonesia using the LDA (latent Dirichlet allocation) topic modeling method. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi, 4(2), 336-344. https://doi.org/10.29207/resti.v4i2.1821
Sanchez, L., Perez, D., Perez, T., Sosa, T., Cruz, G., Kouri, G., Boelaert, M., & Van Der Stuyft, P. (2005). Intersectoral coordination in Aedes aegypti control. A pilot project in Havana City, Cuba. Tropical Medicine & International Health, 10(1), 82-91. https://doi.org/10.1111/j.1365-3156.2004.01347.x
Schultes, O. L., Morais, M. H. F., Cunha, M. d. C. M., Sobral, A., & Caiaffa, W. T. (2021). Spatial analysis of dengue incidence and Aedes aegypti ovitrap surveillance in Belo Horizonte, Brazil. Tropical Medicine & International Health, 26(2), 237-255. https://doi.org/10.1111/tmi.13521
Sievert, C., & Shirley, K. (2015). LDAvis: Interactive visualization of topic models (Version 0.3.2) [R package]. Comprehensive R Archive Network (CRAN). https://CRAN.R-project.org/package=LDAvis
Silvestre Gómez, M. (2018). Implementación de asignación jerárquica latente de Dirichlet para modelado de temas [Trabajo de fin de máster, Universidad de Sevilla]. Archivo digital.
Thomas, L., Brouste, Y., Najioullah, F., Hochedez, P., Hatchuel, Y., Moravie, V., Kaidomar, S., Besnier, F., Abel, S., Rosine, J., Quenel, P., Césaire, R., & Cabié, A. (2010). Predictors of severe manifestations in a cohort of adult dengue patients. Journal of Clinical Virology, 48(2), 96-99. https://doi.org/10.1016/j.jcv.2010.03.008
Trueba-Gómez, R., & Estrada-Lorenzo, J. M. (2010). La base de datos PubMed y la búsqueda de información científica. Seminarios de la Fundación Española de Reumatología, 11(1), 49-63. https://doi.org/10.1016/j.semreu.2010.02.005
Van Atteveldt, W., Welbers, K., Jacobi, C., & Vliegenthart, R. (2014). LDA models topics... But what are “topics”. In Big Data in the Social Sciences Workshop.
World Health Organization. (2023, December 21). Disease Outbreak News; Dengue – Global situation. https://www.who.int/emergencies/disease-outbreak-news/item/2023-DON498
Xie, X., Zou, J., Puttikhunt, C., Yuan, Z., & Shi, P. Y. (2015). Two distinct sets of NS2A molecules are responsible for dengue virus RNA synthesis and virion assembly. Journal of Virology, 89(2), 1298-1313. https://doi.org/10.1128/JVI.02882-14
Published
Issue
Section
License
Copyright (c) 2025 Shared by Journal and Authors (CC-BY-NC-ND)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Authors guarantee the journal the right to be the first publication of the work as licensed under a Creative Commons License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal https://creativecommons.org/licenses/by-nc-nd/4.0/.
2. Authors can set separate additional agreements for non-exclusive distribution of the version of the work published in the journal (eg, place it in an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
3. The authors have declared to hold all permissions to use the resources they provided in the paper (images, tables, among others) and assume full responsibility for damages to third parties.
4. The opinions expressed in the paper are the exclusive responsibility of the authors and do not necessarily represent the opinion of the editors or the Universidad Nacional.
Uniciencia Journal and all its productions are under Creative Commons Atribución-NoComercial-SinDerivadas 4.0 Unported.
There is neither fee for access nor Article Processing Charge (APC)
