A Comprehensive Analysis of Online Reviews in the Srem Region through Topic Modeling
Olivera Grljević - Univeristy of Novi Sad, Faculty of Economics in Subotica, Segedinski put 9-11, 24000 Subotica, Serbia
Mirjana Marić - Univeristy of Novi Sad, Faculty of Economics in Subotica, Segedinski put 9-11, 24000 Subotica, Serbia
DOI: https://doi.org/10.31410/tmt.2023-2024.291
Olivera Grljević - Univeristy of Novi Sad, Faculty of Economics in Subotica, Segedinski put 9-11, 24000 Subotica, Serbia
Mirjana Marić - Univeristy of Novi Sad, Faculty of Economics in Subotica, Segedinski put 9-11, 24000 Subotica, Serbia
DOI: https://doi.org/10.31410/tmt.2023-2024.291
8th International Thematic Monograph - Modern Management Tools and Economy of Tourism Sector in Present Era, Belgrade, 2023/2024, Published by: Association of Economists and Managers of the Balkans in cooperation with the Faculty of Tourism and Hospitality, Ohrid, North Macedonia; ISSN 2683-5673, ISBN 978-86-80194-81-3 ; Editors: Vuk Bevanda, associate professor, Faculty of Social Sciences, Belgrade, Serbia; Snežana Štetić, full time professor, The College of Tourism, Belgrade, Serbia, Printed by: SKRIPTA International, Belgrade
Abstract: This chapter employs topic modeling to reveal the public stance and perception of tourist destinations in the Srem region, providing insights into their diverse appeal and variety of tourist profiles. Social media is the touch point with visitors of tourist attractions and consumers of tourist services. The collection of online reviews of tourist attractions in the Srem region is populated and used for modeling the hidden thematic structures. The authors identified an optimal model with 14 distinct topics through extensive experimentation, centered around nature, relaxation, shrines, and museum history. The topics indicate various tourist profiles active, gastronomic, leisure-seeking, history-loving, and family-oriented tourists. Knowing about the audience is valuable for targeted marketing strategies. The authors extracted and analyzed subsets of reviews related to Monasteries, Museums, Nature, and Nature Reserves indicating specific preferences of tourists within these categories, such as historical relevance of museums, use of modern technologies in exhibitions, or children-friendly content in nature reserves, and improvement areas, such as condition of roads, control of forest cutting, or garbage disposal management. This research offers clearly defined methodological steps and valuable insights for marketing and tourism development in the Srem region.
Keywords: Topic modeling; Online reviews; Tourist preferences
REFERENCES
Abdelrazek, A., Eid, Y., Gawish, E., Medhat, W., & Hassan, A. (2023). Topic modeling algorithms and applications: A survey. Information Systems, 112, 102131. https://doi. org/10.1016/j.is.2022.102131
Aghdam, M. H. (2022). A novel constrained non-negative matrix factorization method based on users and items pairwise relationship for recommender systems. Expert Systems with Applications, 195, 116593. https://doi.org/10.1016/j.eswa.2022.116593
Ali, T., Omar, B., & Soulaimane, K. (2022). Analyzing tourism reviews using an LDA topic-based sentiment analysis approach. MethodsX, 9, 101894. https://doi.org/10.1016/j. mex.2022.101894
Anandarajan, M., Hill, C., & Nolan, T. (2019). Practical text analytics: Maximizing the Value of Text Data. Springer Cham. https://doi.org/10.1007/978-3-319-95663-3
Banks, G. C., Woznyj, H. M., Wesslen, R. S., & Ross, R. L. (2018). A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App). Journal of Business and Psychology, 33, 445–459. https://doi.org/10.1007/s10869-017-9528-3
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(2003), 993-1022. Retrieved from https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Calheirosa, A., Moro, S., & Rita, P. (2017). Sentiment Classification of Consumer-Generated Online Reviews Using Topic Modeling. JOURNAL OF HOSPITALITY MARKETING & MANAGEMENT, 26(7), 675–693. https://doi.org/10.1080/19368623.2017.1310075
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT (pp. 4171–4186). Minneapolis, Minnesota, USA: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N19-1423.pdf
Dixon, S. (2023). Media usage in an internet minute as of April 2022. Statista.
Egger, R. (2022). Applied Data Science in Tourism: Interdisciplinary Approaches, Methodologies, and Applications. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-88389-8
Feldman, R., & Sanger, J. (2007). The text mining handbook - Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press. https://doi.org/10.1017/ cbo9780511546914
Greene, D., O'Callaghan, D., & Cunningham, P. (2014). How Many Topics? Stability Analysis for Topic Models. Machine Learning and Knowledge Discovery in Databases, 498-513. https:// doi.org/10.1007/978-3-662-44848-9_32
Grljević, O. (2016). Sentiment u sadržajima sa društvenih mreža kao instrument unapređenja poslovanja. Subotica, Srbija: Autorski reprint.
Grljević, O. (2023). Analiza sadržaja društvenih medija: Napredni pristupi analizi nestrukturisanih podataka. Subotica: Ekonomski fakultet u Subotici.
Grljević, O., Bošnjak, S., Pavlićević, V., & Pavlović, N. (2019). Analysis of public stance on tourism destinations in Srem/Srijem region. In V. Bevanda, & S. Štetić (Eds.), 4th International Thematic Monograph: Modern Management Tools and Economy of Tourism Sector in Present Era (pp. 267-290). Beograd: Association of Economists and Managers of the Balkans in cooperation with the Faculty of Tourism and Hospitality, Ohrid, North Macedonia. https:// doi.org/10.31410/tmt.2019.267
Grljević, O., Bošnjak, Z., & Kovačević, A. (2022). Opinion mining in higher education: a corpus-based approach. Enterprise Information Systems, 16(5), 1773542. https://doi.org/10.1080 /17517575.2020.1773542
Hu, N., Zhang, T., Gao, B., & Bose, I. (2019). What do hotel customers complain about? Text analysis using structural topic model. Tourism Management, 72, 417-426. https://doi.org/10.1016/j. tourman.2019.01.002
Hudson, S., Roth, M., Madden, T., & Hudson, R. (2015). The effects of social media on emotions, brand relationship quality, and word of mouth: An empirical study of music festival attendees. Tourism Management, 47, 68-76. https://doi.org/10.1016/j.tourman.2014.09.001
Hutchison, P. D., Daigle, R. J., & George, B. (2018). Application of latent semantic analysis in AIS academic research. International Journal of Accounting Information Systems, 31, 83-96. https://doi.org/10.1016/j.accinf.2018.09.003
Kim, K., Park, O., Barr, J., & Yun, H. (2019). Tourists’ shifting perceptions of UNESCO heritage sites: lessons from Jeju Island-South Korea. Tourism Review, 74(1), 20-29. https://doi. org/10.1108/TR-09-2017-0140
Kirilenko, A. P., Stepchenkova, S. O., & Dai, X. (2021). Automated topic modeling of tourist reviews: Does the Anna Karenina principle apply? Tourism Management, 83, 104241. https:// doi.org/10.1016/j.tourman.2020.104241
Korenčić, D., Ristov, S., Repar, J., & Šnajder, J. (2021). A Topic Coverage Approach to Evaluation of Topic Models. IEEE Access, 9, 123280-123312. https://doi.org/10.1109/access.2021.3109425
Kovačević, A., Grljević, O., Bošnjak, Z., & Svilengaćin, G. (2020). The Linguistic Construction of Sentiment Expressions in Student Opinionated Content: A Corpus-based study. Poznań Studies in Contemporary Linguistics, 56(2), 207–249. https://doi.org/10.1515/psicl-2020-0006
Laureate, C. D., Buntine, W., & Linger, H. (2023). A systematic review of the use of topic models for short text social media analysis. Artificial Intelligence Review. https://doi.org/10.1007/ s10462-023-10471-x
Lee, D. D., & Seung, S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788-791. https://doi.org/10.1038/44565
Lee, H., & Kang, Y. (2021). Mining tourists’ destinations and preferences through LSTM-based text classification and spatial clustering using Flickr data. Spatial Information Research, 29, 825–839. https://doi.org/10.1007/s41324-021-00397-3
Li, W., Guo, K., Shi, Y., Zhu, L., & Zheng, Y. (2018). DWWP: Domain-specific new words detection and word propagation system for sentiment analysis in the tourism domain. Knowledge-Based Systems, 146, 203-214. https://doi.org/10.1016/j.knosys.2018.02.004
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Haussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118. https://doi.org/10.1080/19312458.2018.1430754
Milosevic, N. (2012). Stemmer for Serbian language. arXiv, arXiv:1209.4471v1. https://doi. org/10.48550/arXiv.1209.4471
Moody, C. (2016). Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec. arXiv, arXiv:1605.02019v1.
Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111-126. https:// doi.org/10.1002/env.3170050203
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. http://dx.doi.org/10.1561/1500000011
Papilloud, C., & Hinneburg, A. (2018). Qualitative Textanalyse mit Topic-Modellen: Eine Einführung für Sozialwissenschaftler. Wiesbaden: Springer. https://doi.org/10.1007/978-3-658-21980-2
Rosner, F., Hinneburg, A., Röder, M., Nettling, M., & Both, A. (2013). Evaluating topic coherence measures. Neural Information Processing Systems Foundation (NIPS 2013). https://doi. org/10.1145/2684822.2685324
Salton, G., & McGill, M. J. (1986). Introduction to Modern Information Retrieval. New York, NY, USA: McGrawHill, Inc.
Shafqat, W., & Byun, Y.-C. (2020). A Recommendation Mechanism for Under-Emphasized Tourist Spots Using Topic Modeling and Sentiment Analysis. Sustainability, 12(1), 320. https:// doi.org/10.3390/su12010320
Tang, J., Meng, Z., Nguyen, X., Mei, Q., & Zhang, M. (2014). Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis. Proceedings of the 31st International Conference on Machine Learning. 32(1), pp. 190-198. Beijing, China: JMLR: W&CP. Retrieved from http://proceedings.mlr.press/v32/tang14.pdf
Ubiparipovic, B., Matkovic, P., Marić, M., & Tumbas, P. (2020). Critical factors of digital transformation success: A literature review. Ekonomika preduzeća, 5-6(septembar-oktobar 2020), 400-415. https://doi.org/10.5937/EKOPRE2006400U
Vu, H., Li, G., & Law, R. (2019). Discovering implicit activity preferences in travel itineraries by topic modeling. Tourism Management, 75, 435-446. https://doi.org/10.1016/j.tourman.2019.06.011
Wallach, H. М., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation Methods for Topic Models. Proceedings of the 26 th International Conference on Machine Learning (pp. 1105– 1112). Montreal, Canada: ACM. https://doi.org/10.1145/1553374.1553515
Wen, H., Park, E., Tao, C.-W., Chae, B., Li, X., & Kwon, J. (2020). Exploring user-generated content related to dining experiences of consumers with food allergies. International Journal of Hospitality Management, 85, 102357. https://doi.org/10.1016/j.ijhm.2019.102357
Yu, J., & Egger, R. (2021). Tourist Experiences at Overcrowded Attractions: A Text Analytics Approach. In W. Wörndl, C. Koo, & J. Stienmetz, Information and Communication Technologies in Tourism 2021. Cham: Springer. https://link.springer.com/ chapter/10.1007/978-3-030-65785-7_21
Zou, S. (2020). National park entrance fee increase: a conceptual framework. Journal of Sustainable Tourism, 28(12), 2099-2117. https://doi.org/10.1080/09669582.2020.1791142
Keywords: Topic modeling; Online reviews; Tourist preferences
REFERENCES
Abdelrazek, A., Eid, Y., Gawish, E., Medhat, W., & Hassan, A. (2023). Topic modeling algorithms and applications: A survey. Information Systems, 112, 102131. https://doi. org/10.1016/j.is.2022.102131
Aghdam, M. H. (2022). A novel constrained non-negative matrix factorization method based on users and items pairwise relationship for recommender systems. Expert Systems with Applications, 195, 116593. https://doi.org/10.1016/j.eswa.2022.116593
Ali, T., Omar, B., & Soulaimane, K. (2022). Analyzing tourism reviews using an LDA topic-based sentiment analysis approach. MethodsX, 9, 101894. https://doi.org/10.1016/j. mex.2022.101894
Anandarajan, M., Hill, C., & Nolan, T. (2019). Practical text analytics: Maximizing the Value of Text Data. Springer Cham. https://doi.org/10.1007/978-3-319-95663-3
Banks, G. C., Woznyj, H. M., Wesslen, R. S., & Ross, R. L. (2018). A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App). Journal of Business and Psychology, 33, 445–459. https://doi.org/10.1007/s10869-017-9528-3
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(2003), 993-1022. Retrieved from https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Calheirosa, A., Moro, S., & Rita, P. (2017). Sentiment Classification of Consumer-Generated Online Reviews Using Topic Modeling. JOURNAL OF HOSPITALITY MARKETING & MANAGEMENT, 26(7), 675–693. https://doi.org/10.1080/19368623.2017.1310075
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT (pp. 4171–4186). Minneapolis, Minnesota, USA: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N19-1423.pdf
Dixon, S. (2023). Media usage in an internet minute as of April 2022. Statista.
Egger, R. (2022). Applied Data Science in Tourism: Interdisciplinary Approaches, Methodologies, and Applications. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-88389-8
Feldman, R., & Sanger, J. (2007). The text mining handbook - Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press. https://doi.org/10.1017/ cbo9780511546914
Greene, D., O'Callaghan, D., & Cunningham, P. (2014). How Many Topics? Stability Analysis for Topic Models. Machine Learning and Knowledge Discovery in Databases, 498-513. https:// doi.org/10.1007/978-3-662-44848-9_32
Grljević, O. (2016). Sentiment u sadržajima sa društvenih mreža kao instrument unapređenja poslovanja. Subotica, Srbija: Autorski reprint.
Grljević, O. (2023). Analiza sadržaja društvenih medija: Napredni pristupi analizi nestrukturisanih podataka. Subotica: Ekonomski fakultet u Subotici.
Grljević, O., Bošnjak, S., Pavlićević, V., & Pavlović, N. (2019). Analysis of public stance on tourism destinations in Srem/Srijem region. In V. Bevanda, & S. Štetić (Eds.), 4th International Thematic Monograph: Modern Management Tools and Economy of Tourism Sector in Present Era (pp. 267-290). Beograd: Association of Economists and Managers of the Balkans in cooperation with the Faculty of Tourism and Hospitality, Ohrid, North Macedonia. https:// doi.org/10.31410/tmt.2019.267
Grljević, O., Bošnjak, Z., & Kovačević, A. (2022). Opinion mining in higher education: a corpus-based approach. Enterprise Information Systems, 16(5), 1773542. https://doi.org/10.1080 /17517575.2020.1773542
Hu, N., Zhang, T., Gao, B., & Bose, I. (2019). What do hotel customers complain about? Text analysis using structural topic model. Tourism Management, 72, 417-426. https://doi.org/10.1016/j. tourman.2019.01.002
Hudson, S., Roth, M., Madden, T., & Hudson, R. (2015). The effects of social media on emotions, brand relationship quality, and word of mouth: An empirical study of music festival attendees. Tourism Management, 47, 68-76. https://doi.org/10.1016/j.tourman.2014.09.001
Hutchison, P. D., Daigle, R. J., & George, B. (2018). Application of latent semantic analysis in AIS academic research. International Journal of Accounting Information Systems, 31, 83-96. https://doi.org/10.1016/j.accinf.2018.09.003
Kim, K., Park, O., Barr, J., & Yun, H. (2019). Tourists’ shifting perceptions of UNESCO heritage sites: lessons from Jeju Island-South Korea. Tourism Review, 74(1), 20-29. https://doi. org/10.1108/TR-09-2017-0140
Kirilenko, A. P., Stepchenkova, S. O., & Dai, X. (2021). Automated topic modeling of tourist reviews: Does the Anna Karenina principle apply? Tourism Management, 83, 104241. https:// doi.org/10.1016/j.tourman.2020.104241
Korenčić, D., Ristov, S., Repar, J., & Šnajder, J. (2021). A Topic Coverage Approach to Evaluation of Topic Models. IEEE Access, 9, 123280-123312. https://doi.org/10.1109/access.2021.3109425
Kovačević, A., Grljević, O., Bošnjak, Z., & Svilengaćin, G. (2020). The Linguistic Construction of Sentiment Expressions in Student Opinionated Content: A Corpus-based study. Poznań Studies in Contemporary Linguistics, 56(2), 207–249. https://doi.org/10.1515/psicl-2020-0006
Laureate, C. D., Buntine, W., & Linger, H. (2023). A systematic review of the use of topic models for short text social media analysis. Artificial Intelligence Review. https://doi.org/10.1007/ s10462-023-10471-x
Lee, D. D., & Seung, S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788-791. https://doi.org/10.1038/44565
Lee, H., & Kang, Y. (2021). Mining tourists’ destinations and preferences through LSTM-based text classification and spatial clustering using Flickr data. Spatial Information Research, 29, 825–839. https://doi.org/10.1007/s41324-021-00397-3
Li, W., Guo, K., Shi, Y., Zhu, L., & Zheng, Y. (2018). DWWP: Domain-specific new words detection and word propagation system for sentiment analysis in the tourism domain. Knowledge-Based Systems, 146, 203-214. https://doi.org/10.1016/j.knosys.2018.02.004
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Haussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118. https://doi.org/10.1080/19312458.2018.1430754
Milosevic, N. (2012). Stemmer for Serbian language. arXiv, arXiv:1209.4471v1. https://doi. org/10.48550/arXiv.1209.4471
Moody, C. (2016). Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec. arXiv, arXiv:1605.02019v1.
Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111-126. https:// doi.org/10.1002/env.3170050203
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. http://dx.doi.org/10.1561/1500000011
Papilloud, C., & Hinneburg, A. (2018). Qualitative Textanalyse mit Topic-Modellen: Eine Einführung für Sozialwissenschaftler. Wiesbaden: Springer. https://doi.org/10.1007/978-3-658-21980-2
Rosner, F., Hinneburg, A., Röder, M., Nettling, M., & Both, A. (2013). Evaluating topic coherence measures. Neural Information Processing Systems Foundation (NIPS 2013). https://doi. org/10.1145/2684822.2685324
Salton, G., & McGill, M. J. (1986). Introduction to Modern Information Retrieval. New York, NY, USA: McGrawHill, Inc.
Shafqat, W., & Byun, Y.-C. (2020). A Recommendation Mechanism for Under-Emphasized Tourist Spots Using Topic Modeling and Sentiment Analysis. Sustainability, 12(1), 320. https:// doi.org/10.3390/su12010320
Tang, J., Meng, Z., Nguyen, X., Mei, Q., & Zhang, M. (2014). Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis. Proceedings of the 31st International Conference on Machine Learning. 32(1), pp. 190-198. Beijing, China: JMLR: W&CP. Retrieved from http://proceedings.mlr.press/v32/tang14.pdf
Ubiparipovic, B., Matkovic, P., Marić, M., & Tumbas, P. (2020). Critical factors of digital transformation success: A literature review. Ekonomika preduzeća, 5-6(septembar-oktobar 2020), 400-415. https://doi.org/10.5937/EKOPRE2006400U
Vu, H., Li, G., & Law, R. (2019). Discovering implicit activity preferences in travel itineraries by topic modeling. Tourism Management, 75, 435-446. https://doi.org/10.1016/j.tourman.2019.06.011
Wallach, H. М., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation Methods for Topic Models. Proceedings of the 26 th International Conference on Machine Learning (pp. 1105– 1112). Montreal, Canada: ACM. https://doi.org/10.1145/1553374.1553515
Wen, H., Park, E., Tao, C.-W., Chae, B., Li, X., & Kwon, J. (2020). Exploring user-generated content related to dining experiences of consumers with food allergies. International Journal of Hospitality Management, 85, 102357. https://doi.org/10.1016/j.ijhm.2019.102357
Yu, J., & Egger, R. (2021). Tourist Experiences at Overcrowded Attractions: A Text Analytics Approach. In W. Wörndl, C. Koo, & J. Stienmetz, Information and Communication Technologies in Tourism 2021. Cham: Springer. https://link.springer.com/ chapter/10.1007/978-3-030-65785-7_21
Zou, S. (2020). National park entrance fee increase: a conceptual framework. Journal of Sustainable Tourism, 28(12), 2099-2117. https://doi.org/10.1080/09669582.2020.1791142

tmt.2023-2024.291.pdf | |
File Size: | 1211 kb |
File Type: |
Association of Economists and Managers of the Balkans
- UdEkoM Balkan -
179 Ustanicka St, 11000 Belgrade, Republic of Serbia
E-mail: [email protected]
www.udekom.org.rs
- UdEkoM Balkan -
179 Ustanicka St, 11000 Belgrade, Republic of Serbia
E-mail: [email protected]
www.udekom.org.rs
Tel. +381 62 812 5779
VAT number: 108747027
Registration number.: 28157347
Registration number.: 28157347