People Innovation Excellence

Enhancing Focus Topic Findings through Corpus Classifier Algorithm

Reina Setiawan*, Widodo Budiharto, Iman Herwidiana Kartowisastro, and Harjanto Prabowo
Bina Nusantara University, Jl. K. H. Syahdan No. 9, Jakarta 11480, Indonesia

In learning management system, a discussion forum, in which the students and lecturers are involved actively as part of the learning method, enriches the context of communication, thereby enhancing the students’ learning and performance. The aim of this paper was to determine the appropriate topics for a discussion forum for learning management systems through enhanced probabilistic latent semantic analysis (PLSA) with the corpus classifier algorithm. In preparing the paper, the methods used were PLSA and the classifying process, which classifies the documents to become a corpus based on the similarity word approach. The similarity word is influenced by the term-frequency of the word in the document. The novel concept in this paper is the corpus classifier algorithm. The experiment was conducted using three approaches to discover the topic, and it used 4,868 distinct words from 234 documents. The documents were contained in three threads subject. The post of the discussion forum is the text document. The performance of the result was measured by the f-measure, which was calculated for each thread subject. The corpus classifier algorithm was used in the second approach, and third approach increased the average f-measure values for the second and third thread subjects by approximately 24 and 17%, respectively.

Topic Findings, PLSA, Corpus Classification, Similarity Word, Discussion Forum

Full Text
Download full text in PDF


  1. Alwi, H., Dardjowidjojo, S., Lapoliwa, H., & Moeliono, A. M. (2003). Tata Bahasa Baku Bahasa Indonesia. Jakarta: Balai Pustaka.
  2. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern Information Retrieval: the concept and technology behind search (second). Pearson Education Limited.
  3. Balaji, M. S., & Chakrabarti, D. (2010). Student Interactions in Online Discussion Forum: Empirical Research from “Media Richness Theory” Perspective. Journal of Interactive Online Learning, 9(1).
  4. Chen, B. (2009). Latent Topic Modeling of Word Co-occurrence Information for Spoken Document Retrieval. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009 (pp. 3961–3964).
  5. Cheng, C. K., Paré, D. E., Collimore, L. M., & Joordens, S. (2011). Assessing the effectiveness of a voluntary online discussion forum on improving students’ course performance. Computers and Education, 56(1), 253–261.
  6. Chowdhury, A. K., & Shanmugan, V. (2015). Information Technology: Impacts on Environment and Sustainable. Pertanika Journal of Science and Technology, 23(1), 127–139.
  7. Gundel, J. K., & Fretheim, T. (2004). Topic and Focus. The Handbook of Pragmatics, 175, 1–19.
  8. Hofmann, T. (1999). Probabilistic Latent Semantic Analysis. Uncertainty in Artificial Intelligence – UAI’99, 8.
  9. Hofmann, T. (2001). Unsupervised learning by probabilistic Latent Semantic Analysis. Machine Learning, 42(1–2), 177–196.
  10. Hong, C., Chen, W., Zheng, W., Shan, J., Chen, Y., & Zhang, Y. (2008). Parallelization and Characterization of Probabilistic Latent Semantic Analysis. In 2008 37th International Conference on Parallel Processing (pp. 628–635).
  11. Khambari, M. N., Luan, W. S., Fauzi, A., & Ayub, M. (2012). Promoting Teachers’ Technology Professional Development through Laptops. Pertanika Journal of Social Sciences and Humanities, 20(1), 137–145.
  12. Kushartanti, Yuwono, U., & Lauder, M. R. M. T. (2007). Pesona Bahasa Langkah Awal Memahami Linguistik. Gramedia Pustaka Utama, Jakarta.
  13. Li, N., Luo, W., Yang, K., Zhuang, F., He, Q., & Shi, Z. (2017). Self‑organizing Weighted Incremental Probabilistic Latent Semantic Analysis. International Journal of Machine Learning and Cybernetics, 0(0), 1–12.
  14. Manning, C. D., Raghavan, P., & Schutze, H. (2009). An Introduction to Information Retrieval. Information Retrieval. Online edition (c) 2009 Cambridge UP.
  15. Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic Evaluation of Topic Coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 100–108).
  16. Pedersen, J. M., & Elsner, R. (2017). Learning Management Systems on Blended Learning Courses : An Experience-Based. In International Conference on Image Processing and Communications (pp. 141–148).
  17. Piña, A. A. (2018). An Educational Leader’s View of Learning Management Systems. In Leading and Managing e-Learning (pp. 101–113). Springer.
  18. Purver, M., Konrad, P. K., Tenenbaum, J. B., & Griffiths, T. L. (2006). Unsupervised Topic Modelling for Multi-Party Spoken Discourse. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL (pp. 17–24).
  19. Rehurek, R., & Sojka, P. (2009). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.
  20. Saracevic, T. (1975). Relevance: A Review of and o Framework for the Thinking on the Notion in Information Science. Journal of the American Society for Information Science, 26,(6), 321–344.
  21. Schoonenboom, J. (2014). Using An Adapted, Task-Level Technology Acceptance Model to Explain Why Instructors in Higher Education Intend to Use Some Learning Management System Tools more than Others. Computers & Education, 71, 247–256.
  22. Setiawan, R., Kurniawan, A., Budiharto, W., Kartowisastro, I. H., & Prabowo, H. (2016). Flexible affix classification for stemming Indonesian Language. In 2016 13th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2016 (pp. 1–6).
  23. Shi, J., Tian, X., Jiang, Z., Zhao, D., & Lu, M. (2016). Sparsity-constrained probabilistic latent semantic analysis for land cover classification. In Geoscience and Remote Sensing Symposium (IGARSS) (pp. 5453–5456).
  24. Smet, W. De. (2009). Cross-Language Linking of News Stories on the Web using Interlingual Topic Modelling. In Proceedings of the 2nd ACM workshop on Social web search and mining (pp. 57–64).
  25. Tala, F. Z. (2003). A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia. M.Sc. Thesis, Appendix D, pp, 39–46.
  26. Tomar, G. S., Singh, M., Rai, S., Kumar, A., Sanyal, R., & Sanyal, S. (2013). Probabilistic Latent Semantic Analysis for Unsupervised Word Sense Disambiguation. IJCSI International Journal of Computer Science Issues, 10(5), 127–133.
  27. Uys, J. W., Preez, N. D., & Uys, E. W. (2008). Leveraging Unstructured Information Using Topic Modelling. In PICMET 2008 Proceedings (pp. 27–31). Cape Town, South Africa.
  28. Wang, J., Liu, P., She, M. F. H., Kouzani, A., & Nahavandi, S. (2013). Neurocomputing Supervised learning probabilistic Latent Semantic Analysis for human motion analysis. Neurocomputing, 100, 134–143.
  29. Zhai, C. (2017). Probabilistic Topic Models for Text Data Retrieval and Analysis. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1399–1401).

Download full text in PDF

Published at : Updated

Periksa Browser Anda

Check Your Browser

Situs ini tidak lagi mendukung penggunaan browser dengan teknologi tertinggal.

Apabila Anda melihat pesan ini, berarti Anda masih menggunakan browser Internet Explorer seri 8 / 7 / 6 / ...

Sebagai informasi, browser yang anda gunakan ini tidaklah aman dan tidak dapat menampilkan teknologi CSS terakhir yang dapat membuat sebuah situs tampil lebih baik. Bahkan Microsoft sebagai pembuatnya, telah merekomendasikan agar menggunakan browser yang lebih modern.

Untuk tampilan yang lebih baik, gunakan salah satu browser berikut. Download dan Install, seluruhnya gratis untuk digunakan.

We're Moving Forward.

This Site Is No Longer Supporting Out-of Date Browser.

If you are viewing this message, it means that you are currently using Internet Explorer 8 / 7 / 6 / below to access this site. FYI, it is unsafe and unable to render the latest CSS improvements. Even Microsoft, its creator, wants you to install more modern browser.

Best viewed with one of these browser instead. It is totally free.

  1. Google Chrome
  2. Mozilla Firefox
  3. Opera
  4. Internet Explorer 9