Klastering Dokumen dengan Menambahkan Metadata Menggunakan Algoritma COATES
DOI:
https://doi.org/10.15575/kubik.v2i2.1859Keywords:
text mining, metadata, klastering teks, algoritma k-means, algoritma COATESAbstract
Text mining adalah proses ekstraksi pola berupa informasi dan pengetahuan yang berguna dari sejumlah besar sumber data tak terstruktur. Salah satu perkembangan text mining adalah ruang lingkup perbaikan dari pemanfaatan sebuah “side information† yang digunakan untuk membantu proses klastering yang lebih efisien.  “side information† yang dimiliki data dapat membantu proses text mining jika “side information† tersebut bersifat informatif. Di dalam “side information†, metadata merupakan bagian dari “side information† yang dimiliki oleh data. Oleh karena itu, algoritma klastering partisi klasik dan model probabilistik dalam text mining telah dikembangkan untuk memproses data bersama “side information† dengan menggunakan algoritma Content and Auxiliary attribute Based Text Clustering (COATES). Adapun proses klastering ini menggunakan inisialisasi klaster dengan algoritma k-means berdasarkan perhitungan jarak euclidean distance.
References
Shraddha S. Bhanuse, Shailesh D. Kamble, Sandeep M. Kakde. “Text Mining using Metadata for Generation of Side Informationâ€.in Proc. ICISP(2015) .pp 807-814.
Wikipedia, “Metadata†(online), (https://id.wikipedia.org/wiki/Metadata. diakses tanggal 5 september, pukul 13.20)
C. C. Aggarwal and H. Wang, Managing and Mining Graph Data.New York, NY, USA: Springer, 2010
C. Silverstein and J. Pedersen, “Almost-constant time clustering of arbitrary corpus sets,†in Proc. ACM SIGIR Conf., New York, NY, USA, 1997, pp. 60–66
C. C. Aggarwal and C.-X. Zhai, Mining Text Data. New York, NY, USA: Springer, 2012.
Ms. Neha Tiwari dan Prof. Gaima Singh. “ A Framework For Mining Of Text Data With The Application Of Side Informationâ€. 2015
Mrunal V. Uspani , dan Rucha C. Samant. “ Clustering and Classification based on Meta Information using COATES and COLT Algorithmâ€.2015
Monica. M dan Ganesh. J. “An Effective Clistering Approach for Mining Text Data Using Side Informationâ€. 2014
Shilpa S. Raut dan Prof. V. Maral. dul “ Text Clustering and Classification on The Use of Side Information†. 2014
Nikhil Patankar dan Sailee Salkar. “On the use of Side Information Based Improved K-Means Algorithm for Text Clusteringâ€. 2015
Mrunal V. Uspani , dan Rucha C. Samant. “Meta Information Based On Text Clustering and Classification with the Use of COATES and COLT Algorithmâ€. 2015
C. C. Aggarwal and P. S. Yu, “On text clustering with side information,†in Proc. IEEE ICDE Conf. Washington, DC, USA,2012.
Downloads
Published
How to Cite
Issue
Section
Citation Check
License
Authors who publish in KUBIK: Jurnal Publikasi Ilmiah Matematika agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Â