现代图书情报技术 2009, 3(2) 1-8 DOI:     ISSN: 1003-3513 CN: 11-2856/G2

本期目录 | 下期目录 | 过刊浏览 | 高级检索                                                            [打印本页]   [关闭]
论文
扩展功能
本文信息
Supporting info
PDF(720KB)
[HTML全文](KB)
参考文献[PDF]
参考文献
服务与反馈
把本文推荐给朋友
加入我的书架
加入引用管理器
引用本文
Email Alert
本文关键词相关文章
文档聚类描述
文本聚类
文本挖掘
本文作者相关文章
PubMed
Article by Ming,

文本聚类结果描述研究综述*

章成志1,2

1(中国科学技术信息研究所  北京 100038)
2(南京理工大学信息管理系  南京 210094)

摘要

首先对文本聚类结果描述的研究背景和相关的研究情况进行说明,分析自动标引、自动文摘、概念聚类与文本聚类结果描述的关系,定位文本聚类结果描述的研究内容;然后根据文本聚类结果描述的具体要求,对该问题进行形式化;最后给出文本聚类结果描述的评价方法。

关键词 文档聚类描述   文本聚类   文本挖掘  

Survey on Document Clustering Description

Zhang Chengzhi1,2

1(Institute of Scientific and Technical Information of China, Beijing 100038, China)
2(Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094, China)

Abstract:

The research background and related research work about Document Clustering Description (DCD) are given in this paper. The relationship between DCD and automatic indexing, automatic summarization, conceptual clustering is explained and the research content of DCD is definited. According to its requirements, the tasks of DCD are formalized. The evaluation methods of DCD are also described in this paper.

Keywords: Document clustering description   Document clustering   Document mining  
收稿日期 2008-11-18 修回日期  网络版发布日期 2009-02-25 
分类号:

TP391   G252

基金项目:

* 本文系中国博士后科学基金资助项目“多语领域本体学习关键技术研究”(项目编号:20080430463)、南京理工大学科研启动基金项目“主题聚类关键技术研究”(项目编号:AB41123)和“十一五”国家科技支撑计划重点项目“多语言信息服务环境关键技术研究”(项目编号:2006BAH03B02)的研究成果之一。

通讯作者: 章成志 通讯作者E_mail: zhangchz@istic.ac.cn
 

参考文献:

[1] Popescul A, Ungar L. Automatic Labeling of Document Clusters.[EB/OL].[2007-01-10].http://www.cis.upenn.edu/~popescul/Publications/popescul00labeling.pdf
[2] Pucktada T, Jamie C. Automatically Labeling Hierarchical Clusters[C]. In:Proceedings of the 2006 International Conference on Digital government research, San Diego, CA, USA, 2006: 167-176. 
[3] Maqbool O, Babri H A. Interpreting Clustering Results through Cluster Labeling[C]. In:Proceedings of the IEEE International Conference on Emerging Technologies (ICET'05), Islamabad, Pakistan, 2005: 429-434.
[4] Stein B, Meyer zu Eissen S. Topic Identification: Framework and Application[C]. In:Proceedings of the 4th International Conference on Knowledge Management (I-KNOW 04), Graz, Austria, 2004: 353-360.
[5] Lawrie D, Croft W B, Rosenberg A L. Finding Topic Words for Hierarchical Summarization[C]. In:Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01), New Orlean, LA, USA, 2001: 249-357.
[6] Muscat R. Automatic Document Clustering Using Topic Analysis[R]. Technical Report CSAI2005-01, Department of Computer Science & AI, University of Malta, 2005: 1-16.
[7] Li H, Shen D, Zhang B Y, et al. Adding Semantics to Email Clustering[C]. In:Proceedings of the IEEE 6th International Conference on Data Mining (ICDM 06). Hong Kong, China,2006: 18-22.
[8] Dawid W. Descriptive Clustering as a Method for Exploring Text Collections[D]. Poznan University of Technology, Poznań, Poland, 2006: 7-56.
[9] Tseng Y H, Lin C J, Chen H H, et al. Toward Generic Title Generation for Clustered Documents[C]. In:Proceedings of the 3rd Asia Information Retrieval Symposium (AIRS2006), Singapore, 2006: 145-157.
[10] Han J, Kamber M. Data Mining: Concepts and Techniques [M]. San Francisco: Morgan Kaufmann, 2001: 376-379.
[11] Glenisson P, Gl nzel W, Janssens F, et al. Combining Full Text and Bolometric Information in Mapping Scientific Disciplines[J]. Information Processing & Management, 2005, 41(6): 1548-1572.
[12] Lai K K, Wu S J. Using the Patent Co-citation Approach to Establish a New Patent Classification System [J]. Information Processing & Management, 2005, 41(2): 313-330.
[13] Cutting D R, Karger D R, Pedersen J O, et al. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections[C]. In:Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92), Copenhagen, Denmark, 1992: 318-329.
[14] Cutting D R, Karger D R, Pedersen J O. Constant Interaction-time Scatter/Gather Browsing of Large Document Collections[C]. In:Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’93), Pittsburgh, PN, USA, 1993: 126-135.
[15] Muller A, Dorre J, Gerstl P, et al. The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection[C]. In:Proceedings of the 32nd Hawaii International Conference on System Sciences (HICSS1999), Maui, HI, USA, 1999: 2034-2042.
[16] Anton V L, Croft W B. An Evaluation of Techniques for Clustering Search Results[R]. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst, 1996: 1-19.
[17] Zamir O, Etzioni O. Web Document Clustering: A Feasibility Demonstration[C]. In:Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), Melbourne, Australia, 1998: 46-54.
[18] Glover E, Pennock D M, Lawrence S, et al. Inferring Hierarchical Descriptions[C]. In:Proceedings of the 11th International Conference on Information and Knowledge Management (CKIM2002), McLean, VA, 2002: 4-9.
[19] Luhn H P. The Automatic Creation of Literature Abstract[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
[20] Michalski R S, Stepp R E.Learning from Observation: Conceptual Clustering [A]//Michalski R S, Carbonell J G, Mitchell T M eds. Machine Learning: An Artificial Intelligence Approach [C], San Mateo, CA: Morgan Kauffmann, 1983: 331-363.
[21] Michalski R S. Knowledge Acquisition through Conceptual Clustering: A Theoretical Framework and an Algorithm for Partitioning Data into Conjunctive Concepts [J]. Journal of Policy Analysis and Information Systems, 1980, 4(3): 219-244.
[22] Fisher D H. Knowledge Acquisition via Incremental Conceptual Clustering [J]. Machine Learning, 1987, 2: 139–172.
[23] Kolodner J L. Reconstructive Memory: A Computer Model [J]. Cognitive Science, 1983, 7, 281-328.
[24] Lebowitz M. Experiments with Incremental Concept Formation [J]. Machine Learning, 1987, 2: 103–138.
[25] Hanson S J, Bauer M. Conceptual Clustering, Categorization and Polymorphy [J]. Machine Learning, 1989, 3: 343–372.
[26] Thompson K, Langley P. Incremental Concept Formation with Composite Objects[C]. In:Proceedings of the 6th International Worksho Pon Machine Learning (ICML-89), Ithaca, NY, USA, 1989: 373–374.
[27] Carpineto C, Romano G G. An Order-theoretic Approach to Conceptual Clustering[C]. In:Proceedings of 10th International Conference on Machine Learning, Amherst (ICML-93), MA, USA, 1993: 33–40.
[28] Agrawal R, Gehrke J E, Gunopulos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications[C]. In:Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD98), Seattle, WA, USA, 1998: 94–105.
[29] Biswas G, Weinberg J B, Fisher D H. Iterate: A Conceptual Clustering Algorithm for Data Mining [J]. IEEE Transactions on Systems, Man, and Cybernetics (Part C), 1998, 28(2): 100–111.
[30] Talavera L, Béjar J. Generality-based Conceptual Clustering with Probabilistic Concepts [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23: 196–206.
[31] Jonyer I, Cook D J, Holder L B. Graph-based Hierarchical Conceptual Clustering [J]. Journal of Machine Learning Research, 2001, 2: 19-43.
[32] Google 网页目录[EB/OL]. [ 2007-02-01]. http://www.google.com/dirhp?hl=zh-CN.
[33] Yahoo! Business_and_Economy[EB/OL].[ 2007-02-01].http://gb.chinese.yahoo.com/Business_and_Economy/.
[34] 工商经济.搜狐分类目录[EB/OL].[ 2007-02-01].http://www.sogou.com/c002/c002.html.
[35] CNKI主题数字图书馆[EB/OL]. [ 2007-02-01]. http://topic.cnki.net/search.aspx?class=a1.
[36] Gao B J, Ester M. Cluster Description Formats, Problems and Algorithms[C]. In:Proceedings of the Sixth SIAM International Conference on Data Mining (SDM06), Bethesda, MD, USA, 2006.
[37] 侯汉清, 马张华. 主题法导论[M]. 北京: 北京大学出版社, 1991: 16-18.
[38] 晏生宏, 黄莉. 英文易读度测量程序开发探索[J]. 重庆大学学报(社会科学版), 2005, 11(2): 92-97.
[39] 邵培仁. 传播学[M]. 北京: 高等教育出版社, 2000: 131-132.
[40] Kummamuru K, Lotlikar R, Roy S, et al. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results[C]. In:Proceedings International WWW Conference (WWW2004), New York, NY, USA, 2004: 658-665.
[41] Yang Y M, Pedersen J. A Comparative Study on Feature Selection in Text Categorization[C]. In:Proceedings of the International Conference on Machine Leaning (ICML’97), Nashville, TN, USA 1997: 412-420.
[42] Ayad H, Kamel M. Topic Discovery from Text Using Aggregation of Different Clustering Methods[C]. In:Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence, 2002: 161-175.
[43] 章成志. 主题聚类及其应用研究[D]. 南京:南京大学, 2007: 28-50.

本刊中的类似文章
1.王伟,许鑫.基于聚类的网络舆情热点发现及分析*[J]. 现代图书情报技术, 2009,3(3): 74-79
2.章成志,王惠临.多语言文本聚类研究综述*[J]. 现代图书情报技术, 2009,25(6): 31-36
3.刘佳佳,董茗,方曙 .国外专利分析工具的比较研究[J]. 现代图书情报技术, 2007,2(2): 67-74
4.秦春秀,刘怀亮,赵捧未 .一种基于本体论和潜在语义索引的文本语义处理方法*[J]. 现代图书情报技术, 2006,1(9): 34-37
5.殷蜀梅,张智雄,吴振新.一种从医学文本中实现自动关键词抽取和筛选的技术方法*[J]. 现代图书情报技术, 2008,24(8): 31-36
6.崔雷,刘伟,闫雷,张晗,侯跃芳,黄莹娜,张浩 .文献数据库中书目信息共现挖掘系统的开发*[J]. 现代图书情报技术, 2008,24(8): 70-75
7.陆国丽,王小华,王荣波.最大词重降维算法与模拟退火算法相结合的文本聚类方法研究[J]. 现代图书情报技术, 2008,24(12): 43-47
8.饶洋辉,叶良,程洁.WordNet在文本聚类中的应用研究*[J]. 现代图书情报技术, 2009,(10): 67-70
9.王连军 .Web文本挖掘浅析[J]. 现代图书情报技术, 2002,18(6): 38-40
10.王艳.数据挖掘在数字图书馆中的应用[J]. 现代图书情报技术, 2002,18(5): 8-10

Copyright 2008 by 现代图书情报技术