现代图书情报技术 2008, 24(8) 31-36  DOI:      ISSN: 1003-3513 CN: 11-2856/G2

本期目录 | 下期目录 | 过刊浏览 | 高级检索                                                            [打印本页]   [关闭]
论文
扩展功能
本文信息
Supporting info
PDF(525KB)
[HTML全文](KB)
参考文献[PDF]
参考文献
服务与反馈
把本文推荐给朋友
加入我的书架
加入引用管理器
引用本文
Email Alert
本文关键词相关文章
关键词抽取
关键词筛选
BM25F
MMTx
文本挖掘
医学数据挖掘
本文作者相关文章
殷蜀梅
张智雄
吴振新
PubMed
Article by
Article by
Article by

一种从医学文本中实现自动关键词抽取和筛选的技术方法*

殷蜀梅1   张智雄2    吴振新2

1(北京大学医学图书馆 北京 100083)
2(中国科学院国家科学图书馆 北京100190)

摘要

鉴于重要关键词对于文本有着重要的强文本表示功能,关键词抽取和筛选在信息检索、信息抽取和知识挖掘等领域中有着重要的作用。在调研当前关键词抽取的方法后,结合医学领域已有的叙词表和工具以及BM25F加权词频公式提出基于医学文本的重要关键词抽取和筛选的技术方法。该方法主要解决两个关键问题:关键词的识别和抽取、关键词重要性的衡量和筛选。以2001-2007年骨关节炎领域的文献集合为数据来源,对该技术方法进行实践尝试,并验证其实际有效性,为知识挖掘中的重要关键词抽取提供一个行之有效的途径。

关键词 关键词抽取    关键词筛选   BM25F   MMTx   文本挖掘   医学数据挖掘  

A Method for Automatic Keyword Extraction and Filtration from Medical Texts

Yin Shumei1  Zhang Zhixiong2   Wu Zhenxin2

1 (Peking University Health Science Library, Beijing 100083,China) 
2 (National Science Library, Chinese Academy of Sciences, Beijing 100190,China)

Abstract:

Seeing that the keyword or key phrase can represent the feature of text, keyword extraction and filtration has great significance for information retrieval, information extraction and knowledge discovery. This paper first investigates current keyword extraction methods. Then it uses existing thesaurus and tools in the medical field and BM25F model in proposing a method for keyword extraction and filtration from medical texts. The proposed method mainly solves two key problems:identification and extraction of keywords, evaluation of keyword value and filtration of keywords. This paper applies the method on documents in the field of osteoarthritis from the year 2001 to 2007, and verifies its effectiveness, which offers an effective way for extracting keywords in knowledge discovery.

Keywords: Keyword extraction   Keyword filtration   BM25F   MMTx   Text mining   Medical data mining  
收稿日期 2008-06-16 修回日期  网络版发布日期 2008-08-25 
分类号:

G250.73

基金项目:

*本文系国家社会科学基金项目“从数字信息资源中实现知识抽取的理论和方法研究”(项目编号:05BTQ006)的研究成果之一。

通讯作者: 殷蜀梅 通讯作者E_mail: Yinshumei@lib.bjmu.edu.cn
 

参考文献:

[1] 刘华. 基于文本分类中特征提取的领域词语聚类[J]. 语言文字应用,2007(1):139-144.
[2] Blank G D,Pottenger W M, Kessler C D. CIMEL:Constructive and Collaborative, Inquiry-based Multimedia E-Learning[EB/OL]. [2007-08-01].   http://dimacs.rutgers.edu/~billp/pubs/ITICSE01.pdf.
[3] Porter A L,Detampel M J. Technology Opportunities Analysis[J]. Technological Forecasting and Social Change, 1995,49:237-255.
[4] Essential Science Indicators[EB/OL]. [2007-08-01]. http://www.esi-topics.com/RFmethodology.html.
[5] Swan R, Jensen D. TimeMines:Constructing Timelines with Statistical Models of Word Usage[EB/OL]. [2007-08-01].  http://www.cs.cmu.edu/~dunja/KDDpapers/Swan_TM.pdf.
[6] Lowe HJ, Barnett GO. Remote Access MicroMeSH:A Microcomputer System for Searching MEDLINE[C].In: The Proceedings Annual Symposium on Computer Application in Medical Care, 1988:535-539.
[7] Miller RA, Gieszczykiewicz FM, Vries JK, et al. CHARTLINE:Providing Bibliographic References Relevant to Patient Charts Using the UMLS Metathesaurus Knowledge Sources[C].In:the Proceedings Annual Symposium on Computer Application in Medical Care. 1992:86-90.
[8] Evans DA, Hersh WR, Monarch IA, et al. Automatic Indexing of Abstracts via Natural-language Processing Using a Simple Thesaurus[J]. Medical Decision Making, 1991,11(4):S108-S115.
[9] Gordon M, Holt DG, Panigrahi A, et al. Genome-wide Dynamics of SAPHIRE, an Essential Complex for Gene Activation and Chromatin Boundaries[J]. Molecular and Cellular Biology, 2007,27(11):4058-69.
[10] MMTx[EB/OL]. [2007-08-01].  http://mmtx.nlm.nih.gov/.
[11] Aronson A R. MetaMap Variant Generation[EB/OL]. [2007-08-01]. http://skr.nlm.nih.gov/papers/references/mm.variants.pdf.
[12] Robertson S E, Walker S. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval[EB/OL]. [2007-08-01]. http://www.computing.dcu.ie/~gjones/Teaching/CA437/p232.pdf.
[13] Robertson S E,  Walker S, Jones K S, et al. Okapi at TREC-3[C]. In:Proceedings of 3rd Text Retrieval Conference (TREC-3), 1995, 109-126.
[14] 陆伟. 基于域加权词频法的XML文档级检索实现与评价[J]. 中国图书馆学报, 2006(6):57-60.
[15] de Mattei M, Pellati A, Pasello M, et al. High Doses of Glucosamine-HCl have Detrimental Effects on Bovine Articular Cartilage Explants Cultured in Vitro[J]. Osteoarthritis and Cartilage. 2002,10(10):816-25.

本刊中的类似文章
1.章成志,王惠临.多语言文本聚类研究综述*[J]. 现代图书情报技术, 2009,25(6): 31-36
2.殷蜀梅 .基于Medline的医学数据挖掘系统研究*[J]. 现代图书情报技术, 2007,2(4): 12-16
3.崔雷,刘伟,闫雷,张晗,侯跃芳,黄莹娜,张浩 .文献数据库中书目信息共现挖掘系统的开发*[J]. 现代图书情报技术, 2008,24(8): 70-75
4.章成志.文本聚类结果描述研究综述*[J]. 现代图书情报技术, 2009,3(2): 1-8
5.王连军 .Web文本挖掘浅析[J]. 现代图书情报技术, 2002,18(6): 38-40
6.王艳.数据挖掘在数字图书馆中的应用[J]. 现代图书情报技术, 2002,18(5): 8-10

Copyright 2008 by 现代图书情报技术