知识图谱-数据集

原文链接:https://blog.csdn.net/qq_21097885/article/details/104562276

DBpedia

网址:https://wiki.dbpedia.org/

简介:
DBpedia 是一个很特殊的语义网应用范例,它从维基百科(Wikipedia)的词条里撷取出结构化的资料,以强化维基百科的搜寻功能,并将其他资料集连结至维基百科。透过这样的语意化技术的介入,让维基百科的庞杂资讯有了许多创新而有趣的应用,例如手机版本、地图整合、多面向搜寻、关系查询、文件分类与标注等等。DBpedia 同时也是世界上最大的多领域知识本体之一,也是 Linked Data 的一部分,美国科技媒体 ReadWriteWeb 也将 DBpedia 选为2009 年最佳的语义网应用服务。

DBpedia 2014 版的资料集拥有超过458万的物件,包括144万5000人、73万5000个地点、12万3000张唱片、8万7千部电影、1万9000种电脑游戏、24万1000个组织、25万1000种物种和6000个疾病。其资料不仅被BBC、路透社、纽约时报所采用,也是Google、Yahoo等搜寻引擎检索的对象。

2016年发布的版本中,包括了95亿条RDF格式的三元组数据,其中13亿条是从英文版的维基百科中提取的50亿条来自其他语言,另外32亿条来自Depedia Commons和Wikidata。

文献:

@article{DBLP:journals/ws/BizerLKABCH09,author    = {Christian Bizer andJens Lehmann andGeorgi Kobilarov andS{\"{o}}ren Auer andChristian Becker andRichard Cyganiak andSebastian Hellmann},title     = {DBpedia - {A} crystallization point for the Web of Data},journal   = {J. Web Semant.},volume    = {7},number    = {3},pages     = {154--165},year      = {2009},url       = {https://doi.org/10.1016/j.websem.2009.07.002},doi       = {10.1016/j.websem.2009.07.002},timestamp = {Fri, 27 Dec 2019 21:12:44 +0100},biburl    = {https://dblp.org/rec/journals/ws/BizerLKABCH09.bib},bibsource = {dblp computer science bibliography, https://dblp.org}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

Yago

网址:https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/

中文简介:
Yago是一个开源的数据集,其中的数据是从维基百科、WordNet和GeoNames等多个数据源中自动提取得到的。截止到2012年,就包括超过1千万个实体和1.2亿条事实。

英文简介:

YAGO (Yet Another Great Ontology) is an open source knowledge base developed at the Max Planck Institute for Computer Science in Saarbrücken. It is automatically extracted from Wikipedia and other sources.

As of 2012, YAGO3 has knowledge of more than 10 million entities and contains more than 120 million facts about these entities. The information in YAGO is extracted from Wikipedia (e.g., categories, redirects, infoboxes), WordNet (e.g., synsets, hyponymy), and GeoNames. The accuracy of YAGO was manually evaluated to be above 95% on a sample of facts.[To integrate it to the linked data cloud, YAGO has been linked to the DBpedia ontology[6] and to the SUMO ontology.

YAGO3 is provided in Turtle and tsv formats. Dumps of the whole database are available, as well as thematic and specialized dumps. It can also be queried through various online browsers and through a SPARQL endpoint hosted by OpenLink Software. The source code of YAGO3 is available on GitHub.

YAGO has been used in the Watson artificial intelligence system.

文献:

@inproceedings{DBLP:conf/www/SuchanekKW07,author    = {Fabian M. Suchanek andGjergji Kasneci andGerhard Weikum},editor    = {Carey L. Williamson andMary Ellen Zurko andPeter F. Patel{-}Schneider andPrashant J. Shenoy},title     = {Yago: a core of semantic knowledge},booktitle = {Proceedings of the 16th International Conference on World Wide Web,{WWW} 2007, Banff, Alberta, Canada, May 8-12, 2007},pages     = {697--706},publisher = {{ACM}},year      = {2007},url       = {https://doi.org/10.1145/1242572.1242667},doi       = {10.1145/1242572.1242667},timestamp = {Wed, 14 Nov 2018 10:55:41 +0100},biburl    = {https://dblp.org/rec/conf/www/SuchanekKW07.bib},bibsource = {dblp computer science bibliography, https://dblp.org}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

Freebase

网址:http://www.freebase.be/

简介:
类似于维基百科,Freebase的内容是由社区成员贡献的结构化知识。除了人工输入外,Freebase也主动导入如维基百科的结构化知识。
目前,已经被谷歌公司收购。

论文中常用其子集FB13,详见:https://blog.csdn.net/qq_21097885/article/details/103519703

文献:

@inproceedings{DBLP:conf/sigmod/BollackerEPST08,author    = {Kurt D. Bollacker andColin Evans andPraveen Paritosh andTim Sturge andJamie Taylor},editor    = {Jason Tsong{-}Li Wang},title     = {Freebase: a collaboratively created graph database for structuringhuman knowledge},booktitle = {Proceedings of the {ACM} {SIGMOD} International Conference on Managementof Data, {SIGMOD} 2008, Vancouver, BC, Canada, June 10-12, 2008},pages     = {1247--1250},publisher = {{ACM}},year      = {2008},url       = {https://doi.org/10.1145/1376616.1376746},doi       = {10.1145/1376616.1376746},timestamp = {Tue, 27 Nov 2018 10:40:37 +0100},biburl    = {https://dblp.org/rec/conf/sigmod/BollackerEPST08.bib},bibsource = {dblp computer science bibliography, https://dblp.org}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

WordNet

网址: https://wordnet.princeton.edu/

中文简介:
WordNet是一个大型的英语词汇数据库。其中,名词、动词、形容词以及副词被按照认知上的同义词分组,称为synsets,每一个synset表征一个确定的概念。synset之间通过概念语义以及词汇关系链接。WordNet是计算机语言学和自然语言处理中常用的工具。
在汉语中,类似的有知网的HowNet。

论文中常用其子集WN11,详见:https://blog.csdn.net/qq_21097885/article/details/103519635;
以及WN18,详见:https://blog.csdn.net/qq_21097885/article/details/103519750

英文简介:
WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.

WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. However, there are some important distinctions. First, WordNet interlinks not just word forms—strings of letters—but specific senses of words. As a result, words that are found in close proximity to one another in the network are semantically disambiguated. Second, WordNet labels the semantic relations among words, whereas the groupings of words in a thesaurus does not follow any explicit pattern other than meaning similarity.

文献:

@article{DBLP:journals/cacm/Miller95,author    = {George A. Miller},title     = {WordNet: {A} Lexical Database for English},journal   = {Commun. {ACM}},volume    = {38},number    = {11},pages     = {39--41},year      = {1995},url       = {http://doi.acm.org/10.1145/219717.219748},doi       = {10.1145/219717.219748},timestamp = {Wed, 14 Nov 2018 10:22:30 +0100},biburl    = {https://dblp.org/rec/journals/cacm/Miller95.bib},bibsource = {dblp computer science bibliography, https://dblp.org}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

PDD

网址:http://pdd.wangmengsd.com/

中文简介:
PDD,全称Patient-Disease-Drug,是一个医疗相关的数据集,包含了患者、疾病和药物之间的连接关系。

英文简介:
What is PDD Graph (Patient-Disease-Drug Graph):

Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Facing with patients symptoms, experienced caregivers make right medical decisions based on their professional knowledge that accurately grasps relationships between symptoms, diagnosis, and treatments. We aim to capture these relationships by constructing a large and high-quality heterogeneous graph linking patients, diseases, and drugs (PDD) in EMRs.

Specifically, we extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented is accessible on the Web via the SPARQL endpoint, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.

文献:

@inproceedings{DBLP:conf/semweb/WangZLHWLL17,author    = {Meng Wang andJiaheng Zhang andJun Liu andWei Hu andSen Wang andXue Li andWenqiang Liu},editor    = {Claudia d'Amato andMiriam Fern{\'{a}}ndez andValentina A. M. Tamma andFreddy L{\'{e}}cu{\'{e}} andPhilippe Cudr{\'{e}}{-}Mauroux andJuan F. Sequeda andChristoph Lange andJeff Heflin},title     = {{PDD} Graph: Bridging Electronic Medical Records and Biomedical KnowledgeGraphs via Entity Linking},booktitle = {The Semantic Web - {ISWC} 2017 - 16th International Semantic Web Conference,Vienna, Austria, October 21-25, 2017, Proceedings, Part {II}},series    = {Lecture Notes in Computer Science},volume    = {10588},pages     = {219--227},publisher = {Springer},year      = {2017},url       = {https://doi.org/10.1007/978-3-319-68204-4\_23},doi       = {10.1007/978-3-319-68204-4\_23},timestamp = {Tue, 14 May 2019 10:00:53 +0200},biburl    = {https://dblp.org/rec/conf/semweb/WangZLHWLL17.bib},bibsource = {dblp computer science bibliography, https://dblp.org}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

近些年,国内也推出了以中文为主的知识图谱。如清华大学的XLore、上海交通大学的zhishi.me和复旦大学的CNpedia。


清华大学的XLore

网址: https://xlore.org/

简介:
XLORE是融合中英文维基、法语维基和百度百科,对百科知识进行结构化和跨语言链接构建的多语言知识图谱,是中英文知识规模较平衡的大规模多语言知识图谱。XLORE包含16,284,901个的实例,2,466,956个概念,446,236个属性以及丰富的语义关系。

文献:

@inproceedings{DBLP:conf/semweb/WangLWLLZSLZT13,author    = {Zhigang Wang andJuanzi Li andZhichun Wang andShuangjie Li andMingyang Li andDongsheng Zhang andYao Shi andYongbin Liu andPeng Zhang andJie Tang},editor    = {Eva Blomqvist andTudor Groza},title     = {XLore: {A} Large-scale English-Chinese Bilingual Knowledge Graph},booktitle = {Proceedings of the {ISWC} 2013 Posters {\&} Demonstrations Track,Sydney, Australia, October 23, 2013},series    = {{CEUR} Workshop Proceedings},volume    = {1035},pages     = {121--124},publisher = {CEUR-WS.org},year      = {2013},url       = {http://ceur-ws.org/Vol-1035/iswc2013\_demo\_31.pdf},timestamp = {Wed, 12 Feb 2020 16:44:51 +0100},biburl    = {https://dblp.org/rec/conf/semweb/WangLWLLZSLZT13.bib},bibsource = {dblp computer science bibliography, https://dblp.org}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

上海交通大学的zhishi.me

网址:

简介:
Zhishi.me 通过从开放的百科数据中抽取结构化数据,首次尝试构建中文通用知识图谱。目前,已融合了三大中文百科,百度百科,互动百科以及维基百科中的数据。

文献:

@inproceedings{DBLP:conf/semweb/NiuSWRQY11,author    = {Xing Niu andXinruo Sun andHaofen Wang andShu Rong andGuilin Qi andYong Yu},editor    = {Lora Aroyo andChris Welty andHarith Alani andJamie Taylor andAbraham Bernstein andLalana Kagal andNatasha Fridman Noy andEva Blomqvist},title     = {Zhishi.me - Weaving Chinese Linking Open Data},booktitle = {The Semantic Web - {ISWC} 2011 - 10th International Semantic Web Conference,Bonn, Germany, October 23-27, 2011, Proceedings, Part {II}},series    = {Lecture Notes in Computer Science},volume    = {7032},pages     = {205--220},publisher = {Springer},year      = {2011},url       = {https://doi.org/10.1007/978-3-642-25093-4\_14},doi       = {10.1007/978-3-642-25093-4\_14},timestamp = {Thu, 28 Nov 2019 10:44:37 +0100},biburl    = {https://dblp.org/rec/conf/semweb/NiuSWRQY11.bib},bibsource = {dblp computer science bibliography, https://dblp.org}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29

复旦大学的CN-DBpedia

网址: http://kw.fudan.edu.cn/cndbpedia/intro/

简介:
CN-DBpedia以通用百科知识沉淀为主线,以垂直纵深领域图谱积累为支线,致力于为机器语义理解提供了丰富的背景知识,为实现机器语言认知提供必要支撑。
CN-DBpedia已经从百科领域延伸至法律、工商、金融、文娱、科技、军事、教育、医疗等十多个垂直领域,为各类行业智能化应用提供支撑性知识服务,目前已有近百家单位在使用。

文献:

@inproceedings{DBLP:conf/ieaaie/XuXLXLCX17,author    = {Bo Xu andYong Xu andJiaqing Liang andChenhao Xie andBin Liang andWanyun Cui andYanghua Xiao},editor    = {Salem Benferhat andKarim Tabia andMoonis Ali},title     = {CN-DBpedia: {A} Never-Ending Chinese Knowledge Extraction System},booktitle = {Advances in Artificial Intelligence: From Theory to Practice - 30thInternational Conference on Industrial Engineering and Other Applicationsof Applied Intelligent Systems, {IEA/AIE} 2017, Arras, France, June27-30, 2017, Proceedings, Part {II}},series    = {Lecture Notes in Computer Science},volume    = {10351},pages     = {428--438},publisher = {Springer},year      = {2017},url       = {https://doi.org/10.1007/978-3-319-60045-1\_44},doi       = {10.1007/978-3-319-60045-1\_44},timestamp = {Tue, 14 May 2019 10:00:37 +0200},biburl    = {https://dblp.org/rec/conf/ieaaie/XuXLXLCX17.bib},bibsource = {dblp computer science bibliography, https://dblp.org}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/478414.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LeetCode 29. 两数相除(位运算)

1. 题目 给定两个整数,被除数 dividend 和除数 divisor。将两数相除,要求不使用乘法、除法和 mod 运算符。 返回被除数 dividend 除以除数 divisor 得到的商。 示例 1: 输入: dividend 10, divisor 3 输出: 3示例 2: 输入: dividend 7, divisor -…

Git使用的奇技淫巧

源 | Linux公社Git 版本对比相关操作[1] 输出工作区和暂存区的不同。git diff[2] 展示暂存区和最近版本的不同git diff --cached[3] 展示暂存区、工作区和最近版本的不同git diff HEAD[4] 展示本地仓库中任意两个 commit 之间的文件变动git diff <commit-id> <commit-…

Python-接口开发入门

Python-接口开发入门&#xff1a;https://www.cnblogs.com/zhxwind/p/11202629.html

Hyperloop,让发布简洁高效

Hyperloop 是什么&#xff1f; Hyperloop 是服务于美团点评客户端的组件发版、持续集成、App 打包构建、资源调度等各个环节的发布调度系统。名称起源于美国 Elon Musk 构想的 Hyperloop 超级高铁&#xff0c;象征着现代、简洁、高效。 Hyperloop 提供了一站式的平台&#xff0…

论文浅尝 | 基于潜在类别信息的实体链接

笔记整理 | 黄一凡&#xff0c;东南大学本科生来源&#xff1a;AAAI2020链接&#xff1a;https://arxiv.org/pdf/2001.01447v1.pdf一、简介作者意识到在利用预训练模型进行实体链接时&#xff0c;往往会将类别信息忽略&#xff0c;因此会导致模型将指称链接到拥有错误类别的错误…

LeetCode 166. 分数到小数(小数除法)

1. 题目 给定两个整数&#xff0c;分别表示分数的分子 numerator 和分母 denominator&#xff0c;以字符串形式返回小数。 如果小数部分为循环小数&#xff0c;则将循环的部分括在括号内。 示例 1: 输入: numerator 1, denominator 2 输出: "0.5"示例 2: 输入: …

百度飞桨弯道超车了吗?!

事情是这样的...前不久&#xff0c;小夕注意到了一份来自权威评测机构IDC发布的《2020年下半年深度学习平台市场份额报告》&#xff1a;▲IDC:2020年中国深度学习平台市场综合份额top 5立刻惊了&#xff01;印象里百度飞桨三年前还只是一个低调、小而美的深度学习框架&#xff…

百度开源 FAQ 问答系统(AnyQ)安装---Linux(无docker)+小白编译AnyQ-dockerlinux[CentOs]

小白编译AnyQ-docker&linux[CentOs]——AnyQ系列之一https://blog.csdn.net/u011818766/article/details/104117469 原文链接&#xff1a;https://blog.csdn.net/qq_28385535/article/details/83213822 1.系统下载及环境安装 1.github地址&#xff1a;https://codeload.g…

美团点评酒旅数据仓库建设实践

在美团点评酒旅事业群内&#xff0c;业务由传统的团购形式转向预订、直连等更加丰富的产品形式&#xff0c;业务系统也在迅速的迭代变化&#xff0c;这些都对数据仓库的扩展性、稳定性、易用性提出了更高要求。对此&#xff0c;我们采取了分层次、分主题的方式&#xff0c;本文…

论文小综 | 文档级关系抽取方法(上)

本文作者&#xff1a;陈想&#xff0c;浙江大学在读博士&#xff0c;研究方向为自然语言处理张宁豫&#xff0c;浙江大学助理研究员&#xff0c;研究方向为自然语言处理、知识表示与推理1. 前言关系抽取(Relation Extraction, RE)是从纯文本中提取未知关系事实&#xff0c;是自…

LeetCode 621. 任务调度器(贪心)

1. 题目 给定一个用字符数组表示的 CPU 需要执行的任务列表。其中包含使用大写的 A - Z 字母表示的26 种不同种类的任务。任务可以以任意顺序执行&#xff0c;并且每个任务都可以在 1 个单位时间内执行完。CPU 在任何一个单位时间内都可以执行一个任务&#xff0c;或者在待命状…

磁盘I/O那些事

计算机硬件性能在过去十年间的发展普遍遵循摩尔定律&#xff0c;通用计算机的CPU主频早已超过3GHz&#xff0c;内存也进入了普及DDR4的时代。然而传统硬盘虽然在存储容量上增长迅速&#xff0c;但是在读写性能上并无明显提升&#xff0c;同时SSD硬盘价格高昂&#xff0c;不能在…

吴恩达发起新型竞赛范式!模型固定,只调数据?!

文 | 小戏打开 Kaggle &#xff0c;琳琅满目的比赛让人目不暇接&#xff0c;研究的领域更是五花八门&#xff0c;从农林牧渔到衣食住行&#xff0c;似乎只要有数据&#xff0c;不论数据好坏&#xff0c;就可以直接使用各种机器学习的模型在其身上大展拳脚&#xff0c;从逻辑回归…

论文小综 | 文档级关系抽取方法(下)

本文作者&#xff1a;陈想&#xff0c;浙江大学在读博士&#xff0c;研究方向为自然语言处理张宁豫&#xff0c;浙江大学助理研究员&#xff0c;研究方向为自然语言处理、知识表示与推理这篇推文是文档级关系抽取方法的第二部分&#xff0c;前面的部分请移步推文“论文小综 | 文…

LeetCode 128. 最长连续序列(哈希set)

1. 题目 给定一个未排序的整数数组&#xff0c;找出最长连续序列的长度。 要求算法的时间复杂度为 O(n)。 示例:输入: [100, 4, 200, 1, 3, 2] 输出: 4 解释: 最长连续序列是 [1, 2, 3, 4]。它的长度为 4。来源&#xff1a;力扣&#xff08;LeetCode&#xff09; 链接&#…

清华姚班教师劝退文:读博,你真的想好了吗?

文 | 张焕晨编 | 琰琰源 | AI科技评论先简单介绍一下我的背景。我本科在University of Wisconsin-Madison&#xff0c;然后去CMU念了个PhD&#xff0c;主要研究数据库方向。目前回国在清华IIIS&#xff08;姚班&#xff09;做助理教授&#xff0c;并且 cofound 了Singularity D…

纠删码存储系统中的投机性部分写技术

本文已被USENIX’17年度技术大会录用&#xff0c;此处为中文简译版。 阅读英文论文完整版请点击&#xff1a;Speculative Partial Writes in Erasure-Coded Systems 多副本和纠删码&#xff08;EC&#xff0c;Erasure Code&#xff09;是存储系统中常见的两种数据可靠性方法。与…

论文浅尝 - EMNLP | 通过元强化学习实现少样本复杂知识库问答

笔记整理 | 谭亦鸣&#xff0c;东南大学博士生来源&#xff1a;EMNLP 2020链接&#xff1a;https://www.aclweb.org/anthology/2020.emnlp-main.469.pdf本文关注聚合型复杂知识图谱问答任务&#xff0c;这类复杂问题的答案通常需要经过一些集合操作得到&#xff0c;例如&#x…

写了一篇关于 NLP 综述的综述!

文 | 小轶综述&#xff0c;往往是了解一个子领域最为高效的起点。然而&#xff0c;对于AI这样一个日新月异高速发展的行业&#xff0c;时效性也自然地成为了我们选择综述的衡量指标之一。即使一篇 AI 综述具有超高 citation&#xff0c;如果它写于 20 年前&#xff0c;那对今天…

美团点评容器平台HULK的调度系统

本文是美团点评基础架构系列文章之一。这个系列将全面介绍支撑数亿用户、超千万日订单的美团点评平台诸多业务的公共基础架构相关技术。系列已经发布的文章包括&#xff1a; - 《分布式会话跟踪系统架构设计与实践》 - 《Leaf——美团点评分布式ID生成系统》 - 《深度剖析开源分…