SICK数据集简介



官方网址:http://clic.cimec.unitn.it/composes/sick.html


SICK是Sentences Involving Compositional Knowledge 的首字母缩写

SICK数据集包含一万个英语句子对,  来自于两个已经存在的paraphrase数据集:
一个是8k imageFlickrbuilt, (http://nlp.cs.illinois.edu/HockenmaierGroup/data.html)
另一个是SEMEVAL-2012的语义文本相似度视频描述数据集 (http://www.cs.york.ac.uk/semeval-2012/task6/index.php?id=data).
每个句子对按照含义的关系标注以及两者的蕴含(entailment)关系标注


SICK 的发布遵照以下协议:
Creative Commons Attribution-NonCommercial-ShareAlike 3.0
Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US)

在发布的研究中应用SICK时,请应用:
M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi and R. Zamparelli. 2014. A SICK cure
for the evaluation of compositional distributional semantic models. Proceedings of LREC 2014,
Reykjavik (Iceland): ELRA.


SICK数据集用于SemEval 2014 - Task 1:
Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment


文件结构: tab分割的文本文件

各个域的定义:

- pair_ID: 句子对ID

- sentence_A:  A句

- sentence_B:  B局

- entailment_label: 文本蕴含关系的标注(gold truth/ground truth) (NEUTRAL, ENTAILMENT, or CONTRADICTION)

- relatedness_score: 语义关系度的标注分数 gold score (on a 1-5 continuous scale)

- entailment_AB: A到B的蕴含关系entailment for the A-B order (A_neutral_B, A_entails_B, or A_contradicts_B)

- entailment_BA: B到A的蕴含关系entailment for the B-A order (B_neutral_A, B_entails_A, or B_contradicts_A)

- sentence_A_original: 导出句子A的原始句子original sentence from which sentence A is derived

- sentence_B_original: 导出句子B的原始句子original sentence from which sentence B is derived

- sentence_A_dataset: 句子A的来源数据集dataset from which the original sentence A was extracted (FLICKR vs. SEMEVAL)

- sentence_B_dataset: 句子B的来源数据集dataset from which the original sentence B was extracted (FLICKR vs. SEMEVAL)

- SemEval_set: set including the sentence pair in SemEval 2014 Task 1 (TRIAL, TRAIN, or TEST)


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部