الأربعاء، 21 نوفمبر 2012

Named Entities Recognition and Questions Answers Data



source: http://users.dsic.upv.es/~ybenajiba
Named Entities Recognition (NER) Task zip
ANERCorp: Is a Corpus of more than 150,000 words annotated for the NER task.
ANERGazet: Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc.
Test-Bed for Passage Retrieval (PR) and Question Answering (QA) tasks zip
Documents: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system).
List of Questions: This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF.
List of Correct Answers: For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation.
Doc -
Arabic language rules (in Arabic): Somebody has mailed me this pps file which summarizes all the Arabic rules, unfortunately there is no English version of the file. I would have translated it myself because it's really worth it but the file contains 812 slides!!.

ليست هناك تعليقات:

إرسال تعليق