Named Entities Recognition (NER) Task |
zip |
ANERCorp: Is a Corpus of more than 150,000 words annotated for the NER task. |
|
ANERGazet: Is a collection of 3 Gazetteers, (i) Locations: a
Gazetteer containing names of continents, countries, cities, etc.; (ii)
People: a Gazetteer containing names of people recollected manually from
different Arabic websites; and finally (iii) Organizations: containing
names of Organizations like companies, football teams, etc. |
|
Test-Bed for Passage Retrieval (PR) and Question Answering (QA) tasks |
zip |
Documents: more than 11,000 Arabic Wikipedia Articles in SGML
format (the format adopted in the CLEF and also the one accepted by the
JIRS system). |
|
List of Questions: This is a list of 200 questions of
different types. The proportion of each type of questions is the same
proportion adopted in CLEF. |
|
List of Correct Answers: For each of the questions presented
in my list of questions, I give you here a list of correct answers for
each question. This list is very important for automatic evaluation. |
|
Doc |
- |
Arabic language rules (in Arabic): Somebody has mailed me
this pps file which summarizes all the Arabic rules, unfortunately there
is no English version of the file. I would have translated it myself
because it's really worth it but the file contains 812 slides!!. |
|
ليست هناك تعليقات:
إرسال تعليق