الأربعاء، 21 نوفمبر 2012

Named Entities Recognition and Questions Answers Data



source: http://users.dsic.upv.es/~ybenajiba
Named Entities Recognition (NER) Task zip
ANERCorp: Is a Corpus of more than 150,000 words annotated for the NER task.
ANERGazet: Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc.
Test-Bed for Passage Retrieval (PR) and Question Answering (QA) tasks zip
Documents: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system).
List of Questions: This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF.
List of Correct Answers: For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation.
Doc -
Arabic language rules (in Arabic): Somebody has mailed me this pps file which summarizes all the Arabic rules, unfortunately there is no English version of the file. I would have translated it myself because it's really worth it but the file contains 812 slides!!.

الذخيرة النصية الفصحى لجامعة الملك سعود

ملفات الذخيرة النصية

ملفات الذخيرة النصية الفصحى موزعة في ستة مجلدات تمثل الفروع الرئيسية للذخيرة النصية. يمكنك تحميل الملفات من هنا:
الدين1 (A1)
الدين2 (A2)
اللغة (B)
الأدب (C)
العلوم (D)
علم الاجتماع (E)
السير والتراجم (F)

المصدر:
http://ksucorpus.ksu.edu.sa/ar/?author=2