Morphological Features

QuranTree.jl provides complete types for all morphological features and part of speech of The Quranic Arabic Corpus.

Parsing

The features of each token are encoded as String in its raw form, and in order to parse this as morphological feature, the function parse(QuranFeatures, x) is used, where x is the raw String input. For example, the following will parse the 2nd part of the 3rd word of the 1st verse of Chapter 1:

julia> using QuranTree
julia> using Yunir
julia> crps, tnzl = load(QuranData());
julia> crpsdata = table(crps);
julia> tnzldata = table(tnzl);
julia> crpsdata[1][1][3][2]Chapter 1 ٱلْفَاتِحَة (The Opening) Verse 1 1×5 DataFrame Row │ word part form tag features │ Int64 Int64 String String String ─────┼───────────────────────────────────────────────────────────────────── 1 │ 3 2 r~aHoma`ni ADJ STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:…
julia> token = crpsdata[1][1][3][2].data[!, :features]1-element Vector{String}: "STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN"
julia> mfeat = parse(QuranFeatures, token[1])Stem(:ADJ, ADJ, AbstractQuranFeature[Lemma("r~aHoma`n"), Root("rHm"), M, S, GEN])
julia> typeof(mfeat)Stem

Extracting Detailed Description

To see the detailed description of the features, @desc is used.

julia> @desc mfeatStem
────
Adjective:
 ├ data: ADJ
 ├ desc: Adjective
 └ ar_label: صفة
Lemma:
 └ data: r~aHoma`n
Root:
 └ data: rHm
Masculine:
 ├ data: M
 ├ desc: Masculine
 └ ar_label: الجنس
Singular:
 ├ data: S
 ├ desc: Singular
 └ ar_label: العدد
Genetive:
 ├ data: GEN
 ├ desc: Genetive case
 └ ar_label: مجرور

The Julia's dump function can be used as to how to access the properties of the Stem object.

julia> dump(mfeat)Stem
  data: Symbol ADJ
  pos: Adjective
    data: Symbol ADJ
    desc: String "Adjective"
    ar_label: String "صفة"
  feats: Array{AbstractQuranFeature}((5,))
    1: Lemma
      data: String "r~aHoma`n"
    2: Root
      data: String "rHm"
    3: Masculine
      data: Symbol M
      desc: String "Masculine"
      ar_label: String "الجنس"
    4: Singular
      data: Symbol S
      desc: String "Singular"
      ar_label: String "العدد"
    5: Genetive
      data: Symbol GEN
      desc: String "Genetive case"
      ar_label: String "مجرور"
julia> # access other feats of the token mfeat.feats5-element Vector{AbstractQuranFeature}: Lemma("r~aHoma`n") Root("rHm") M S GEN

Checking Parts of Speech

isfeat(token, pos) checks whether the token's parsed feature is a particular part of speech (pos). For example, the following checks whether mfeat above, among others, is indeed Masculine and Singular.

julia> isfeat(mfeat, Masculine)true
julia> isfeat(mfeat, Feminine)false
julia> isfeat(mfeat, Singular)true
julia> isfeat(mfeat, Adjective) && isfeat(mfeat, Genetive)true

Another example on checking whether the token has Root and Lemma features.

julia> isfeat(mfeat, Root) && isfeat(mfeat, Lemma)true

Lemma, Root and Special

root, lemma and special functions are used for extracting the Root, Lemma and Special morphological features, respectively.

julia> root(mfeat)"rHm"
julia> lemma(mfeat)"r~aHoma`n"
julia> arabic(root(mfeat))"رحم"
julia> arabic(lemma(mfeat))"رَّحْمَٰن"

The following example shows token with Special feature:

julia> token2 = crpsdata.data[!, :features][53]"STEM|POS:NEG|LEM:laA|SP:<in~"
julia> mfeat2 = parse(QuranFeatures, token2)ERROR: DomainError with QuranFeatures("STEM|POS:NEG|LEM:laA|SP:<in~"): Expected SUFFIX feature, got STEM.
julia> special(mfeat2)ERROR: UndefVarError: mfeat2 not defined
julia> arabic(special(mfeat2))ERROR: UndefVarError: mfeat2 not defined

Implied Verb Features

Some features of Quranic Arabic Verbs are implied. For example, the Voice feature of the Verb is default to Active voice, the Mood feature is default to Indicative mood, and the Verb form feature is default to First form.

julia> token3 = crpsdata.data[!, :features][27]"STEM|POS:V|IMPF|(X)|LEM:{sotaEiynu|ROOT:Ewn|1P"

token3 is a Verb with no Mood and Verb form features stated. However, parsing this will automatically add the default values of the said features as shown below:

julia> mfeat3 = parse(QuranFeatures, token3)Stem(:V, V, AbstractQuranFeature[Lemma("{sotaEiynu"), Root("Ewn"), IMPF, X, 1, P, IND, ACT])
julia> @desc mfeat3Stem ──── Verb: ├ data: V ├ desc: Verb └ ar_label: فعل Lemma: └ data: {sotaEiynu Root: └ data: Ewn Imperfect: ├ data: IMPF ├ desc: Imperfect verb └ ar_label: فعل مضارع VerbFormX: ├ data: X ├ desc: Tenth verb form └ ar_label: فعل FirstPerson: ├ data: 1 ├ desc: First person └ ar_label: الاسناد Plural: ├ data: P ├ desc: Plural └ ar_label: العدد Indicative: ├ data: IND ├ desc: Indicative mood (default) └ ar_label: مرفوع Active: ├ data: ACT ├ desc: Active voice (default) └ ar_label: مبني للمعلوم

Another example where the Voice feature of the Verb is implied:

julia> token4 = crpsdata.data[!, :features][27]"STEM|POS:V|IMPF|(X)|LEM:{sotaEiynu|ROOT:Ewn|1P"
julia> mfeat4 = parse(QuranFeatures, token4)Stem(:V, V, AbstractQuranFeature[Lemma("{sotaEiynu"), Root("Ewn"), IMPF, X, 1, P, IND, ACT])
julia> @desc mfeat4Stem ──── Verb: ├ data: V ├ desc: Verb └ ar_label: فعل Lemma: └ data: {sotaEiynu Root: └ data: Ewn Imperfect: ├ data: IMPF ├ desc: Imperfect verb └ ar_label: فعل مضارع VerbFormX: ├ data: X ├ desc: Tenth verb form └ ar_label: فعل FirstPerson: ├ data: 1 ├ desc: First person └ ar_label: الاسناد Plural: ├ data: P ├ desc: Plural └ ar_label: العدد Indicative: ├ data: IND ├ desc: Indicative mood (default) └ ar_label: مرفوع Active: ├ data: ACT ├ desc: Active voice (default) └ ar_label: مبني للمعلوم

POS Abstract Types

The table below contains the complete list of the Part of Speech with its corresponding types. As shown in the table below, each part of speech has a corresponding parent type, which is a superset type in the Type Hierarchy. This is useful for grouping. For example, instead of using || (or) in checking for all tokens that are either FirstPerson, SecondPerson, or ThirdPerson, the parent type AbstractPerson can be used.

julia> # without using parent type
       function allpersons(row)
           rfeat = parse(QuranFeatures, row.features)
           is1st = isfeat(rfeat, FirstPerson)
           is2nd = isfeat(rfeat, SecondPerson)
           is3rd = isfeat(rfeat, ThirdPerson)
       
           return is1st || is2nd || is3rd
       endallpersons (generic function with 1 method)
julia> tbl1 = filter(allpersons, crpsdata.data);
julia> tbl1[!, [:form, :features]]44092×2 DataFrame Row │ form features │ String String ───────┼──────────────────────────────────────────────── 1 │ <iy~aAka STEM|POS:PRON|LEM:<iy~aA|2MS 2 │ naEobudu STEM|POS:V|IMPF|LEM:Eabada|ROOT:… 3 │ <iy~aAka STEM|POS:PRON|LEM:<iy~aA|2MS 4 │ nasotaEiynu STEM|POS:V|IMPF|(X)|LEM:{sotaEiy… 5 │ {hodi STEM|POS:V|IMPV|LEM:hadaY|ROOT:h… 6 │ naA SUFFIX|PRON:1P 7 │ >anoEamo STEM|POS:V|PERF|(IV)|LEM:>anoEam… 8 │ ta SUFFIX|PRON:2MS ⋮ │ ⋮ ⋮ 44086 │ >aEuw*u STEM|POS:V|IMPF|LEM:Eu*o|ROOT:Ew… 44087 │ xalaqa STEM|POS:V|PERF|LEM:xalaqa|ROOT:… 44088 │ waqaba STEM|POS:V|PERF|LEM:waqaba|ROOT:… 44089 │ Hasada STEM|POS:V|PERF|LEM:Hasada|ROOT:… 44090 │ qulo STEM|POS:V|IMPV|LEM:qaAla|ROOT:q… 44091 │ >aEuw*u STEM|POS:V|IMPF|LEM:Eu*o|ROOT:Ew… 44092 │ yuwasowisu STEM|POS:V|IMPF|LEM:wasowasa|ROO… 44077 rows omitted
julia> # using parent type tbl2 = filter(row -> isfeat(parse(QuranFeatures, row.features), AbstractPerson), crpsdata.data);
julia> tbl2[!, [:form, :features]]44092×2 DataFrame Row │ form features │ String String ───────┼──────────────────────────────────────────────── 1 │ <iy~aAka STEM|POS:PRON|LEM:<iy~aA|2MS 2 │ naEobudu STEM|POS:V|IMPF|LEM:Eabada|ROOT:… 3 │ <iy~aAka STEM|POS:PRON|LEM:<iy~aA|2MS 4 │ nasotaEiynu STEM|POS:V|IMPF|(X)|LEM:{sotaEiy… 5 │ {hodi STEM|POS:V|IMPV|LEM:hadaY|ROOT:h… 6 │ naA SUFFIX|PRON:1P 7 │ >anoEamo STEM|POS:V|PERF|(IV)|LEM:>anoEam… 8 │ ta SUFFIX|PRON:2MS ⋮ │ ⋮ ⋮ 44086 │ >aEuw*u STEM|POS:V|IMPF|LEM:Eu*o|ROOT:Ew… 44087 │ xalaqa STEM|POS:V|PERF|LEM:xalaqa|ROOT:… 44088 │ waqaba STEM|POS:V|PERF|LEM:waqaba|ROOT:… 44089 │ Hasada STEM|POS:V|PERF|LEM:Hasada|ROOT:… 44090 │ qulo STEM|POS:V|IMPV|LEM:qaAla|ROOT:q… 44091 │ >aEuw*u STEM|POS:V|IMPF|LEM:Eu*o|ROOT:Ew… 44092 │ yuwasowisu STEM|POS:V|IMPF|LEM:wasowasa|ROO… 44077 rows omitted
julia> sum(tbl1[!, :features] .!== tbl2[!, :features])0

Part of Speech Types

TypeParent TypeTagDescriptionArabic Name
NounAbstractNounSymbol("N")Nounاسم
ProperNounAbstractNounSymbol("PN")Proper nounاسم علم
AdjectiveAbstractDerivedNominalSymbol("ADJ")Adjectiveصفة
ImperativeVerbalNounAbstractDerivedNominalSymbol("IMPN")Imperative verbal nounاسم فعل أمر
PersonalAbstractPronounSymbol("PRON")Personal pronounضمير
DemonstrativeAbstractPronounSymbol("DEM")Demonstrative pronounاسم اشارة
RelativeAbstractPronounSymbol("REL")Relative pronounاسم موصول
TimeAbstractAdverbSymbol("T")Time adverbظرف زمان
LocationAbstractAdverbSymbol("LOC")Location adverbظرف مكان
PrepositionAbstractPrepositionSymbol("P")Prepositionحرف جر
EmphaticLamAbstractPrefixSymbol("EMPH")Emphatic lam prefixلام التوكيد
ImperativeLamAbstractPrefixSymbol("IMPV")Imperative lam prefixلام الامر
PurposeLamAbstractPrefixSymbol("PRP")Purpose lam prefixلام التعليل
EmphaticNunAbstractPrefixSymbol("+n:EMPH")Emphatic lam prefixلام التوكيد
CoordinatingAbstractConjunctionSymbol("CONJ")Coordinating conjunctionحرف عطف
SubordinatingAbstractConjunctionSymbol("SUB")Subordinating particleحرف مصدري
AccusativeAbstractParticleSymbol("ACC")Accusative particleحرف نصب
AmendmentAbstractParticleSymbol("AMD")Amendment particleحرف استدراك
AnswerAbstractParticleSymbol("ANS")Answer particleحرف جواب
AversionAbstractParticleSymbol("AVR")Aversion particleحرف ردع
CauseAbstractParticleSymbol("CAUS")Particle of causeحرف سببية
CertaintyAbstractParticleSymbol("CERT")Particle of certaintyحرف تحقيق
CircumstantialAbstractParticleSymbol("CIRC")Circumstantial particleحرف حال
ComitativeAbstractParticleSymbol("COM")Comitative particleواو المعية
ConditionalAbstractParticleSymbol("COND")Conditional particleحرف شرط
EqualizationAbstractParticleSymbol("EQ")Equalization particleحرف تسوية
ExhortationAbstractParticleSymbol("EXH")Exhortation particleحرف تحضيض
ExplanationAbstractParticleSymbol("EXL")Explanation particleحرف تفصيل
ExceptiveAbstractParticleSymbol("EXP")Exceptive particleأداة استثناء
FutureAbstractParticleSymbol("FUT")Future particleحرف استقبال
InceptiveAbstractParticleSymbol("INC")Inceptive particleحرف ابتداء
InterpretationAbstractParticleSymbol("INT")Inceptive particleحرف تفسير
InterogativeAbstractParticleSymbol("INTG")Interogative particleحرف استفهام
NegativeAbstractParticleSymbol("NEG")Negative particleحرف نفي
PreventiveAbstractParticleSymbol("PREV")Preventive particleحرف كاف
ProhibitionAbstractParticleSymbol("PRO")Prohibition particleحرف نهي
ResumptionAbstractParticleSymbol("REM")Resumption particle
RestrictionAbstractParticleSymbol("RES")Restriction particleأداة حصر
RetractionAbstractParticleSymbol("RET")Retraction particleحرف اضراب
ResultAbstractParticleSymbol("RSLT")Result particleحرف واقع في جواب الشرط
SupplementalAbstractParticleSymbol("SUP")Suplemental particleحرف زائد
SurpriseAbstractParticleSymbol("SUR")Surprise particleحرف فجاءة
VocativeAbstractParticleSymbol("VOC")Vocative particleحرف نداء
DisconnectedLettersAbstractDisLettersSymbol("INL")Quranic initialsحروف مقطعة
FirstPersonAbstractPersonSymbol("1")First personالاسناد
SecondPersonAbstractPersonSymbol("2")Second personالاسناد
ThirdPersonAbstractPersonSymbol("3")Third personالاسناد
MasculineAbstractGenderSymbol("M")Masculineالجنس
FeminineAbstractGenderSymbol("F")Feminineالجنس
SingularAbstractNumberSymbol("S")Singularالعدد
DualAbstractNumberSymbol("D")Dualالعدد
PluralAbstractNumberSymbol("P")Pluralالعدد
VerbAbstractPartOfSpeechSymbol("V")Verbفعل
PerfectAbstractAspectSymbol("PERF")Perfect verbفعل ماض
ImperfectAbstractAspectSymbol("IMPF")Imperfect verbفعل مضارع
ImperativeAbstractAspectSymbol("IMPV")Imperative verbفعل أمر
IndicativeAbstractMoodSymbol("IND")Indicative mood (default)مرفوع
SubjunctiveAbstractMoodSymbol("SUBJ")Subjunctive moodمنصوب
JussiveAbstractMoodSymbol("JUS")Jussive moodمجزوم
ActiveAbstractVoiceSymbol("ACT")Active voice (default)مبني للمعلوم
PassiveAbstractVoiceSymbol("PASS")Passive voiceمبني للمجهول
VerbFormIAbstractFormSymbol("I")First verb form (default)فعل
VerbFormIIAbstractFormSymbol("II")Second verb formفعل
VerbFormIIIAbstractFormSymbol("III")Third verb formفعل
VerbFormIVAbstractFormSymbol("IV")Fourth verb formفعل
VerbFormVAbstractFormSymbol("V")Fifth verb formفعل
VerbFormVIAbstractFormSymbol("VI")Sixth verb formفعل
VerbFormVIIAbstractFormSymbol("VII")Seventh verb formفعل
VerbFormVIIIAbstractFormSymbol("VIII")Eighth verb formفعل
VerbFormIXAbstractFormSymbol("IX")Ninth verb formفعل
VerbFormXAbstractFormSymbol("X")Tenth verb formفعل
VerbFormXIAbstractFormSymbol("XI")Eleventh verb formفعل
VerbFormXIIAbstractFormSymbol("XII")Twelfth verb formفعل
ActiveParticleAbstractDerivedNounSymbol("ACT PCPL")Active particleاسم فاعل
PassiveParticleAbstractDerivedNounSymbol("PASS PCPL")Passive particleاسم مفعول
VerbalNounAbstractDerivedNounSymbol("VN")Verbal nounمصدر
DefiniteAbstractStateSymbol("DEF")Definite stateمعرفة
IndefiniteAbstractStateSymbol("INDEF")Indefinite stateنكرة
NominativeAbstractCaseSymbol("NOM")Nominative caseمرفوع
GenetiveAbstractCaseSymbol("GEN")Genetive caseمجرور