Indexing the Corpus

QuranTree.jl offers a intuitive indexing for both Quranic Arabic Corpus and the Tanzil Data, specifically it follows the following usage:

# for Quranic Arabic Corpus
crpsdata[<chapters>][<verses>][<words>][<parts>]

# for Tanzil Data
tnzldata[<chapters>][<verses>]

The following are the options supported for each index:

  • Chapters:
    • Int64 - crpsdata[1] (extracts Chapter 1)
    • UnitRange - crpsdata[15:24] (extracts Chapter 15 to 24)
    • Array{Int64,1} - crpsdata[[3,9,10]] (extracts Chapter 3, 9 and 10)
    • end (special) - crpsdata[end-3:end] (extracts Chapter 111 to 114).
  • Verses:
    • Int64 - crpsdata[1][1] (extracts Verse 1 of Chapter 1)
    • UnitRange - crpsdata[2][15:24] (extracts verses 15 to 24 of Chapter 2)
    • Array{Int64,1} - crpsdata[10][[3,9,10]] (extracts verses 3, 9 and 10 of Chapter 10)
  • Words: (not applicable for TanzilData, only CorpusData)
    • Int64 - crpsdata[1][1][1] (extracts Word 1 of Verse 1 of Chapter 1)
    • UnitRange - crpsdata[2][8][1:3] (extracts words 1 to 3 of Verse 8 of Chapter 2)
    • Array{Int64,1} - crpsdata[2][8][[1,3]] (extracts words 1 and 3 of Verse 8 of Chapter 2)
  • Parts: (not applicable for TanzilData, only CorpusData)
    • Int64 - crpsdata[1][1][1][1] (extracts Part 1 of Word 1 of Verse 1 of Chapter 1)
    • UnitRange - crpsdata[2][9][1][1:2] (extracts Part 1 to Part 2 of Word 1 of Verse 9 of Chapter 2)
    • Array{Int64,1} - crpsdata[2][9][1][[1,2]] (extracts Part 1 and Part 2 of Word 1 of Verse 9 of Chapter 2)

As an example, the following will extract Verse 9 of Chapter 2 in both TanzilData and CorpusData:

julia> using QuranTree
julia> data = QuranData();
julia> crps, tnzl = load(data);
julia> crpsdata = table(crps);
julia> tnzldata = table(tnzl);
julia> crpsdata[2][9]Chapter 2 ٱلْبَقَرَة (The Cow) Verse 9 18×5 DataFrame Row │ word part form tag features │ Int64 Int64 String String String ─────┼──────────────────────────────────────────────────────────────────── 1 │ 1 1 yuxa`diEu V STEM|POS:V|IMPF|(III)|LEM:yuxa`d… 2 │ 1 2 wna PRON SUFFIX|PRON:3MP 3 │ 2 1 {ll~aha PN STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|… 4 │ 3 1 wa CONJ PREFIX|w:CONJ+ 5 │ 3 2 {l~a*iyna REL STEM|POS:REL|LEM:{l~a*iY|MP 6 │ 4 1 'aAmanu V STEM|POS:V|PERF|(IV)|LEM:'aAmana… 7 │ 4 2 wA@ PRON SUFFIX|PRON:3MP 8 │ 5 1 wa REM PREFIX|w:REM+ ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ 12 │ 7 1 <il~aA^ RES STEM|POS:RES|LEM:<il~aA 13 │ 8 1 >anfusa N STEM|POS:N|LEM:nafos|ROOT:nfs|FP… 14 │ 8 2 humo PRON SUFFIX|PRON:3MP 15 │ 9 1 wa CIRC PREFIX|w:CIRC+ 16 │ 9 2 maA NEG STEM|POS:NEG|LEM:maA 17 │ 10 1 ya$oEuru V STEM|POS:V|IMPF|LEM:ya$oEuru|ROO… 18 │ 10 2 wna PRON SUFFIX|PRON:3MP 3 rows omitted
julia> tnzldata[2][9]Chapter 2 ٱلْبَقَرَة (The Cow) Verse 9 1×1 DataFrame Row │ form │ String ─────┼─────────────────────────────────── 1 │ يُخَٰدِعُونَ ٱللَّهَ وَٱلَّذِينَ ءَامَنُوا۟ وَمَا يَخْ…

As shown above, the output of the indexing contains label for the chapter name, both in Arabic and in English.

Combinations of Indices

Combinations of these indices are also supported. For example, the following will extract Chapter 111 to 114, each with verses 1 and 3:

julia> crpsdata[111:114][[1,3]]Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People)
Verses 1, 3

41×7 DataFrame
 Row │ chapter  verse  word   part   form      tag     features                ⋯
     │ Int64    Int64  Int64  Int64  String    String  String                  ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │     111      1      1      1  tab~ato   V       STEM|POS:V|PERF|LEM:tab ⋯
   2 │     111      1      2      1  yadaA^    N       STEM|POS:N|LEM:yad|ROOT
   3 │     111      1      3      1  >abiY     N       STEM|POS:N|LEM:>abN|ROO
   4 │     111      1      4      1  lahabK    N       STEM|POS:N|LEM:lahab|RO
   5 │     111      1      5      1  wa        CONJ    PREFIX|w:CONJ+          ⋯
   6 │     111      1      5      2  tab~a     V       STEM|POS:V|PERF|LEM:tab
   7 │     111      3      1      1  sa        FUT     PREFIX|sa+
   8 │     111      3      1      2  yaSolaY`  V       STEM|POS:V|IMPF|LEM:yaS
  ⋮  │    ⋮       ⋮      ⋮      ⋮       ⋮        ⋮                     ⋮       ⋱
  35 │     114      1      3      1  bi        P       PREFIX|bi+              ⋯
  36 │     114      1      3      2  rab~i     N       STEM|POS:N|LEM:rab~|ROO
  37 │     114      1      4      1  {l        DET     PREFIX|Al+
  38 │     114      1      4      2  n~aAsi    N       STEM|POS:N|LEM:n~aAs|RO
  39 │     114      3      1      1  <ila`hi   N       STEM|POS:N|LEM:<ila`h|R ⋯
  40 │     114      3      2      1  {l        DET     PREFIX|Al+
  41 │     114      3      2      2  n~aAsi    N       STEM|POS:N|LEM:n~aAs|RO
                                                    1 column and 26 rows omitted
julia> tnzldata[111:114][[1,3]]Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People) Verses 1, 3 8×3 DataFrame Row │ chapter verse form │ Int64 Int64 String ─────┼─────────────────────────────────────────────────── 1 │ 111 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ تَبَّتْ يَدَآ أَ… 2 │ 111 3 سَيَصْلَىٰ نَارًا ذَاتَ لَهَبٍ 3 │ 112 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ هُوَ ٱللَّ… 4 │ 112 3 لَمْ يَلِدْ وَلَمْ يُولَدْ 5 │ 113 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِ… 6 │ 113 3 وَمِن شَرِّ غَاسِقٍ إِذَا وَقَبَ 7 │ 114 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِ… 8 │ 114 3 إِلَٰهِ ٱلنَّاسِ
Note

Special indexing end is also applicable, for example crpsdata[111:114][[1,3]] is the same as crpsdata[end-3:end][[1,3]], and tnzldata[111:114][[1,3]] is equivalent to tnzldata[end-3:end][[1,3]].

Another example, the following will extract Part 1 of Words 1 to 3 of the above CorpusData output:

julia> crpsdata[111:114][[1,3]][1:3][1]Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People)
Verses 1, 3

23×7 DataFrame
 Row │ chapter  verse  word   part   form     tag     features                 ⋯
     │ Int64    Int64  Int64  Int64  String   String  String                   ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │     111      1      1      1  tab~ato  V       STEM|POS:V|PERF|LEM:tab~ ⋯
   2 │     111      1      2      1  yadaA^   N       STEM|POS:N|LEM:yad|ROOT:
   3 │     111      1      3      1  >abiY    N       STEM|POS:N|LEM:>abN|ROOT
   4 │     111      3      1      1  sa       FUT     PREFIX|sa+
   5 │     111      3      2      1  naArFA   N       STEM|POS:N|LEM:naAr|ROOT ⋯
   6 │     111      3      3      1  *aAta    N       STEM|POS:N|LEM:*uw|FS|AC
   7 │     112      1      1      1  qulo     V       STEM|POS:V|IMPV|LEM:qaAl
   8 │     112      1      2      1  huwa     PRON    STEM|POS:PRON|3MS
  ⋮  │    ⋮       ⋮      ⋮      ⋮       ⋮       ⋮                     ⋮        ⋱
  17 │     113      3      2      1  $ar~i    N       STEM|POS:N|LEM:$ar~|ROOT ⋯
  18 │     113      3      3      1  gaAsiqK  N       STEM|POS:N|ACT|PCPL|LEM:
  19 │     114      1      1      1  qulo     V       STEM|POS:V|IMPV|LEM:qaAl
  20 │     114      1      2      1  >aEuw*u  V       STEM|POS:V|IMPF|LEM:Eu*o
  21 │     114      1      3      1  bi       P       PREFIX|bi+               ⋯
  22 │     114      3      1      1  <ila`hi  N       STEM|POS:N|LEM:<ila`h|RO
  23 │     114      3      2      1  {l       DET     PREFIX|Al+
                                                     1 column and 8 rows omitted