Indexing the Corpus

QuranTree.jl offers a intuitive indexing for both Quranic Arabic Corpus and the Tanzil Data, specifically it follows the following usage:

# for Quranic Arabic Corpus
crpsdata[<chapters>][<verses>][<words>][<parts>]

# for Tanzil Data
tnzldata[<chapters>][<verses>]

The following are the options supported for each index:

Chapters:
- Int64 - crpsdata[1] (extracts Chapter 1)
- UnitRange - crpsdata[15:24] (extracts Chapter 15 to 24)
- Array{Int64,1} - crpsdata[[3,9,10]] (extracts Chapter 3, 9 and 10)
- end (special) - crpsdata[end-3:end] (extracts Chapter 111 to 114).
Verses:
- Int64 - crpsdata[1][1] (extracts Verse 1 of Chapter 1)
- UnitRange - crpsdata[2][15:24] (extracts verses 15 to 24 of Chapter 2)
- Array{Int64,1} - crpsdata[10][[3,9,10]] (extracts verses 3, 9 and 10 of Chapter 10)
Words: (not applicable for TanzilData, only CorpusData)
- Int64 - crpsdata[1][1][1] (extracts Word 1 of Verse 1 of Chapter 1)
- UnitRange - crpsdata[2][8][1:3] (extracts words 1 to 3 of Verse 8 of Chapter 2)
- Array{Int64,1} - crpsdata[2][8][[1,3]] (extracts words 1 and 3 of Verse 8 of Chapter 2)
Parts: (not applicable for TanzilData, only CorpusData)
- Int64 - crpsdata[1][1][1][1] (extracts Part 1 of Word 1 of Verse 1 of Chapter 1)
- UnitRange - crpsdata[2][9][1][1:2] (extracts Part 1 to Part 2 of Word 1 of Verse 9 of Chapter 2)
- Array{Int64,1} - crpsdata[2][9][1][[1,2]] (extracts Part 1 and Part 2 of Word 1 of Verse 9 of Chapter 2)

As an example, the following will extract Verse 9 of Chapter 2 in both TanzilData and CorpusData:

julia> using QuranTree
julia> data = QuranData();
julia> crps, tnzl = load(data);
julia> crpsdata = table(crps);
julia> tnzldata = table(tnzl);
julia> crpsdata[2][9]Chapter 2 ٱلْبَقَرَة (The Cow)
Verse 9

18×5 DataFrame
 Row │ word   part   form       tag     features
     │ Int64  Int64  String     String  String
─────┼────────────────────────────────────────────────────────────────────
   1 │     1      1  yuxa`diEu  V       STEM|POS:V|IMPF|(III)|LEM:yuxa`d…
   2 │     1      2  wna        PRON    SUFFIX|PRON:3MP
   3 │     2      1  {ll~aha    PN      STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|…
   4 │     3      1  wa         CONJ    PREFIX|w:CONJ+
   5 │     3      2  {l~a*iyna  REL     STEM|POS:REL|LEM:{l~a*iY|MP
   6 │     4      1  'aAmanu    V       STEM|POS:V|PERF|(IV)|LEM:'aAmana…
   7 │     4      2  wA@        PRON    SUFFIX|PRON:3MP
   8 │     5      1  wa         REM     PREFIX|w:REM+
  ⋮  │   ⋮      ⋮        ⋮        ⋮                     ⋮
  12 │     7      1  <il~aA^    RES     STEM|POS:RES|LEM:<il~aA
  13 │     8      1  >anfusa    N       STEM|POS:N|LEM:nafos|ROOT:nfs|FP…
  14 │     8      2  humo       PRON    SUFFIX|PRON:3MP
  15 │     9      1  wa         CIRC    PREFIX|w:CIRC+
  16 │     9      2  maA        NEG     STEM|POS:NEG|LEM:maA
  17 │    10      1  ya$oEuru   V       STEM|POS:V|IMPF|LEM:ya$oEuru|ROO…
  18 │    10      2  wna        PRON    SUFFIX|PRON:3MP
                                                            3 rows omitted
julia> tnzldata[2][9]Chapter 2 ٱلْبَقَرَة (The Cow)
Verse 9

1×1 DataFrame
 Row │ form
     │ String
─────┼───────────────────────────────────
   1 │ يُخَٰدِعُونَ ٱللَّهَ وَٱلَّذِينَ ءَامَنُوا۟ وَمَا يَخ…

As shown above, the output of the indexing contains label for the chapter name, both in Arabic and in English.

Combinations of Indices

Combinations of these indices are also supported. For example, the following will extract Chapter 111 to 114, each with verses 1 and 3:

julia> crpsdata[111:114][[1,3]]Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People)
Verses 1, 3

41×7 DataFrame
 Row │ chapter  verse  word   part   form      tag     features                ⋯
     │ Int64    Int64  Int64  Int64  String    String  String                  ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │     111      1      1      1  tab~ato   V       STEM|POS:V|PERF|LEM:tab ⋯
   2 │     111      1      2      1  yadaA^    N       STEM|POS:N|LEM:yad|ROOT
   3 │     111      1      3      1  >abiY     N       STEM|POS:N|LEM:>abN|ROO
   4 │     111      1      4      1  lahabK    N       STEM|POS:N|LEM:lahab|RO
   5 │     111      1      5      1  wa        CONJ    PREFIX|w:CONJ+          ⋯
   6 │     111      1      5      2  tab~a     V       STEM|POS:V|PERF|LEM:tab
   7 │     111      3      1      1  sa        FUT     PREFIX|sa+
   8 │     111      3      1      2  yaSolaY`  V       STEM|POS:V|IMPF|LEM:yaS
  ⋮  │    ⋮       ⋮      ⋮      ⋮       ⋮        ⋮                     ⋮       ⋱
  35 │     114      1      3      1  bi        P       PREFIX|bi+              ⋯
  36 │     114      1      3      2  rab~i     N       STEM|POS:N|LEM:rab~|ROO
  37 │     114      1      4      1  {l        DET     PREFIX|Al+
  38 │     114      1      4      2  n~aAsi    N       STEM|POS:N|LEM:n~aAs|RO
  39 │     114      3      1      1  <ila`hi   N       STEM|POS:N|LEM:<ila`h|R ⋯
  40 │     114      3      2      1  {l        DET     PREFIX|Al+
  41 │     114      3      2      2  n~aAsi    N       STEM|POS:N|LEM:n~aAs|RO
                                                    1 column and 26 rows omitted
julia> tnzldata[111:114][[1,3]]Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People)
Verses 1, 3

8×3 DataFrame
 Row │ chapter  verse  form
     │ Int64    Int64  String
─────┼───────────────────────────────────────────────────
   1 │     111      1  بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ تَبَّتْ يَدَآ أ…
   2 │     111      3  سَيَصْلَىٰ نَارًا ذَاتَ لَهَبٍ
   3 │     112      1  بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ هُوَ ٱلل…
   4 │     112      3  لَمْ يَلِدْ وَلَمْ يُولَدْ
   5 │     113      1  بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ ب…
   6 │     113      3  وَمِن شَرِّ غَاسِقٍ إِذَا وَقَبَ
   7 │     114      1  بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ ب…
   8 │     114      3  إِلَٰهِ ٱلنَّاسِ

Note

Special indexing end is also applicable, for example crpsdata[111:114][[1,3]] is the same as crpsdata[end-3:end][[1,3]], and tnzldata[111:114][[1,3]] is equivalent to tnzldata[end-3:end][[1,3]].

Another example, the following will extract Part 1 of Words 1 to 3 of the above CorpusData output:

julia> crpsdata[111:114][[1,3]][1:3][1]Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People)
Verses 1, 3

23×7 DataFrame
 Row │ chapter  verse  word   part   form     tag     features                 ⋯
     │ Int64    Int64  Int64  Int64  String   String  String                   ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │     111      1      1      1  tab~ato  V       STEM|POS:V|PERF|LEM:tab~ ⋯
   2 │     111      1      2      1  yadaA^   N       STEM|POS:N|LEM:yad|ROOT:
   3 │     111      1      3      1  >abiY    N       STEM|POS:N|LEM:>abN|ROOT
   4 │     111      3      1      1  sa       FUT     PREFIX|sa+
   5 │     111      3      2      1  naArFA   N       STEM|POS:N|LEM:naAr|ROOT ⋯
   6 │     111      3      3      1  *aAta    N       STEM|POS:N|LEM:*uw|FS|AC
   7 │     112      1      1      1  qulo     V       STEM|POS:V|IMPV|LEM:qaAl
   8 │     112      1      2      1  huwa     PRON    STEM|POS:PRON|3MS
  ⋮  │    ⋮       ⋮      ⋮      ⋮       ⋮       ⋮                     ⋮        ⋱
  17 │     113      3      2      1  $ar~i    N       STEM|POS:N|LEM:$ar~|ROOT ⋯
  18 │     113      3      3      1  gaAsiqK  N       STEM|POS:N|ACT|PCPL|LEM:
  19 │     114      1      1      1  qulo     V       STEM|POS:V|IMPV|LEM:qaAl
  20 │     114      1      2      1  >aEuw*u  V       STEM|POS:V|IMPF|LEM:Eu*o
  21 │     114      1      3      1  bi       P       PREFIX|bi+               ⋯
  22 │     114      3      1      1  <ila`hi  N       STEM|POS:N|LEM:<ila`h|RO
  23 │     114      3      2      1  {l       DET     PREFIX|Al+
                                                     1 column and 8 rows omitted