Indexing the Corpus
QuranTree.jl offers a intuitive indexing for both Quranic Arabic Corpus and the Tanzil Data, specifically it follows the following usage:
# for Quranic Arabic Corpus
crpsdata[<chapters>][<verses>][<words>][<parts>]
# for Tanzil Data
tnzldata[<chapters>][<verses>]
The following are the options supported for each index:
- Chapters:
- Int64 -
crpsdata[1]
(extracts Chapter 1) - UnitRange -
crpsdata[15:24]
(extracts Chapter 15 to 24) - Array{Int64,1} -
crpsdata[[3,9,10]]
(extracts Chapter 3, 9 and 10) - end (special) -
crpsdata[end-3:end]
(extracts Chapter 111 to 114).
- Int64 -
- Verses:
- Int64 -
crpsdata[1][1]
(extracts Verse 1 of Chapter 1) - UnitRange -
crpsdata[2][15:24]
(extracts verses 15 to 24 of Chapter 2) - Array{Int64,1} -
crpsdata[10][[3,9,10]]
(extracts verses 3, 9 and 10 of Chapter 10)
- Int64 -
- Words: (not applicable for
TanzilData
, onlyCorpusData
)- Int64 -
crpsdata[1][1][1]
(extracts Word 1 of Verse 1 of Chapter 1) - UnitRange -
crpsdata[2][8][1:3]
(extracts words 1 to 3 of Verse 8 of Chapter 2) - Array{Int64,1} -
crpsdata[2][8][[1,3]]
(extracts words 1 and 3 of Verse 8 of Chapter 2)
- Int64 -
- Parts: (not applicable for
TanzilData
, onlyCorpusData
)- Int64 -
crpsdata[1][1][1][1]
(extracts Part 1 of Word 1 of Verse 1 of Chapter 1) - UnitRange -
crpsdata[2][9][1][1:2]
(extracts Part 1 to Part 2 of Word 1 of Verse 9 of Chapter 2) - Array{Int64,1} -
crpsdata[2][9][1][[1,2]]
(extracts Part 1 and Part 2 of Word 1 of Verse 9 of Chapter 2)
- Int64 -
As an example, the following will extract Verse 9 of Chapter 2 in both TanzilData
and CorpusData
:
julia> using QuranTree
julia> data = QuranData();
julia> crps, tnzl = load(data);
julia> crpsdata = table(crps);
julia> tnzldata = table(tnzl);
julia> crpsdata[2][9]
Chapter 2 ٱلْبَقَرَة (The Cow) Verse 9 18×5 DataFrame Row │ word part form tag features │ Int64 Int64 String String String ─────┼──────────────────────────────────────────────────────────────────── 1 │ 1 1 yuxa`diEu V STEM|POS:V|IMPF|(III)|LEM:yuxa`d… 2 │ 1 2 wna PRON SUFFIX|PRON:3MP 3 │ 2 1 {ll~aha PN STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|… 4 │ 3 1 wa CONJ PREFIX|w:CONJ+ 5 │ 3 2 {l~a*iyna REL STEM|POS:REL|LEM:{l~a*iY|MP 6 │ 4 1 'aAmanu V STEM|POS:V|PERF|(IV)|LEM:'aAmana… 7 │ 4 2 wA@ PRON SUFFIX|PRON:3MP 8 │ 5 1 wa REM PREFIX|w:REM+ ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ 12 │ 7 1 <il~aA^ RES STEM|POS:RES|LEM:<il~aA 13 │ 8 1 >anfusa N STEM|POS:N|LEM:nafos|ROOT:nfs|FP… 14 │ 8 2 humo PRON SUFFIX|PRON:3MP 15 │ 9 1 wa CIRC PREFIX|w:CIRC+ 16 │ 9 2 maA NEG STEM|POS:NEG|LEM:maA 17 │ 10 1 ya$oEuru V STEM|POS:V|IMPF|LEM:ya$oEuru|ROO… 18 │ 10 2 wna PRON SUFFIX|PRON:3MP 3 rows omitted
julia> tnzldata[2][9]
Chapter 2 ٱلْبَقَرَة (The Cow) Verse 9 1×1 DataFrame Row │ form │ String ─────┼─────────────────────────────────── 1 │ يُخَٰدِعُونَ ٱللَّهَ وَٱلَّذِينَ ءَامَنُوا۟ وَمَا يَخْ…
As shown above, the output of the indexing contains label for the chapter name, both in Arabic and in English.
Combinations of Indices
Combinations of these indices are also supported. For example, the following will extract Chapter 111 to 114, each with verses 1 and 3:
julia> crpsdata[111:114][[1,3]]
Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People) Verses 1, 3 41×7 DataFrame Row │ chapter verse word part form tag features ⋯ │ Int64 Int64 Int64 Int64 String String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 111 1 1 1 tab~ato V STEM|POS:V|PERF|LEM:tab ⋯ 2 │ 111 1 2 1 yadaA^ N STEM|POS:N|LEM:yad|ROOT 3 │ 111 1 3 1 >abiY N STEM|POS:N|LEM:>abN|ROO 4 │ 111 1 4 1 lahabK N STEM|POS:N|LEM:lahab|RO 5 │ 111 1 5 1 wa CONJ PREFIX|w:CONJ+ ⋯ 6 │ 111 1 5 2 tab~a V STEM|POS:V|PERF|LEM:tab 7 │ 111 3 1 1 sa FUT PREFIX|sa+ 8 │ 111 3 1 2 yaSolaY` V STEM|POS:V|IMPF|LEM:yaS ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 35 │ 114 1 3 1 bi P PREFIX|bi+ ⋯ 36 │ 114 1 3 2 rab~i N STEM|POS:N|LEM:rab~|ROO 37 │ 114 1 4 1 {l DET PREFIX|Al+ 38 │ 114 1 4 2 n~aAsi N STEM|POS:N|LEM:n~aAs|RO 39 │ 114 3 1 1 <ila`hi N STEM|POS:N|LEM:<ila`h|R ⋯ 40 │ 114 3 2 1 {l DET PREFIX|Al+ 41 │ 114 3 2 2 n~aAsi N STEM|POS:N|LEM:n~aAs|RO 1 column and 26 rows omitted
julia> tnzldata[111:114][[1,3]]
Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People) Verses 1, 3 8×3 DataFrame Row │ chapter verse form │ Int64 Int64 String ─────┼─────────────────────────────────────────────────── 1 │ 111 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ تَبَّتْ يَدَآ أَ… 2 │ 111 3 سَيَصْلَىٰ نَارًا ذَاتَ لَهَبٍ 3 │ 112 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ هُوَ ٱللَّ… 4 │ 112 3 لَمْ يَلِدْ وَلَمْ يُولَدْ 5 │ 113 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِ… 6 │ 113 3 وَمِن شَرِّ غَاسِقٍ إِذَا وَقَبَ 7 │ 114 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِ… 8 │ 114 3 إِلَٰهِ ٱلنَّاسِ
Special indexing end
is also applicable, for example crpsdata[111:114][[1,3]]
is the same as crpsdata[end-3:end][[1,3]]
, and tnzldata[111:114][[1,3]]
is equivalent to tnzldata[end-3:end][[1,3]]
.
Another example, the following will extract Part 1 of Words 1 to 3 of the above CorpusData
output:
julia> crpsdata[111:114][[1,3]][1:3][1]
Chapters 111-114: ٱلْمَسَد-ٱلنَّاس (Palm Fibre-People) Verses 1, 3 23×7 DataFrame Row │ chapter verse word part form tag features ⋯ │ Int64 Int64 Int64 Int64 String String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 111 1 1 1 tab~ato V STEM|POS:V|PERF|LEM:tab~ ⋯ 2 │ 111 1 2 1 yadaA^ N STEM|POS:N|LEM:yad|ROOT: 3 │ 111 1 3 1 >abiY N STEM|POS:N|LEM:>abN|ROOT 4 │ 111 3 1 1 sa FUT PREFIX|sa+ 5 │ 111 3 2 1 naArFA N STEM|POS:N|LEM:naAr|ROOT ⋯ 6 │ 111 3 3 1 *aAta N STEM|POS:N|LEM:*uw|FS|AC 7 │ 112 1 1 1 qulo V STEM|POS:V|IMPV|LEM:qaAl 8 │ 112 1 2 1 huwa PRON STEM|POS:PRON|3MS ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 17 │ 113 3 2 1 $ar~i N STEM|POS:N|LEM:$ar~|ROOT ⋯ 18 │ 113 3 3 1 gaAsiqK N STEM|POS:N|ACT|PCPL|LEM: 19 │ 114 1 1 1 qulo V STEM|POS:V|IMPV|LEM:qaAl 20 │ 114 1 2 1 >aEuw*u V STEM|POS:V|IMPF|LEM:Eu*o 21 │ 114 1 3 1 bi P PREFIX|bi+ ⋯ 22 │ 114 3 1 1 <ila`hi N STEM|POS:N|LEM:<ila`h|RO 23 │ 114 3 2 1 {l DET PREFIX|Al+ 1 column and 8 rows omitted