Transliteration

For transliteration, we will use Yunir.jl, a lightweight Arabic NLP toolkit. Yunir.jl uses Buckwalter as the default transliteration based on the Quranic Arabic Corpus encoding. The transliteration is done via the encode function, for example, the following will transliterate the first verse of Chapter 1:

julia> using QuranTree
julia> using Yunir
julia> crps, tnzl = load(QuranData());
julia> crpsdata = table(crps);
julia> tnzldata = table(tnzl);
julia> vrs = verses(tnzldata[1][1])1-element Vector{String}: "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"
julia> encode(vrs[1])"bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi"
Note

You need to install Yunir.jl to run the above code. To install, run

using Pkg
Pkg.add("Yunir")

The verses function above is used to extract the corresponding verse from the Qur'an data of type AbstractQuran.

Tips

verses by default only returns the verse form of the table, but one can also extract the corresponding verse number instead of the form, example:

verses(tnzldata, number=true, start_end=true)
verses(tnzldata, number=true, start_end=false)
Tips

To extract the words of the corpus, use the function words instead.

The function verses always returns an Array, and hence encoding multiple verses is possible using Julia's . (dot) broadcasting operation. For example, the following will transliterate all verses of Chapter 114:

julia> vrs = verses(tnzldata[114])6-element Vector{String}:
 "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ"
 "مَلِكِ ٱلنَّاسِ"
 "إِلَٰهِ ٱلنَّاسِ"
 "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ"
 "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ"
 "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"
julia> encode.(vrs)6-element Vector{String}: "bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi qulo >aEuw*u birab~i {ln~aAsi" "maliki {ln~aAsi" "<ila`hi {ln~aAsi" "min \$ar~i {lowasowaAsi {loxan~aAsi" "{l~a*iY yuwasowisu fiY Suduwri {ln~aAsi" "mina {lojin~api wa{ln~aAsi"

Decoding

To decode the transliterated back to Arabic form, Yunir.jl has arabic function to do just that. For example, the following will decode to Arabic the transliterated verses of Chapter 114 above:

julia> arabic.(encode.(vrs))6-element Vector{String}:
 "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ"
 "مَلِكِ ٱلنَّاسِ"
 "إِلَٰهِ ٱلنَّاسِ"
 "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ"
 "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ"
 "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"

Or using the CorpusData,

julia> vrs = verses(crpsdata[114])6-element Vector{String}:
 "qulo >aEuw*u birab~i {ln~aAsi"
 "maliki {ln~aAsi"
 "<ila`hi {ln~aAsi"
 "min \$ar~i {lowasowaAsi {loxan~aAsi"
 "{l~a*iY yuwasowisu fiY Suduwri {ln~aAsi"
 "mina {lojin~api wa{ln~aAsi"
julia> avrs = arabic.(vrs)6-element Vector{String}: "قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ" "مَلِكِ ٱلنَّاسِ" "إِلَٰهِ ٱلنَّاسِ" "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ" "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ" "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"
Tips

. (dot) broadcasting is only used for arrays. So, for String input (not arrays of String), arabic(...) (without dot) is used. Example,

arabic(vrs[1])

Custom Transliteration

Creating a custom transliteration requires only an input encoding in the form of a dictionary (Dict). For example, Yunir.jl's Buckwalter's encoding is provided by the constant BW_ENCODING as shown below:

julia> BW_ENCODINGDict{Symbol, Symbol} with 61 entries:
  Symbol("ۣ") => Symbol(";")
  :ة          => :p
  :ذ          => :*
  :ۥ          => Symbol(",")
  :ء          => Symbol("'")
  Symbol("ۜ") => :(:)
  Symbol("َ") => :a
  Symbol("ٰ") => Symbol("`")
  :ي          => :y
  :ت          => :t
  :ن          => :n
  :ب          => :b
  :ص          => :S
  :ا          => :A
  :ث          => :v
  :إ          => :<
  :ج          => :j
  :ى          => :Y
  Symbol("ٍ") => :K
  ⋮           => ⋮

Suppose, we want to create a new transliteration by simply reversing the values of the dictionary. This is done as follows:

julia> old_keys = collect(keys(BW_ENCODING));
julia> new_vals = reverse(collect(values(BW_ENCODING)));
julia> my_encoder = Dict(old_keys .=> new_vals)Dict{Symbol, Symbol} with 61 entries: Symbol("ۣ") => :q :ة => Symbol("(") :ذ => :l :ۥ => :u :ء => :D Symbol("ۜ") => :f Symbol("َ") => Symbol("[") Symbol("ٰ") => :z :ي => Symbol("#") :ت => :r :ن => :k :ب => :Z :ص => Symbol("]") :ا => Symbol("\"") :ث => :~ :إ => :i :ج => :m :ى => :_ Symbol("ٍ") => :h ⋮ => ⋮
julia> @transliterator my_encoder "MyEncoder"

The macro @transliterator is used for updating the transliteration, and it takes two inputs: the dictionary (my_encoder) and the name of the encoding ("MyEncoder"). Using this new encoding, the avrs above will have a new transliteration:

julia> new_vrs = encode.(avrs);
julia> new_vrs6-element Vector{String}: ";,*g ^[},-l, Z<t[Zv< H*kv[\"s<" "j[*<n< H*kv[\"s<" "i<*[zK< H*kv[\"s<" "j<k %[tv< H*g-[sg-[\"s< H*g+[kv[\"s<" "H*v[l<_ #,-[sg-<s, :<_ ],!,-t< H*kv[\"s<" "j<k[ H*gm<kv[(< -[H*kv[\"s<"

To confirm this new transliteration, decoding it back to arabic should generate the proper results:

julia> arabic.(new_vrs)6-element Vector{String}:
 "قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ"
 "مَلِكِ ٱلنَّاسِ"
 "إِلَٰهِ ٱلنَّاسِ"
 "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ"
 "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ"
 "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"

To reset the transliteration, simply run the following:

julia> @transliterator :default

This will fallback to the Buckwalter transliteration, as shown below:

julia> bw_vrs = encode.(avrs);
julia> bw_vrs6-element Vector{String}: "qulo >aEuw*u birab~i {ln~aAsi" "maliki {ln~aAsi" "<ila`hi {ln~aAsi" "min \$ar~i {lowasowaAsi {loxan~aAsi" "{l~a*iY yuwasowisu fiY Suduwri {ln~aAsi" "mina {lojin~api wa{ln~aAsi"
julia> arabic.(bw_vrs)6-element Vector{String}: "قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ" "مَلِكِ ٱلنَّاسِ" "إِلَٰهِ ٱلنَّاسِ" "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ" "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ" "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"

Simple Encoding

Another feature supported in QuranTree.jl is the Simple Encoding. For example, the following will (Simple) encode the first verse of Chapter 1:

julia> vrs = verses(tnzldata[1][1:5])5-element Vector{String}:
 "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"
 "ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ"
 "ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"
 "مَٰلِكِ يَوْمِ ٱلدِّينِ"
 "إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ"
julia> parse(SimpleEncoding, vrs[1])"Ba+Kasra | Seen+Sukun | Meem+Kasra | <space> | AlifHamzatWasl | Lam | Lam+Shadda+Fatha | Ha+Kasra | <space> | AlifHamzatWasl | Lam | Ra+Shadda+Fatha | HHa+Sukun | Meem+Fatha+AlifKhanjareeya | Noon+Kasra | <space> | AlifHamzatWasl | Lam | Ra+Shadda+Fatha | HHa+Kasra | Ya | Meem+Kasra"
julia> parse.(SimpleEncoding, vrs)5-element Vector{String}: "Ba+Kasra | Seen+Sukun | Meem+Kasra | <space> | AlifHamzatWasl | Lam | Lam+Shadda+Fatha | Ha+Kasra | <space> | AlifHamzatWasl | Lam | Ra+Shadda+Fatha | HHa+Sukun | Meem+Fatha+AlifKhanjareeya | Noon+Kasra | <space> | AlifHamzatWasl | Lam | Ra+Shadda+Fatha | HHa+Kasra | Ya | Meem+Kasra" "AlifHamzatWasl | Lam+Sukun | HHa+Fatha | Meem+Sukun | Dal+Damma | <space> | Lam+Kasra | Lam+Shadda+Fatha | Ha+Kasra | <space> | Ra+Fatha | Ba+Shadda+Kasra | <space> | AlifHamzatWasl | Lam+Sukun | Ain+Fatha+AlifKhanjareeya | Lam+Fatha | Meem+Kasra | Ya | Noon+Fatha" "AlifHamzatWasl | Lam | Ra+Shadda+Fatha | HHa+Sukun | Meem+Fatha+AlifKhanjareeya | Noon+Kasra | <space> | AlifHamzatWasl | Lam | Ra+Shadda+Fatha | HHa+Kasra | Ya | Meem+Kasra" "Meem+Fatha+AlifKhanjareeya | Lam+Kasra | Kaf+Kasra | <space> | Ya+Fatha | Waw+Sukun | Meem+Kasra | <space> | AlifHamzatWasl | Lam | Dal+Shadda+Kasra | Ya | Noon+Kasra" "AlifHamzaBelow+Kasra | Ya+Shadda+Fatha | Alif | Kaf+Fatha | <space> | Noon+Fatha | Ain+Sukun | Ba+Damma | Dal+Damma | <space> | Waw+Fatha | AlifHamzaBelow+Kasra | Ya+Shadda+Fatha | Alif | Kaf+Fatha | <space> | Noon+Fatha | Seen+Sukun | Ta+Fatha | Ain+Kasra | Ya | Noon+Damma"