Parts of Speech Tagger
In this section, we are going to use CAMeL Tools for Parts of Speech tagging. To start with, load the data as follows:
julia> using QuranTree
julia> crps, tnzl = load(QuranData());
julia> crpsdata = table(crps);
julia> tnzldata = table(tnzl);
For this task, we are going to use the second verse of Chapter 1.
julia> avrs1 = verses(tnzldata[1][2])[1]
"ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ"
The will load the model:
julia> using Pkg
julia> Pkg.add("PyCall")
Resolving package versions...
No Changes to `~/work/QuranTree.jl/QuranTree.jl/docs/Project.toml`
No Changes to `~/work/QuranTree.jl/QuranTree.jl/docs/Manifest.toml`
julia> using PyCall
julia> @pyimport camel_tools.disambig.mle as camel_disambig
julia> @pyimport camel_tools.tagger.default as camel_tagger
julia> mled = camel_disambig.MLEDisambiguator.pretrained()
PyObject <camel_tools.disambig.mle.MLEDisambiguator object at 0x7fc56037be20>
Tagging
From the model, we instantiate the DefaultTagger
and finally call the tag
method to tag the token:
julia> tagger = camel_tagger.DefaultTagger(mled, "pos")
PyObject <camel_tools.tagger.default.DefaultTagger object at 0x7fc562e307c0>
julia> tagger.tag(split(avrs1))
4-element Array{String,1}:
"noun"
"noun_prop"
"noun"
"noun"