Parts of Speech Tagger

In this section, we are going to use CAMeL Tools for Parts of Speech tagging. To start with, load the data as follows:

julia> using QuranTree

julia> crps, tnzl = load(QuranData());

julia> crpsdata = table(crps);

julia> tnzldata = table(tnzl);

For this task, we are going to use the second verse of Chapter 1.

julia> avrs1 = verses(tnzldata[1][2])[1]
"ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ"

The will load the model:

julia> using Pkg

julia> Pkg.add("PyCall")
  Resolving package versions...
No Changes to `~/work/QuranTree.jl/QuranTree.jl/docs/Project.toml`
No Changes to `~/work/QuranTree.jl/QuranTree.jl/docs/Manifest.toml`

julia> using PyCall

julia> @pyimport camel_tools.disambig.mle as camel_disambig

julia> @pyimport camel_tools.tagger.default as camel_tagger

julia> mled = camel_disambig.MLEDisambiguator.pretrained()
PyObject <camel_tools.disambig.mle.MLEDisambiguator object at 0x7fc56037be20>

Tagging

From the model, we instantiate the DefaultTagger and finally call the tag method to tag the token:

julia> tagger = camel_tagger.DefaultTagger(mled, "pos")
PyObject <camel_tools.tagger.default.DefaultTagger object at 0x7fc562e307c0>

julia> tagger.tag(split(avrs1))
4-element Array{String,1}:
 "noun"
 "noun_prop"
 "noun"
 "noun"