Some parts of this website may do not work correctly, because your browser doesn't support JavaScript or you have disabled it. In order to use all features please enable JavaScript in your browser.

Other languages bindings

Perl

Simple tokenization of Polish text:
use PSIToolkit::Simple;

my $psi = PSIToolkit::Simple::PipeRunner->new("tokenize --lang pl ! perl-simple-writer");
my $result = $psi->run_for_perl('Pan prof. dr hab. Jan Nowak.');

# $result = ['Pan', 'prof.', 'dr', 'hab.', 'Jan', 'Nowak', '.', ];
Tokenization and segmentation of Polish text:
use PSIToolkit::Simple;

my $psi = PSIToolkit::Simple::PipeRunner->new(
  "tp-tokenizer --lang pl ! srx-segmenter --lang pl ! perl-simple-writer --tags token --spec segment");
my $result = $psi->run_for_perl('Ala ma kota. Kot ma mysz.');

# $result = [
#     ['Ala', 'ma', 'kota','.'],
#     ['Kot', 'ma', 'mysz','.'],
# ];
Lemmatization with Morfologik:
use PSIToolkit::Simple;

my $psi = PSIToolkit::Simple::PipeRunner->new(
  "tp-tokenizer --lang pl ! morfologik ! perl-simple-writer --with-args --tags form");
my $result = $psi->run_for_perl('ma');

# $result = [
#   [
#     {
#       'text' => 'ma',
#       'category' => 'verb',
#       'values' => {
#         'tense' => 'fin',
#         'number' => 'sg',
#         'person' => 'ter',
#         'aspect' => 'imperf'
#       }
#     },
#     {
#       'text' => 'ma',
#       'category' => 'adj',
#       'values' => {
#         'number' => 'sg',
#         'degree' => 'pos',
#         'case' => 'nom',
#         'gender' => 'f'
#       }
#     },
#     {
#       'text' => 'ma',
#       'category' => 'adj',
#       'values' => {
#         'number' => 'sg',
#         'degree' => 'pos',
#         'case' => 'voc',
#         'gender' => 'f'
#       }
#     }
#   ]
# ];

Python

Simple tokenization of Polish text:
import PSIToolkit

psi = PSIToolkit.PipeRunner('tp-tokenizer --lang pl')
result = psi.run('Pan prof. dr hab. Jan Nowak.')

# result = 'Pan\nprof.\ndr\nhab.\nJan\nNowak\n.\n'

Ruby

By default, there are available bindings for Ruby 1.8.7, but running with 1.9 version is also possible. To use it with given version you have to compile PSI-Toolkit with that version.

Simple tokenization of Polish text:
require 'psitoolkit'

psi = PSIToolkit::PipeRunner.new("tp-tokenizer --lang pl")
result = psi.run('Pan prof. dr hab. Jan Nowak.')

# result = "Pan\nprof.\ndr\nhab.\nJan\nNowak\n.\n"