Some parts of this website may do not work correctly, because your browser doesn't support JavaScript or you have disabled it. In order to use all features please enable JavaScript in your browser.

Specification for parse

gobio

A deep parser based on the parser used in Translatica machine translation system.

Gobio operates on morfologically annotated text.

Gobio has some predefined sets of rules for several languages, but if you want, you can provide your own rule file with --rules option. The rules for gobio are in general a kind of context-free grammar rules. The tutorial on how to write your own rule sets for gobio is being prepared.

Explanation of symbols used to denote grammatical categories for Polish:

  • C (czasownik) - verb
  • LG (liczebnik główny) - cardinal numeral
  • P (przymiotnik) - adjective
  • PR (przyimek) - preposition
  • PS (przysłówek) - adverb
  • R (rzeczownik) - noun
  • S (spójnik) - conjunction
  • ZP (zaimek przymiotny) - adjective pronoun
  • ZRn (zaimek rzeczowny nieokreślony) - indefinite noun pronoun
  • ZRo (zaimek rzeczowny osobowy) - personal noun pronoun
  • ZRs (zaimek „się”) - pronoun “się”
  • ZRw (zaimek wskazujący) - demonstrative pronoun
  • ZS (zaimek przysłowny) - adverbial pronoun

Aliases

parse, parse-generator, parser

Languages

de, pl, test

Examples

--line-by-line gobio --lang pl --terminal-tag parse-terminal ! bracketing-writer --disamb --tags parse --opening-bracket %c[

Parse Polish sentences line by line and print simplified constituent tree for each sentence.

in:
Komputer czyta zdania.
Każde zdanie ma swoje drzewo składniowe.
Zrobiłem już trzy zdania.
out:
FR[R[Komputer]] fin[czyta] FR[R[zdania]].
FR[ZP[Każde] R[zdanie]] FP[P[ma]] FR[ZP[swoje] R[drzewo] FP[P[składniowe]]].
praet[Zrobiłem] FPS[PS[już]] FR[LG[trzy] R[zdania]].

Options

Allowed options:
  --lang arg (=guess)                   language
  --force-language                      force using specified language even if 
                                        a text was resognised otherwise
  --edge-number-limit arg (=-1)         maximal number of edges inserted 
                                        between each two vertices
  --rules arg (=%ITSDATA%/%LANG%/rules.g)
                                        file with rules in text format
  --terminal-tag arg (=parse-terminal)  tag for terminal

puddle

A shallow parser based on the Spejd shallow parser originally developed at IPI PAN (http://zil.ipipan.waw.pl/Spejd/). For input, Puddle requires morphologically anotated text as produced, for instance, by the morfologik processor. It may also serve as a disambiguation tool itself or can be used chained with a POS-tagger (e.g. metagger processor).

Note that text needs to be annotated morphologically before passing it to puddle.

Currently, rules and tagsets are available for Polish only and used by default if not specified otherwise. The Polish parsing rules are for demonstration purposes only and are by no means complete.

For other languages, you need to provide custom rules and tag sets that are compatible with the morphological processer employed in before puddle in the processing pipeline. A tutorial on the rule format and tagsets is currently being prepared. See the Polish rule and tag sets for examples.

Aliases

parse, parse-generator, parser

Languages

fr, pl

Options

Allowed options:
  --lang arg (=guess)                   language
  --force-language                      force using specified language even if 
                                        a text was resognised otherwise
  --tagset arg (=%ITSDATA%/%LANG%/tagset.%LANG%.cfg)
                                        tagset file
  --rules arg (=%ITSDATA%/%LANG%/rules.%LANG%)
                                        rules file

Other help resources