LvAuthor is a platform used for text analysis. It creates the xml files used by gnosly.com and English Listening Game.
The main tasks and algorithms are the following:
- text cleaning: UTF-8 conversion, special chars removing
- paragraph splitting
- period splitting
- phrase analysis: phrase slitting, phrase type determination (incisive, direct, indirect, etc)
- word splitting
- Part of speech (POS) assignment of word. Automatic disambiguation with human supervision
- audio matching: words of text are discovered in audiobook file determining the time when the each word begin and end
- word translation: words are translated automatically
- english text phrase and Italian text phrase correlation: each english phrase is correlated with the related Italian phrase, comparing the original text of the story with the human translated one
- revision applying: fix made by the professor on word translation or phrase correlation is applied
- final xml file building
The technologies used for this app are the following:
- Netbeans Platform
- NLP algorithms
- Ant with custom ant task, XSLT
- Java
- Postgresql, JPA (Hibernate)
- GIT, Maven