TalkBank | ![]() | MOR and UD Grammars |
We are currently transitioning the TalkBank system for morphosyntactic analysis from the MOR/POST/MEGRASP system to the UD (Universal Dependencies) system which is described in detail here . We apply UD taggers to TalkBank files using Stanford's Stanza system that has been built into the Batchalign2 program created by Houjun Liu, as described in this report under review.
The great advantage of UD over MOR is that it is available for many more languages. It also seems to perform better than MOR for computing dependency relations on the %gra line. However, its control of morphological analysis on the %mor line is not yet as analytic as MOR. So, for English and Spanish, we will retain use of MOR. For English only, the UD tiers are called %umod and %ugra, leaving the names %mor and %gra for the tiers created by MOR.
As of March 2024, we have tagged these languages in CHILDES using UD: Afrikaans, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Frfench, German, Icelandic, Irish, Italian, Japanese, Korean, Mandarin, Norwegian, Polish, Portuguese, Serbian, Slovenian, Spanish, Swedish, Turkish, and Welsh. Once UD grammars become available, we hope to apply UD through Batchalign to languages such as Sesotho or Nungon. Currently, application to Arabic, Bulgarian, Farsi, Greek, Hebrew, Russian, and Tamil is blocked by the fact that the transcripts were done in a non-standard romanization not supported by UD. Application to Danish and Hungarian will require extensive cleanup of the transcripts. Users may wish to still rely on the MOR grammars for English, Spanish, and Hebrew and the word segmenter for Chinese .