Franz Josef Och Welcome Mon, 29 May 2017 21:07:03 +0000 en-US hourly 1 Yet Another Small MaxEnt Toolkit: YASMET Wed, 07 Oct 2015 17:08:09 +0000 This is a tiny toolkit for performing training of maximum entropy models. It is a generic toolkit, which is applicable to conditional maximum entropy models. A special feature of this program is that it is written only in 132 lines of code (can be printed on one sheet of paper …). It includes training of model parameters, evaluation, perplexity and error rate computation, count-based feature reduction, Gaussian priors, feature count normalization. In addition, it is easy to use and efficient enough to deal with millions of features.

YASMET is free software under the GNU Public License and is distributed without any warranty.

If you find YASMET useful in your work, please cite this page.

Here it is: source code, gcc 3.x compatiple source code, (incomplete) documentation, little tutorial written by Deepak Ravichandran

Feature selection can be done with yasmetFS from Deepak Ravichandran.

GIZA++: Training of statistical translation models. Wed, 07 Oct 2015 17:05:54 +0000 GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop in 1999 at the Center for Language and Speech Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional features. The extensions of GIZA++ were designed and written by Franz Josef Och.

About GIZA++

The program includes the following extensions to GIZA:

  • Model 4;
  • Model 5;
  • Alignment models depending on word classes (software for producing word classes can be downloaded here;
  • Implements the HMM alignment model: Baum-Welch training, Forward-Backward algorithm, empty word, dependency on word classes, transfer to fertility models, …;
  • Includes a variant of Model 3 and Model 4 which allow the training of the parameter p_0;
  • Various smoothing techniques for fertility, distortion/alignment parameters;
  • Significant more efficient training of the fertility models;
  • Correct implementation of pegging as described in (Brown et al. 1993), a series of heuristics in order to make pegging sufficiently efficient;

In order to compile GIZA++ you may need:

  • a recent version of the GNU compiler (2.95 or higher)
  • a recent version of assembler and linker which do not have restrictions with respect to the length of symbol names

It is known to compile on Linux, Irix and SUNOS systems. A lot of older compiler version do not fully support all features of STL that are used by GIZA++. Therefore, frequently occur compiler, assembler or linker problems which are mostly due to the intensive use of STL within the program. If any compilation problem occurs, please first try to get the newest compiler version. Patches to the code are most welcome. Feel free to send me mail asking for help, but please do not necessarily expect me to have time to help.

It is released under the GNU Public License (GPL).


You are welcome to use the code under the terms of the licence for research or commercial purposes, however please acknowledge its use with a citation:

  • Franz Josef Och, Hermann Ney. “A Systematic Comparison of Various Statistical Alignment Models”, Computational Linguistics, volume 29, number 1, pp. 19-51 March 2003.


Here is a BiBTeX entry:

AUTHOR = {Franz Josef Och and Hermann Ney},
TITLE = {A Systematic Comparison of Various Statistical Alignment Models},
JOURNAL= {Computational Linguistics}, 
VOLUME = 29,
YEAR = 2003,
PAGES = {19--51}}


newest version on NEW


  • various bug fixes/improved efficiency
  • allows generation of n-best lists of alignments (-nbestalignments N)
  • compiles with gcc version 2.95 – 3.3 / MacOSX
  • more memory efficient (if compiled with -DBINARY_SEARCH_FOR_TTABLE)

GIZA++.2001-01-30.tar.gz (old version)


This work was supported by the National Science Foundation under Grant No. No. IIS-9820687 through the 1999 Workshop on Language Engineering, Center for Language and Speech Processing, Johns Hopkins University.

mkcls: Training of word classes. Wed, 07 Oct 2015 17:02:51 +0000 mkcls is a tool to train word classes by using a maximum-likelihood-criterion. The resulting word classes are especially suited for language models or statistical translation models. The program mkcls was written by Franz Josef Och.

Usage of mkcls:

mkcls [-nnum] [-ptrain] [-Vfile] opt

-V output classes

-n number of optimization runs (Default: 1); larger number => better results

-p filename of training corpus (Default: ‘train’)


mkcls -c80 -n10 -pkorpus -Vkats opt

(generates 80 classes for the corpus ‘in’ and writes the classes in ‘out’)

In order to compile mkcls you may need:

  • a recent version of the GNU compiler (2.95 or higher)

It is released under the GNU Public License (GPL).




Source code:

newest version on NEW


  • compiles now also with gcc versions 2.95 – 3.3 / MacOS X

mkcls.2001-01-12.tar.gz (old version)