Confusion matrices

Confusion matrices were constructed based on input to and output from Croatian spelling checker ispravi.me.
More: https://www.mdpi.com/2073-431X/13/2/39


X - letter in a row
Y - letter in a column

Types of edits

  • insertionCondOnFollowing - inserting wrong letter Y in front of X (X -> YX)
  • insertionCondOnPrevious - inserting wrong letter Y after X (X -> XY)
  • deletionCondOnFollowing - deleting letter Y in front of X (YX -> X)
  • deletionCondOnPrevious - deleting letter Y after X (XY -> X)
  • substitution - substitution of wrong letter Y and correct letter X (X -> Y)
  • transposition - transposition of two consecutive letters (XY -> YX)

Types of matrices

  • Occurrences - number of records for each letter combination
  • Relative frequency (letter level) - relative frequency of wrong letter X in context of letter X
  • Relative frequency (whole dataset) - relative frequency of wrong letter X in the context of the edit type within the dataset

Special characters

Apart from Croatian alphabet letters, matrices include 2 additional characters:

  • space - in bigrams
  • @ - word boundary