Confusion matrices
Confusion matrices were constructed based on input to and output from Croatian spelling checker ispravi.me.
More: https://www.mdpi.com/2073-431X/13/2/39
X - letter in a row
Y - letter in a column
Types of edits
- insertionCondOnFollowing - inserting wrong letter Y in front of X (X -> YX)
- insertionCondOnPrevious - inserting wrong letter Y after X (X -> XY)
- deletionCondOnFollowing - deleting letter Y in front of X (YX -> X)
- deletionCondOnPrevious - deleting letter Y after X (XY -> X)
- substitution - substitution of wrong letter Y and correct letter X (X -> Y)
- transposition - transposition of two consecutive letters (XY -> YX)
Types of matrices
- Occurrences - number of records for each letter combination
- Relative frequency (letter level) - relative frequency of wrong letter X in context of letter X
- Relative frequency (whole dataset) - relative frequency of wrong letter X in the context of the edit type within the dataset
Special characters
Apart from Croatian alphabet letters, matrices include 2 additional characters:
- space - in bigrams
- @ - word boundary