Analysis Section ( 3/5 ) - Word structure

3.1 Introduction

Several authors have identified structures in the composition of words in the Voynich MS. These are, roughly in chronological order:

3.1.1 Tiltman's split in roots and suffixes (1)

Tiltman observed that many words in the stars or recipes section (which was the only sample he had available for a detailed analysis) were composed of two parts. He set up the a table in Plate 17 of his publication, which is shown below, converted to Eva:

Roots Suffixes
ok   of an   ain   aiin   aiiin
ot   op ar   air   aiir   aiiir
qok   qof al   ail   aiil   aiiil
qot   qop or
ch ol
Sh ey   eey   eeey
d edy   eedy   eeedy

Every combination of a 'root' and a 'suffix' gives a valid word. He roughly subdivided the suffixes into three groups, depending on whether they contain a, o or e. Tiltman observes that the suffixes (which he also calls 'finals') are often found standing alone in the stars section of the MS.

Some additional observations by Tiltman related to this are:

Tiltman also presents an observation, apparently offered to him by one Peter Long, "that the a groups might represent Roman numerals. Thus aiin might represent 'iij', and ar ar al 'xxv', but this, if true would only present one with a set of numbered categories, which doesn't solve the problem. In any case, though it accounts for the properties of the commoner combinations, it produces many impossible ones."

3.1.2 Mike Roe's generic word

The following pattern was contributed to the Voynich MS mailing list by one of its original participants Micheal Roe. His system is represented here translated to the EVA alphabet. Each path represenst a valid word, and Mike suggested that this could perhaps present evidence of grammar of the Voynich language:

                      +- o  --+  +- r -+
 o   --+           +--+       +--+     +--+
       |  +- t -+  |  +- cho -+  +- l -+  |
 qo  --+--+     +--+                      |
       |  +- k -+  |  +- e ---+           |
 cho --+           |  |       |           |
                   |  +- ee --+           |
                   |  |       |           |
                   +--+- che -+-- y ------+------>
                   |  |       |           |
                   |  +- ch --+           |
                   |  |       |           |
                   |  +- sh --+           |
                   |  |       |           |
                   |  +-------+           |
                   |                      |
                   |  +- al ---+          |
                   |  |        |          |
                   +--+- am ---+----------+
                      |        |
                      +- ain --+
                      |        |
                      +- aiin -+

3.1.3 Robert Firth's split into odd and even groups

A slightly different approach was taken by another early mailing list member, Robert Firth, who, ignoring the word spaces, was able to define two lists of characters and character groups, such, that the text in the Voynich MS consists of alternating items from these lists. The split is not entirely unambiguous but reportedly it works for most of the MS. It is explained in his >> Note Nr.24

3.2 Jorge Stolfi's ground-breaking work

3.2.1 Split into 'soft' and 'hard' characters

Jorge Stolfi discovered a new structure in the words of the Voynich MS, by grouping all characters into 'soft' and 'hard' and showing that the vast majority of words consists of one, two or three groups, which he calls prefix, stem or midfix, and suffix. The first and last consist of 'soft' characters, and the stem or midfix of hard characters. Stolfi has since then been able to set up a more detailed (and sltightly more complicated) description of the words (see 'word grammar' below). However, the simple principle behind the prefix-stem-suffix structure makes it worthwhile to look at it first.

It is explained in detail on >> a page at his web site. It may be summarised as follows:

It explains that the following characters are 'hard' characters and build the optional stem of a word: ch Sh t k p f e cTh cKh cPh cFh (to be checked for completeness).

All other characters are soft, and build the prefix and suffix. The rule states that the vast majority of Voynich words are made up as one of:

The distribution of these patterns throughout the MS, and the possible patterns for each word part should provide further interesting clues about the language of the Voynich MS.

Jorge Stolfi later found that the distribution of the 'hard' characters seems to be governed by a strict and simple rule. First of all, they can include something that has been called a pedestal or a plateau (i.e. typically the character ch) and independently of that they can include a gallows character ( t k p f ). The result is explained at his >> web site at a location I still need to check.

3.2.2 Fine structure

Later, Stolfi analysed a 'fine structure' of words in the Voynich MS. This is also known as the 'OKOKO' paradigm. It is also explained in detail at >> a page on his web site.

3.2.3 Word grammar

Most of the features found by Jorge Stolfi were later combined into what he calls the grammar of Voynich words. It is the most complete and most accurate breakdown of the word structure into a set of rules, and includes core, mantle and crust characters. It is explained in great detail on >>this page at his web site, while the >>formal grammar definition is here. The interested reader is advised to study this page directly, as it is quite difficult to summarise.

3.3 Hidden Markov Modelling

This topic will be described on a later page but its results also reflect on the word structure.

3.4 Summary

The results presented in this page are critally important for anyone interested in translating the text of the Voynich MS. The fact that structures like the ones introduced in this page exist, tells us that the MS text is not one that was encrypted from an Indo-European plain text using the type of encryption available in the early 15th Century. Any tentative solution working along these lines will necessarily fail.

The exact word structure has not been identified definitely. This page shows several cases, and in general one may observe that the simple paradigms will 'cover' or 'explain' a smaller percentage of the word types in the MS, while the more complicated ones cover a larger percentage.

The word structure is also likely to completely explain the anomalously low entropy values of the Voynich MS text, though what is cause and what is effect is not yet fully understood.


