Creating word lists for Newton OS 1.x

This page describes how to create new system-wide recognition word lists ("dictionaries") on Newton OS 1.x. I have been using my own Slovene word list created using this method since 2012, so I am finally describing it in the unlikely case someone else needs it. If you need my help in creating a word list, contact me.

Back up your Newton if you choose to do this. All this is undocumented, so dangerous things might happen.

Officially, there is no way to do this, and while Apple promised to release information on "dict" parts, they never did. Paul Guyot did reverse engineer the word list format for Newton OS 2.x, but it is unknown whether that is compatible with Newton OS 1.x.

But that is not really true. The operating system has functions for creating a new word list and adding words to it, and by just using a simple trick, we can make the word list usable in all applications.

In Newton Toolkit, create a new "auto" part and use the following text file:

constant TheID := 199;

DefConst('TheDict, LoadDataFile("Data.bin", 'AirusA));

InstallScript := func(partFrame, removeFrame)
	dictionaries[Length(dictionaries) - 1].dictID := TheID;
	SetDictionaryData(TheID, TheDict);

RemoveScript := func(removeFrame)

First, we call NewDictionary. The Newton Programmer's Guide says its parameter should be 'custom, but it doesn't seem to care. The function returns the word list's ID, which is curiously always 200 or a higher integer (the word lists in ROM have much lower IDs). We discard it because it turns out that the ID is the only thing that distinguishes a "custom" word list from a system-wide one! (The status slot, unlike Paul Guyot says, seems to not be important on OS 1.x.)

Knowing that the new word list will get added to the end of the global dictionaries array, we simply change its ID to an integer below 200. A safe value is 199, but having multiple word lists with the same ID is probably not a good idea, as the ID is then used to refer to the word list.

We then call SetDictionaryData so that our word list data from the package (here loaded from the Data.bin file into a binary object called TheDict) is used (normally, the word list would reside in RAM, which wastes heap space).

Finally, we call ReadDomainOptions, which, according to Q&A, tells the recognition system to forget any hints it may have stored, forcing the recognition system to recheck viewFlags when necessary. Like Paul Guyot says, the domainType slot is the same as the viewFlags slot on views, and in this case specifies where the recognizer should consider this word list. By default, this is vCharsAllowed, which is usually what you want. You can also add other flags, for example vNameField if the word list is a list of proper names.

The RemoveScript should be self-explanatory. Note: I do not guarantee its correctness.

How do you generate the Data.bin file? The only way I know of is to create a temporary word list in RAM, transfer it to your desktop somehow, call DisposeDictionary on it (as you don't need it anymore), compile the package, and download it. This means you must have enough free space in the heap to hold the whole word list; 20K should be enough for a few thousand heavily inflected words.

Assuming you already have a list of words you'd like to generate a word list from, one way to add them to the word list is to programmatically generate a list of calls to AddWordToDictionary and execute a screenful of them at a time in an NTK Inspector connection (do not execute all of them at once as the Newton might not be able to handle it):

vars.TempDict := NewDictionary(nil);
AddWordToDictionary(TempDict, "aardvark");
// ... thousands of words later...
AddWordToDictionary(TempDict, "zygote");

Then you can write a function to dump the dictionary in hexadecimal and write a program on your desktop to create a binary Data.bin file from it to use in the package. That is left as an exercise for the reader.

First published on .
Last updated on .

Table of contents

Contact me