Manuscript Transcription Tool

Welcome to Glyph Machina!

What is Glyph Machina?

Glyph Machina is a model that combines convolutional neural networks (CNN) and long short-term memory networks (LSTM) to transcribe medieval Latin legal manuscripts for research and pedagogy. It takes insular legal manuscripts from 1150-1500, written in abbreviated Latin documentary hand, and produces readable Latin text. It does not currently work well on other languages. This means that it does not produce a diplomatic transcription; rather, it silently expands the abbreviations in the style of most Selden Society and Ames Foundation transcriptions of these rolls.

Below is a step-by-step introduction to using the tool, as well as tips for helping it to get the best results for your project.

1. Choose the membrane you want to transcribe and download the image onto your own computer. A few tips:

a. This model is trained on Latin only. It will not work well on other languages.

b. The model is trained on manuscripts from the Anglo-American Legal Tradition digitization of the National Archives in Kew, so images taken from there will often work best.

c. Feel free to try others, as well as images you have taken yourself, but make sure the image is of high enough quality for the model to discern. A good rule of thumb is that if you can make out the letters, the model will be able to make them out too.

d. Here are some examples of hands that the model is explicitly trained to handle (with an example of an edge case in image quality on bottom right):

Examples of hands that work best with the model

2. Upload your image to the model. Drag and drop works here.

3. Crop the image to include only the text you want to start with. The model works best when you crop to a full paragraph or case and every line is complete in the image (neither letters nor sentences are cut off).

a. Crop out marginalia and try to get the edge of the text. This might mean taking smaller chunks of text one at a time.

b. Crop out any non-manuscript space (like the desk in the recording below).

c. Use the squares at the edge of the image to bring the edges inward like so:

d. You can try again by pressing "Revert Crop."

e. Sometimes it can work best if you crop it on your computer first and then upload the cropped image, because you can adjust the tilt of the image and the contrast (see Troubleshooting, below).

4. Click "Identify Lines."

a. If the lines are not correct, you can use the "Edit Lines" tool to add/change/delete lines. In order to add lines, click the start of the line and then click inflection points across the line, and press "Finish Line" when you are done with that line.

b. If you correct the lines, make sure to press "Save Edit."

5. Click "Extract Text." This will run the model and give you what the model was able to extract from the image. If you like, you can stop here, as this is the last step in which the model refers directly to the manuscript itself. The lines are numbered for easier reference.

6. If you want, you can run the spelling and grammar check, which will correct based on the model's knowledge of medieval legal Latin.

Spell and Grammar Check button interface

a. Please note, this step does not refer back to the manuscript. Like an English grammar check, it knows the likelihood of certain spellings and grammatical formulae and substitutes where it finds something very unlikely. This means that it can correct some mistakes that the model made, but it can also introduce new ones. We have found that on average it ends up with a 3-5% higher accuracy rate.

Example of fixing mistakes in spell check

b. Be patient. This step usually takes at least 30 seconds and can take up to 5 minutes. Do not refresh.

7. If you want, you can run the translator by clicking "Contemporary English." This works on the same principle as Google Translate and is not meant to be used for research. But it can give students still learning Latin a quick check for content.

a. Be patient. This step usually takes at least 30 seconds and can take up to 5 minutes. Do not refresh.

8. Once you have completed all the steps you want to, you can click back and forth to compare any two steps. For example, you could compare the spell-checked text to the manuscript, or the translation to the extracted text. This way you can make the decision about which is most useful to you, and learners can put the transcription and the manuscript side-by-side.

9. You can download the files from all steps to your computer at any time by clicking the "Download All Files" button at the top of the page.

If you're still having trouble with any of these steps, our Troubleshooting guide may be able to help.

1. Raw Image

Crop image to a flag shape (wider than tall).

Drag edges to define text you want to transcribe (less is more)
Exclude non-manuscript space and marginalia
Crop on your own image editor before uploading for best results
When you are satisfied with your crop, press "Identify Lines"

Click here for a demonstration video. ▶

Zoom: 100%

2. Image with Underlines

Edit lines

If the model has missed any lines, press "Edit Lines"
Add lines by pressing "Draw Lines," clicking on all inflection points, and pressing "Finish Line"
Move lines by pressing "Select/Move" and dragging the line
Delete lines by pressing "Delete" and clicking on the line you wish to delete
When finished, press either "Save Lines" to save your changes or "Cancel Edit" to revert to the model's lines
If you want to draw the lines yourself in a PageXML editor program, you can do so by downloading and uploading the documents in this step.
When you are satisfied, press "Extract Text"

Here is a demo of line editing. ▶

Click and drag to draw lines. Right-click to delete a line.

Zoom: 100%