Starting in August 2019, the joint project of historians, linguists and computer scientists will examine the importance of the imperial city of Nuremberg for the exchange of information in the Holy Roman Empire and research the contribution of this municipal chancellery to the development of the New High German written language. An important component is the computer-based document analysis. In order to work effectively with the large text corpus, the automatic handwritten text recognition (HTR), writer identification or other methods must be of very high quality.
Automatic HTR made major improvements in the last couple of years. An essential reason is the use of deep learning methods. The applicants have already gained initial experience in a pilot project on the oldest book of the Nuremberg letters of correspondence (1404-1408). In order to use the existing technology optimally, however, more extensive training data than the previous database is necessary. Therefore, the project develops best practices to create sufficient data quickly and cost-efficiently. On the basis of our preliminary work, we want to achieve this goal by combining HTR methods with lower-quality transcriptions. The proven method can then be applied to other sources.