Monthly Archives: May 2021

Research Bulletin – April / May 2021

Hello again all,

Sorry for the delay in this research update from Canal-side Towers, there have been a few busy projects requiring my attention. Sometimes you can only do so much, and these definitely are such times.

The Disabled People’s Archive

One of the recent additions has been the Judy and Paul Hunt Collection. While the pandemic closures have led to a backlog of boxes here in the Towers, the upsides are that there is less carpet to keep clean, and one gets more of a preview than usual. It’s a modest but powerful collection of family papers in five boxes – and such treasures! Then I noticed that Paul Hunt (1937-1979) didn’t yet have a Wikipedia page which I felt was quite wrong, so recently I’ve made a start, based on some of these papers. Please feel free to add and correct it – anyone can, it is open source and a community creation. And many thanks to Judy Hunt for creating this new Collection.

Access for all, and hybrid files

This one is a bit more tech-y.

One of my recent interests has been to try and find a single format for a digital file that can be used for sharing copies of archived born-analogue papers in fully accessible manner.

At the moment we often use multiple file formats for the same record to cover different access requirements (usually as PDF and as Word). This isn’t efficient and has the risk of the multiple files about one document becoming separated, compromising their accessibility.

Previously I’ve used OCR (optical character recognition) to try to improve the accessibility of PDF copies, but the accuracy is only around 80% and this isn’t good enough. In time the quality of OCR programs and apps may improve enough, but currently another method is still needed. Born-analogue documents with columns, text boxes and other complex layouts can confuse OCR programs – except some commercial ones (eg as used by to scan in old copies of city directories while skipping headers, ads, side-bars, etc).

One approach I am testing is to append the PDF document to the Word document, and save the resulting file as one, larger PDF. The Word document is usually created by starting with an OCR version then manually correcting the errors by looking at the original item. This can be called a hybrid file format.

This method works in terms of it being a single file, and for visually impaired people the navigation seems to work best when the plain (from Word) text is at the start of the file, and the imaged pages of the original document follow afterwards.

Another approach to creating a single file is to use a photographic image of each sheet of the archived paper, along with an alt-text description, then adding the plain text (from Word) into this combined, larger PDF. Happily the PDF format retains items such as alt-text descriptions of images. Whether this is an improvement on the first method currently remains to be seen.

Perhaps one of the long-term solutions will be made possible by the screen-reading programs such as JAWS that now sometimes include their own OCR function. However, having a good copy of the digital text already embedded within each file does also help with the search function used as a finding aid by archivists and researchers, and by search engines.

One current drawback with using PDFs for these hybrid files is that JAWS requires the reader to manually command a next-page at every step to read the full PDF, in comparison with Word where JAWS will ignore page breaks and read it out continuously (thanks Linda for pointing this out).

All comments and pointers gratefully received,

Keep staying safe,


PS – some people may find this of interest:

