About the Project
The 'Banks Digitisation Project' is basically about improving intellectual access to a significant personal archive. The aim of the Project has been to digitise the papers of Sir Joseph Banks and make them available on the Internet, indexed so that the user can gain quick access to a single document or to a related series of documents. The Project has also been an opportunity to experiment with the application of a new technology to archives and manuscripts collections. Most imaging projects to date have concentrated on printed material, or single items or treasures, rather than on interrelated accumulations of documents.
The Project has been made possible by the Sir Joseph Banks Memorial Fund which dates back to 1905 and was the initiative of J.H. Maiden, FRS. Funds were raised through public subscription and the sale of Maiden's book, Sir Joseph Banks: the father of Australia (1909) and have been vested in the State Library since 1946. They were to be used towards editing, publishing and distributing the Banks Papers
The Library previously used the Fund to publish Banks' journal on the Endeavour in 1962, to subsidise Averil Lysaght's Joseph Banks in Newfoundland and Labrador in 1971, and to support the publication in 1979 of Banks' correspondence, mainly held in the Sutro Library, California, relating to sheep and wool.
Acquisition of the papers
The Project is limited to those papers of Sir Joseph Banks which are held in the Mitchell and Dixson collections at the State Library of New South Wales, in Sydney, Australia. These amount to approximately 10,000 manuscript pages and include correspondence, principally letters received, but also reports, invoices and accounts, journals, plus a small quantity of maps, charts and watercolours.
While many of these documents have been published in various works before, the collection is published here for the first time in facsimile, and extensively indexed
When Sir Joseph Banks died in 1820, he left behind a well organised archive which documented his influential career. It was the most comprehensive archive of its kind in Britain - perhaps the world.
The custodial history of the archive is confused but essentially, the dismemberment of the archive which resulted in the loss of much information began in the 1880s when Lord Brabourne, Banks' collateral descendant started selling the papers.
The main consignment of Banks papers held in the Mitchell Library were acquired in 1884. The papers did not come to the Library immediately but were first used in the compilation of the History of New South Wales from the records, the first volume of which appeared in 1889, published by the New South Wales Government Printer. They were again used in the multi-volume Historical records of New South Wales, which appeared from 1892 until 1901, also emanating from the Government Printer. The papers, having already been subject to earlier reorganisation, were annotated and rearranged during this process. They were transferred to the Mitchell Library when it opened as the Australiana collection in 1910 and have become known as the Brabourne collection, an appellation which obscured Banks.
Banks papers, including the Endeavour journal, were also included in the bequest of the Library's founding benefactors David Scott Mitchell and Sir William Dixson
In the ensuing 100 years, numerous smaller accessions have been made as estrays from the now far-flung Banks archive appeared in auction sales and dealers' catalogues. The latest acquisitions have been a single letter written by Comte de Lauraguais, acquired in 1989 for $7,500, and most recently, a single letter written by George Caley, acquired in 1996 for $850.
Organisation of the papers
The papers were rearranged to reflect the way they were used and accumulated by Banks. In some cases this was fairly straightforward, in others very complex. We know Banks kept together papers relating to specific activities, often with assigned titles, a practice which he began quite early.
Papers relating to the equipping of the Bounty voyage of William Bligh, for example, are grouped together under Banks' own title 'Plan for the Voyage with Letters from various persons who interferd in the management of it'. Letters Banks received from Bligh during the voyage of the Bounty have been grouped together, by Banks, under the title 'Correspondence Bounty'. Papers relating to the equipping of the Investigator voyage under Matthew Flinders are grouped together under the heading 'Correspondence relating to the fitting out of the Investigator for a voyage of discovery'. He kept these separate from another series titled 'Correspondence with Matthew Flinders'.
These archival series, of course remain in their current arrangement. The only changes have been the addition of some fugitives which had been removed by previous owners and can now be reinserted. Of course none of the series represented in this Project can be considered definitively complete given that the papers of Banks have been dispersed and are held by many different institutions world wide.
It seems probable, from information contained in several documents in the collection titled 'Index to names of writers', that Banks often numbered his correspondence and arranged it in volumes alphabetically by correspondent's name, and chronologically in order of receipt, within that. He also separated foreign correspondence from domestic correspondence. A volume covered a given year then it was closed off and a new one started. Many of the letters he received from his extensive correspondents also include a folio number from these volumes, written in ink in the top right hand corner. Sometimes a single document will contain three or four folio numbers variously imposed by Banks or a clerk, the Government Printers, the Library or previous owners.
It is difficult to determine how far this arrangement extended across Banks' entire archive. For this reason it is also not possible to replicate entirely Banks' original arrangement. A decision was taken therefore to arrange letters not clearly part of a series devised by Banks into series based on correspondent and arranged chronologically within that. This practical arrangement approximates Banks' probable original arrangement.
Parts of the papers have become known to researchers by their shelf location numbers which have appeared as published citations and which are no longer relevant in the new arrangement. The previous shelf location number of every document has been recorded for inclusion on the Internet.
On the surface at least, it seemed there were various options for the Project including the transcription of the papers. This had some appeal because of the perceived potential for full text retrieval of the documents. The cost, however, proved prohibitive - around $50,000 for an unedited first draft to be completed by clerical staff, not by people with any specific knowledge of the period or of 18th century manuscripts.
Transcription would also marginally increase the storage requirements because both the original document in facsimile and the transcription would need to be scanned. Transcription is the option taken by a separate digitisation project with which the State Library has had useful contact. This project, at the University of Bergen in Norway, involves digitising the archive of the philosopher Ludwig Wittgenstein. The Banks Archive Project at the Natural History Museum in London which I visited recently, is still undecided about hard copy transcription of Banks correspondence.
Transcription has not been our approach except in the case of the 1,200 page journal kept by Banks on Cook's voyage of the Endeavour. The complete text of the transcription may be downloaded. Only the journal appears both in facsimile and in transcription, partly because the State Library owns copyright in Beaglehole's transcription of the journal, and partly because of the length of the document - our indexing is to document level only and not to the page within a document. This level of indexing is superfluous in a document of 1,200 pages.
The Endeavour transcription can be readily compared to the original allowing the researcher to decide on accuracy for themselves, rather than relying solely on the transcriber's interpretation. The other thing to be said about transcription is the desirability of transcribing documents exactly as found with respect to idiosyncrasies of spelling, punctuation, etc. These details may convey information important to the history of the text.
I am speaking at length about the transcription option even though the Banks Digitisation Project decided against it because it raises important issues. It can be reasonably taken for granted in a project such as this that relying on free text searching of document transcription, quite apart from the costs involved in providing transcriptions in the first place, is not efficient and effective in searching a body of data of any size, particularly when searching for conceptual information. Anyone searching the Endeavour journal in this Project will quickly realise what I mean. A lot of indexing work remains to be done on the journal to make up for the inadequacies of free text searching in a document such as this.
A concept which is not mentioned literally in a document is overlooked in a free text search even though the subject may have been discussed at length. Terms used commonly in indexing or used commonly by current researchers, are not necessarily to be found in the documents themselves. Good examples of this from the Banks papers are the terms First Fleet and Rum Rebellion, both later appellations not used and unknown at the time the events occurred hence not found in documents created at the time.
William Bligh's emotive reference to John Macarthur as 'that extraordinary Hydra of New South Wales', referring to Macarthur's role in the Rum Rebellion would be completely missed in a free text search on the name Macarthur.
Language and spelling change with time, for example, the development and use of chronometers in determining longitude easily and accurately in navigation is obscured by 18th century reference to timekeepers. The term chronometer came into use much later.
The spelling of personal names was often inconsistent in the past. William Bligh is often referred to as Blythe. Arthur Phillip is referred to as Phillip, Philip or Phillips.
A free text search may oblige the searcher to scan many hundreds of entries with no indication which aspect of the topic is referred to, eg a free text search on James Cook will find all references to Cook but not distinguish between Cook's first, second or third voyages. Similarly a free text search on William Bligh will not distinguish between the first or second breadfruit voyages, the mutiny, his governorship of New South Wales, the Rum Rebellion, or his various other naval commands.
Free text searching cannot accommodate changes in place names, eg Van Diemens Land and Tasmania; New Holland and Australia, and many, many less obvious examples which occur in the Banks Papers.
Optical Character Recognition (OCR) was investigated but was never a real option for a manuscripts digitisation project of this nature. With current standards, OCR simply could not be considered for the digitisation of 18th and 19th century manuscript items encompassing many different styles and standards of handwriting, 18th century spellings, etc. This may become an option in the future as machine recognition of handwriting improves and becomes more commonplace, but the disadvantages of free text searching remain.
This left the only realistic option of manually creating a bibliographic database for the entire collection based on two record types - one describing and indexing the series; the other describing and indexing each individual document within a series.
The series level description includes:
Series title, with date range.
Number of documents in the series
Provenance note, which describes the custodial history of each document in the series, including previous Mitchell and Dixson Library shelf location numbers.
Background note, equivalent to a Biographical Note or Administrative History.
Subjects involves indexing the series for persons' names, ships' names, events, and other topics, common to the entire series. Very few documents can be fully understood by themselves; it is their accumulation into a meaningful context - the series - which provides the subject. The purpose of these entries, based mainly on ABN (Australian Bibliographic Network) authorities, is to describe the common content of each document within the context of the series.
In addition to the series level description, document level description of each individual document in the series is included. This is the difference between our usual level of description and indexing, and the level required for this Project. Each document description includes:
Document title, including the date of the document
Notes with any additional information, eg about enclosures, language the document is written in, etc
Author of document
Date of document
Series title to which the particular document belongs
Subjects, including additional persons' names, ships' names, events, and other topics, etc, which are unique to the document described and which therefore do not duplicate subject entries which occur at the series level.
All the fields, at series and document description level, are free-text searchable.
This very detailed level of description and indexing is the minimum level considered necessary to take advantage of the high speed retrieval and multiple simultaneous access capabilities of digital technology. If anything less than this level of access were provided, the benefits of reproducing a collection on the Internet would be barely greater than those gained by producing a high quality microfilm.
An institution considering this type of project as a means of providing access to collections must be fully aware of the high level of commitment, particularly in terms of staff time, which is required, and must decide whether this can be justified by the outcome. The complexity of the task should not be underestimated. The adage that the period of time for completion of such a project should be doubled and then doubled again could not be more true.
With the increasing number of people visiting the web site and contacting the Library in response to it, comes the realisation that expectations of the Project indexing are higher than of a non-digital indexing project. Somehow because the technology is more sophisitcated there is an expectation that the indexing will be too.
The Banks Project took the option of scanning the manuscripts from microfilm rather than exposing the documents themselves to the scanning process. This decision had the advantage of providing a high quality microfilm for readers who may request it, and a human readable backup.
Both the microfilming, done to the level of 160 line pairs per millimetre, and the digitisation are being carried out by the Library's usual microfilm contractors, W. & F. Pascoe Pty Ltd. Pascoes generate the original G4 TIF files, uncompressed, as well as the converted, compressed and adjusted JPEG files which are then supplied to Discovery Media. The original documents are generally low contrast, brown ink on cream paper but filmed in black and white, so the option to enhance the contrast on the microfilm was taken. Individual images were optimised in Photoshop for further contrast, and sharpened. Each image file averages 120 kilo bytes after scanning in and compression.
Various degrees of limited enhancement of the image occur during the scanning process to reduce verso bleedthrough, staining or other residues, etc. Over enhancement of heritage materials raises the issue of the authenticity of the document as a trade-off to document legibility. A digitisation project of Civil War documents in USA did rely heavily on digital enhancement to improve the legibility of documents which had been written in ink diluted with water, a response to chronic wartime shortages. While this may be a valid option, it unquestionably changes the effect the documents create and destroys some of the information implied and conveyed.
Another important variable in digital image preservation is the resolution of the image measured as dots per inch (dpi). The greater the dpi the clearer the image but the greater the storage requirements, unlike microfilm which is resolution indifferent. As image storage is one of the major capital costs of digitisation, the images will be stored compressed.
For the Banks Project it was decided to scan at a minimum resolution of 100 dpi, quite low, but using 256 grey scales rather than simply black and white. It's the 256 grey scales that makes the difference between the average facsimile transmission and these facsimile images. The small amount of pictorial material in the Banks collection, mainly watercolours, will be scanned in full colour.
After filming and scanning of the images, the microfilms are then returned to the Library to be linked with the relevant data.
Some of the value of the Banks Project is undoubtedly in the opportunity it provides to subject the Banks papers to a thorough analysis in an attempt to replicate the original archive as far as possible. The very close and detailed description required and the recreation of series has led to the identification and correction of long perpetuated misattributions of documents and incorrect datings.
The Banks Digitisation Project is a small part of the so called 'fourth revolution' in the history of human communication: the transmission of digital information via computer networks.
What this means for the people and institutions, such as libraries and archives, that have traditionally worked with the written word is yet to be measured. Most obviously readers log on to their computers to find the information they need rather than actually visiting libraries and archives, taking pressure off already stretched resources on site. On average there are 2,000 visits a month to the still incomplete Banks Project on the Internet. The Library would barely have had 2,000 visits to its Macquarie Street site to view the Banks papers on microfilm or in original during the last 100 years. Digitisation of collections opens the institution to users who would find it difficult or impossible to visit. It also assists in the preservation of paper based collections, sometimes priceless or fragile, by providing alternative access.
For the State Library such ease of access represents, on the one hand, a loss of control over material it has housed and conserved for many years. However, access, together with preservation, remain the Library's highest priorities. One of the options to retain some control is to somehow degrade digital images.
The Internet is not a substitute for the quality selection of material performed by libraries and archives. The question for information providers becomes one of determining the most appropriate mode of delivery for information - from hard copy through to digitisation. Reference materials, for instance, work well in electronic format which, optimally, lends itself to refined searches and multiple simultaneous access.
Digitisation is still expensive and time consuming, not the least in terms of staff time, and reading from a computer screen is still not as comfortable as reading from a printed page but, if used selectively, it is a process with the potential to enhance considerably access to collections.