Metadata (cataloguing information)
Consider the example of a scan of a pencil sketch. If presented online with no accompanying information, it might easily be mistaken for a 'born-digital' image made on a computer. Catalogue information or metadata, used to describe data is important information in its own right.
Metadata is often defined as 'data about data'. In the digital world, metadata is usually text that describes something about the creation, content or context of an individual file or a collection of many digital files. Metadata relating to a digital resource can come from one of two sources: it can be automatically derived from the digital resource itself (e.g. when Windows tells you the size of a file) or it can be created and associated with a resource by human beings (e.g. the subject of a photograph). Metadata created by humans is the most difficult and time consuming metadata to create but it is also usually the most important.
Metadata might take the form of carefully controlled words selected from pre-prepared lists, or it might be a simple 'free text' description or a set of keywords used to informally 'tag' a file to make it easy to find. It might describe something objective and straightforward, such as the dimensions of a digital image; or something much more complex, like the subject matter of a video or the legal rights associated with its use. Metadata is often held within databases, but it can take other forms - it can just as easily be found embedded within a digital file itself. In short, metadata provides the means for us to describe our digital resources in a structured way that enables us to share those resources with other people or with machines.
Metadata is usually structured in some way. Rather than randomly selecting aspects of a digital file to describe, it is common to use a set of agreed upon fields or 'elements' (e.g. 'Creator:', 'Title:', 'Subject:') a set of these is known as a metadata schema. Also, rather than select terms at random to enter into these fields we can instead select terms from predefined lists (known as controlled vocabularies). Expressing terms is a consistent way (such as entering a person's name as 'Surname, Forenames') also helps us control our vocabulary.
This approach has several advantages:
- It makes it easier to create the metadata, since our pre-defined fields tell us which information needs to be collected and recorded.
- It makes it easier to understand the metadata, making things clear to a user.
- It makes it easier to retrieve an item in a search, since the search query can be much more specific, targeting relevant fields, rather than searching across everything.
- It also makes it easier to share data and metadata - as long as common categories and terms have been used by all parties.
We may want to describe individual resources (e.g. a photo or an audio file), but sometimes we may prefer to describe collections of resources (e.g. a photo album or a music album). Or we might wish to describe just a part of a larger whole (e.g. an illustration found within a published book or part of a song). The challenge for those creating metadata is to decide what needs to be described, how much detail to go into and how metadata will be organised.
Metadata can tell us what something is ('descriptive metadata'). It might tell us where a thing has come from, who owns it and how it can be used ('rights metadata'). It might describe how a digital file was created ('technical metadata'), how it is managed ('administrative metadata') or how it can be kept into the future ('preservation metadata'). Or it might help us to relate this digital file with others ('structural metadata'). Descriptive metadata is the type most likely to be searched and displayed to the public.
These distinctions are useful to keep in mind when you are describing a collection of your own. What particular metadata categories will you need to include to make your data usable? (for example, using a 'Collaborator:' field isn't required if you always work alone). If you plan to use a repository service to provide access to your research, do they have mandatory metadata requirements? Some researchers can afford to spend hours creating metadata for each resource; others cannot.
See A recommended approach to metadata for a suggested set of fields that you might want to use.
So where do we put all this information? Metadata for digital collections can be held in several different places: within the digital file itself, within a database or in separate (but associated) text files. Most people will make use of a database to hold their metadata. Word-processing software is not recommended, as it can be very difficult to move data from a Word document (for instance) to elsewhere in the future. Using an actual database is better.
There are many types of cataloguing software available offering very different levels of functionality and complexity, a key point is to check that an 'export' function is included. This means that, if you need to move your metadata at some point in the future, you can do so. If you have a particular repository in mind that will eventually hold your research data, talk to them about the form in which they would like to import your metadata (for example, this might be as an Open Office spreadsheet).
Keywords are a popular way of providing information about the subject-matter of digital resources. To be really useful, though, you should pick words from a master keyword list (a 'controlled vocabulary'). If possible use a 'pick list' rather than a free text box to avoid inputting error. A master list of keywords ensures that a concept is described using only 'authorised' terms.
If you are organising your files into groups, your system of naming things should reflect these groups. Often, we use a three-letter code as an acronym for an overall collection and numeric references for each item. For example:
I might use 'NY' as the acronym for the whole of my New York Photographs collection
NY-1 is a group of photos taken on an initial trip.
NY-1-1.tif is the first photo taken on that initial trip.
NY-1-2.tif is the second photo taken on that initial trip.
NY-2 is a group of photos taken on a second trip.
NY-2-1.tif is the first photo taken on the second trip.
NY-2-2.tif is the second photo taken on that second trip.
A file naming convention is not essential, but does help to keep the things organised.