Essential Metadata Summary

This document is a summary of the essential metadata recommendations for family history media which may be found in a Github repository. If there are any discrepancies between this summary and the full recommendations, the full Github recommendations apply.

Purpose

This is a recommendation for embedding family history metadata that is essential for the consistent capturing, sharing, interoperability and preservation of the data about what an image depicts – e.g. the traditional “writing on the back of the photo” – within a digital image itself in a machine-readable (non-visual) format.

There are thousands of metadata properties in use today, many of which overlap in full or in part with other properties and many of which are incompletely specified and used in inconsistent ways. The primary goal of this recommendation is to identify and clarify the meaning of a small subset of these existing properties which we have identified as essential to family history.

Overarching principles

In the most broad terms, these recommendations assume an implementation that adheres to the following general basic principles:

Don’t Remove Metadata. Even if the metadata is not a form you understand or is a duplicate of other fields, some other tool may have a use for it. Leave it in the file.
At Least These Fields. There are hundreds of metadata properties and fields available, of which we recommend using just one. You are welcome to read and write as many of the other fields as you wish, as long as you also use the one we recommend.
Only Embed What You Collect. You are only expected to embed family history data that you already collect from your users. You are not expected to collect any new data that is not already collected by your software. However, you may need to modify the format of the data you collect in order to align with these recommendations.

The Metadata Pipeline

In order to aid in the implementation of our standard and overarching principles (see implementation levels further below), it is helpful to understand how we describe the stages of metadata interaction within a given software data pipeline (“system”). We define five stages of interactions that the metadata of user-generated, uploaded, imported, or contributed files (e.g. media) may experience in any given software system:

RETAIN: The first thing a system must decide is whether to retain or strip the metadata already embedded in media that is uploaded, imported or added by a user. There are many reasons for a business to choose to strip the metadata, from reducing file size, to protecting privacy, to managing liability. By retaining metadata, the system makes no modification to the original file or its metadata and stores that original for the duration of their relationship with the user or owner of that file.
READ: A system may read some or all of the embedded file metadata into a separate file, database or other data structure. At this step, the metadata is no longer bound uniquely to the original file and can be manipulated by the system independently. Reading metadata does not imply metadata retention. A given system might both read and then strip metadata.
DISPLAY: The system can present the metadata stored in the database to an end user. The system can be designed to present only a subset of available metadata fields to and can modify the format of the metadata for presentation purposes.
EDIT: A system may allow a user to make changes to the metadata, whether outside of the original file or embedded within it. Typically, user-specified changes to metadata are stored first in some data structure, whether temporarily or indefinitely, and then potentially embedded into the file.
SEARCH: A system may allow users to search for files based on metadata fields associated with the file, either in a separate data structure or embedded within the file. This is not a metadata requirement but a beneficial user experience that can be created by the implementer of the metadata reading application.

Core Goals

Any individual software system will have a metadata pipeline that achieves some if not all of these stages for some if not all possible metadata fields. However, the goal of the FHMWG essentials recommendations is to foster an interoperable ecosystem of software products that go beyond this internal pipeline and achieve two additional goals for image files specifically:

INTEROPERATE: systems must interoperate so that users can import or export files and their metadata following shared conventions for file and metadata formats. Metadata interoperability may be achieved by adherence to existing metadata standards for reading and writing metadata as well as any method of information transfer including but not limited to direct API access, exportable metadata structures (.csv, .xls, .json, etc.), embedding edited metadata into original files, etc.
EMBED systems must have the ability to write metadata stored outside of the original files in independent data structures back into the original files themselves. e.g. interoperability is achieved via file embedded metadata. If the metadata is embedded then when a user downloads the photo it travels with the photo. This is the ultimate step for digital preservation.

Implementation Levels

No matter how thoroughly you choose to implement these FHMWG standards, you are helping to enhance the value of digital media for family history provided that you follow our overarching principles and the implementation guidance below. We have defined three levels to guide service providers to implement our recommendations. The goal is for providers to work towards reaching the highest level of implementation by starting from the lowest. Providers already close to achieving the requirements of a given level should ideally work towards coverage of that level before moving up to the next level.

Level 1: Preserve Metadata (“Retain”)

The most accessible level that any software system developer can achieve is to retain and preserve the metadata already present in a file provided by a user. A system at the preserve level retains the original file metadata undisturbed in whatever state it might exist, whether or not it matches our recommended structure. This step is absolutely critical to the preservation mission of the FHMWG.

Level 2: Present Metadata (“Display”)

The next level requires the system to read metadata that is already present in the file provided by a user and present it to the user via a user interface. At this level the system may choose to read data from any fields it chooses. However, the system should use the metadata fields in these recommendations when storing the data, enabling search and when presenting it back to the user.

Level 3: Put Metadata Back (“Edit and Embed”)

In this final level, the system allows the user to edit metadata, saves any changes made by the user, and writes the changes back into the files using the metadata fields in these recommendations, e.g. the metadata is embedded. These fields must be the ones presented to the user in the user interface when accepting edits.

General Background on Image Metadata

Photo metadata may be written in one of three formats:

Exchange Image File Format (EXIF), standards for devices (like cameras and scanners) that embed metadata.
IPTC Information Interchange Model (IIM), IPTC’s first multi-media news exchange format.
Extensible Metadata Platform (XMP), an ISO standard for embedding metadata in a format that cane be embedded, read, and interpreted consistently and is also extensible.

EXIF metadata is the oldest format and is written as an offset in the file with a known length that was agreed upon among device manufacturers. This format does not allow the set of metadata to be extended. It also requires that all the fields fit within the length given and either truncated or padded to make them fit.

IIM metadata allowed more properties to be stored in the file and stored than as key, value pairs. It duplicated some of the core EXIF data in addition to adding new fields, but did not allow new properties to be added. Best practices recommended synching the IIM and EXIF properties that stored the same data.

XMP format was developed by Adobe as an open standard and then adopted as an ISO standard. In addition to storing the data as key, value pairs, it allowed for extending the metadata properties by adding new schema definitions. IPTC defined its IIM metadata as core XMP metadata properties and recommended synching between the XMP IPTC core fields and the corresponding IIM and EXIF data.

The International Press Telecommunications Council (IPTC) is the global standards body of the news media and publishes the IPTC Photo Metadata Standard. The IPTC Photo Metadata Standard is recognized and accepted industry wide and provides guidance on how to use IPTC defined XMP metadata in a way that achieves interoperability.

Essential Family History Metadata Compatibility

The essential family history metadata recommendations closely follow the IPTC Photo Metadata Standard which provides clarity on how to read, write, and synchronize standard photo metadata. Most of the metadata that is important to family historians is also core to the IPTC photo metadata and has semantically equivalent metadata properties in the IIM and EXIF format. IPTC provides guidelines for mapping between its XMP data and the older IIM and EXIF formats. Additionally, IPTC publishes interoperability tests to validate that the embedded metadata is written correctly and consistently to the IPT standards.

This recommendation leverages and remains in accordance with existing popular metadata standards and guidelines that promote interoperability between software and formats for preserving the “writing on the back of the photo” which includes these elements:

Title
Description
Date
Location (names and geotags)
People (names and face tags)

In general, we require one XMP property per element and recommend syncing it to the appropriate IIM or EXIF properties also, in accordance with the IPTC Photo Metadata Standard. The goal is to define a consistent way to capture, share, and preserve these elements of essential family history metadata so that it can be consistently read and interpreted by software applications, even if the metadata was originally captured in one of the older (IIM or EXIF) format and not in the XMP format.

Recommendation Summary

Title

An image may have a title.

The title should be a short human-readable name or reference for the digital file.

Description

An image may have a description.

The description may be of any length and contain any information the user cares to add. The description could also include a caption for the image.

Date

An image can record the date of the depicted scene. The precision and accuracy of the date may vary, e.g. the date may only be specified by a year, or a month within a year, etc.

Note that this is the date of the scene depicted in the image, and may not be the date of the image’s creation. While the depicted date and the creation date may be the same for digital photographs, they generally differ for scanned images or artistic representations.

Many other date fields exist in other metadata, such as image creation dates, image modification dates, etc. As with all metadata, implementations may choose to support those if they wish, but this recommendation only addresses the date of the depicted scene.

Location (Names and GeoTags)

An image can identify one location where the primary focus of the depicted scene is found.

A location can have names such as a full name, identifier, sublocation, city, state, country, as well as GPS coordinates. At least one of the location name elements are required. Location identifier and GPS coordinates are optional.

IPTC allows storing more than one location in a single image’s metadata. Because the meaning of multiple locations is not uniformly understood, we recommend against using multiple locations.

Many other location properties exist in the IPTC standard. As with all metadata, implementations may choose to support those if they wish, but this recommendation only includes fields in the IPTC locationShownInImage property. Best practices recommend synching the location elements with the appropriate core XMP properties as well as IIM and EXIF properties.

GPS coordinates always identify a single precise point, but real locations may cover a larger area or be imprecisely located. The location names can help convey the scope of the GPS coordinates.

People (Names and Face Tags)

An image may depict persons either visually in the image or by close association (i.e. my grandpa’s house is associated with my grandpa even though grandpa is not visually in the photo).

The image may have a list of person names associated with the image.

A person

in an image may be referenced by a face tag (coordinates within the image). Person face tags are encouraged, but optional. In addition to the face tag, the person name must be added to the list of persons in the image.

Addtional Technical Information (for software developers only)

Some Technical Metadata Overview for software developers can be viewed in the FHMWG Github Repository Overview

More Technical Metadata Details for software developers can be viewed in the FHMWG Github Repository Recommendations

SAVE METADATA

Family History Matters

Essential Metadata Summary