Proposals for extending Ogg Vorbis comments

Metadata has become an ubiquitous part of digital audio formats, since attempting to encode all relevant information into a filename can quickly become unwieldy. MPEG audio attempted to do this after the fact, with ID3 tags, but subsequent standards have attempted to find a better way to build metadata into the files. One of the novel features that vorbis comments offer is that, unlike metadata formats that offer a fixed list of possible data types, vorbis comments are of the form “name=value”, placing no restrictions on the field name, making it easily extensible. Also, any comment field may appear any number of times, allowing for cleaner descriptions of multi-valued fields, such as if more than one artist contributed to a song. These extensible comments have found their way into standards besides Ogg Vorbis, such as the Free Lossless Audio Codec, so effective usage of vorbis comments can be relevant to anyone's audio collection.

The developers of the vorbis format created a base set of comment field types, all of which are optional, leaving the task of specifying what should appear in metadata primarily up to the user. Essentially, all that is offered is a small amount of semantic information so that certain common fields can be understood among all vorbis users. Opinions on what role metadata should play vary widely, and the vorbis developers view comments as “the equivalent of a quick note scribbled on the bottom of a CDR,” so the current comment specification is not likely to change. This leaves those who desire a more complete, better defined comment system on their own to find that definition.

For the most common cases, the given set of comment fields is sufficient, but it is lacking for attempts to encode information from some types of audio, most notably with DJ mixes. Also, the fields are (intentionally) very generic, which makes it difficult to search for someone with a more specific role in the creation of the piece, such as a conductor or remixer. Finally, the roles of certain fields, such as artist and performer, become less clear when there is more than one class of either. Although all of these shortcomings are intentional decisions in the design of vorbis comments, they make it, in its barest form, unsuitable to concisely and clearly encode information about an audio work. The suggestions offered here attempt to alleviate this problem.

This document is a perpetual work in progress; music and its presentation change, and so should its metadata representation change to better encode relevant information. The primary goal is clarity. Any information that does not fit well into the base specification should be given its own field, and the meaning of old fields should be made more specific to prevent overlap. Field names should be kept as generic as is reasonable, so as to allow reuse when the same role is found in a different genre of audio, but the meaning of a field should be discernible from the name itself. In the case of multiple occurrences of a field, the order of the fields should carry no meaning. If additional information is needed, different field names should be used. Most importantly, I want your input. Feel free to email david@gophernet.org to discuss your ideas. It would be nice if, after writing all this on what vorbis comments could be, more than one person used it.

There is a another attempt to extend vorbis comments, listed in the references [3], which has goals similar to mine, conveniently allowing me to steal several of its ideas, but it has some severe flaws. The concept of singleton tags, though logical for some fields, drastically limits the flexibility of the comments when applied to AUTHOR. Ironically, the example given for multi-valued fields in the original specification uses AUTHOR. The addition of a language tag smacks of localized fields, which could result in non-portable field names. I don't hate people who don't speak English, but we have to agree on something. As for the contents of the fields, most fields will be in the language of the artist, such as the artist's name, title of the work, &c, and thus most of the fields are effectively language-independent. Perhaps I don't quite understand the scope of the problem, but currently I think that the idea of language tags would only damage portability. My final gripe is that the author of the recommendations seems to have an irrational fear of spaces. Spaces are explicitly allowed in field names and should be used when appropriate instead of creating jumbled messes of letters.

Involved People

The base specification provides ARTIST and PERFORMER. I find this a bit sparse.

ARTIST
Perhaps the best description for this field is “the person or group whose name is on the CD cover.” This is perhaps the most ambiguous of the fields, since the person thought of as responsible for the work varies by genre. In popular music, this would be the performing band. In classical music, this is usually the composer, but it may sometimes be a performer. In spoken word tracks (e.g., audio books), this is the author of the work. In DJ mixes, this is the DJ. Essentially, this field allows for the most prominent person or persons involved in the work to be easily displayed.
SOURCE ARTIST
The artist of the work being performed, when different from ARTIST. This field applies for DJ mixes, since the person or group in the ARTIST field will be making a performance of a recording done by another artist.
COMPOSER
The author of the work. This may be a composer in classical works sold by performer or a songwriter in popular music if different from ARTIST.
PERFORMER
A performer of a work not acting in another specific role, such as conductor or artist. This would include the speaker of audio works and sometimes the performer of classical works in cases when ARTIST or ENSEMBLE isn't suitable. Instrument could be included in square brackets, such as in “PERFORMER[CLARINET]” or “PERFORMER[GUITAR]”, though this may make searching for a specific performer more difficult.
ENSEMBLE
The group performing the piece, when different from ARTIST. For example, this would be the orchestra in classical works.
CONDUCTOR
The leader of an ensemble performing the piece.
REMIXER
If the version of the sound recording is a remix, i.e., a new version created from the original tracks, the remixer should be included here, if known.
PRODUCER
The person responsible for the project, usually providing funding and some form of artistic direction.
ENGINEER
The person responsible for creating the mastered record from the recorded tracks, or, for live recordings, for keeping the sound levels properly balanced. Sometimes the sound engineer is also referred to as the “producer”; if these are the same people, producer is the preferred field.
GUEST ARTIST
Sometimes when doing a collaboration with another artist, there will be a primary artist in charge of the recording and a guest performer or guest performers. This field should be used for artists in a collaboration whose role should be distinguished from that of the primary ARTIST.

Album information

These fields should be the same for every track in an album. Vorbis provides ALBUM and ORGANIZATION. ORGANIZATION is a bit ambiguous, though, since different companies are often involved in production and publishing, so PUBLISHER was added. To provide a unique identifier for an album in the spirit of ISRC for tracks, PRODUCTNUMBER and CATALOGNUMBER were added.

I know I said earlier that I would use some spaces in field names (PRODUCTNUMBER, CATALOGNUMBER), but TRACKNUMBER kind of set the trend for these.

ALBUM
The name of the album or collection from which this track was taken
ORGANIZATION
The organization responsible for producing the album; i.e., the record label.
PUBLISHER
The organization responsible for publishing the album. This is often the same as ORGANIZATION. If this is the case, both should be given to allow for searching by PUBLISHER and CATALOGNUMBER, but if PUBLISHER is missing, it should be assumed to be the same as ORGANIZATION.
PRODUCTNUMBER
The Universal Product Code, EAN, or JAN code used to identify the album if it is a retail product. These are given in the form of a bar code. CDs will almost always use either the 12-digit UPC-A or 13-digit EAN-13 forms. The various bar-code systems are intended to be compatible with one another, so no identification of the code type should be necessary.
CATALOGNUMBER
The number used by the publisher to uniquely identify a recording. On CDs, this is usually printed along the spine and is occasionally the same as the PRODUCTNUMBER.
VOLUME
The volume number for a multi-volume work, such as a multi-disc album or boxed set. This doesn't necessarily have to be a number; it could, for example, be a subtitle for the volume.
RELEASE DATE
The date that the album was published. This can be used to distinguish among various remasters and re-releases of an album. See DATE for suggestions on format. This date is often also included, though in a different form, with the COPYRIGHT information.
SOURCE MEDIUM
The medium from which the track was ripped; e.g., CD, Radio, Cassette, Vinyl LP

Track information

These fields are used to identify the specific song, and should usually appear only once in a track. The main addition I stole from [3] is OPUS, to more specifically identify a classical work. Vorbis provides TITLE, VERSION, TRACKNUMBER, DESCRIPTION, GENRE, DATE, LOCATION and ISRC.

TITLE
The title of the work
SUBTITLE
This field is intended for use with FLAC, in order to connect specific titles with an embedded cue sheet. A single file can effectively contain multiple works, indexed by TRACK and INDEX in the case of cue sheets, and they can be specified using subscripts like “SUBTITLE[TRACK 3]” or “SUBTITLE[TRACK 7:INDEX 2]”. This should only be used for the case of multiple works in the same file and not for cases where a single work has multiple titles.
PART
When a single work is spread across multiple files, this is to be used for the title of the portion of the work. For example, a symphony with multiple movements could use TITLE for the name of the symphony and PART for the movement.
VERSION
This field should be used to differentiate multiple versions of a track in a collection or provide remix information.
OPUS
The number of the work, if applicable. This is not always referred to as “Opus”, and as such should include the name of the numbering system. For example, a Bach work may be referred to as “BWV 872”, and a Mozart work might be referred to as “KV 339”. For clarity, the same abbreviation should be used for each system consistently (e.g., pick one for K and KV with Mozart), and, preferably, conflicts in abbreviation should be avoided (K could be for Kochel or Kirkpatrick). I suggest you use “BWV” for Bachwerkeverzeichnis, “KV” (Kochel verzeichnis) for Kochel's catalog of Mozart's works, “K” for Kirkpatrick's catalog of Scarlatti's works, “RV” for Ryom Verzeichnis when used for Vivaldi, and just spell out “Opus”.
SOURCE WORK
In the case of soundtracks or music inspired by a movie or play, this field is intended for the original work from which the music was taken.
TRACKNUMBER
The index number of the work in a collection.
SPARS
The Society of Professional Audio Recording Services designation of whether the recording process was analog or digital. It consists of three components, for the recording, mixing, and mastering of the recording. These designations are commonly seen on classical music CDs. DDD would be a completely digital process, ADD would be an analog recording that was digitally mixed and masterd, AAD would be analog recording and mixing, and so on.
DESCRIPTION
A short text description of the contents.
GENRE
A short text description of the genre
DATE
The date the track was recorded. I recommend using a variant of the ISO 8601 date and time representation, modified to look less stupid. Using YYYY/MM/DD HH:MM:SS as a basis, include as much information as known and omit the rest. “2004” would be a recording made in the year 2004, and “2002/04/01 13” would be a recording made at 1PM on April 1, 2002. The time zone of this time should be given local to LOCATION. Season can be used by spelling it out and placing it in the same order of specificity; for example, “2004 Fall”. Ranges can be given as date1-date2
LOCATION
The recording studio, venue, or other physical location where the recording took place.
ISRC
The International Standard Recording Code number for the track. This is used to identify the track for royalty purposes, and, if used, is included as part of the table of contents of audio CDs.

Copyright Information

Vorbis provides COPYRIGHT, LICENSE and CONTACT, and I see no reason to expand from these. I did, however, clarify LICENSE, since the inclusion of “All Rights Reserved” was more likely a half-hearted attempt to provide everyone a means of using LICENSE rather than a particular love of the Buenos Aires copyright convention. Also, for the COPYRIGHT field, I disagree with the method of display in vorbis-tools. The current version of vorbis-tools (1.1) will display COPYRIGHT fields as “Copyright field contents”, assuming that the field contents are always of the form “year copyright holder, which is not always correct. Copyright statements on audio works often include the information for both the song's copyright and the recording's copyright, and they are often more complex than just a year and copyright holder, so I suggest that you include the appropriate symbol, © or ℗, and vorbis-tools should be modified to remove its assumption. (Windows seems to have trouble displaying the second symbol; it's a circle-P, sound recording copyright, unicode 0x2117.)

These fields may be different for each track in an album, and COPYRIGHT and LICENSE, at least, should probably only appear once in a track.

COPYRIGHT
Copyright attribution; e.g., “© 2001 Nobody's Band” or “℗ 2001 Lightning Records”.
LICENSE
License information for redistributable works. For example, “Any use permitted” or a URL to a license such as a Creative Commons license (Distributed under the terms of the Creative Commons Attribution license. See http://creativecommons.org/licenses/by/2.0/ for details.) or the EFF Open Audio License. Works not licensed for redistribution should not include this field.
CONTACT
Contact information for the creators or distributors of the track. This could be a URL, an email address, or the physical address of the producing label.

Examples

Here a few examples, taken from my CD collection. Of course, I don't listen to every possible type of music, so if you have some examples that you think would be helpful, feel free to send them along, especially if they aren't from a CD.

ARTIST=Johnny Cash
COMPOSER=S. Silverstein
PRODUCER=Bob Johnston
PRODUCER=Bob Irwin
ENGINEER=Neil Wilburn
ENGINEER=Bob Breault
ALBUM=Johnny Cash at San Quentin
ORGANIZATION=Columbia
PUBLISHER=Columbia/Legacy
PRODUCTNUMBER=074646601723
CATALOGNUMBER=CK 66017
RELEASE DATE=2000
TITLE=A Boy Named Sue
TRACKNUMBER=11
GENRE=Country
DATE=1969/02/24
LOCATION=San Quentin Prison
ISRC=USSM19901986
COPYRIGHT=© Sony Music Entertainment Inc., ℗ 2000 Sony Music Entertainment Inc.
SOURCE MEDIUM=CD
  
ARTIST=Dieselboy
SOURCE ARTIST=Dom
SOURCE ARTIST=Kemal
REMIXER=D. Higgins
REMIXER=C. Ritter
REMIXER=K. Danner
PRODUCER=Damian Higgins
PRODUCER=Eric Silver
PRODUCER=Louis Montorio
ENGINEER=Rick Essig @ The Master Cutting Room, NYC
ALBUM=The Dungeonmaster's Guide
ORGANIZATION=Human Imprint Recordings
ORGANIZATION=System Recordings
PUBLISHER=The Greenwich Music Group
PRODUCTNUMBER=820997800823
CATALOGNUMBER=HUMA8008-2
VOLUME=The Dungeon Master's Guide
RELEASE DATE=2004
TITLE=Moulin Rouge
VERSION=Dieselboy + Kaos + Karl K Remix
TRACKNUMBER=14
GENRE=Drum and Bass
DATE=2004
LOCATION=Philadelphia, D.Cell
COPYRIGHT=℗ & © Moving Shadow LTD.  Courtesy of Moving Shadow LTD.
SOURCE MEDIUM=CD
  
ARTIST=Wanda Landowska
COMPOSER=J.S. Bach
ENGINEER=Nathaniel S. Johnson
ENGINEER=James Nichols
ALBUM=The Well-Tempered Clavier, Book II
ORGANIZATION=RCA Victor
ORGANIZATION=Red Seal
PUBLISHER=BMG Classics
PRODUCTNUMBER=078635782523
CATALOGNUMBER=7825-2-RC
VOLUME=Disc 2
RELEASE DATE=1988
TITLE=Fugue IX in E Major
OPUS=BWV 878
TRACKNUMBER=2
GENRE=Baroque
DATE=1951/06-1954/03
LOCATION=Lakeville, Connecticut, Wanda Landowska's home
COPYRIGHT=© 1988, BMG Music, ℗ 1988, BMG Music.
SOURCE MEDIUM=CD
  
ARTIST=Beethoven
ENSEMBLE=Berlin Philharmonic Orchestra
CONDUCTOR=Andre Cluytens
PRODUCER=Ken Kahn
ALBUM=Best of the Great Composers
ORGANIZATION=Seraphim
PUBLISHER=CEMA Special Markets
PRODUCTNUMBER=077775786422
CATALOGNUMBER=S21-57864
RELEASE DATE=1992
TITLE=Symphony No. 9 in D minor
PART=II. Molto vivace
OPUS=Opus 125
TRACKNUMBER=3
GENRE=Classical
COPYRIGHT=℗ © 1992 CEMA Special Markets
SOURCE MEDIUM=CD
  
ARTIST=They Might Be Giants
PRODUCER=Danny Bramson
PRODUCER=Guy Oseary
PRODUCER=Pat Dillett
PERFORMER=Robin "Goldie" Goldwasser
ALBUM=Austin Powers, the Spy Who Shagged Me: More Music from the Motion Picture
ORGANIZATION=Maverick Recording Company
PUBLISHER=Maverick Recording Company
PRODUCTNUMBER=093624753827
CATALOGNUMBER=9 47538-2
RELEASE DATE=1999
TITLE=Dr. Evil
SOURCE WORK=Austin Powers: The Spy Who Shagged Me
TRACKNUMBER=11
GENRE=Pop
COPYRIGHT=Courtesy of New Line Music Co., ℗ 1999 New Line Productions, Inc.
SOURCE MEDIUM=CD
  

Remaining Ambiguities and Holes

Occasionally an artist will be given as something of the form “DJ Hejaz presents Sideproject Beatz”. This practice seems to be most common in electronic music, but might occur elsewhere. The artist is Sideproject Beatz, but it's really just DJ Hejaz performing under a different name. Since this form could, theoretically, appear in any of the involved people roles, encoding this would require side-project fields for each possible involved person field, which would be a big mess and would fail to connect an artist to his side project in the case of multiple artists. I recommend simply dropping the information about DJ Hejaz and using “Sideproject Beatz” as the field content, or, if you really want the whole thing, just use the entire “presents” form as the artist name.

References and further reading

Credits

Initial version, 2004/09/28, David Shea. Thanks to David Cantrell and Simon Fowler for suggestions and proofreading.

2004/11/13, removed SOLOIST after Attila Bogár pointed out that the difference between it and PERFORMER is unclear.

2004/11/21, added PART and a suggestion for making LOCATION sortable, again from Atilla Bogár.