Tuesday, July 25, 2006

Thoughts on Music and Text

This isn't revolutionary, but I was thinking last night about the difficulties involved in music-scanning software. That is, software that scans an image of music (presumably that you scanned in from paper) and creates a representation of that music in a format that's understood by one or more music-writing programs, such as Lilypond or Sibelius.

Text-scanning software, as evidenced by the quality of the scans at Project Gutenburg's Distributed Proofreaders, is pretty good, at least if you have a nice, clean page. However, the last time I tried out music-scanning software, I was very sorely disappointed. So, I began thinking about what the difficulties were with music, and how these were different from text.

Humans can easily recognize and interpret words in all different fonts and sizes, and with a standard font, like Times, very little time is spent actually interpreting the symbols (letters, punctuation). Instead, many common words and phrases are recognized by their structure and form, and the mind replaces any missing words or lettrs as needed, giving a very powerful method of interpretation, even with relatively minimal data. Does music operate in the same way?

First, let's look at fonts and sizes. While it is true that there are some different fonts, sizes, and sometimes even different symbols for musical 'letters' (but then, think of 'A' and 'a'), the variation is relatively small. An eighth note, for instance, is almost always a small, filled oval, with a line attached (called the 'stem') that has one flag. But then we have to consider the equivalent of ligatures.

A ligature, in typography, is when two separate letters are set together as one character. For example, fl and fl. It's sometimes hard to see on a computer screen, but open up a professionally-printed book and you'll see them. Anyway, in music, ligatures can occur when flagged notes are next to one another. Two eighth notes, for example. The flags are joined, and instead a bar runs between the stems.

Musical ligatures are very useful. The composer, arranger or typesetter can change the interpretation of the music based upon how the notes are joined together. For instance, two sets of three eighth notes is different that three sets of two eighth notes, even if they occur in the same time signature.

Anyway, one of the problems involved in scanning music is the lack of semantic understanding by the software. With text, it is much simpler; not only can you improve the recognition of each character, but you can include a dictionary of common words, which can be referenced to check each word as it is scanned. I don't know if any software actually does this, but the human mind does, which leads to the correcting mechanism above.

I wondered whether musicians do this with music, too. The answer is yes. At least, I do. Many times, there is a passage that I have played before, even if it's only a few notes. I glance at the music, see the starting point and recognize the relationship between the notes, and my brain sends the sequence to my fingers. Of course, this doesn't always work perfectly, but it is very helpful in sightreading, and practice corrects the times when it doesn't help. Additionally, musicians, while playing, can understand the context of the passage. They are aware whether the key is minor or major, and can sometimes guess the structure of the music, even if they're playing alone.

This ability to understand the minute pieces of music in context helps musicians read it, which is something that a computer could probably not do. However, music-scanning software could include samples and compare the relationship between notes to the samples.

Another difficulty in scanning music is that it is much more cluttered than text. In text, each character stands by itself, and so the software can pick each one out separately. Of course, sometimes there are complications, such as spots of ink on the page or underlining, but in general text can be isolated.

This is not so with music. Each note is in relation to a staff, which determines its pitch, and usually other notes. For example, if we were to take a slurred, dotted eighth note connected to a staccato sixteenth note, we must consider first their pitches, then their durations (keeping in mind that there is a dot to the right of the eighth note that applies to it), their articulations (the slur passes from the eighth note to the sixteenth, but the staccato is only on the sixteenth--or is it a smudge?), and any text that might be connected to them, for example dynamic markings. Some software attempts to enforce duration constraints on the scanned music by making each measure have exactly the correct durations. Musicians do this, too, but they have the flexibility and intelligence to decide if one note is actually an over-sized grace note or where the extra needed rest should go. The software, free from context, has more difficulty doing this.

Anyway, that's all I have for the time being on that topic.
Post a Comment