Sunday, July 10, 2011

Ear-Training exercises for the iPod Shuffle

I recently had what seemed like a neat idea (to me, at least): wouldn't it be cool if I could create a bunch of mp3 tracks that would allow me to practise ear-training exercises while on the bus, in the park, or even (why not?) at my desk during a quiet spell at work.


The background

The type of thing I had in mind would play a set of musical examples of a given kind (e.g. musical intervals, chords, scales, arpeggios); after each one, it would pause for a few seconds before announcing the correct name (e.g. "major third", "perfect fifth", "minor seventh chord" etc). Each track could contain on the order of 10 to 20 such examples.

Encouraged by the results of my previous efforts to develop an interactive ear-training program within Mathematica (published as a Demonstration here), I suspected that it could serve as a starting point for this new project.

Just to quickly review, here is a rough mock-up of how Mathematica can generate a MIDI object corresponding to a simple interval (in this case a perfect fifth), played in ascending order:

In[1]:= Sound[{SoundNote["C", {0,1}], SoundNote["G", {1,2}]}]

(To actually hear this within Mathematica, you need to pass thie object to the EmitSound function.)

It's also possible to sound multiple notes simultaneously; here is a C Major triad:

In[2]:= Sound[{SoundNote[{"C","E","G"}, {0,1}]}]

With these basic building blocks, and Mathematica's powerful symbolic programming capabilities, it is easy enough to generate all sorts of musical examples with randomized features, varying pitches, instrument sound, and so on.


We have ways of making you talk, OSX

The other piece of the puzzle, automated speech generation, fell into place when I stumbled, quite by accident, on the speech capabilities of Mac OSX. The "say" function on the command line (i.e. in the "Terminal" App) allows you to tell Mac OSX to say something. For example:

> say "fitter, happier, and more productive"

You can choose from a number of built-in voices (go to System Prefs to see complete set):

> say -v Alex "fitter, happier, and more productive"

And, importantly for my purposes, you can tell it to save the output as a sound file (in the AIFF format):

> say -v Alex -o sound_bite.aiff "fitter, happier, and more productive"


So you can see the building blocks are all there: Mathematica's midi capabilities to generate musical sounds (intervals, chords, scales, arpeggios, etc -- whatever I am trying to recognize by ear), and the command-line "say" function to generate spoken fragments (to provide the answers to each example, after a suitable pause).

As an aside, it's probably worth noting that you can access the same speech functionality as "say" (it's an OSX-wide thing), using the Speak function:

In[1]:= Speak["fitter, happier, and more productive"]

However, there doesn't seem to be any way to "capture" the output of this function in a way that allows it to be combined with other sounds, so although this could be useful for a program running within a Mathematica session, it wouldn't be any help for generating mp3 files.


Converting MIDI to AIFF

My initial expectation was that combining the above ingredients would be fairly straightforward and have me on the road with my new mp3 test collection in no time. Unfortunately, as I started playing around I discovered a major obstacle: although you can combine formats such as MIDI, WAVE and AIFF within Mathematica, it is not possible to export these into a single file. Furthermore, Mathematica doesn't handle compressed formats like MP3 and AAC.

Now, you'd think that converting MIDI to WAVE or AIFF (or any format for that matter) should be quite a simple affair in this day and age, but that doesn't seem to be the case. Despondent, I began casting around for a workaround of any description. The best I could find was a shareware command-line program "midi2mp3" (available here). It's easy to download and use and, fortunately, it runs for free provided your sound file is no longer than 60 seconds (which doesn't bother me at all, since my individual musical examples are never more than a few seconds in duration). So, I could use midi2mp3 to convert to WAVE and then import that back into Mathematica.


Converting AIFF to AAC

There was just one remaining step: once Mathematica had created the final complete sound file, it could only be exported in AIFF or WAV format. I needed another external utility to convert this into MP3 or AAC ready for easy import into iTunes. Thankfully, this time there was a built-in OSX command-line function, "afconvert", that could do the trick.


Putting the pieces together

I now had all the pieces I needed to achieve my goal. It had been a long, dark, and at times almost unbearably geeky journey, but I was on the home straight.

So, without further ado, here is the complete solution that I ended up with (again, I'll just show a mock-up which demonstrates the key idea):

1. Generate midi sound in Mathematica, and export it as a midi object:

In[1]:= Export["sound.mid", Sound[{Sound["C", {0,1}], Sound["G", {1,2}]}]]

2. Use an external converter to get this into WAV format (here I'm calling it from with Mathematica):

In[2]:= Run["midi2mp3 sound.mid -e wave"]; mySound = Import["sound.mid.wav"]

3. Get speech fragment:

In[3]:= Run["say -v Alex -o speech.aiff \"the correct answer is perfect fifth\""]; myAnswer = Import["speech.aiff"]

4. Combine the two pieces and export in AIFF format:

In[4]:= Export["test.aiff", Sound[{mySound, myAnswer}]

5. Convert from AIFF to AAC (I had to poke around the net to find out how to use this function -- converting to MP3 instead of AAC is similar):

In[5]:= Run["afconvert -f 'm4af' -d 'aac' -b 98304 \"test.aiff\""]

The end result should be a file called "test.m4a" containing our simple interval followed by its name spoken.


Conclusion

So after a fair amount of pfaffing around, I have been able to generate sets of ear-training tests to listen to on my iPod. I've been using them for a couple of weeks now and so far they're working a treat. At first I found it difficult to recognize intervals whenever I'm unable to sing it out loud to myself (for obvious reasons, this simply isn't feasible on the bus or in the office) and being unable to replay it multiple times (rewinding on the iPod Shuffle is too clunky to be useful; however, I can hit pause to give myself more time to think). But I'm getting better at that with more practice.

Note that in the above, although I had recourse to several external scripts (midi2mp3, afconvert, say), I was still able to use Mathematica as the framework to bring them all together. This is very convenient, since it makes it possible to automate the process to a high degree.

Finally, I've glossed over many details (e.g. putting a pause of desired length between sound and answer; randomly choosing pitches, intervals; etc) but these were all fairly routine once the above steps had been worked out.

Check back again soon as I will endeavour to post a couple of sample files very soon.

EDIT: I've uploaded a couple of examples here.

No comments:

Post a Comment