Limitations of layering HTML5 Audio

I’ve been porting the the strings app (Flash version in above video) into HTML5. Canvas has been dealing with the graphics well, but have found some limitations with its Audio.

The Flash Version

Here’s how the Flash is setup. There are 38 plucked sound samples along a chromatic scale. When a string is created, it saves its pitch index (0 through 37) based on its length – low notes for long strings, high notes for short strings. Since multiple strings can share the same pitch, it needs the ability to layer or multishot samples. Otherwise we’ll hear truncated glitches.

This happens smoothly in Flash. I reused code from my Consulate General Flash piece. 38 Sound objects are created at launch, one for each note. Then, each string maintains its own SoundChannel object, returned by Sound.play(), allowing for multishot layering.

The only Flash limitation is how many sounds can be layered at any given moment. (With Flash v4-5 I remember it being 8, but now it is seems capable of around 16.) If you ask Flash to play too many sounds, the latter sounds simply won’t fire, resulting in momentary glitches. That’s why I deliberately chose pizzicato samples with short sustain.

Also, I import a single MP3 with all 38 samples, storing the start times of each note in an array. I feel it’s cleaner to request one MP3 than make 38 HTTP requests. It also reduces load time by cutting out the in-between request lag. When a Sound.play() fires, a timer stops the sound just before the next sample would begin.

// these are the marker points for each note
public var arrNoteTime = [0, 3, 6, 8.25, 10.5, 12.75, 15, 17.25, 19.5, 21.75, 24, 26.25 ...

public function playNote(rVol:Number):void {
trans.volume = vol0 + rVol*(vol1-vol0);
chan = snd.play(t0Snd, 1, trans);
var delay = t1Snd-t0Snd;
TweenLite.delayedCall(delay/1000, stopNote, [chan]);

HTML5 Approach

The first thing I tested was the ability to multishot a single <audio> sample. Doesn’t work. The currently playing sample gets cut off. Next, while Flash only seems concerned with how many sounds are actually currently playing, HTML5 doesn’t even like mere existence of too many <audio> elements. (I also tried using pure Audio objects instead – same is true.) In Chrome, starting around the 9th <audio> element, they simply won’t play().

So importing 38 separate MP3 files, which would require an <audio> element for each note, is definitely out. Fortunately, I already have my above method of using a single MP3 with the start times of each note.

To allow layering of the samples, the only solution is to manually maintain a pool of 8 <audio> objects, which act like Flash’s SoundChannel. The playNote function looks like this:

// what channel to use?
this.auObj = document.getElementById("channel"+chan);
// set volume
this.auObj.volume = lerp(this.vol0, this.vol1, this.rStrength);
// is that channel in use? arrChan stores booleans
var inUse = arrChan[chan][0];
// seek to the start time
this.auObj.currentTime = this.t0Snd/1000;
// if that channel is in use, it will just seek there and let it keep playing
// otherwise, trigger play function
if (!inUse) { this.auObj.play(); }
// stop around 300 ms before next sample triggers - will trigger pause()
setChanTimeout(chan, this.t1Snd-this.t0Snd-300);
// go to next channel
chan = (chan + 1) % channels;

This performed pretty well. But we now have a major loading issue. Flash only needs to load the single MP3 (around 900k) once, and creates multiple Sound instances. However, every <audio> triggers a new HTTP request, multiplying our load time dramatically. I also attempted load a single <audio> element, then create the remaining 7 elements only after the MP3 was fully loaded, hoping the browser would be smart enough to use its cache. No luck: Safari activity window reports 8 HTTP requests.

Conclusion: Some music apps that can work around the channel and loading limitations might still work. Perhaps an 8-channel drum machine (though that introduces millisecond-level timing issues). But for this project, the best plan is to use Flash audio communicating to the HTML5 canavas through ExternalInterface. More on that next.

8 comments on ‘Limitations of layering HTML5 Audio’

  1. Tom says:

    Hi, really interesting post. It made me wonder what you see as the advantage of having some elements in HTML5 and some in Flash. I’ve been thinking around similar issues for progressive enhancement of sites and generally come to the conclusion that if you’re going to use Flash for some integral part of the app it doesn’t make sense to offer the other parts in HTML as the user without Flash will get a meaningless experience anyway…

    Of course I understand that if it’s just for the sake of experimenting it doesn’t really matter. Just interested in your thoughts.


  2. JP Sykes says:

    I found additional issues once you move to mobile. The main reason as I saw it for using HTML5 audio over Flash was that I could build an interface that would work on the desktop and on all iphone/ipod/ipad browsers, rather than alienating that segment with “flash won’t work”.

    But audio on ios mobile safari is appalling once you get above playing 1 file at a time. Mobile safari has no concept of playing multiple samples at the same time, and will only allow you to play a 2nd one once the 1st one has played all the way through.

    It makes HTML5 interfaces, games, music apps that want to be browser based, and want to run on iOs just out of the question.

    This leaves you with no other option that making an app, and audio in Objective-C doesn’t look like the easiest world to jump into either.

    Awesome projects by the way, I love the MTA work.

  3. Jenna Fox says:

    Yeah. It’s very troubling dealing with Google Chrome’s media issues. It seems that even in audio, they’re forcing developers to just use flash instead.

    I recently ran in to all of these issues in my game prototype “Ripple”, which runs great in Safari 5 and Firefox 4 Beta, but is basically unusable in Chrome due to the audio triggering reliability issues: http://creativepony.com/games/ripple/

    I did have the same issue with audio elements always wanting to load the sound from scratch, and worked around it in a terribly hacky way. I have a ruby script which compiles all the audio and visual assets (WAVs, PNGs, and the likes) in to base64 encoded data URIs, and then packs those in to a json file which I name assets.js. I then download that file using an xmlhttprequest (with progress shown in that loading indicator pie chart thing), with the final 5% of the pie chart left unfilled as time for the data to load. The browser is incredibly quick at decoding the resulting multi-megabyte json file (takes about a third of a second), leaving me with an object I can pull the audio data from. Audio objects seem to load data URIs up to a few hundred kilobytes essentially instantly, allowing me to create and destroy them without consequence.

    This process, as messy and wasteful as it is, allowed me also to use separate wav files rather than trying to seek around in an audio spritesheet. It also allows for very nice failthrough when testing locally – if I skip loading assets.js, the code fails to find the needed files in the assets object and goes on to download them in the usual fashion.

    In conclusion, timing issues leave this game prototype unusable in Google Chrome. If I really wanted this to be a playable final version of my game, I’d need to port it to flash – in which case I’d probably just rebuild the whole thing there, rather than depend on external interface mucking about, as it’d open up the possibility of flash game licensing. I’m leaving it for now though, as the prototype did it’s job well enough that I now feel confident in rebuilding the concept as an iPad game with cocos2d-iphone. :)

  4. richtaur says:

    We’ve had similar audio woes. Fortunately we’ve got the ear of the Chrome dev team and have filed some bugs; should see some improvements in future versions!

    Good stuff, thanks for posting this.

  5. [...] You can also use your mouse to pluck strings/subway lines on the site, though not on the video above sadly. For all you techies who want to know who he did it, Chen explains it all on here. [...]

  6. Are you able to put up the HTML5 audio version you said had poor performance? I’m sure the Firefox team would be interested in trying it out and seeing if there were bugs that needed filing.


  7. [...] à de nombreuses limitations et bugs lors de stratification des échantillons multi-shot. (Voir ce post pour plus de détails.) si le signal audio est déclenchée par le flash en arrière-plan, la [...]

Leave a Reply