WWW2010 Talk
Title (256 chars max)
Exposing Audio Data to the Web: an API and prototype
Authors
- Corban Brook
- David Humphrey
- Al MacDonald
- Thomas Saunders
Abstract (2000 chars max)
The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media. We present a new enhanced API for these media elements, as well as a working Firefox prototype, which allows web developers to read and write raw audio data. We will demonstrate examples of how our audio data API can be leveraged to improve web accessibility, analyze audio streams, make in-browser synthesizers and instruments, process digital signals, create audio-based games, and drive animated visualizations; in tandem, we will also explore the code necessary for web developers to work with audio data in Javascript, and implement various audio algorithms, for example, Fast Fourier transform (FFT). Finally, we will entertain further possibilities that such an API would provide, such as text to speech, speech to text analysis, "seeing" 3D using sound, etc.
Ideas for Presentation
Just riffing, feel free to chop, cut and burn. - Al
Granting direct access to the audio stream in the browser, opens up the web as a powerful tool for accessibility, innovation and creativity. There are a myriad of readers for the web such as SuperNova for the visually impaired, but having worked with people who have absolutely no vision, there is an obvious gap in usability. Someone who is partially sighted or color-blind, can change the size of the screen and switch color palettes. But people who have very-little to no-sight-at-all have to rely on text-to-speech. The problem with this method is simple to explain, but difficult to solve. Have you ever tried to navigate a drop-down menu with your eyes closed using text-to-speech? This alone renders the web unusable in seconds. Imagine having absolutely no visual cues to help you know what to click on. Imagine having no idea if you are in the navigation or in the content. What if the tab indexes have been neglected on a site you need to pay your bills, or change personal information with a company? You would have to reply on other people to help you on the web, and you would miss out on all the benefits that the web can bring to your daily life.
This happens to be a wonderful use-case for audio stream access in the browser. Audio cues can be used to assist the blind, allowing developers to create content that is navigable for a wider audience. But there are even deeper implications to using audio technology in the browser. Research and development of video-to-audio technology by physicist Peter Meijer, has demonstrated that moving images can be converted into audio-signals the human brain can use to navigate in 3D space, and can in some cases create artificial synesthesia, allowing users to "really see" with sound as the brain adapts to common usage and begins filtering the relevant audio-signals before firring them at the visual cortex.
Recruitment of Visual Cortex for Sound Encoded Object Identification in the Blind
"Individuals using a visual-to-auditory sensory substitution device (SSD) called "The vOICe" can identify objects in their environment through images encoded by sound. We have shown that identifying objects with this SSD is associated with activation of occipital visual areas. Here, we show that repetitive transcranial magnetic stimulation (rTMS) delivered to a specific area of occipital cortex (identified by functional MRI) profoundly impairs a blind user's ability to identify objects. rTMS delivered to the same site had no effect on a visual imagery task. The task and site-specific disruptive effect of rTMS in this individual suggests that the cross-modal recruitment of occipital visual areas is functional in nature and critical to the patient's ability to process and decode the image sounds using this SSD."
http://www.seeingwithsound.com/nr2009.html
The potential application of this technology in the web is staggering....