future timeline technology singularity humanity
 
Blog»

 

9th August 2014

New technology can extract audio from visual data

Researchers at MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analysing microscopic vibrations of objects depicted in video. In one set of experiments, they were able to recover intelligible speech from the vibrations of a crisp packet, photographed from 15 feet away through sound-proof glass.

 

 

In other experiments, the researchers extracted useful audio signals from videos of aluminium foil, the surface of a glass of water, and even the leaves of a potted plant. Their findings are presented at this year’s SIGGRAPH, the world's largest conference on computer graphics and interactive techniques.

“When sound hits an object, it causes the object to vibrate,” says Abe Davis, a graduate student in electrical engineering and computer science at MIT and first author on the new paper. “The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. People didn’t realise that this information was there.”

Reconstructing audio from video requires that the frequency of the video samples — the number of frames of video captured per second — be higher than the frequency of the audio signal. In some of their experiments, the researchers used a high-speed camera able to capture 2,000 to 6,000 frames per second. That’s much faster than the 60 frames per second possible with some smartphones, but well below the frame rates of the best commercial high-speed cameras, which can top 100,000 frames per second.

In other experiments, however, they used an ordinary digital camera. Because of a quirk in the design of most cameras’ sensors, the researchers were able to infer information about high-frequency vibrations even from video recorded at a standard 60 frames per second. While this audio reconstruction wasn’t as faithful as that with the high-speed camera, it may still be good enough to identify the gender of a speaker in a room; the number of speakers — and even, given accurate enough information about the acoustic properties of speakers’ voices — their identities.

The researchers’ technique has obvious applications in law enforcement and forensics, but Davis is more enthusiastic about the possibility of what he describes as a new kind of imaging: “We’re recovering sounds from objects. That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.”

 

crisp packet

 

In their experiments, the researchers have been measuring the material, mechanical, and structural properties of objects based on motions less than a tenth of a micrometre in size. That corresponds to 1/5000th of a pixel in close-up images — but it's possible to infer motions smaller than a pixel by looking at the way a single pixel’s colour value fluctuates over time.

“This is new and refreshing. It’s the kind of stuff that no other group would do right now,” says Alexei Efros, an associate professor of electrical engineering and computer science at the University of California at Berkeley. “We’re scientists, and sometimes we watch these movies, like James Bond, and we think, ‘This is Hollywood theatrics. It’s not possible to do that. This is ridiculous.’ And suddenly, there you have it. This is totally out of some Hollywood thriller. You know that the killer has admitted his guilt because there’s surveillance footage of his potato chip bag vibrating.”

However, technology of this kind may raise concerns over privacy in the future — particularly with ongoing, exponential advances in screen resolution, computer power and sensing abilities. Imagine a miniaturised version, for instance, able to be incorporated into glasses or even bionic eyes. The use of surveillance drones and high-definition CCTV will also increase greatly in the coming years. Looking at the more distant future, the algorithms will be orders of magnitude more accurate and detailed, possibly combined with X-ray camera vision to peer through walls and other intervening obstacles. Perhaps by then, we will enter a world in which privacy becomes a thing of the past.

 

Comments »

 

 

 
 

 

Comments

 

 

 

 

⇡  Back to top  ⇡

Next »