MIT Media Lab Visit

by Created: 24 Jan 2013 Updated: 22 May 2014

The MIT Media Lab is a national treasure. At the end of January, 2013, I toured the lab and got to experience the future.


Personally, the highlight of the trip was meeting Pattie Maes. Dr. Maes founded and directs the Fluid Interfaces group at the MIT Media Lab, and before that founded the Software Agents group. Her CV and Wikipedia page don’t do her justice. She was an early pioneer of collaborative filtering, and is leading the way in rethinking interfaces. You may be one of the 8.5 million people who have seen her 2009 TED talk (and if not, you should watch it).

Anyway, as you can see from the Fluid Interfaces page, Dr. Maes and her graduate students have created some seriously cool projects and technologies that I predict will appear in many future successful products. Take a moment to peruse their site and be inspired.

I also really enjoyed meeting Dr. Grace Woo (also here), whom Fast Company named in 2013 as one of the top 100 “Most Creative People.” Grace’s Ph.D. thesis, VRCodes: embedding unobtrusive data for new devices in visible light, and her earlier project, bokode, explored storing information in plain sight in ways that are visible to detectors such as cameras but invisible to the human eye. Her ideas are brilliant and another piece of the Internet of Things (IoT) puzzle.

Some of the MIT Media Lab projects that particularly stood out to me include:

Smarter Objects

Smarter objects (also by Valentin Heun, Shunichi Kasahara, and Pattie Maes is a pioneering IoT project. I saw an early iteration of this, called elecTron.


Smarter Objects explore these notable parts of the IoT:

  • detecting object identification codes with a single ordinary camera
  • detecting object position and orientation with a single ordinary camera
  • overlaying digital interfaces on and digitally controlling real-world objects

Valentin laser-cut custom patterns of triangles that act both as unique object (or object class) identifiers, and also provide enough information to reconstruct the geometric orientation of the object. Viewed through an ordinary iPad, he then overlayed an interface for controlling speakers (pictured here), changing the station or volume or dragging sound from one speaker to another. The computations required run in near real-time using the Vuforia framework. See their paper for additional information.

Technology similar to SmarterObjects is now available from Tangible Play in their first product, Osmo. Osmo attaches a small mirror and stand to a standard iPad, and then runs computer vision algorithms on what it sees. Initially, they’ve launched three applications: One to handle drawings, one to handle tangram arrangements, and one to play a word game. By using known objects and keeping the camera angle somewhat fixed, they’re able to recognize the objects with enough accuracy to make a fun, kid-friendly experience.

I expect this is just the very beginning for this kind of technology. Over the next few years, I hope we’ll have more and more integration of the physical and digital spaces.


FlexPad by Jürgen Steimle, Andreas Jordt, and Pattie Maes, uses a ceiling-mounted Kinect and projector (and an ordinary PC to drive those and perform the necessary computations) to track and project information onto a deformed surface, such as a sheet of paper or foam. Note that such surfaces are essentially featureless. See the paper for details and usability test results.


It’s amazing to me that commercial hardware available in 2012 was already capable of this kind of interaction. Imagine what will be possible in a few years, as the sensors and projectors and software improve! For example: You could put a PrimeSense or ToF sensor into a wearable headset, and use a small form-factor display (such as MicroVision projector, Google Glass, or Oculus Rift). With this configuration, any surface you look at (such as a blank piece of paper held in your hands) could display any information desired — search, news, books, charts, games, etc. — and respond to your touch. The potential applications are nearly limitless.

New interfaces like these will radically alter society and business. This is not hyperbole; it’s already started to happen. Thomas Caudell coined the term “Augmented Reality” and deployed the HUDset system at Boeing in 1992, which enabled one electrical worker to lay wires in an airplane in half the time it previously took two workers (one to hold all the manuals) to do the same job. The HUDset overlayed the worker’s field of view with a layout of where the wires needed to go. The scenarios I described above could enable similar applications such as assisting a surgeon by projecting onto a patient’s body, a craftsman laying tile or carving wood, or an information worker interacting with a sheet of paper.

Bar of Soap

Bar of Soap (video, data) is a 2009 project by the Object-Based Media group led by Michael Bove. Michael is a brilliant wizard whose magic repertoire includes flinging light from one lamp to another across the room with a flick of his hand.

Bar of Soap has 48 capacitative sensors, a 3-axis accelerometer, and a Bluetooth transmitter (and battery) in a block. These sensors generate a 51-dimensional vector about 3 times per second. They labelled this data with how a person was using the block (e.g., as a phone, as a remote, as a bar of soap, etc.) and then used supervised Machine Learning to create a classifier. Whenever a user picks up the Bar of Soap and uses it in a way previously learned, they can recognize that with ~95% accuracy.


Similar technology is now available to consumers in the new Myo armband by Thalmic Labs. Myo contains custom EMG sensors and a 9-axis inertial measurement unit (IMU). Thalmic Labs has trained on the data from this device to be able to distinguish among various hand gestures in free space, such as holding up a finger, or making a fist, or pointing in a direction. Thalmic has basically built the “mouse” for the Internet of Things. Imagine wearing a Myo, pointing at your television, and saying “On” (or maybe, “Ok Glass, on” :) ).

Bar of Soap and Myo are just the beginning for this kind of technology. With Deep Learning, one could collect large, unlabeled datasets from multiple sensors (in an environment or in a device), train on that data, and then use a smaller, labeled dataset to identify which neurons have learned which behavior labels. The first training step could be done by the manufacturer, and the second could be done by the manufacturer or the customer to enable individualized accuracy and dynamic labels. For example, you could potentially:

  • implement this in any mobile phone to know when it’s being used as a phone, camera, remote, etc.
  • add sensors in a car cabin and also use data from its controls (steering, pedals, etc) to detect driver impairment (falling asleep, DUI, distracted)
  • train on security camera data and build a system to detect shoplifting or other criminal behavior
  • train on a person’s game console inputs and map those to in-game actions

The possibilities are limitless.


I’ve barely scratched the surface of the amazing work happening at the MIT Media Lab. If you have the opportunity to visit, you should (and if your employer is a sponsor, then you have royalty-free licensing rights to their work).