Representation Learning for Multisensory Perception and Planning