Summary
|
This project aims at aggregating spatial information with a camera-equipped walking
robot in real time. We want to develop a system that creates a 3-D representation from
moving camera images and in turn renders the observed scene from the perspective of a
virtual camera following the robot. In the computer games community, this is called a
“third person perspective”. We call it the "guardian angel perspective" because
it allows a remote operator to supervise and control the robot much more comfortably
than one could directly from the camera images. Additionally, we want to provide a
survey perspective and thus aim for aggregating a consistent global map.
One major application of such a system is urban search and rescue (USAR) where
a teleoperated robot searches for entombed victims in the remains of a collapsed building.
Such an environment is very confusing for the operator, especially since the cameras
of the robot often shake heavily and the cameras field of view mostly does not cover the
immediate surroundings of the robot. In such a situation a real-time 3D model that can
be viewed from arbitrary perspectives would be a great help for the operator.
On a methodical level, the main challenge is the fact that the images are shaking
heavily, the motion is perturbed by the robot (tripping and tumbling), and obstacles are
often very close to the camera so the images change rapidly while the robot moves. As
a result, it is very difficult to perform data association, i.e. to determine which visual
observations correspond to the same feature.
The first goal is to design and implement a statistical Markov Chain Monte Carlo
(MCMC) framework for visual SLAM that can defer data association decisions until
enough perceptual evidence is available and that can undo decisions that later turn out to
result in inconsistencies (lazy data association). Specifically, it should handle situations
with more possible data-association hypotheses than what could be handled by tracking
them all (proactive data association).
The second goal is a representation for texturized 3-D data that can be rendered in
real-time using established 3-D graphics tools and that can be updated in real-time by
the visual SLAM algorithm. This update operation is difficult to realize, because a visual
SLAM algorithm may change estimates for past camera poses later on, especially when
closing a loop. So it is impossible to combine all images in 3D according to the current
camera pose estimates. The other extreme would be to reprocess the whole sequence of
images every time and is also impossible. So the second algorithmic goal is to devise
a representation that allows a medium course between both extremes. It will combine
local groups of images but still retaining the possibility to change the relative position of
these groups.
|