dc.description.abstract | Obstacle avoidance is one of the key functionalities necessary for the proper functioning of an
autonomous mobile robot. Smart & Kaelbling (2002) showed that the time required to learn this task
using reinforcement learning can be unfeasibly long even for simplified versions of the task.
Knowledge transfer and experience replay are two techniques that have been suggested to speed up
reinforcement learning. However, their application to the obstacle avoidance task is very limited. For
instance Lin(1991) applied experience replay to an obstacle avoidance task in which the robot
environment was bounded by a wall and therefore, the robot control agent needed to learn how to
prevent the robot from colliding with the wall. Smart & Kaelbling (2002) applied a form of knowledge
transfer known as teaching to the obstacle avoidance task, but there was only one obstacle in the
environment.
Policy reuse and policy transfer are two knowledge transfer techniques that have been shown to
improve learning performance when applied to the Keepaway sub problem of robotic soccer (Taylor
& Stone, 2009; Fernández et al., 2010). These techniques can be used when there is the possibility of
learning a simpler version of the intended task, then reusing the knowledge acquired in the simpler
version of the task to bootstrap learning in the intended task. The simpler version of the task is called
the source task while the intended task is called the target task. The obstacle avoidance task can be
structured in this way. One way to achieve this is by creating an obstacle avoidance task containing
fewer actions than the intended target task.
In this study, we investigated how policy transfer, policy reuse and experience replay can be
combined to speedup learning in an obstacle avoidance task. We investigated the performance when
the techniques were used in isolation and when they were used together. The experiments testing the
performance of the techniques were setup in a robotics simulation environment. We used two sets of
source and target task pairs. The first set comprised of two actions in the source task and three actions
in the target task, while the second set comprised of three actions in the source task and six actions in
the target task. The performance was measured by the average number of times the robot was able to
reach the goal position under the guidance of the learned policy. The performance was calculated
both at the initial level and at the asymptotic level.
In this study, our findings are that in the first pair of tasks, policy transfer outperformed policy reuse
in as far as speeding up learning in the target task was concerned by about 30% at the initial level and
about 10% at the asymptotic level. The combination of policy transfer with policy reuse was not found
to lead to any significant improvement in the first pair of tasks, but it was found to lead to significant
improvement in the second pair of tasks. The improved performance could be attributed to the fact
that the six actions task is more difficult, hence it benefits more from bootstrapping. The combined
techniques outperform policy reuse alone by 10% points in the six actions task based on the number
of episodes that end at the goal region. The combination also overcomes the problem of declining
performance observed in policy transfer from the three actions task to the six actions task.
It was also found that while experience replay led to significant improvement in learning the source
task, it did not offer any improvement in learning when it was combined with knowledge transfer in
the target task. In fact, it was seen to lead to degraded performance in most cases. For instance when
experience replay was combined with policy transfer in the six actions task, the initial performance
was 89% while the asymptotic performance was 59%. When policy reuse was combined with
experience replay, the initial performance was 68% while the final performance was 59%. There was
however some initial improvement when policy reuse was combined with experience replay, but this
improvement was not sustained for long.
The main contribution of this work was a reinforcement learning framework that combines
experience replay, policy reuse and policy transfer in learning the obstacle voidance task. We have
also shown when each of these techniques can be most useful when applied to improve learning
performance in reinforcement learning. These results will promote greater adoption and acceptance
of the reinforcement learning techniques in the process of developing autonomous mobile robots. | en_US |