Structure of the Learning Track
The general structure of the learning track will be as follows.
Competitors will submit two programs to the organizers before the
competition begins: a learner and a planner. The competition will then
be run in two stages. First, there will be a learning stage, where the
learning programs will be provided with the domain definition and
example problems for each domain that will appear in the competition.
For each domain, the learning program will be given a certain amount of
learning time, after which it must output a domain-specific
control-knowledge file. Second, there will be an evaluation stage,
where, for each domain, the planner will be provided with the
appropriate domain-specific control-knowledge file and asked to solve a
sequence of test problems from that domain.
The organizers are not placing any constraints on what
style of learning approach might be used. For example, a system might
utilize statistical/inductive learning or purely deductive learning
techniques. In addition, the learning track provides a good venue for
entering approaches that might not traditionally be viewed as learning,
such as pure domain-analysis. For example, domain analysis could be
conducted during the learning period and the resulting knowledge used
during the evaluation period. Ultimately, we hope to see a wide variety
of approaches, that will help answer the following questions. How can a
planner best use a learning, or domain analysis, period in order to
improve future performance?
Evaluation Schema
The learning track will have two distinct phases: a learning phase and
an evaluation phase. These phases will involve planning problems drawn
from two distinct distributions: the target distribution and the
bootstrap distribution.
Below these distributions are described followed by a description of
the learning and evaluation phases.
This schema is not yet finalized and we welcome feedback.
Problem Distributions:
For each planning domain there will be two distinct
distributions over problem instances: the target distribution and the
bootstrap distribution. The ultimate goal of the competiton is to learn
knowledge that allows a planner to perform well on problems drawn from
the target distribution. The target distributions will be designed so
as to generate problems that are difficult for state-of-the-art
non-learning planners to solve within the evaluation timeframe. The
bootstrap distribution will generate significantly easier problems, in
that they can be solved by a number of state-of-the-art planners in a
reasonable amount of time. This distribution will be used to generate
problems for the learning phase of the competition, with the idea that
they will be more tractable to solve and learn from. It is difficult in
general to specify an exact relationship between the target and
bootstrap distributions. However, informally, the organizers will scale
the number of objects involved in the planning problems to move from
the bootstrap to target distributions, but keep other problem
characteristics the same. Since the ultimate goal is to do well on the
target distribution, we will plan to provide the learners with a set of
problems from both the bootstrap distribution and target distribution
during the learning phase. The learners are free to use problems from
either or both distributions. Naturally the set of target problems used
in the actual evaluation will not be made available to the learners
during the learning phase.
Learning Phase:
- The learning phase will begin after the participants
deliver the final version of their code to the organizers. At this
point the participants must freeze their code. The tentative date for
this is June 2, 2008.
- After the code freeze the organizers will distribute
the set of competition domains. Along with each domain will be a set of
30 problems drawn from the bootstrap distribution and a set of 30
problems drawn from the target distribution, which will constitute the
training set for the learning phase. The bootstrap and target problems
will be in separate directories. The participants may choose to use
either or both problem sets, or choose to not use any example problems
(in the case of domain analysis).
- After the domains and training problems are
distributed each participant will run their learning algorithm for each
domain to produce a "domain-specific knowledge file" for each one. The
knowledge files will then be sent to the organizers. The timeframe for
running the learning algorithms remains to be determined, but we expect
to provide participants with at least a week.
- The participants must run the same learner that was
submitted during the code freeze. The organizers will randomly select
domains in which to run the learning algorithms locally to ensure that
the frozen learner produces the same knowledge as submitted by
participants.
Evaluation Phase:
- The organizers will conduct the evaluation phase on
their local machines. The planners will be evaluated in each domain
while being given access to the appropriate learned knowledge file. The
evaluation will be conducted on a set of problems drawn from the target
distribution. The number of problems has not yet been determined. Also,
if time permits, planners that can run without learned knowledge files
will be evaluated without the knowledge on the same problem set. The
no-knowledge evaluation will help provide insight into the impact that
learning had for each participant. The winners, however, will be
determined based only on the results with the learned knowledge files.
- The amount of time that each planner will be given to
solve each problem remains to be determined and depends on the final
number of systems participanting. The organizers will record both the
time required to solve each problem and the solution quality.
Scoring:
- Two winners will be crowned: one based on a
planning-time metric and one based on a plan-quality metric.
- Planning Time Metric:
- For a given problem let T* be the minimum time
required by any planner to solve the problem. (When no planner solves
the problem then we ignore it for evaluation.)
- A planner that solves the problem in time T will
get a score of T*/T for the problem. Those that do not solve the
problem get a score of 0.
- The planning time metric for a planner is simply
the sum of scores received over all evaluation problems.
- Plan Quality Metric:
- For a given problem let N* be the minimum number
of actions in any solution returned by a participant. (When no planner
solves the problem then we ignore it for evaluation.)
- A planner that returns a solution with N actions
will get a score of N*/N for the problem. Those that do not solve the
problem get a score of 0.
- The plan quality metric for a planner is simply
the sum of scores received over all evaluation problems.
|