Competition Details - 2008

Structure of the Learning Track


The general structure of the learning track will be as follows. Competitors will submit two programs to the organizers before the competition begins: a learner and a planner. The competition will then be run in two stages. First, there will be a learning stage, where the learning programs will be provided with the domain definition and example problems for each domain that will appear in the competition. For each domain, the learning program will be given a certain amount of learning time, after which it must output a domain-specific control-knowledge file. Second, there will be an evaluation stage, where, for each domain, the planner will be provided with the appropriate domain-specific control-knowledge file and asked to solve a sequence of test problems from that domain.

The organizers are not placing any constraints on what style of learning approach might be used. For example, a system might utilize statistical/inductive learning or purely deductive learning techniques. In addition, the learning track provides a good venue for entering approaches that might not traditionally be viewed as learning, such as pure domain-analysis. For example, domain analysis could be conducted during the learning period and the resulting knowledge used during the evaluation period. Ultimately, we hope to see a wide variety of approaches, that will help answer the following questions. How can a planner best use a learning, or domain analysis, period in order to improve future performance?

Evaluation Schema


The learning track will have two distinct phases: a learning phase and an evaluation phase. These phases will involve planning problems drawn from two distinct distributions: the target distribution and the bootstrap distribution.
Below these distributions are described followed by a description of the learning and evaluation phases.
This schema is not yet finalized and we welcome feedback.

Problem Distributions:

For each planning domain there will be two distinct distributions over problem instances: the target distribution and the bootstrap distribution. The ultimate goal of the competiton is to learn knowledge that allows a planner to perform well on problems drawn from the target distribution. The target distributions will be designed so as to generate problems that are difficult for state-of-the-art non-learning planners to solve within the evaluation timeframe. The bootstrap distribution will generate significantly easier problems, in that they can be solved by a number of state-of-the-art planners in a reasonable amount of time. This distribution will be used to generate problems for the learning phase of the competition, with the idea that they will be more tractable to solve and learn from. It is difficult in general to specify an exact relationship between the target and bootstrap distributions. However, informally, the organizers will scale the number of objects involved in the planning problems to move from the bootstrap to target distributions, but keep other problem characteristics the same. Since the ultimate goal is to do well on the target distribution, we will plan to provide the learners with a set of problems from both the bootstrap distribution and target distribution during the learning phase. The learners are free to use problems from either or both distributions. Naturally the set of target problems used in the actual evaluation will not be made available to the learners during the learning phase.

Learning Phase:

  • The learning phase will begin after the participants deliver the final version of their code to the organizers. At this point the participants must freeze their code. The tentative date for this is June 2, 2008.
  • After the code freeze the organizers will distribute the set of competition domains. Along with each domain will be a set of 30 problems drawn from the bootstrap distribution and a set of 30 problems drawn from the target distribution, which will constitute the training set for the learning phase. The bootstrap and target problems will be in separate directories. The participants may choose to use either or both problem sets, or choose to not use any example problems (in the case of domain analysis).
  • After the domains and training problems are distributed each participant will run their learning algorithm for each domain to produce a "domain-specific knowledge file" for each one. The knowledge files will then be sent to the organizers. The timeframe for running the learning algorithms remains to be determined, but we expect to provide participants with at least a week.
  • The participants must run the same learner that was submitted during the code freeze. The organizers will randomly select domains in which to run the learning algorithms locally to ensure that the frozen learner produces the same knowledge as submitted by participants.

Evaluation Phase:

  • The organizers will conduct the evaluation phase on their local machines. The planners will be evaluated in each domain while being given access to the appropriate learned knowledge file. The evaluation will be conducted on a set of problems drawn from the target distribution. The number of problems has not yet been determined. Also, if time permits, planners that can run without learned knowledge files will be evaluated without the knowledge on the same problem set. The no-knowledge evaluation will help provide insight into the impact that learning had for each participant. The winners, however, will be determined based only on the results with the learned knowledge files.
  • The amount of time that each planner will be given to solve each problem remains to be determined and depends on the final number of systems participanting. The organizers will record both the time required to solve each problem and the solution quality.

Scoring:

  • Two winners will be crowned: one based on a planning-time metric and one based on a plan-quality metric.
  • Planning Time Metric:
    • For a given problem let T* be the minimum time required by any planner to solve the problem. (When no planner solves the problem then we ignore it for evaluation.)
    • A planner that solves the problem in time T will get a score of T*/T for the problem. Those that do not solve the problem get a score of 0.
    • The planning time metric for a planner is simply the sum of scores received over all evaluation problems.
  • Plan Quality Metric:
    • For a given problem let N* be the minimum number of actions in any solution returned by a participant. (When no planner solves the problem then we ignore it for evaluation.)
    • A planner that returns a solution with N actions will get a score of N*/N for the problem. Those that do not solve the problem get a score of 0.
    • The plan quality metric for a planner is simply the sum of scores received over all evaluation problems.

 

Previous Competitions

IPC5

IPC4

IPC3

IPC2

IPC1