IOI 2010 - Home

November 25, 2010

IOI Photos have been posted

August 15, 2010

Posted first Newsletter

July 29, 2010

IOI Program

July 29, 2010

Schedule updated including information about new Evening Lectures.

July 5, 2010

Rules, competition environment and sample tasks added to site.

May 27, 2010

Registration activated and many ioi2010.org web updates.

More News

Task Information for Language

Task Author: Gordon Cormack (CAN)

The nature of this problem is innovative within the IOI. Its purpose is to bring the field of information retrieval under the attention. This problem is discussed in detail in the book Information Retrieval: Implementing and Evaluating Search Engines by S. Büttcher, C.L.A. Clarke, and G.V. Cormack (MIT Press, to appear soon). Especially see Chapter 10 on Categorization and Filtering.

One important observation is that excerpts from the same language version of Wikipedia will share some characteristics in a statistical sense. Because many random excerpts are offered, the variability between excerpts from the same language play a negligible role. It has been confirmed that the statistical resemblance between the provided test input and the official grader input is highly predictable.

Note that because of the random re-coding of the language codes and symbol codes, there is no opportunity to hard code any specific (personal) language knowledge into a solution.

There are many approaches possible. Rocchio's method, which was informally described in the task description, suffices to solve Subtask 1.

For Subtask 2, one needs to do more than simply look at symbol frequencies. Collecting statistics on bigrams (pairs of neighboring symbols), trigrams (three consecutive symbols) will yield higher accuracies.