2.1. API Reference for Participants

Note

Please refer the guide before reading this.

We provide a basic API for participants of TREC OpenSearch to perform several actions such as obtaining a key, queries, documents and feedback. The API can also be used to update runs. Everything is implemented as HTTP request, and we use the request types GET, HEAD and PUT. We try to throw appropriate 4XX errors where possible. Furthermore, the content the API returns when a error is thrown should help locate the issue. Please let us know when error messages are not helpful and need clarification.

For all operations, an API key is required. This key is supplied as username via HTTP basic authentication. The password should be left empty. Also, we require you to sign an agreement. Details on that process will be shared once you sign up.

Note that participants are free to implement their own client to communicate with this API. An example client is provided in the participant guide under Implement a Client.

Our API is located at http://api.trec-open-search.org/api/v2 .

Note

We have rate limited the API to 300 calls per minute or 10 calls per second, whichever hits first. Please do let us know if this is causing you any problems.

Note

We may sometimes restart our API. You may notice this because the API is down for a few seconds (up to a few minutes). Please implement your client in such a way that this will not cause problems (i.e., add a retry loop with a small sleep to all the API calls).

2.1.1. Query

From each site that a participant signed up for (see the sites page for your user account on the dashboard), a sample of (N=100) queries is made available. This endpoint allows for downloading these queries.

After the train phase, new queries (and doclists) will be made available.

Note

We kindly ask you to not enter any of the provided queries into the search engines for testing purposes (unless, of course you have an actual information need that translates in any of the queries). As we are not aware of your the IP addresses you may use for these request, we have no means of filtering such queries out. In particular, for the smaller engines such test issues of queries might severely impact the usefulness of our challenge. We will, however, monitor for strange behavior.

GET /api/v2/participant/query

Obtain the query set for all sites that you have agreed too. If you update the sites you agree too through the dashboard, then the query set will reflect this.

Each query is marked with its type. A query can be a train, test or eval query. Eval queries are supposed to not be evaluated online. So, participants will (should) not expect any feedback for them. The default query type is “train”.

Status Codes:
Return:
{
    "queries": [
        {
            "creation_time": "Mon, 10 Nov 2014 17:42:24 -0000",
            "qid": "S-q1",
            "qstr": "jaguar",
            "type": "train"
        },
        {
            "creation_time": "Mon, 10 Nov 2014 17:42:24 -0000",
            "qid": "S-q2",
            "qstr": "apple",
            "type": "test"
        }
    ]
}

2.1.2. Doclist

For each query, there is a fixed set of documents available. These documents are selected by the site. And this selection may change over time. Therefore, participants should update the doclist for a query on a regular (daily?) basis.

For some use cases, the doclist will contain relevance signals (also referred to as features, or ranking signals). These are always sparse representations, missing values can be assumed to be zeros. The relevance signals can be query only, document only, or query document dependent. For these uses cases, the actual query and document content are generally not provided. The use cases that do not have relevance signals, will need to provide query and document content.

GET /api/v2/participant/doclist/(qid)

Retrieve the document list for a query.

This doclist defines the set documents that are returnable for a query.

Note

This document list may change over time.

Parameters:
  • qid – the query identifier
Status Codes:
Return:
{
    "qid": "S-q22",
    "doclist": [
        {"docid": "S-d3" },
        {"docid": "S-d5"},
        {"docid": "S-d10"},
        ...
            ]
}

For use cases with relevance signals, the returned data looks like this:

Return:
{
    "qid": "S-q22",
    "doclist": [
        {   "docid": "S-d3",
            "relevance_signals": [[1,.6], [4,.83]]},
        {   "docid": "S-d5",
            "relevance_signals": [[3..45], [4,.83]]},
        {   "docid": "S-d10",
            "relevance_signals": [[1,.1], [4,.25]]},
        ...
            ]
}

2.1.3. Doc

When a use case does not define relevance signals for each query document pair then this is where the content of documents is made available.

GET /api/v2/participant/doc/(docid)

Retrieve a single document.

Note

Note that documents may change over time, currently reflected by a changing creation_time (documents are currently overwritten when they change, hence the changing creation time). Documents can even be deleted, requesting a deleted document results in a 404.

Parameters:
  • docid – the document identifier
Status Codes:
Return:
{
     "content": {"description": "Lorem ipsum dolor sit amet",
                 "short_description" : "Lorem ipsum",
                 ...}
     "creation_time": "Sun, 27 Apr 2014 23:40:29 -0000",
     "docid": "S-d1",
     "title": "Document Title",
     "site_id": "S"
}
GET /api/v2/participant/docs

Retrieve all documents.

Note

Note that documents may change over time, currently reflected by a changing creation_time (documents are currently overwritten when they change, hence the changing creation time).

Status Codes:
Return:
{"docs": [
    {
     "content": {"description": "Lorem ipsum dolor sit amet",
                 "short_description" : "Lorem ipsum",
                 ...}
     "creation_time": "Sun, 27 Apr 2014 23:40:29 -0000",
     "docid": "S-d1",
     "title": "Document Title",
     "site_id": "S"
    }, ...]
}

2.1.4. Run

Runs (TREC terminology) are just rankings of document ids as shown to actual users. Participants can keep updating their runs. They also have the option of updating an identifier for the run. This identifier is then used in the feedback that is returned.

PUT /api/v2/participant/run/(qid)

Submit a run (ranking) for a specific query. Note that the runid is used to identify queries within a specific run. It could be any string, you may want to use the version of your ranker or your team name. Submitting a run for a query with the same runid twice, will overwrite and save only the most recent one. You can view and activate your runs in the dashboard after submitting them via the API. Activating a run is required before your runs show up on the website, so do not forget to do that.

For test queries, a run can only be uploaded outside of test periods. An exception to this rule is if you have never uploaded a run for a test query. Then, it can be uploaded once during a test period. This is to allow participants to join at any moment. See test periods here: http://living-labs.net/challenge/.

Parameters:
  • qid – the query identifier
Status Codes:
Request Headers:
 
Content:
{
    "qid": "U-q22",
    "runid": "82",
    "doclist": [
        {
            "docid": "U-d4"
        },
        {
            "docid": "U-d2"
        }, ...
    ],
}

2.1.5. Feedback

DELETE /api/v2/participant/feedback/(qid)/(runid)
DELETE /api/v2/participant/feedback/(qid)

Remove feedback for a query. Only your own feedback will be removed.

Parameters:
  • qid – the query identifier
Status Codes:
GET /api/v2/participant/feedback/(qid)/(runid)
GET /api/v2/participant/feedback/(qid)

Obtain feedback for a query. Only feedback for runs you submitted will be returned. So, first submit a run, wait a while to give a user the chance to enter the query for which you submitted the run. Then, wait even longer to given the site the change to feed the click back into our API. As soon as all this happens, the feedback will become available here.

You may specify “all” as the query identifier to obtain feedback for all queries.

Note that you may receive multiple feedbacks for a single query as it may have been shown to a user more than once. And even if you specify a runid, then the rankings for this runid may have been presented to users multiple times.

Clicks can be either just present (True) or timestamped (list of timestamps).

Feedback is never given for test queries.

Parameters:
  • qid – the query identifier, can be “all”
  • runidoptional, the runid
Status Codes:
Return:
{
    "feedback": [
        {"qid": "S-q1",
         "runid": "baseline",
         "modified_time": "Sun, 27 Apr 2014 13:46:00 -0000",
         "doclist": [
             {"docid": "S-d1"
             "clicked": True},
             {"docid": "S-d2"},
             ...
         ]},
         ...
}

In case Team Draft Interleaving was performed at the site, this is encoded as follows.

Content:
{
"feedback":    [
    {
    "qid": "S-q1",
    "runid": "baseline",
    "type": "tdi",
    "doclist": [
        {
            "docid": "S-d1",
            "clicked": true,
            "team": "site",
        },
        {
            "docid": "S-d4",
            "clicked": true,
            "team": "participant",
        },
        ]
    }, ...]
}

2.1.6. Outcome

GET /api/v2/participant/outcome/(qid)
GET /api/v2/participant/outcome

Obtain outcome (optionally for a query).

You may omit or specify “all” as the query identifier to obtain feedback for all queries.

Outcome will be aggregate per site and for test and train queries. If specify a query, you will only obtain the outcome for this query. Otherwise outcomes are aggregated over all queries.

The “outcome” is computed as: #wins / (#wins + #losses). Where a win is defined as the participant having more clicks on documents assigned to it by Team Draft Interleaving than clicks on documents assigned to the site.

Outcome for test queries is restricted to the test period. Train queries are not restricted in time. See http://living-labs.net/challenge/.

Parameters:
  • qidoptional, the query identifier, can be “all”
Status Codes:
Return:
{
    "outcomes": [
        {
            "type": "test",
            "test_period": {
                "start": "Fri, 01 May 2015 00:00:00 -0000"
                "end": "Sat, 16 May 2015 00:00:00 -0000",
                "name": "CLEF LL4IR Round #1",
            },
            "impressions": 10,
            "losses": 3,
            "ties": 5,
            "wins": 2
            "outcome": "0.4",
        },
        {
            "qid": "all",
            "site_id": "S",
            "type": "train",
            "test_period": null,
            "impressions": 10,
            "losses": 3,
            "ties": 5,
            "wins": 2
            "outcome": "0.4",
        },
        ...
    ]
}

2.1.7. Historical Feedback

GET /api/v2/participant/historical/(qid)

Obtain historical feedback for a query. Historical feedback is always in the form of aggregated CTR.

You may specify “all” as the query identifier to obtain feedback for all queries.

Parameters:
  • qid – the query identifier, can be “all”
Status Codes:
Return:
{
    "feedback": [
        {"qid": "S-q1",
         "modified_time": "Sun, 27 Apr 2014 13:46:00 -0000",
         "type": "ctr",
         "doclist": [
             {"docid": "S-d1"
             "clicked": 0.6},
             {"docid": "S-d2"},
             ...
         ]},
         ...
}