YouCook2 Dataset Home Explore Download Leaderboard


Submit your results on codalab (captioning) and codalab (grounding)!

Evaluation Metric

We use the same evaluation code as in here. We evaluate the model on both localizing and describing events. The metric first finds the proposals that have tIoU overlapping with arbitrary GT segment higher than a threshold (in our case 0.3, 0.5, 0.7, and 0.9). Then it measures the caption quality against GT caption (e.g., BLEU, METEOR). Proposals without significant overlappings will have 0 language scores. Up to 1000 proposals are considered. More details please refer to the original paper.

Submission Format

Please create a with only a single file named result.json (the name has to be exact).

An example on JSON format for the captioning task:

  "version": "VERSION 1.0",
  "results": {
    "v_xHr8X2Wpmno": [
        "sentence": "add some salt and pepper to the bowl and mix",
        "timestamp": [74.59, 109.14]
  "external_data": {
    "used": true,
    "details": "First fully-connected layer from VGG-16 pre-trained on ILSVRC-2012 training set",

An example on JSON format for the grounding task:

  "database": {
	"recipe_type": "201",
	"segments": { 
		"0": {
		    "objects": [{
			"label": "chicken", 
			"boxes": [{
			   "occluded": 0, 
			   "outside": 0, 
			   "xbr": 326, 
			   "ybr": 288, 
			   "xtl": 38,
			   "ytl": 19}]