YouTube-VOS

A Large-Scale Benchmark for Video Object Segmentation

News

The 7th Large-scale Video Object Challenge in conjunction with ICCV 2025 is ongoing! Checkout
Due to the numerous requests from the community, we now have released the ground truth labels for validation sets of VOS2019, VIS[2019, 2021, 2022] in the corresponding codalab download links!
Due to maintainance issues of the old Codalab website, we have migrated the VOS and VIS evaluation servers of the 2019 challenge to the new codalab site. Details

What is YouTube-VOS

YouTube-VOS is the first large-scale benchmark that supports multiple video object segmentation tasks.

Semi-supervised Video Object Segmentation
Video Instance Segmentation
Referring Video Object Segmentation

It also has the following features.

5000+ high-resolution YouTube videos
90+ semantic categories
7800+ unique objects
190k+ high-quality manual annotations
340+ minutes duration

Research papers

Please cite the following papers if you find our dataset is useful.

Semi-supervised video object segmentation

Video instance segmentation

Video Instance Segmentation. ICCV 2019

Referring Video Object Segmentation

URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark. ECCV 2020

Dataset examples