Video object segmentation has been studied extensively in the past decade due to its importance in understanding video spatial-temporal structures as well as its value in industrial applications. Recently, data-driven algorithms (e.g. deep learning) have become the dominant approach to computer vision problems and one of the most important keys to their successes is the availability of large-scale datasets. Unfortunately, there does not exist a large-scale dataset for video object segmentation. As a result, recent algorithms for video object segmentation have to train their models on image datasets which do not contain any temporal information or video-specific characteristics, and thus would settle for suboptimal solutions. Besides, previous benchmark does not differentiate the object categories used in training and testing, and therefore cannot evaluate the true generalization ability of algorithms on unseen categories.

In this workshop in conjunction with a competition, we will present the first large-scale dataset for video object segmentation, which would allow participant teams to try novel and bold ideas that could not succeed with previous small-scale datasets. In contrast to previous video-object-segmentation datasets, our dataset has the following advantages:

  • Our dataset contains 4000+ high-resolution videos clips, which are downloaded from YouTube and contain diverse contents. It is more than 30 times larger than the existing largest dataset (i.e. DAVIS) for video object segmentation.
  • Our dataset consists of a total of 94 object categories which cover common objects such as animals, vehicles, accessories and persons in different activities.
  • The videos in our dataset are taken by both amateurs and professionals. Therefore, in addition to various object motion, there is frequently significant camera motion.
  • Our segmentation masks are carefully labeled by human annotators to ensure high quality

We expect that our new dataset shall bring new possibilities of generating novel ideas for dense-prediction video tasks as well as providing a more comprehensive evaluation methodologies for video segmentation technology.

Our workshop is co-located with another video segmentation workshop “The Third International Workshop on Video Segmentation”. The other workshop will discuss on a wide range of topics in video segmentation. Welcome to attend their workshop as well in the morning session.



  • Sep 14th: The workshop begins.
  • Aug 27th: The final competition results will be announced and high-performance teams will be invited.
  • Aug 10th-25th: Release the test dataset and open the submission of the test results.
  • Jul 18th: Setup the submission server on CodaLab and open the submission of the validation results.
  • Jun 18th: Release the training and validation dataset.


Ning Xu Linjie Yang Yuchen Fan Brian Price Weiyao Lin
Ning Xu
Adobe Research
Linjie Yang
Snap Research
Yuchen Fan
Brian Price
Adobe Research
Weiyao Lin
Michael Ying Yang Jianchao Yang Jiebo Luo Thomas S. Huang  
Michael Ying Yang
U of Twente
Jianchao Yang
ByteDance AI Lab
Jiebo Luo
U of Rochester
Thomas S. Huang


Snap Adobe UIUC Chinese Academy of Sciences
Snap Adobe UIUC Chinese Academy of Sciences