Our method consists of two major modules, mask propagation module and re-identification module. Our mask propagation module is built based on flownet2, while taking multiple previous frames and its corresponding masks as guidance to train a flow-appearance propagation network for masks propagation between adjacent frames. For re-identification module, we train a network to generate a 128-dim feature for each position, following the paper "Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning". For any two foreground objects, we calculate the number of pixels whose distances to the foreground object in the first image is smaller than that of the background pixels. This indicates similarity degree of two objects. In the inference process, the mask propagated from the previous frame and masks generated by the mask-rcnn are treated as the candidate masks. Then the flownet2 and re-identification modules are used to calculate the appearance and motion consistency for each mask. Finally, the masks with highest score is set to the prediction of current frame. No online training is used in our method. Our method achieve 69.7% in validation set and 69.9% in test set.