A combined approach of meta-learning and mask propagation is used for the competition. Basically, the given target objects of a video are tracked by a fine-tuned neural network with given first frame annotation. Yet, the finetuning process is fast and effective because we used a meta-trained network using REPTILE meta-learning framework. Afterward, the output mask from the finetuned network is enhanced with another neural network in which its input is a channel-wise concatenation of (the previous frame, the previous frame prediction, current frame, current frame prediction by the finetuned network). For both of the network, we used a PSP-network pretrained for VOC 2012. The given frames of videos are preprocessed with various operations (crop, reflection, etc..) for training and frames are resized to 600x1200.