We investigate the performance of a simple online tracker when paired with a strong instance segmentation network. We chose the Hybrid Task Cascade (HTC) network as our model for instance segmentation because it achieves top results on COCO. The HTC network, which was trained on COCO, is finetuned for YTVIS by freezing the backbone network and only finetuning the FPN neck and the output heads. Additionally, we make use of multi-scale training and image flipping during training. Note that our method uses no test time tricks (ensemble, image flipping, multi-scale testing, etc.) during inference. Our simple online tracker first computes a matching score between the current frame’s detections and previous tracks; candidate detections are then greedily assigned to previous tracks. The matching score is computed based upon the IOU, category prediction, and detection score.