Could you provide the video-level inference code?
Could you provide the video-level inference code?