The basic task is: - to receive a video track and an audio track - to insert a certain length of black at the start of the video track - if the audio is longer, to freeze the last frame of the video track until the audio has completed - to insert content into the audio track - to merge the audio track into the video track There are a couple of complications: - sometimes we receive just a video track, in which case we have to split off the audio Stream and use that - the content is ultrasonic, so it's important this information is not lost (I believe we can use .aac with .mp4)