Marrying video and audio from two sources

wbtczn wrote on 11/24/2012, 8:14 PM
I did a video recording of our church choir and want to marry up the .mp3 track that was recorded via our sound system instead of using what the camera recorded. It takes some nudging of the the audio track, but I get the two synced up to start with. At that point I am looking at a close-up of one of the soloists.

A few seconds later a second soloist joins and as they sing I can tell that the audio and video are disjointed again. If I align them here, that's just going to screw up where I started the alignment.

Is this something based upon how the two recordings were made? Is there a way I can get them to sync?

Here's some tech details:

VIDEO
General
Type: MPEG-2 Transport Stream
Streams
Video: 00:05:55:.856, 29.970 fps interlaced, 1920x1080x12, AVC
Audio 1: 00:05:55.856, 48,000 Hz, 5.1 Surround (stereo downmix), Dolby AC-3
Audio 2: 00:05:55:856, 48,000 Hz, 5.1 Surround, Dolby AC-3
Plug-In
Name: compoundplug.dll
Format: MPEF-2 Transport Stream
Version: Version 12.0 (Build 394) 64-bit

AUDIO
General
Type: MP3 Audio
Size: 11.00 MB
Streams
Audio: 00:05:51.847, 256 Kbps, 44,100 Hz, Stereo, MPEG Layer-3
Plug-In
Name: mp3plug2.dll
Format: MP3 Audio
Version: Version 12.0 (Build 394)


Could it be something related to the original audio being recorded on the video at 48,000 Hz vs. the audio being 44,100 Hz?

Comments

videoITguy wrote on 11/24/2012, 8:22 PM
What were your recording devices?
Make and model of camera? does it capture timecode?
Make and model of MP3 device recording audio?
Do you have some other source recorded audio - like .wav format or direct CD recorder to disk?
musicvid10 wrote on 11/24/2012, 8:32 PM
I split the mp3 into ~ten minute chunks (very carefully in quiet spots, at zero crossings), lock the camera audio, then sync the chunks with Pluraleyes. Some mini-crossfades or gaps will occur, easily covered with snippets of camera audio.

By way of explanation, no two clocks will stay in sync unless slaved by a wire. Mp3 recorders are notorious for drift.

I saved the first twenty minutes of a major musical with a backup mp3 this way.

I don't like the results of stretching / squeezing the audio, esp. when there is music involved.
Ron Windeyer wrote on 11/24/2012, 8:40 PM
While I have never done this, I enjoy tinkering with stuff as people pose questions. (Hey, it's how I learn too!) I do notice that your audio track is nominally 4 seconds shorter that the video track, for reasons unknown. That and the fact that they are getting out of sync suggests that the audio track may need to be stretched.
This from the Vegas help file:

Time-stretching or pitch-shifting an audio event
The Resample and stretch quality setting on the Audio tab of the Project Properties dialog determines the quality of processing when time-stretching audio events. For more information, see "Setting Project Properties."


Editing in the Event Properties dialog
Right-click the event and choose Properties from the shortcut menu.

On the Audio Event tab, choose a setting from the Method drop-down list.

Setting
Description

None
Turns off time stretching and pitch shifting.

ėlastique
The élastique method uses technology from zplane.development, and provides enhanced real-time time stretching and pitch-shifting capabilities.

The élastique method also allows you to preserve and shift a clip's formants, which are the characteristic resonant frequencies of a sound.

Choose a setting from the Stretch attributes drop-down list to choose the stretching method best suited to your media:

Pro – Provides the highest quality stretching but requires more RAM usage and CPU power.

Efficient – Uses fewer resources while still producing great time-stretching quality for polyphonic audio.

Soloist (Monophonic) or Soloist (Speech) – Provide good quality for monophonic audio with little effect on system resources.

Type the desired event length in the New length box.

The event's original length is displayed for reference in the Original length box.

Type the desired pitch shift (in semitones) in the Pitch change box.

If you want to change the event length without changing pitch, type 0 in the box.

If you want the pitch to be determined by the new event tempo, select the Lock to stretch box. For example, doubling an event's tempo will raise its pitch by one octave.

If the élastique Pro mode is selected in the Stretch attributes drop-down list, you can type a value in the Formant shift box to adjust the event's formants.

This option is only available when the Preserve formants check box is selected.

Formant shifting can be used to deepen the tone of a vocal performance without changing the pitch.

The formant shift amount represents the number of semitones to shift the timbre in addition to the offset required to compensate for any pitch shifting. For example, a setting of 0.000 applies formant correction with no additional shifting, while a setting of -7.000 will apply formant correction and deepen a sound by 7 semitones.

As I understand it, you could select the audio track, R-click, Properties, choose elastique Pro as the timeshift method, with pitch shift at 0. Specify the new length to match the video clip. This should automagically lengthen the clip and things will be in sync.

Good luck!

musicvid10 wrote on 11/24/2012, 8:51 PM
As I implied, if one can accept the results of time stretching / squeezing, then great. Elastique' is good, but it's not magical. There will always be flange/echo artifacting, and Q-Noise, which I personally cannot tolerate. One needs to test such techniques on one's own material to be sure.
Ron Windeyer wrote on 11/24/2012, 9:29 PM
Good point. I guess another way would be to apply a velocity envelope to the video clip, and effectively speed it up (shorten it) by the required amount.

The only true test is whether it works for you.
wbtczn wrote on 11/24/2012, 9:37 PM
videolTguy:

The video was shot on a Panasonic HDC-TM900. I don't believe it captures timecode.

The recorder is a Marantz flash recorder. I'm not sure of the exact model number as it is mounted in a rack at the church and I am at home. In looking for something similar on the net, it appears to be a PMD 560.
wbtczn wrote on 11/24/2012, 9:39 PM
musicvid - the whole video is less than 6 minutes long. Although, I'm thinking a solution may be to cut the video at spots and fill in with stills or fades to cover the gaps.
riredale wrote on 11/24/2012, 9:41 PM
In my own experience I concluded it was NEVER a good idea to stretch or compress video, because the artifacts were obvious.

With Elastique, I can marry two recordings together very accurately. Just start with the two in sync at the beginning (perhaps a cough or a clap in the room--the sharper the better). Then just go down the timeline a ways, and find another good sound to sync on. If they don't match, split the lower track and stretch/compress using Elastique until they do. They butt the after portion of the lower track to the split mark, and continue on down the timeline.

You'll find that you can get very good at syncing after a short while. When you've reached the end of the timeline, go back to the beginning and look at a suitable sync point midway between your first sync points. If it doesn't precisely match, split the lower track there and then time stretch/compress both portions of the split track.

I have found that mechanical devices such as my Minidisc recorders can drift maybe one frame out of 5 minutes. My Sony cameras, however, maybe change one frame out of one hour. Regardless, with Elastique you have no issues, just a bit more work.

A few months back I took two entirely different performances of the Durufle Requiem (same choir, performances one month apart) and was able, eventually, turn them into one spectacular performance with a doubled choir. One performance had a terrific pipe organ, the other, a terrific ensemble. Together, a great recording.
wbtczn wrote on 11/24/2012, 9:42 PM
Ron - the reason they aren't the same length is that they were recorded by two different people/sources and I haven't trimmed them to length. I planned on doing it after I got the sources synced up.

Are you suggesting that I can use the time stretch / pitch shift to change the length of the audio without it being apparant to the listener?
wbtczn wrote on 11/24/2012, 9:46 PM
riredale - it sounds like you've done what I am trying to do. Instead of pulling the audio from a different video, it's coming from the flash recording. I'm thinking the toughest part for you would have been when parts of the performance were not done at the same tempo.

What I'm not understanding from these posts (and perhaps I just need to play with it) is the mechanics of making this elastique thing work.
musicvid10 wrote on 11/24/2012, 9:55 PM
"musicvid - the whole video is less than 6 minutes long."
Wow. Even with the cheapest pocket recorders I've never had drift that was apparent at six minutes. Can you sync it at the middle or solo part, and have the ends match acceptably? Maybe only one audio cut? Is there by chance some cutaway video you can use to realign the A-roll?
wbtczn wrote on 11/24/2012, 10:03 PM
Is there a way for me to post what I have so you all can see what I see?

Of course, I'm not sure I want you to see the video. I was using a really crappy tri pod and I have a done of jitter in the video! :)
musicvid10 wrote on 11/24/2012, 10:10 PM
use a sharing site -- dropbox, mediafire, google drive
wbtczn wrote on 11/24/2012, 10:20 PM
I've got drop box....what I'm not sure is what to put in there -- the original 2 files? The .veg file? or what?
fldave wrote on 11/24/2012, 10:21 PM
Yeah, I get a drift, but split a frame or overlap a frame about every 6-10 minutes. I was 10 frames off of a 100 minute concert overall. A few seconds shouldn't be noticeable, unless one of the devices dropped a frame or several seconds of audio.
musicvid10 wrote on 11/24/2012, 10:28 PM
put the original vid and the original mp3, neither should take up much space.
wbtczn wrote on 11/24/2012, 10:46 PM
I was just mentioning this to my wife. She's the Minister of Worship Arts / choir director, and she had a light bulb moment. The recording she gave me was from our second service. The video was from the first service! So that definitely fills in some gaps!

I've gone ahead and put the 2 files in my Dropbox public folder. I'd still be interested in how you would try to marry the two together.

They are located in this folder: https://www.dropbox.com/sh/0t5wzdgxvy610d8/06h9RUPBSE
Here's the video link: https://dl.dropbox.com/u/13030895/11-04-2012_094320.m2ts
Here's the audio link: https://dl.dropbox.com/u/13030895/I%20Will%20Rise.mp3

Ron Windeyer wrote on 11/24/2012, 11:12 PM
Hi wbtczn

Your wife's "light bulb" moment may be the answer - audio and video from different performances may well be quite out of sync, even by a second or two. Have you a recording from the first service? If not, you may have to fudge things a bit.

Yes, I am suggesting that you can stretch the audio without changing pitch or the listener noticing. Be aware though - I have just read the help file in Vegas; musicvid sounds like the voice of experience and suggests that you may end up with artifacts. I can only suggest that you try it and see.

Edit - found the audio, can't find the video link - I get a 404 error..
Cancel that - found the video with Internet Explorer. D/Ling now.
Chienworks wrote on 11/25/2012, 12:13 AM
Thing is though, you *DO* want to change the pitch!

I really don't understand where the idea came from in this forum that pitch shift is verboten. It is actually *required* to correct the problem. If the audio recording device ran at a different speed then the pitch has been changed, and therefore part of fixing the speed change is to shift the pitch back to where it started.

I think in the modern digital days we forget that pitch = speed and speed = pitch. It's only with the new digital tools that we can alter reality and change one without the other, and doing so is often inappropriate.
Ron Windeyer wrote on 11/25/2012, 4:04 AM
Well it's been an entertaining day - I have learned something and had the chance to try something new.
It seemed that the choir (which is pretty darn good, BTW) is human rather than being machines - the timing was ever so slightly out compared to the earlier version.
I cut the audio in 5 places, stretched the segments with eliastique and pushed two together slightly.
I think the result is quite passable - check it out here
https://www.dropbox.com/s/idr7xbid74tndy7/I%20Will%20Rise.m2ts
musicvid10 wrote on 11/25/2012, 9:28 AM
Oh, different performances.
Had a director (hired gun) who liked the audio better from the second performance with the video from the first performance of two evenings of West Side Story.

He disappeared into the closet with two of those big Panny editing decks with the intention of crash-editing the whole 3 hr. performance to the best composite. I told him it wouldn't work.

We next saw him three weeks later -- disheveled, wandering aimlessly, had lost considerable weight; and the oddest glazed-over look on his face, something resembling dementia. We never saw the tape.

You'll probably make it with your six minute shoot. Then compare to the native video. Best of luck.
Andy_L wrote on 11/25/2012, 11:49 AM
We next saw him three weeks later -- disheveled, wandering aimlessly, had lost considerable weight, and the oddest glazed-over look on his face, something resembling dementia. We never saw the tape..

This sounds like one of my editing sessions...

:)
riredale wrote on 11/25/2012, 3:31 PM
Chienworks: You are right, though it's a bit of a mind-bender. If I record audio simultaneously with a digital device and with a Minidisc device and then put all the audio on a single timeline, the Minidisc audio is almost always very slightly slower, at one frame in 8000 (4.5 minutes on average). This implies that the issue is with Minidisc record/playback modes, since it's pretty consistent. In any event, by shrinking the Minidisc track by that one frame in 8000, I guess in theory one SHOULD be allowing the pitch to increase back to the original value. But the increase should be very small, in fact unnoticeable. When I tweak acapella choir recordings and goose up the pitch as they (usually) drift flat over time, I find that a ten-cent change is barely noticeable, while a twenty-cent (1/5th of a semitone) bump is. The pitch change I would get from adjusting my Minidisc audio would be only about one cent.

The one time I had to hold the pitch constant was for a documentary a few years back where we descended from the upper level of the Eiffel Tower via stairway. I sped up the video by a factor of about 5, and sped up a portion of the William Tell Overture to match the action. If I hadn't held the pitch constant the overture would have sounded like something out of Alvin and the Chipmunks (BTW it takes a long time to walk down the stairs of the Tower). Guaranteed to give one vertigo.

This is the "Pie Jesu" portion of the mashup I mentioned earlier. Same choir, two different venues one month apart. One performance had brass, strings, and percussion, the other a nice pipe organ.

I didn't spend a lot of time on this, and there are a couple of places where I could have gone in and tweaked the sync a bit more. But on the whole I think it came out okay.
wbtczn wrote on 11/25/2012, 7:54 PM
That looked pretty good, Ron. There was a spot early that I could tell was off, but I was looking for it to be off. :) I had to remind myself that the audio and video were from separate performances.