Render Time...Smoke on my bottom

John_Cline wrote on 2/9/2005, 7:53 AM

Vegas produces the best looking output of any NLE I have used. I agree, an increase in render speed would be nice but NOT at the expense of quality.

John

Jimmy_W wrote on 2/9/2005, 7:55 AM

Second that!
Jimmy

GaryKleiner wrote on 2/9/2005, 8:07 AM

Hardware acceleration would be great, but:

You can speed up your render time with dual processors by rendering simultaneoulsy from two vegas projects (or render two parts of the same project with two instances of Vegas open).

Also, you can set up two render nodes with network rendering, as well as network render using three computers (up to six nodes).

Can the other software NLEs do that?

Gary

B_JM wrote on 2/9/2005, 8:14 AM

I still say Sony should make playstation 3's render nodes .. !

DavidMcKnight wrote on 2/9/2005, 8:19 AM

Dual procs won't increase the performance of one app, from everything I've read. The concept of dual procs is so you can do two things at once, such as running Vegas and Photoshop, or as was mentioned two instances of Vegas.

I suppose you can do network rendering on one PC with two procs... Dunn, have you been able to try this configuration yet?

B_JM wrote on 2/9/2005, 8:47 AM

Dual procs WILL increase the performance of SOME apps ...

even in vegas for straight mpeg2 encoding (no filters) , will use both CPUs ..

Digital Fusion , AE , Renderman and many other apps properly use both cpus

Dunn wrote on 2/9/2005, 9:14 AM

No I have not tried network rendering. The point of this post is that the onus of improving render time falls on the Software designers to take advantage of mulit threding dual processors and not me spending a lot of money on a network rendering config. that still does not take advantage of the full potential of the CPU and ram. PS i like the codec too! i don't like waiting for it to render.

Liam_Vegas wrote on 2/9/2005, 9:22 AM

You don't have to spend a lot of money on network rendering. What is being said here is that you can setup 2 "nodes" of the network rendering service on your Dual-Processor PC. Therefore you have what you were looking for... a much faster rendering capability as your renders can be spread across the 2 cpu's.

Spot|DSE wrote on 2/9/2005, 9:30 AM

With the new chips they've got coming, I don't think that's at all too far off, BJ_M

johnmeyer wrote on 2/9/2005, 9:39 AM

Several posts here refer to a quality vs. speed tradeoff in the Vegas DV codec. I think this is a red herring. The real problem in Vegas is not the speed of the encoding, but the speed of the rendering. If you do something trivial, like change the opacity of a frame, and then encode to a DV file, it happens pretty fast. However, as soon as you do more complicated things, and then render the project, it can take hours for just a few minutes of video. This time is rendering not encoding time.

So, my first point is that the excellent Sony DV encoder has little to do with the speed (or lack therof) when rendering effects.

My second, and final point is that rendering IS a seaprate process and should be able to take full advantage of dual processors. If it is not, then this is something that Sony engineers could change, if they choose to devote their energies in that direction. Dual processor rendering sure would be a lot easier for someone to set up and use than network rendering. As someone who has used network rendering, it is extremely difficult to set up; often has problems; requires a 100 Mbps switched network; only works well when the network is lightly loaded; and presently gives back much of the potential time saving (when doing a shared render) due to the stupid way it handles stitching (something that could easily be fixed in one of two very clever ways, as discussed in many threads last summer).

RichMacDonald wrote on 2/9/2005, 11:32 AM

>rendering IS a seaprate process and should be able to take full advantage of dual processors. If it is not, then this is something that Sony engineers could change, if they choose to devote their energies in that direction.

It could be done. A quick search on "mpeg algorithm" and "parallel" provides many things worth study. Based on a simplistic understanding of the mpeg algorithm, I see several areas of interest:

1) Parallel algorithms for the various mathematical transformations.

(Quite a bit of current research in this plus historical experience because these are common engineering tasks.)

2) Better (more exhaustive) searches for the temporal compression.

(Seems to me that no one worries about this. Everyone just implemented simplistic, fast heuristics that don't exhaustively search the possibilities. Perhaps this is a legacy from the hardware approach. This of course would greatly slow down rendering speed, but you should be able to improve the quality/size ratio.)

3) Simplest of all and almost perfectly scaleable: Separate the rendering into the independent "chunks" between the I-frames, then compress each chunk in parallel.

(One thread to compress the I-frames. Extract each chunk and delegate to another thread; as many threads as processors. One thread to combine the individual results into the final stream.)

farss wrote on 2/9/2005, 1:21 PM

Except DV isn't mpeg! DV has NO temporal compression and the whole frame is decompressed and compressed as a single entity hence there is no way you can split the task up. Encoding to mpeg (as opposed to rendering DV) can be split as it's done in blocks and the MC encoder does just this.
Bob.

B_JM wrote on 2/9/2005, 1:24 PM

you can split it up -- other apps so so quite nicely

logiquem wrote on 2/9/2005, 1:27 PM

> As someone who has used network rendering, it is extremely difficult to set up; often has problems

It's funny, i finally decided yesterday to try network rendering. Surprisingly, i was up and running 30 min. later! About 4 simple steps to follow in fact. And the results are stunning, i got more than 50% lower rendering time with a second, slower renderer. It is true that the project was very heavy in term of fx (chromakeying, color correction, chroma blur) and light on assets (just one stream of DV plus some graphics).

I wonder if the comment about the "extremely difficult set up" is related to the past or the present 5.0d Vegas build?

johnmeyer wrote on 2/9/2005, 3:03 PM

I wonder if the comment about the "extremely difficult set up" is related to the past or the present 5.0d Vegas build?

I was the one that wrote that. The original 5.0 network was really difficult to set up because all the mappings had to be entered by hand. 5.0b corrected that rough edge, and helped ease the setup. However, I still classify network rendering setup as difficult because of the various posts that seem to come up in this forum all the time. People are still having problems.

This is not entirely Sony's fault, but rather is the result of the complication of operating over networks, some of which are peer, some of which are server, and all of which have different security schemes that affect which files can be shared, and what rights another computer has to save, modify and delete files remotely.

Another issue about network rendering: it is rendering, . Even after trying to clarify that difference in my earlier post, I note that many people are still confusing the two (I do as well sometimes). Encoding is the process of putting the video into a particular format, and is done after all the rendering is finished. Encoding takes the same amount of time for each minute of video regardless of whether it takes 10 milliseconds to render each frame or 10 minutes to render each frame.

Since encoding is not distributed by the network rendering feature, MPEG encoding will not be helped by network rendering although you can encode to MPEG during a network render (the encoding is all done on the host computer).

I am pleasantly surprised by your report on the reduction you achieved in rendering time, after your thirty minute setup. Many people don't achieve such good results with just one render node. Because of the stitching overhead, network rendering does not work well for projects of long duration (as measured by the timeline), especially if the rendering chores are light, because it takes so long to move multi-gigabyte files around the network and then back to your main computer to stitch together. On the other hand, if you have a five-minute gem that is using bezier masks, multiple compositing channels, chroma key, supersampling, etc., and it takes ten hours to render, then network rendering is a godsend because relatively little data has to be shipped around and stitched.

Dunn wrote on 2/10/2005, 8:44 AM

johnmeyer,
Would writing code for Vegas to utilize the full processing power of dual CPUs be a difficult task for Sony software engineers? (Is it that simple? A detail explanation would be a waist on me.)

My understanding of "scalable" is that the software performs to the limits of the hardware.

Thank you.

B_JM wrote on 2/10/2005, 9:24 AM

i think the main bottleneck is the filters - they are all single threaded it appears ...

logiquem wrote on 2/10/2005, 10:31 AM

>On the other hand, if you have a five-minute gem that is using bezier masks, multiple compositing channels, chroma key, supersampling, etc., and it takes ten hours to render, then network rendering is a godsend because relatively little data has to be shipped around and stitched.

That's exactly my situation. It is indeed a godsend, cause it cost nothing (when you already have a network) and it's potentially a tremendous way to upgrade your setup without loosing the old cpu/motherboard power.

RichMacDonald wrote on 2/10/2005, 11:41 AM

>Would writing code for Vegas to utilize the full processing power of dual CPUs be a difficult task for Sony software engineers? (Is it that simple? A detail explanation would be a waist on me.)

The simple explanation is that yes, it would be a difficult task for the Sony software engineers. But (a) that is a generalization that might be unfair to the software engineers and (b) the subject is near and dear to me, so I can't resist a more detailed explanation. I'll oversimplify a little ...:-)

Programmers talk about "threads", where each thread is a single sequential executing task. A single-threaded program does one thing at a time. Multi-threaded programs can do more than one thing at a time. In essence, a multi-threaded program is several little programs that share memory and cooperate while executing simultaneously. But even that is too simple a description, because a multi-threaded program can actually run on a "single-threaded machine", in which case only one thread can operate at a time, and the other threads are idle while waiting for their turn.

At the risk of oversimplifying (i.e., making wrong statements in order to keep it simple), a single CPU is a "single-threaded machine". This is actually a lie because single CPU computers can do several things at once, e.g., write a file and perform calculations at the same time. And today's CPUs can actually (within limitations) do multiple calculations at the same time. But a second CPU changes everything as you now can have two threads (one for each CPU) both running at full speed.

A "good" program will have at least two threads: One for controlling what happens in the GUI (the stuff you see on screen, mouse and keyboard, etc), and another for controlling the program as it reacts to things that the user does via the GUI. If there was only one thread, then every time you did something in the GUI, the computer would "lock you out" while it responded. So, for example, if you started to render a video clip, there would be no way for you to click the cancel button and stop it; you'd have to wait for the rendering to finish before doing anything else.

As an aside, that GUI thread is the laziest thread on your computer. It spends 99.99% of its time doing absolutely nothing while waiting for the incredibly slow user to do something :-)

Most programmers are familiar with this basic operation. Then it gets more sophisticated. For example, Vegas lets you add clips to a project and start working on them, while in the background it will continue to build the audio peak files. This is an example of multi-threading and its fairly easy to do because the audio peak files can be built independently of the other things you're doing. But its also tricky because some things can only be done after the audio peak files are built. In programming terms, one thread may have to stop and wait for a 2nd thread to complete before the 1st thread can continue.

This is a programmer's baby step into multi-threading. And its also the end of baby steps, as the next steps are leaps into the deep end.

It turns out that it is easy (relatively easy) to program single-threaded. But multiple-threaded programming is very difficult. The code is much harder to read, and most importantly its very hard to reason about the code to ensure that it works correctly and/or write the tests to catch any errors before the code is released into the big bad world of real users. Bugs are much more likely, especially the ones that are very hard to identify, diagnose, reproduce, and fix.

In fact, most programmers spend their lives working in single-threaded environments, while onnly a few elite programmers deal with the multi-threaded issues. So most programmers don't know how to write good programs in multi-threaded environments. It can be learned, but its difficult for everyone and impossible for some. I wouldn't be surprised if only 1 programmer in 100 is a good multi-threaded programmer.

Having said that, I know that the Sony software engineers are elite programmers. Its obvious from their product. They produce incredibly sophisticated tools with remarkably few bugs. Hats off to them without a doubt.

And having said that, I'm also certain that moving to a "fully" multi-threaded environment (and I mean taking advantage of multiple CPUs) will cause major headaches for the programmers. This is true for any program, not just Vegas, as it requires major changes in the fundamental architecture of any program.

And even assuming that all the programmers are multi-threaded experts, there are many computer operations that simply don't lend themselves well to multi-threading. Some task are difficult or impossible to split into separate parallel subtasks. Its the old "9 women can't have a baby in 1 month" syndrome. Or it can turn out that you have to do a bunch of additional things to make something parallel, and those extra things can cost more time than you save. For example, as BJ_M noted, the filters are single-threaded. And even if they're not, filters operate in sequence, i.e., the output of one filter is the input into the second, so the second filter has to wait until the first filter has completed. Naively, the first filter can run on the first CPU and the second filter can run on the second CPU. Unfortunately, the filters will run at different speeds (some filters have simple calculations; other filters have time-consuming calculations), so one CPU will be bottlenecking the other. You can alleviate this by adding "buffers" between the filters to store information temporarily (like a warehouse between producers and consumers), but that takes additional memory, may need an additional thread, and always slows down overall operation.

And if I lost you on that last paragraph, I may have meant to :-)

What it comes down to is that (1) some things are fundamentally hard to convert into multi-threading, and (2) even the easy things generally require severe changes to the code.

But now it gets interesting. There was a recent article in Dr Dobbs magazine (for programmers, and you need a subscription to see it, unfortunately) that stated: Moore's law is over for single-threaded code and future performance improvements are only going to come from multi-threading. (Moore's law is the recognition that computer performance has historically doubled every 18 months.) We've become used to faster and faster CPU Ghz speeds, but over the last 2 years the improvements have started to plateau. We may not see 4 Ghz CPUs. The reason is heat: The CPUs generate so much heat that we can no longer cool the chips fast enough to push the higher speeds.

CPU Ghz is "proportional" (true to the first approximation, a lie to the second) to the speed of single-threaded code. So this means that you're not going to be able to increase the speed of Vegas by buying a faster computer, because you won't be able to buy a "faster" computer.

The Dr Dobbs article was also interesting because it noted that programmers have used increasing CPU speeds as a crutch for "poor" programming. Rather than optimizing the code, programmers have "solved" performance problems by telling the user to get a faster computer. And programmers have used the increasing speed of CPUs to add "bloat", which in turn reduces performance further. For example, each new version of Windows has essentially sucked up the additional capability of the newest computers: Try running WinXP on your 10 year old computer.

Well those days of a "free lunch" are coming to a close. From now on, programmers are going to have to worry about performance and fix it in their code.

I need to quickly note that the Vegas programmers don't write "poor bloated" software (although 95+% of the rest of the world does). They work in a field where speed is crucial, so they have already spent a great deal of time optimizing for speed. However, their days of "free lunch" are almost over too. In the future, they'll only be able to get significant speed gains by re-architecting to take advantage of multi-CPUs.

So that's the bad news. The good news is that they know it better than you. And they are clearly addressing the issue because we got a taste in the last release with network rendering. (Or is it network encoding, John :-?) But we need to be patient, because, did I mention its hard and bug-prone

DavidMcKnight wrote on 2/10/2005, 12:16 PM

Rich -

VERY well stated. As a programmer who has only dabbled in multi-threading, I agree with your assesments completely, especially the comments about how well done the current product is.

clyde2004 wrote on 2/10/2005, 12:32 PM

Thank you Rich for the post.
[Dumb question alert] Just to try and confuse your very clear explanations, how does the operating system (be it Windows, Mac or other) affect the overall performance of real world tasks (threading, hyper-threading, etc.). The reason I ask is that sometimes people will say "the new version of my favorite operating system is alot faster". Wouldn't it be true that a really efficient OS could speed up a task like rendering?

farss wrote on 2/10/2005, 12:48 PM

Having worked on real time control systems, both the hardware and to a smaller extent the software, I'll second the comments about how hard and bug prone this process is. What plays a big role in this is the design of the hardware and the design of the OS. We pretty much used to design our own hardware and write our own OS. Our microcoded CPUs only ran with a 10MHz clock, core memory and 4 MB disks and yet they still did the job quicker than todays PCs. Of course our displays weren't graphical, most of the code was written in assembler or Fortran and it all cost literaly millions of dollars.
Bob.

John_Cline wrote on 2/10/2005, 1:16 PM

Well put, Rich. I only have one minor comment about Moore's Law. Moore observed an exponential growth in the number of transistors per integrated circuit and predicted that this trend would continue. He said nothing about performance. Here is a link to his original paper from April 19, 1965.

John

RichMacDonald wrote on 2/10/2005, 1:49 PM

Clyde2004, I'm going to have to bail on a detailed answer to your question because I'm not a hardware guy. But there is a dance between the OS and the particular CPU and supporting hardware. Basically the new CPU comes out with new feature X, which if used will speed up your particular application. However, the old OS does not support that particular feature on that CPU, so there is no speedup. Then the OS is rewritten to catchup and take advantage of that new feature. Then there is a speedup.

And its not just new features, the OS can be rewritten to interact with the CPU in a different way that is faster than before.

Now, at the risk of smoking out my bottom, one might say that its not that amazing how fast today's computers are, its actually criminal how many cycles the low level stuff (between OS and CPU) wastes because they're always stepping on each other's toes so badly. But that would be to point a hypocritical finger at our favorite M$, who does, after all, have to support all that old legacy DOS et al., preventing them from streamlining the new code to run faster. That old code is an albatross.

Stuff like:

if (dos) then x else if (win98) then y else if (win2k) then z else...

throughout one's code.

And if you think the above logic is bad, think how much worse it must be when dealing with all the hardware and driver combinations for a particular computer!

So next time Vegas drops backwards compatibility for an old OS, say "thank you" instead of "how dare you" :-)