|
|
| Sajal |
| Posted: Nov 3 2003, 09:50 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 115
Member No.: 2239
Joined: 28-January 03

|
I just downloaded the latest Virtualdub Mpeg2 1.5.7 [18019], its working flawlessly with DivX 5.1.1 Beta 1, earlier versions of VDub Mpeg2 are also working fine.
The BUG has been crushed, at last. So its a great news for people having P4 with WinXP. I hope I'm not the only person.
-------------------- 'Veni, Vidi, Velcro' - I came, I saw, I stuck around |
 |
| fccHandler |
| Posted: Nov 4 2003, 03:53 AM |
 |
|
Administrator n00b
  
Group: Moderators
Posts: 3961
Member No.: 280
Joined: 13-September 02

|
Here is VirtualDub-MPEG2 1.5.8.
(Sheesh, four releases in one month! Hard to keep up with Avery these days...)
-------------------- May the FOURCC be with you... |
 |
| Sajal |
| Posted: Nov 4 2003, 06:46 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 115
Member No.: 2239
Joined: 28-January 03

|
Thanks for the fast update fccHandler
Well, Avery is beyond any thanks
-------------------- 'Veni, Vidi, Velcro' - I came, I saw, I stuck around |
 |
| meilin |
| Posted: Nov 4 2003, 09:13 AM |
 |
|
Unregistered

|
thx for your wonderful work |
 |
| fccHandler |
| Posted: Nov 4 2003, 08:35 PM |
 |
|
Administrator n00b
  
Group: Moderators
Posts: 3961
Member No.: 280
Joined: 13-September 02

|
I uploaded a new build just now, because the main window crashed if you tried to close it while an MPEG was being parsed. (I hadn't properly incorporated Avery's fix into my version of the code.)
I also moved YV12 to the top of the list (as mentioned here) and fixed it so that 4:2:0 MPEG files can be passed as YV12 in "fast recompress" mode.
-------------------- May the FOURCC be with you... |
 |
| Sajal |
| Posted: Nov 5 2003, 10:56 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 115
Member No.: 2239
Joined: 28-January 03

|
I have one request, please put the Build number beside the download link on your webpage. I understand that requires more effort on your part, but that will help a lot of the users of your Mod. As always it isn't possible to follow this forum, just checking the link will work.
Btw, your page is already bookmarked by me.
Keep it up.
-------------------- 'Veni, Vidi, Velcro' - I came, I saw, I stuck around |
 |
| fccHandler |
| Posted: Nov 10 2003, 10:11 PM |
 |
|
Administrator n00b
  
Group: Moderators
Posts: 3961
Member No.: 280
Joined: 13-September 02

|
VirtualDub-MPEG2 1.5.9 is up.
-------------------- May the FOURCC be with you... |
 |
| Sajal |
| Posted: Nov 11 2003, 05:24 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 115
Member No.: 2239
Joined: 28-January 03

|
Phew! That was quick.
I now don't even download Avery's original Virtualdub, though I know he deserves full credits.
fccHandler : Thanks for keeping my request and putting up the build number, we owe you a lot.
-------------------- 'Veni, Vidi, Velcro' - I came, I saw, I stuck around |
 |
| meilin |
| Posted: Nov 11 2003, 09:16 AM |
 |
|
Unregistered

|
thx |
 |
| fccHandler |
| Posted: Nov 20 2003, 09:50 PM |
 |
|
Administrator n00b
  
Group: Moderators
Posts: 3961
Member No.: 280
Joined: 13-September 02

|
Build 18143 fixes (I hope) a random crash which can occur when you exit the program. Also I've been tweaking some of the MPEG code, which might make it a tiny bit faster.
-------------------- May the FOURCC be with you... |
 |
| NCC1701e |
| Posted: Nov 23 2003, 09:47 AM |
 |
|
Unregistered

|
fcchandler,
First off, thanks for writing VirtualDub-MPEG2! It's been a great boon to be able to work with VOB files directly.
I do have one question. If I've been reading the FAQs correctly, this version doesn't contain P4 optimizations, right?
I don't have much experience with C code (though I've done a fair amount of Pascal). Is it worth trying to recompile the code to optimize for P4/hyperthreading?
Thanks! |
 |
| fccHandler |
| Posted: Nov 23 2003, 03:54 PM |
 |
|
Administrator n00b
  
Group: Moderators
Posts: 3961
Member No.: 280
Joined: 13-September 02

|
Nomadic hosts his own P4 build of VirtualDub-MPEG2 (but he's one build behind as I write this).
My MPEG-2 code isn't P4 optimized simply because I've never had a P4 and I don't know how to write for one. It would be necessary for someone with the knowledge and experience (and the tools) to rewrite the MPEG-2 code for a P4 build, otherwise I wouldn't expect much speed gain from recompiling the existing code.
-------------------- May the FOURCC be with you... |
 |
| phaeron |
| Posted: Nov 23 2003, 11:44 PM |
 |
|

Virtualdub Developer
  
Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02

|
The Intel compiler is much more aggressive at low-level optimization than Visual C++, so it's actually most likely to help on the MPEG-2 bit decoding. Here are some of the wonderful performance features of the P4:
- Shift of death. Shifts in scalar mode (shl/shr/sar) have a latency of 4 clocks.
- Slow LEA. Addressing modes with an index scaler ([eax+ecx*4]) generate a shift uop. See above.
- Slow IMUL. Scalar multiplies go through the FPU and are slow again.
- Slow MOVQ. Register-to-register MMX moves have a latency of 6 clocks. pxor mm0,mm0/paddw mm0,mm1 has a latency of 2 clocks.
- 64K aliasing. All lines in a set in the L1 cache share the upper 16 bits. That means if two 64-byte aligned sets are separated by 64K, they cannot both be in L1 at the same time, so allocating your MPEG buffers with VirtualAlloc() calls is a bad idea. The stack is a big problem here on HT systems.
- Slow flag operations. Basic ALU operations run in 1/2 clock, but those that consume flags (sbb, cmovcc, setcc) are a lot slower.
You can probably get some gain out of profile-guided optimization mode, but I've never tried it. |
 |
| Darkfalz |
| Posted: Nov 24 2003, 03:13 AM |
 |
|
Unregistered

|
| QUOTE (phaeron @ Nov 23 2003, 05:44 PM) | The Intel compiler is much more aggressive at low-level optimization than Visual C++, so it's actually most likely to help on the MPEG-2 bit decoding. Here are some of the wonderful performance features of the P4:
- Shift of death. Shifts in scalar mode (shl/shr/sar) have a latency of 4 clocks.
- Slow LEA. Addressing modes with an index scaler ([eax+ecx*4]) generate a shift uop. See above.
- Slow IMUL. Scalar multiplies go through the FPU and are slow again.
- Slow MOVQ. Register-to-register MMX moves have a latency of 6 clocks. pxor mm0,mm0/paddw mm0,mm1 has a latency of 2 clocks.
- 64K aliasing. All lines in a set in the L1 cache share the upper 16 bits. That means if two 64-byte aligned sets are separated by 64K, they cannot both be in L1 at the same time, so allocating your MPEG buffers with VirtualAlloc() calls is a bad idea. The stack is a big problem here on HT systems.
- Slow flag operations. Basic ALU operations run in 1/2 clock, but those that consume flags (sbb, cmovcc, setcc) are a lot slower.
You can probably get some gain out of profile-guided optimization mode, but I've never tried it. | Still with all that going on, my P4 seems incredibly fast all the same |
 |
| phaeron |
| Posted: Nov 24 2003, 04:07 AM |
 |
|

Virtualdub Developer
  
Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02

|
The P4 does have some advantages. Its double-clocked scalar ALU is I believe a first in mainstream CPUs, and its trace cache means that alignment and decode issues are nonexistent in inner loops. It's mainly built for scalability, which is why it has the prefetcher to take advantage of massive bandwidth -- you can always increase parallelism to get more bandwidth but dropping latency is much harder. So the fact that the P4 executes fewer instructions per cycle is not an issue if the P4 can clock 2-3x faster than the competition. We're starting to see this now, in that Intel is clocking Netburst past 3GHz and aiming at 4Ghz+, when AMD is still struggling around 2-2.5GHz. Which strategy will win remains to be seen; it will be interesting when AMD64 takes off and when Intel releases Prescott, which fixes some of the above mistakes.
The main annoyance with the P4 is that it invalidates many of the tuning techniques that were introduced and popularized for the Pentium and Pentium Pro architectures. In fact, some P4 tuning techniques haven't been seen since the days of the 286, such as replacing shifts with repetitive adds. What's frustrating is that you can get the P4 to execute code significantly faster if you work around the anomalies, but it's so painful to do that it's almost not worth it. In contrast, the main strategy for the PPro and Athlon architectures is usually much simpler: balance dependency chains and shove instructions through the decoders as fast as possible.
It also doesn't help that the P4 machines are usually the fastest in the spec for a product. How do you justify optimizing for the P4 on the high end when it penalizes the Pentium II on the low end that your product barely runs on as it is? |
 |