|
|
| farmdve |
| Posted: Feb 23 2012, 12:53 PM |
 |
|
Member
 
Group: Members
Posts: 24
Member No.: 24933
Joined: 21-January 09

|
I don't know if OpenCL or CUDA have progressed far, so that we can compress via them, but a GPU is by far, much faster than a CPU.
I was wondering, Phaeron, if you have any desire to create an OpenCL(as I am an AMD Radeon user) kernel to use the GPU to compress/edit videos? |
 |
| levicki |
| Posted: May 9 2012, 08:01 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 167
Member No.: 22605
Joined: 13-December 07

|
My guess would be that he doesn't 
It only makes sense that you write your own plugin for a specific editing purpose.
I don't see any issue with writing an OpenCL filter plugin -- it should be possible right now. As a matter of fact I am currently looking into writing one myself.
Regarding compression, that task is done by a codec and (at least for decoding) there are some codecs that can use GPU acceleration.
Creating compliant video streams of good quality is a rather complex task and it doesn't make sense to duplicate that functionality in VirtualDub. |
 |
| phaeron |
| Posted: May 12 2012, 08:01 PM |
 |
|

Virtualdub Developer
  
Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02

|
Getting integration with video codecs is difficult. Just about the only standard for this on Windows is DXVA, which is tricky to work with because it's (a) DirectShow based and ( relies on either DirectDraw or Direct3D, depending on the version. DirectDraw is a lost cause so this would require DXVA 2.0 (Vista+), and that in turn requires messy interop. VirtualDub largely relies on external codecs to perform encoding and decoding functions so this would be required to get GPU acceleration. Also, while hardware accelerated decoders are common, I don't know about much availability for hardware accelerated encoding.
As for accelerating filters, yes it is possible, but the major concern is the cost of getting the video data to the graphics card and back. This is expensive both in cycles and in delay, and the problem with doing it in a filter is that you lock-step the pipeline: the CPU uploads the frame, waits for the GPU to process it, and then downloads it. This takes long enough that it's easy to end up being slower than a plain old CPU based filter. Thus, for good performance you really need assistance VirtualDub to keep the frames on the graphics card as long as possible and to arrange for background download of the frames.
I did look at CUDA a while back and liked the language, but one big problem was that it was too slow on the video card I had at the time. The other problem was the need for a linked-in library for the most usable form of CUDA. OpenCL doesn't require this and potentially could work, but VirtualDub's existing 3D acceleration code is Direct3D 9/9Ex based and so I'd have to look into interop.
Needless to say, processing video is a lot more complex now than 10 years ago.... |
 |
| dloneranger |
| Posted: May 12 2012, 10:39 PM |
 |
|
Moderator
  
Group: Moderators
Posts: 2366
Member No.: 22158
Joined: 26-September 07

|
Funnily enough, there was a round up of the current state of gpu encoding
The title "the wretched state of gpu transcoding" sums it up pretty well http://www.extremetech.com/computing/12868...gpu-transcoding
-------------------- MultiAdjust JoinWav WavNormalize FFMPeg Input Plugin v1827 UnSharpMask Windows7/8 Codec Chooser All FccHandlers Stuff inc. Installers for acm codecs AAC, AC3, LameMp3 |
 |
| levicki |
| Posted: May 13 2012, 04:05 PM |
 |
|
Advanced Member
  
Group: Members
Posts: 167
Member No.: 22605
Joined: 13-December 07

|
| QUOTE (phaeron @ May 12 2012, 09:01 PM) | As for accelerating filters, yes it is possible, but the major concern is the cost of getting the video data to the graphics card and back. This is expensive both in cycles and in delay, and the problem with doing it in a filter is that you lock-step the pipeline: the CPU uploads the frame, waits for the GPU to process it, and then downloads it. This takes long enough that it's easy to end up being slower than a plain old CPU based filter. Thus, for good performance you really need assistance VirtualDub to keep the frames on the graphics card as long as possible and to arrange for background download of the frames.
I did look at CUDA a while back and liked the language, but one big problem was that it was too slow on the video card I had at the time. The other problem was the need for a linked-in library for the most usable form of CUDA. OpenCL doesn't require this and potentially could work, but VirtualDub's existing 3D acceleration code is Direct3D 9/9Ex based and so I'd have to look into interop.
Needless to say, processing video is a lot more complex now than 10 years ago.... | - To get the real benefit of acceleration and GPU processing I think it would be wise to have completely separate GPU filter pipeline where results from one filter would be passed directly to another filter in the chain without first downloading to the host memory.
- OpenCL requires runtime to be installed (it comes with GPU drivers usually) and each vendor has their own SDK (for NVIDIA you have to use CUDA to compile OpenCL).
- So true, and all the new language features do not help in extracting performance -- they add confusion. |
 |
| phaeron |
| Posted: May 19 2012, 10:47 PM |
 |
|

Virtualdub Developer
  
Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02

|
| QUOTE | | To get the real benefit of acceleration and GPU processing I think it would be wise to have completely separate GPU filter pipeline where results from one filter would be passed directly to another filter in the chain without first downloading to the host memory. |
VirtualDub already does this. Any filters that are 3D acceleration enabled are run on a special thread dedicated for GPU processing and frames are passed between 3D accelerated filters as D3D9 render target textures. Integrating OpenCL would involve piggybacking onto this pipeline.
| QUOTE | | OpenCL requires runtime to be installed (it comes with GPU drivers usually) and each vendor has their own SDK (for NVIDIA you have to use CUDA to compile OpenCL). |
I don't think this is a problem -- you just open the OpenCL library and feed it program source with optional binary caching. CUDA is NVIDIA's proprietary GPU programming API which was a predecessor to OpenCL, but it's basically a sibling path to OpenCL. AFAIK you do not need to use either the CUDA SDK or API order to use OpenCL on NVIDIA cards. This actually solves a nasty problem that would have arisen with integrating CUDA, which is the need for a possibly license incompatible SDK and runtime.
However, there is a more annoying issue. I did a bit of research, and the only API that OpenCL is guaranteed to work with is OpenGL. This is a problem because VirtualDub's existing 3D filtering and display pipelines, as well as DirectX Video Acceleration (DXVA) 2.0, are Direct3D 9Ex based. This is how the D3D interop story looks:
- D3D9/9Ex: Separate conflicting NVIDIA and Intel extensions, no support from ATI, no standardized extension appears to be coming.
- D3D10: Standardized as Khronos extension, appears to be widely supported.
- D3D11: Only supported by NVIDIA, but Khronos extension is on the way.
It might be possible to bridge across from D3D9Ex to OpenGL and from there to OpenCL, but that'd be a mess I don't even want to think about. |
 |
| levicki |
| Posted: Aug 4 2012, 09:35 PM |
 |
|
Advanced Member
  
Group: Members
Posts: 167
Member No.: 22605
Joined: 13-December 07

|
Well, OpenCL is Apple's idea and Apple only has OpenGL, not to mention all are made by Khronos so it makes sense for those to go together. You locked yourself to Windows platform using DirectX while using OpenGL would enable you to port to Mac OS X and to Linux.
|
 |
| DarrellS |
| Posted: Apr 8 2013, 04:29 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 567
Member No.: 1061
Joined: 28-November 02

|
You can use the cuda.exe H264 encoder with the external encoder feature. The quality isn't as good as x264 in superfast mode though. Hopefully that could be tweaked somehow but there aren't a lot of options that I can tell.
The encoder can be found in Selur's Hybrid program.
The command arguement that I used was...
--resolution %(width)x%(height) --input - --sar 1x1 --format IYUV --control_mode vbr_rest --bitrate 4000 --bitrate_peak 15000 --fps %(fps) --profile high --level auto --offload partial --measure FPS --showFrameStats 100 --deinterlace true --frame_typ frame --pframe_dist 1 --gop_max 250 --dynamicGOP true --pquant_min 20 --bquant_min 24 --iquant_min 20 --deblock true --cavlc false --nal_typ auto --sps_pps false --slices auto --output "%(tempvideofile)" |
 |
| -vdub- |
| Posted: Apr 8 2013, 10:49 PM |
 |
|
Advanced Member
  
Group: Members
Posts: 613
Member No.: 27087
Joined: 24-February 10

|
--profile 444 --preset veryslow or placebo
For better quality could lower either to where the quality starts to fall. Then use the previous setting once again for video quality for the video encode ? |
 |
| DarrellS |
| Posted: Apr 16 2013, 08:56 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 567
Member No.: 1061
Joined: 28-November 02

|
| QUOTE (-vdub- @ Apr 8 2013, 03:49 PM) | --profile 444 --preset veryslow or placebo
For better quality could lower either to where the quality starts to fall. Then use the previous setting once again for video quality for the video encode ? |
When I said "The quality isn't as good as x264 in superfast mode", I meant that x264 at superfast preset looked better than Cuda with the command argument that I posted.
Here is the help file from cuda.exe...
| CODE | C:\Documents and Settings\Darrell>C:\Tools\Hybrid\Cuda.exe help
Authors: netcasthd&Selur can both be contacted through the doom9 forum
Usage: cuda --input "path to input file" --resolution WIDTHxHEIGHT --output "path to output file"
example: 'cuda --input test.yuv --resolution 640x352 --output test.264'
Required Settings: ----------------------- --resolution <value>x<value> specify the resolution of the input file (needs to be at least mod2) --output <value> specify the input file (the encoder doesn't care about the extension)
Global Settings: ----------------------- --input <value> specify the raw input file or - for pipe input (-) --sar <value>x<value> set the pixel aspect ratio, e.g. 1x1, 16x11 (1x1) --format <value> set input color format: UYVY, YUY2, YV12, NV12, IYUV (IYUV) --control_mode <value> select the rate control method: allowed for h.264: cbr, vbr, cq, vbr_rest (vbr_rest) allowed for vc-1: cbr (cbr) --bitrate <value> target bitrate kBit/s (1500) - only for cbr, vbr, vbr_rest --bitrate_peak <value> maximal bitrate peak kBit/s (62000) - only for cbr, vbr, vbr_rest --fps <value> set the ouput frame rate (25/1) --fps <numerator/denominator> --profile <value> select a profile for the output stream: allowed for h.264: baseline, main, high (high) allowed for vc-1: simple, main (main) --level <value> select a profile for the output stream: allowed for h.264: 10,11,12,13,20,21,22,30,31,32,40,41,42,50,51 (auto) allowed for vc-1: low, medium, high, auto (auto) --offload <value> specify gpu work offload, values: partial, full (partial) --forceGPU <value> set if you want to force a specific gpu to beused: 0/1 (NA) --measure <value> during encoding measure: FPS, NONE (FPS) --showFrameStats <value> set how often frame stats should be posted during encoding: 0- (0 = disable) (only when measure FPS is used) --frame_typ <value> set the outptu frame typ: frame, top, bottom (frame) --pframe_dist <value> set the minimum distance between two p-frames(1) --gop_max <value> set the maximum distance between two key frames (250) --dynamicGOP <value> set whether or not gop structure should be choosen dynamically: true/false (true) --pquant_min <value> set a min quantizer for p-frames (0) - only with vbr_rest --bquant_min <value> set a min quantizer for b-frames (0) - only with vbr_rest --iquant_min <value> set a min quantizer for i-frames (0) - only with vbr_rest --deblock <value> dis-/enable deblocking: true/false (true) --deinterlace <value> dis-/enable deinterlacing: true/false (false) --preset <value> select a preset: psps, ipod, avchd, bluray, hdv1440 --cavlc <value> use CAVLC instead of CABAC entropy coding --nal_typ <value> select nal-unit type: auto, 1-4 (auto) --sps_pps <value> dis-/enable sps_pps flaf (true) --slices <value> set slice count: auto, 1-4 |
Not sure what to use to get better quality from cuda. If I set the bitrate higher then the output is almost acceptable but the file size is much higher than the input file and almost twice as big as the x264 file.
Also, if I use an AVI as input then cuda encoding slows way down. In some of my tests, my Q6600 CPU smoked the cuda encoder with my GTS450 graphics card. I was getting around 450fps with cuda but after installing all my programs, for some reason, my encode speeds dropped with cuda. I am using an old driver for my card because I couldn't get cuda to work at all before. I had to do a clean install of XP with the original driver to get it working. Maybe if I upgrade to a newer driver I can get better results but I was told that after a certain version of the driver that they crippled the speed so I don't want to install the newest driver.
Anyway, my tests showed that it isn't worth using cuda unless you have a slow CPU and a fast cuda card. Although you can get the quality almost as good as x264 at superfast preset, you could never get it anywhere close to x264 at crf 18 and medium or slow preset. |
 |
| -vdub- |
| Posted: Apr 16 2013, 09:30 PM |
 |
|
Advanced Member
  
Group: Members
Posts: 613
Member No.: 27087
Joined: 24-February 10

|
We had improvements to make cpu faster with the latest virtualdub beta you may have seen
Virtualdub 10.4-test7
(test-1) Processing priority option now applies to all worker threads (filtering, compression)
Your 64 bit cpu has four actual cores not a hyper-threading cpu. As phaeron had said elsewhere that actual cpu cores should be better over hyper-thread cpu types. Did the above virtualdub beta version increase your fps speed any further for your test files ! |
 |
|