Welcome Guest ( Log In | Register )


Important

The forums will be closing permanently the weekend of March 15th. Please see the notice in the announcements forum for details.

 
Pixel.cpp Vdconvertycbcrtorgb, Vdconvertrgbtoycbcr
« Next Oldest | Next Newest » Track this topic | Email this topic | Print this topic
dloneranger
Posted: Feb 8 2012, 04:53 PM


Moderator


Group: Moderators
Posts: 2366
Member No.: 22158
Joined: 26-September 07



Just checking, are these functions correct?

Testing a colourspace converter using these gives strange results

Going 709 > rgb > 601 seems ok
Going 601 > rgb > 709 seems to give an incorrect transformation - eg faces go all purplish

I'm comparing this to others, eg JPSDR's colour space converter


--------------------
MultiAdjust JoinWav WavNormalize FFMPeg Input Plugin v1827 UnSharpMask
Windows7/8 Codec Chooser
All FccHandlers Stuff inc. Installers for acm codecs AAC, AC3, LameMp3
 
    Top
jpsdr
Posted: Feb 9 2012, 09:32 AM


Advanced Member


Group: Members
Posts: 335
Member No.: 20490
Joined: 23-December 06



Can problem be only because of display ?
I don't know how video driver works, so maybe i'm saying something stupid, but when you give YCbCr componant to your video driver, how does it know if it's 709 or 601 ?

Note about my filters. In theory, in VDub, you can tell if your YCbCr output is 601 or 709, but actualy, i've not implemented them, so, any output produce my filters are considered for VDub as 601. There is no impact if after in your chain filter you stay in YCbCr, but if in your chain VDub make a convertion, it will consider it at 601, even if it's not.

Why like this ? Because actualy the convert format of VDub work only with 601.

I'm often doing the following :
Input is YV12 (either 601 or 709 if i work from DVD or Blu-Ray source).
My AutoYUYV filter to ouput YV16
My RGB convert with correct matrix of the source.
Use of filter wich work only on RGB.
My RGB convert back to YV24, almost always 709, as i'm targeting Blu-Ray720p or 1080p.
Convert to YV12 by VDub <----- Here the problem.
Output YV12 in lossless codec (UT Video).

Problem is if in my YV24 output, i force the flag to said it's 709, when i'll use the VDub YV12 convert, it will convert back my 709 to 601, and if i've selected the output to be 709, it will probably convert back to 709. 2 useless color convertion wich can only have a negative effect on picture, and lost of time.

So, my output are only taged as YV12, the standard default value, wich is also by default 601, and output is also selected on standard YV12 (as my input), all YCbCr are taged as their standard mode (YV16, ...), wich are by default 601, like this, for VDub won't do any color convertion, the only one made are those i make in my filters, and i control them. There is no effect in the process chain to fool VDub telling a 709 is indeed a 601, if you don't make any thing which may force VDub to use a matrix color conversion. The only effect you can have is display, either ouput or preview, as if one of my filter you output an YCbCr 709, VDub will display it but considering it's 601 (if this have any effect on video display/driver, i don't know).

 
     Top
dloneranger
Posted: Feb 9 2012, 04:55 PM


Moderator


Group: Moderators
Posts: 2366
Member No.: 22158
Joined: 26-September 07



It was just something I was playing with
I was thinking about adding a colourspace converter to one of my filters so I was looking for the right maths on the net, but as there were these 2 handy functions in pixel.cpp, I thought I'd give them a try

For testing I was just playing with various videos and comparing a few different 601>709, 709>601 conversions
The ones in pixel.cpp make the video from 601>709 look really purple though, so I thought I'd check in

--------------------
MultiAdjust JoinWav WavNormalize FFMPeg Input Plugin v1827 UnSharpMask
Windows7/8 Codec Chooser
All FccHandlers Stuff inc. Installers for acm codecs AAC, AC3, LameMp3
 
    Top
dloneranger
Posted: Feb 9 2012, 06:16 PM


Moderator


Group: Moderators
Posts: 2366
Member No.: 22158
Joined: 26-September 07



I think theres something wrong with the 709 coding

A simple transform like
CODE
 sint32 rgb= VDConvertYCbCrToRGB(y0,b0,r0,true,false);
 sint32 ra = (rgb>>16) & 0xff;
 sint32 ga = (rgb>>8) & 0xff;
 sint32 ba = (rgb) & 0xff;
 sint32 ryb = VDConvertRGBToYCbCr(ra,ga,ba,true,false);
 sint32 y1 = (ryb>>8) & 0xff;
 sint32 b1 = (ryb) & 0xff;
 sint32 r1 = (ryb>>16) & 0xff;

should produce y0==y1, b0==b1 and r0==r1
The 601 code does, but the 709 code is way off

--------------------
MultiAdjust JoinWav WavNormalize FFMPeg Input Plugin v1827 UnSharpMask
Windows7/8 Codec Chooser
All FccHandlers Stuff inc. Installers for acm codecs AAC, AC3, LameMp3
 
    Top
jpsdr
Posted: Feb 10 2012, 09:29 AM


Advanced Member


Group: Members
Posts: 335
Member No.: 20490
Joined: 23-December 06



If you're interesting by the maths, my code is the following, i've got informations from net/Wiki, i hope values were right.

CODE

void JPSDR_RGBConvert::Compute_Lookup(void)
{
double kr,kg,kb;
double u1,u2,v1,v2,r1,g1,g2,b1;
signed short i;

if (mData.full_range)
{
 Min_Y=0;
 Max_Y=255;
 Min_U=0;
 Max_U=255;
 Min_V=0;
 Max_V=255;
 Coeff_Y=255.0;
 Coeff_U=255.0;
 Coeff_V=255.0;
}
else
{
 Min_Y=16;
 Max_Y=235;
 Min_U=16;
 Max_U=240;
 Min_V=16;
 Max_V=240;
 Coeff_Y=219.0;
 Coeff_U=224.0;
 Coeff_V=224.0;
}

switch (mData.color_matrix)
{
 case 0 : kr=0.2126; kb=0.0722; break;
 case 1 : kr=0.299; kb=0.114; break;
 case 2 : kr=0.212; kb=0.087; break;
 case 3 : kr=0.3; kb=0.11; break;
}
kg=1.0-kr-kb;

u1=-kr/(1.0-kb); u2=-kg/(1.0-kb);
v1=-kg/(1.0-kr); v2=-kb/(1.0-kr);

r1=2.0*(1.0-kr);
g1=-(2.0*(1.0-kb)*kb)/kg; g2=-(2.0*(1.0-kr)*kr)/kg;
b1=2.0*(1.0-kb);

Offset_Y=(Min_Y << 6)+32;
Offset_U=(128 << 6)+32;
Offset_V=(128 << 6)+32;

Offset_R=(signed short)-round(16.0+((32.0*255.0*Min_Y)/Coeff_Y)+((128.0*r1*255.0*32.0)/Coeff_V));
Offset_G=(signed short)-round(16.0+((32.0*255.0*Min_Y)/Coeff_Y)+((128.0*g1*255.0*32.0)/Coeff_U)+((128.0*g2*255.0*32.0)/Coeff_V));
Offset_B=(signed short)-round(16.0+((32.0*255.0*Min_Y)/Coeff_Y)+((128.0*b1*255.0*32.0)/Coeff_U));

switch (convertion_mode)
{
 case 0 :
  for (i=0; i<=255; i++)
  {
   lookup[i]=(signed short)round((i*kr*Coeff_Y*64.0)/255.0);
   lookup[i+256]=(signed short)round((i*kg*Coeff_Y*64.0)/255.0);
   lookup[i+512]=(signed short)round((i*kb*Coeff_Y*64.0)/255.0);
   lookup[i+768]=(signed short)round((i*u1*Coeff_U*0.5*64.0)/255.0);
   lookup[i+1024]=(signed short)round((i*u2*Coeff_U*0.5*64.0)/255.0);
   lookup[i+1280]=(signed short)round((i*Coeff_U*0.5*64.0)/255.0);
   lookup[i+1536]=(signed short)round((i*Coeff_V*0.5*64.0)/255.0);
   lookup[i+1792]=(signed short)round((i*v1*Coeff_V*0.5*64.0)/255.0);
   lookup[i+2048]=(signed short)round((i*v2*Coeff_V*0.5*64.0)/255.0);
  }
  break;
 case 1 :
 case 2 :
 case 3 :
 case 4 :
 case 5 :
  for (i=0; i<=255; i++)
  {
   lookup[i]=(signed short)round((i*255.0*32.0)/Coeff_Y);
   lookup[i+256]=(signed short)round((i*r1*255.0*32.0)/Coeff_V);
   lookup[i+512]=(signed short)round((i*g1*255.0*32.0)/Coeff_U);
   lookup[i+768]=(signed short)round((i*g2*255.0*32.0)/Coeff_V);
   lookup[i+1024]=(signed short)round((i*b1*255.0*32.0)/Coeff_U);
   lookup[i+1280]=0;
   lookup[i+1536]=0;
   lookup[i+1792]=0;
   lookup[i+2048]=0;
  }
  break;
}

}


void JPSDR_RGBConvert::RGB32toYV24(const void *src_,void *dst_y_,void *dst_u_,void *dst_v_,sint32 w,sint32 h,ptrdiff_t src_pitch,ptrdiff_t dst_pitch_y,
 ptrdiff_t dst_pitch_u,ptrdiff_t dst_pitch_v)
{
const RGB32 *src;
unsigned char *dst_y,*dst_u,*dst_v;
sint32 i,j;
signed short y,u,v;
unsigned short r,g,b;

src=(RGB32 *)src_;
dst_y=(unsigned char *)dst_y_;
dst_u=(unsigned char *)dst_u_;
dst_v=(unsigned char *)dst_v_;

for (i=0; i<h; i++)
{
 for (j=0; j<w; j++)
 {
  b=src[j].b;
  g=src[j].g;
  r=src[j].r;
  y=(Offset_Y+lookup[r]+lookup[g+256]+lookup[b+512]) >> 6;
  u=(Offset_U+lookup[r+768]+lookup[g+1024]+lookup[b+1280]) >> 6;
  v=(Offset_V+lookup[r+1536]+lookup[g+1792]+lookup[b+2048]) >> 6;
  if (y<Min_Y) y=Min_Y;
  if (y>Max_Y) y=Max_Y;
  if (u<Min_U) u=Min_U;
  if (u>Max_U) u=Max_U;
  if (v<Min_V) v=Min_V;
  if (v>Max_V) v=Max_V;
  dst_y[j]=(unsigned char)y;
  dst_u[j]=(unsigned char)u;
  dst_v[j]=(unsigned char)v;
 }
 src=(RGB32 *)((char *)src+src_pitch);
 dst_y+=dst_pitch_y;
 dst_u+=dst_pitch_u;
 dst_v+=dst_pitch_v;
}
}


void JPSDR_RGBConvert::YV24toRGB32(const void *src_y_,const void *src_u_,const void *src_v_, void *dst_,sint32 w,sint32 h,ptrdiff_t src_pitch_y,ptrdiff_t src_pitch_u,
 ptrdiff_t src_pitch_v,ptrdiff_t dst_pitch)
{
RGB32 *dst;
const unsigned char *src_y,*src_u,*src_v;
sint32 i,j;
signed short r,g,b;
unsigned short y,u,v;

dst=(RGB32 *)dst_;
src_y=(unsigned char *)src_y_;
src_u=(unsigned char *)src_u_;
src_v=(unsigned char *)src_v_;

for (i=0; i<h; i++)
{
 for (j=0; j<w; j++)
 {
  y=src_y[j];
  u=src_u[j];
  v=src_v[j];
  r=(lookup[y]+lookup[v+256]+Offset_R) >> 5;
  g=(lookup[y]+lookup[u+512]+lookup[v+768]+Offset_G) >> 5;
  b=(lookup[y]+lookup[u+1024]+Offset_B) >> 5;
  if (r<0) r=0;
  if (r>255) r=255;
  if (g<0) g=0;
  if (g>255) g=255;
  if (b<0) b=0;
  if (b>255) b=255;
  dst[j].b=(unsigned char)b;
  dst[j].g=(unsigned char)g;
  dst[j].r=(unsigned char)r;
  dst[j].alpha=0;
 }
 dst=(RGB32 *)((char *)dst+dst_pitch);
 src_y+=src_pitch_y;
 src_u+=src_pitch_u;
 src_v+=src_pitch_v;
}

}


0 : BT709
1 : BT601
2 : SMTPE_240M
3 : FCC

Optimized code x86 :
CODE

JPSDR_RGBConvert_RGB32toYV24_SSE2 proc src:dword,dst_y:dword,dst_u:dword,dst_v:dword,w:dword,h:dword,offset_Y:word,
offset_U:word,offset_V:word,lookup:dword,src_modulo:dword,dst_modulo_y:dword,dst_modulo_u:dword,dst_modulo_v:dword,
Min_Y:word,Max_Y:word,Min_U:word,Max_U:word,Min_V:word,Max_V:word

public JPSDR_RGBConvert_RGB32toYV24_SSE2

local i:dword

push esi
push edi
push ebx

xor eax,eax
pxor xmm3,xmm3
pxor xmm2,xmm2
pxor xmm1,xmm1
pxor xmm0,xmm0
movzx eax,offset_Y
pinsrw xmm1,eax,0
pinsrw xmm1,eax,4
movzx eax,offset_U
pinsrw xmm1,eax,1
pinsrw xmm1,eax,5
movzx eax,offset_V
pinsrw xmm1,eax,2
pinsrw xmm1,eax,6
movzx eax,Min_Y
pinsrw xmm2,eax,0
pinsrw xmm2,eax,4
movzx eax,Max_Y
pinsrw xmm3,eax,0
pinsrw xmm3,eax,4
movzx eax,Min_U
pinsrw xmm2,eax,1
pinsrw xmm2,eax,5
movzx eax,Max_U
pinsrw xmm3,eax,1
pinsrw xmm3,eax,5
movzx eax,Min_V
pinsrw xmm2,eax,2
pinsrw xmm2,eax,6
movzx eax,Max_V
pinsrw xmm3,eax,2
pinsrw xmm3,eax,6

mov esi,src

Boucle0_2:
mov eax,w
mov i,eax
Boucle1_2:
movzx edx,byte ptr[esi]
movzx ecx,byte ptr[esi+1]
movzx ebx,byte ptr[esi+2]; ebx=R ecx=G edx=B
mov esi,lookup
movzx eax,word ptr[esi+2*ebx]
add ax,word ptr[esi+2*ecx+512]
add ax,word ptr[esi+2*edx+1024]
pinsrw xmm0,eax,0
movzx eax,word ptr[esi+2*ebx+1536]
add ax,word ptr[esi+2*ecx+2048]
add ax,word ptr[esi+2*edx+2560]
pinsrw xmm0,eax,1
movzx eax,word ptr[esi+2*ebx+3072]
add ax,word ptr[esi+2*ecx+3584]
add ax,word ptr[esi+2*edx+4096]
pinsrw xmm0,eax,2
mov esi,src
movzx edx,byte ptr[esi+4]
movzx ecx,byte ptr[esi+5]
movzx ebx,byte ptr[esi+6]; ebx=R ecx=G edx=B
mov esi,lookup
movzx eax,word ptr[esi+2*ebx]
add ax,word ptr[esi+2*ecx+512]
add ax,word ptr[esi+2*edx+1024]
pinsrw xmm0,eax,4
movzx eax,word ptr[esi+2*ebx+1536]
add ax,word ptr[esi+2*ecx+2048]
add ax,word ptr[esi+2*edx+2560]
pinsrw xmm0,eax,5
movzx eax,word ptr[esi+2*ebx+3072]
add ax,word ptr[esi+2*ecx+3584]
add ax,word ptr[esi+2*edx+4096]
pinsrw xmm0,eax,6
paddsw xmm0,xmm1
psraw xmm0,6
pmaxsw xmm0,xmm2
pminsw xmm0,xmm3

mov edi,dst_y
pextrw eax,xmm0,0
mov byte ptr[edi],al
pextrw eax,xmm0,4
mov byte ptr[edi+1],al
mov edi,dst_u
add dst_y,2
pextrw eax,xmm0,1
mov byte ptr[edi],al
pextrw eax,xmm0,5
mov byte ptr[edi+1],al
mov edi,dst_v
add dst_u,2
pextrw eax,xmm0,2
mov byte ptr[edi],al
pextrw eax,xmm0,6
mov byte ptr[edi+1],al
add dst_v,2
add src,8
mov esi,src
dec i
jnz Boucle1_2
add esi,src_modulo
mov eax,dst_y
add eax,dst_modulo_y
mov dst_y,eax
mov eax,dst_u
add eax,dst_modulo_u
mov dst_u,eax
mov eax,dst_v
add eax,dst_modulo_v
mov dst_v,eax
mov src,esi
dec h
jnz Boucle0_2

pop ebx
pop edi
pop esi

ret

JPSDR_RGBConvert_RGB32toYV24_SSE2 endp


JPSDR_RGBConvert_YV24toRGB32_SSE2 proc src_y:dword,src_u:dword,src_v:dword,dst:dword,w:dword,h:dword,offset_R:word,
offset_G:word,offset_B:word,lookup:dword,src_modulo_y:dword,src_modulo_u:dword,src_modulo_v:dword,dst_modulo:dword

public JPSDR_RGBConvert_YV24toRGB32_SSE2

local i:dword

push esi
push edi
push ebx

xor eax,eax
pxor xmm2,xmm2
pxor xmm1,xmm1
pxor xmm0,xmm0
movzx eax,offset_B
pinsrw xmm1,eax,0
pinsrw xmm1,eax,4
movzx eax,offset_G
pinsrw xmm1,eax,1
pinsrw xmm1,eax,5
movzx eax,offset_R
pinsrw xmm1,eax,2
pinsrw xmm1,eax,6
mov edi,dst

Boucle0_4:
mov eax,w
mov i,eax
Boucle1_4:
mov esi,src_y
movzx ebx,byte ptr[esi]
mov esi,src_u
movzx ecx,byte ptr[esi]
mov esi,src_v
movzx edx,byte ptr[esi]; ebx=Y ecx=U edx=V
mov esi,lookup
movzx eax,word ptr[esi+2*ebx]
add ax,word ptr[esi+2*edx+512]
pinsrw xmm0,eax,2
movzx eax,word ptr[esi+2*ebx]
add ax,word ptr[esi+2*ecx+1024]
add ax,word ptr[esi+2*edx+1536]
pinsrw xmm0,eax,1
movzx eax,word ptr[esi+2*ebx]
add ax,word ptr[esi+2*ecx+2048]
pinsrw xmm0,eax,0

mov esi,src_y
movzx ebx,byte ptr[esi+1]
mov esi,src_u
add src_y,2
movzx ecx,byte ptr[esi+1]
mov esi,src_v
add src_u,2
movzx edx,byte ptr[esi+1]; ebx=Y ecx=U edx=V
mov esi,lookup
add src_v,2
movzx eax,word ptr[esi+2*ebx]
add ax,word ptr[esi+2*edx+512]
pinsrw xmm0,eax,6
movzx eax,word ptr[esi+2*ebx]
add ax,word ptr[esi+2*ecx+1024]
add ax,word ptr[esi+2*edx+1536]
pinsrw xmm0,eax,5
movzx eax,word ptr[esi+2*ebx]
add ax,word ptr[esi+2*ecx+2048]
pinsrw xmm0,eax,4
paddsw xmm0,xmm1
psraw xmm0,5
packuswb xmm0,xmm2
movq qword ptr[edi],xmm0

add edi,8
dec i
jnz Boucle1_4
add edi,dst_modulo
mov eax,src_y
add eax,src_modulo_y
mov src_y,eax
mov eax,src_u
add eax,src_modulo_u
mov src_u,eax
mov eax,src_v
add eax,src_modulo_v
mov src_v,eax
dec h
jnz Boucle0_4

pop ebx
pop edi
pop esi

ret

JPSDR_RGBConvert_YV24toRGB32_SSE2 endp


Optimized x64 :
CODE

JPSDR_RGBConvert_RGB32toYV24_SSE2 proc public frame

w equ dword ptr[rbp+48]
h equ dword ptr[rbp+56]
offset_Y equ word ptr[rbp+64]
offset_U equ word ptr[rbp+72]
offset_V equ word ptr[rbp+80]
lookup equ qword ptr[rbp+88]
src_modulo equ qword ptr[rbp+96]
dst_modulo_y equ qword ptr[rbp+104]
dst_modulo_u equ qword ptr[rbp+112]
dst_modulo_v equ qword ptr[rbp+120]
Min_Y equ word ptr[rbp+128]
Max_Y equ word ptr[rbp+136]
Min_U equ word ptr[rbp+144]
Max_U equ word ptr[rbp+152]
Min_V equ word ptr[rbp+160]
Max_V equ word ptr[rbp+168]

push rbp
.pushreg rbp
mov rbp,rsp
push rdi
.pushreg rdi
push rsi
.pushreg rsi
push rbx
.pushreg rbx
push r12
.pushreg r12
push r13
.pushreg r13
push r14
.pushreg r14
push r15
.pushreg r15
.endprolog

xor rax,rax
pxor xmm3,xmm3
pxor xmm2,xmm2  
pxor xmm1,xmm1
pxor xmm0,xmm0
movzx eax,offset_Y
pinsrw xmm1,eax,0
pinsrw xmm1,eax,4
movzx eax,offset_U
pinsrw xmm1,eax,1
pinsrw xmm1,eax,5
movzx eax,offset_V
pinsrw xmm1,eax,2
pinsrw xmm1,eax,6
movzx eax,Min_Y
pinsrw xmm2,eax,0
pinsrw xmm2,eax,4
movzx eax,Max_Y
pinsrw xmm3,eax,0
pinsrw xmm3,eax,4
movzx eax,Min_U
pinsrw xmm2,eax,1
pinsrw xmm2,eax,5
movzx eax,Max_U
pinsrw xmm3,eax,1
pinsrw xmm3,eax,5
movzx eax,Min_V
pinsrw xmm2,eax,2
pinsrw xmm2,eax,6
movzx eax,Max_V
pinsrw xmm3,eax,2
pinsrw xmm3,eax,6

mov rsi,rcx
mov r10,lookup
mov rdi,rdx   ;rdi=dst_y
mov r11,r8   ;r11=dst_u
mov r12,r9   ;r12=dst_v
mov r13,2
mov r14,8
mov r8d,w
mov r9d,h
xor rcx,rcx
xor rdx,rdx
xor rbx,rbx
xor r15,r15

Boucle0_2:
mov ecx,r8d
Boucle1_2:
movzx edx,byte ptr[rsi]
movzx r15d,byte ptr[rsi+1]
movzx ebx,byte ptr[rsi+2]; rbx=R r15=G rdx=B
movzx eax,word ptr[r10+2*rbx]
add ax,word ptr[r10+2*r15+512]
add ax,word ptr[r10+2*rdx+1024]
pinsrw xmm0,eax,0
movzx eax,word ptr[r10+2*rbx+1536]
add ax,word ptr[r10+2*r15+2048]
add ax,word ptr[r10+2*rdx+2560]
pinsrw xmm0,eax,1
movzx eax,word ptr[r10+2*rbx+3072]
add ax,word ptr[r10+2*r15+3584]
add ax,word ptr[r10+2*rdx+4096]
pinsrw xmm0,eax,2
movzx edx,byte ptr[rsi+4]
movzx r15d,byte ptr[rsi+5]
movzx ebx,byte ptr[rsi+6]; rbx=R r15=G rdx=B
movzx eax,word ptr[r10+2*rbx]
add ax,word ptr[r10+2*r15+512]
add ax,word ptr[r10+2*rdx+1024]
pinsrw xmm0,eax,4
movzx eax,word ptr[r10+2*rbx+1536]
add ax,word ptr[r10+2*r15+2048]
add ax,word ptr[r10+2*rdx+2560]
pinsrw xmm0,eax,5
movzx eax,word ptr[r10+2*rbx+3072]
add ax,word ptr[r10+2*r15+3584]
add ax,word ptr[r10+2*rdx+4096]
pinsrw xmm0,eax,6

paddsw xmm0,xmm1
psraw xmm0,6
pmaxsw xmm0,xmm2
pminsw xmm0,xmm3

pextrw eax,xmm0,0
mov byte ptr[rdi],al
pextrw eax,xmm0,4
mov byte ptr[rdi+1],al
pextrw eax,xmm0,1
add rdi,r13
mov byte ptr[r11],al
pextrw eax,xmm0,5
mov byte ptr[r11+1],al
pextrw eax,xmm0,2
add r11,r13
mov byte ptr[r12],al
pextrw eax,xmm0,6
mov byte ptr[r12+1],al
add rsi,r14
add r12,r13
dec ecx
jnz Boucle1_2
add rsi,src_modulo
add rdi,dst_modulo_y
add r11,dst_modulo_u
add r12,dst_modulo_v
dec r9d
jnz Boucle0_2

pop r15
pop r14
pop r13
pop r12
pop rbx
pop rsi
pop rdi
pop rbp

ret

JPSDR_RGBConvert_RGB32toYV24_SSE2 endp

JPSDR_RGBConvert_YV24toRGB32_SSE2 proc public frame

w equ dword ptr[rbp+48]
h equ dword ptr[rbp+56]
offset_R equ word ptr[rbp+64]
offset_G equ word ptr[rbp+72]
offset_B equ word ptr[rbp+80]
lookup equ qword ptr[rbp+88]
src_modulo_y equ qword ptr[rbp+96]
src_modulo_u equ qword ptr[rbp+104]
src_modulo_v equ qword ptr[rbp+112]
dst_modulo equ qword ptr[rbp+120]

push rbp
.pushreg rbp
mov rbp,rsp
push rdi
.pushreg rdi
push rsi
.pushreg rsi
push rbx
.pushreg rbx
push r12
.pushreg r12
push r13
.pushreg r13
push r14
.pushreg r14
push r15
.pushreg r15
.endprolog

xor rax,rax
pxor xmm2,xmm2
pxor xmm1,xmm1
pxor xmm0,xmm0
movzx eax,offset_B
pinsrw xmm1,eax,0
pinsrw xmm1,eax,4
movzx eax,offset_G
pinsrw xmm1,eax,1
pinsrw xmm1,eax,5
movzx eax,offset_R
pinsrw xmm1,eax,2
pinsrw xmm1,eax,6
mov rsi,rcx    ;rsi=src_y
mov r11,rdx    ;r11=src_u
mov r12,r8    ;r12=src_v
mov rdi,r9
mov r8d,w
mov r9d,h
mov r10,lookup
mov r13,2
mov r14,8
xor rcx,rcx
xor rdx,rdx
xor rbx,rbx
xor r15,r15

Boucle0_4:
mov ecx,r8d
Boucle1_4:
movzx ebx,byte ptr[rsi]
movzx r15d,byte ptr[r11]
movzx edx,byte ptr[r12]; rbx=Y r15=U rdx=V
movzx eax,word ptr[r10+2*rbx]
add ax,word ptr[r10+2*rdx+512]
pinsrw xmm0,eax,2
movzx eax,word ptr[r10+2*rbx]
add ax,word ptr[r10+2*r15+1024]
add ax,word ptr[r10+2*rdx+1536]
pinsrw xmm0,eax,1
movzx eax,word ptr[r10+2*rbx]
add ax,word ptr[r10+2*r15+2048]
pinsrw xmm0,eax,0
movzx ebx,byte ptr[rsi+1]
add rsi,r13
movzx r15d,byte ptr[r11+1]
add r11,r13
movzx edx,byte ptr[r12+1]; rbx=Y r15=U rdx=V
add r12,r13
movzx eax,word ptr[r10+2*rbx]
add ax,word ptr[r10+2*rdx+512]
pinsrw xmm0,eax,6
movzx eax,word ptr[r10+2*rbx]
add ax,word ptr[r10+2*r15+1024]
add ax,word ptr[r10+2*rdx+1536]
pinsrw xmm0,eax,5
movzx eax,word ptr[r10+2*rbx]
add ax,word ptr[r10+2*r15+2048]
pinsrw xmm0,eax,4
paddsw xmm0,xmm1
psraw xmm0,5
packuswb xmm0,xmm2
movq qword ptr[rdi],xmm0

add rdi,r14
dec ecx
jnz Boucle1_4
add rsi,src_modulo_y
add r11,src_modulo_u
add r12,src_modulo_v
add rdi,dst_modulo
dec r9d
jnz Boucle0_4

pop r15
pop r14
pop r13
pop r12
pop rbx
pop rsi
pop rdi
pop rbp

ret

JPSDR_RGBConvert_YV24toRGB32_SSE2 endp
 
     Top
dloneranger
Posted: Feb 10 2012, 03:01 PM


Moderator


Group: Moderators
Posts: 2366
Member No.: 22158
Joined: 26-September 07



Cheers, hopefully I'll have some time on the weekend to take a look

ps could you post the definition of lookup[]

--------------------
MultiAdjust JoinWav WavNormalize FFMPeg Input Plugin v1827 UnSharpMask
Windows7/8 Codec Chooser
All FccHandlers Stuff inc. Installers for acm codecs AAC, AC3, LameMp3
 
    Top
phaeron
Posted: Feb 11 2012, 10:00 PM


Virtualdub Developer


Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02



Dammit, I almost got it out of Scilab correctly. There are two bugs in VDConvertRGBToYCbCr() in the limited range 709 path, a missing minus sign on the 2639 coefficient and the Y bias being 0x8000 (0.5) instead of 0x108000 (16.5).
 
    Top
dloneranger
Posted: Feb 11 2012, 10:03 PM


Moderator


Group: Moderators
Posts: 2366
Member No.: 22158
Joined: 26-September 07



Thanks for the correction


--------------------
MultiAdjust JoinWav WavNormalize FFMPeg Input Plugin v1827 UnSharpMask
Windows7/8 Codec Chooser
All FccHandlers Stuff inc. Installers for acm codecs AAC, AC3, LameMp3
 
    Top
jpsdr
Posted: Feb 12 2012, 10:16 AM


Advanced Member


Group: Members
Posts: 335
Member No.: 20490
Joined: 23-December 06



QUOTE (dloneranger @ Feb 10 2012, 03:01 PM)
ps could you post the definition of lookup[]


CODE

signed short lookup[2304];

#pragma pack(push,1)  // Compiler option, alined data on 1 byte, equal /Zp1

typedef struct _RGB32
{
unsigned char b;
unsigned char g;
unsigned char r;
unsigned char alpha;
} RGB32;

#pragma pack(pop)  // Put back project alined data settingv

#ifndef _INC_MATH
#include <math.h>
#endif

#define round(x) (signed long) floor(x+0.5)


Indeed, take sources here : http://dl.free.fr/vMMMz0su9

If page web is in french, click on Télécharger ce fichier.
 
     Top
dloneranger
Posted: Feb 12 2012, 12:13 PM


Moderator


Group: Moderators
Posts: 2366
Member No.: 22158
Joined: 26-September 07



Thanks - now all I need is some free time to get the coding done in.........
wink.gif

--------------------
MultiAdjust JoinWav WavNormalize FFMPeg Input Plugin v1827 UnSharpMask
Windows7/8 Codec Chooser
All FccHandlers Stuff inc. Installers for acm codecs AAC, AC3, LameMp3
 
    Top
dloneranger
Posted: Feb 12 2012, 02:59 PM


Moderator


Group: Moderators
Posts: 2366
Member No.: 22158
Joined: 26-September 07



@phaeron

The VDConvertRGBToYCbCr to fullrange 601 also seems off
I think its the minus again, for 5329
cr = ( 32768*r - 27439*g + 5329*b + 0x808000);

--------------------
MultiAdjust JoinWav WavNormalize FFMPeg Input Plugin v1827 UnSharpMask
Windows7/8 Codec Chooser
All FccHandlers Stuff inc. Installers for acm codecs AAC, AC3, LameMp3
 
    Top
phaeron
Posted: Feb 20 2012, 10:19 PM


Virtualdub Developer


Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02



No, that one is correct. Chroma red is opposed by green and blue, while chroma blue is opposed by red and green. The full-range 601 equations are the same ones used by JFIF:

QUOTE

Y = 0.299 R + 0.587 G + 0.114 B
Cb = - 0.1687 R - 0.3313 G + 0.5 B + 128
Cr = 0.5 R - 0.4187 G - 0.0813 B + 128


round(-0.0813 * 2^16) = -5328. The off by one is because the JFIF spec only give coefficients to three significant digits.
 
    Top
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
11 replies since Feb 8 2012, 04:53 PM Track this topic | Email this topic | Print this topic

<< Back to Testing / Bug Reports