Welcome Guest ( Log In | Register )


Important

The forums will be closing permanently the weekend of March 15th. Please see the notice in the announcements forum for details.

 
Amd64 Compiler In Psdk Broken?, Should -7 / 4 = -2?
« Next Oldest | Next Newest » Track this topic | Email this topic | Print this topic
squid_80
Posted: May 6 2005, 12:11 PM


Advanced Member


Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05



After spending many hours trying to get xvid64 to work with the new PSDK, I found out that integer division was causing problems. I made a small program to demonstrate:
CODE
int _tmain(int argc, _TCHAR* argv[])
{
int yref;
scanf("%d", &yref);
printf("yref = %d, yref/4 = %d, yref%%4 = %d", yref, yref/4, yref%4);
return 0;
}

If I use -7 as input, it gives the following: yref = -7, yref/4 = -2, yref%4 = -3

The problems I was having are caused by assuming -7/4 = -1, not -2. I know the AMD64 compiler is using the SAR instruction, hence the rounded result. Is this correct behaviour for a C compiler? Seems kind of strange that it would produce different results to x86 code.
 
     Top
fccHandler
Posted: May 6 2005, 03:07 PM


Administrator n00b


Group: Moderators
Posts: 3961
Member No.: 280
Joined: 13-September 02



MSVC6 gives me yref/4 = -1. Hmmm... unsure.gif

--------------------
May the FOURCC be with you...
 
     Top
squid_80
Posted: May 6 2005, 10:27 PM


Advanced Member


Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05



Hmph. I dug out my C textbook just to check and it says this: "The direction of truncation for / and the sign of the result for % are machine-dependent for negative operands, as is the action taken on overflow or underflow."
So I guess you could argue that there's nothing wrong with the way it's behaving. But it still seems wrong that 4*(yref/4) + yref%4 is not always equal to the value of yref.
 
     Top
phaeron
Posted: May 7 2005, 03:44 AM


Virtualdub Developer


Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02



I think it's guaranteed that (x/y)*y + (x%y) == x, given y!=0 and no overflow. The CPU most definitely defines these operations as truncating toward zero.

A bug was filed on this exact issue on the VS2005 beta 1 compiler:
http://lab.msdn.microsoft.com/productfeedb...e0-9d772e467744

It has to do with the optimization of a signed / and % pair with a constant power-of-two divisor. Another reason to prefer & and >>/<< when possible.
 
    Top
squid_80
Posted: May 7 2005, 04:12 AM


Advanced Member


Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05



QUOTE (phaeron @ May 7 2005, 01:44 PM)
I think it's guaranteed that (x/y)*y + (x%y) == x, given y!=0 and no overflow. The CPU most definitely defines these operations as truncating toward zero.

According to AMD's docs, the SAR instruction doesn't truncate to zero for negative dividends:
QUOTE
Although the SAR instruction effectively divides the operand by a power of 2, the behaviour is different from the IDIV instruction. For example, shifting -11 (FFFFFFF5h) by two bits to the right (that is, divide -11 by 4), gives a result of FFFFFFFDh, or -3, whereas the IDIV instruction for dividing -11 by 4 gives the result of -2. This is because the IDIV instruction rounds off the quotient to zero, whereas the SAR instruction rounds off the remainder to zero for positive dividends and to negative infinity for negative dividends. So, for positive operands, SAR behaves like the corresponding IDIV instruction. For negative operands, it gives the same result if and only if all the shifted-out bits are zeroes; otherwise, the result is smaller by 1.


I did try using (yref&~3)>>2, but that didn't work either. Probably got optimized away. In the end I had to use the clumsy looking (yref-yref%4)>>2.
 
     Top
phaeron
Posted: May 7 2005, 05:24 AM


Virtualdub Developer


Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02



I meant it defines % and / as such in IDIV. SAR, of course, always rounds to negative infinity by virtue of being an arithmetic right shift. The compiler normally uses SAR to emulate a signed divide by two by applying a correction if the number is negative; it just goofed it up in this case. You can do such a correction yourself, too:
CODE

x/(1<<n) == (x + ((x>>31) & ((1<<n)-1)) >> n)

 
    Top
squid_80
Posted: May 7 2005, 06:39 AM


Advanced Member


Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05



Ah, that's why (yref&~3)>>2 didn't work - forgot to take the sign into account. I was close though!

EDIT: Nope, I'm just retarded. Ignore this post.
 
     Top
squid_80
Posted: May 8 2005, 05:35 AM


Advanced Member


Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05



OK, now I really do understand it. 2 reasons why I was confused:
- I hadn't slept in 30 hours. 2's complement arithmetic doesn't seem to work well for me in this state.
- The original code that was causing the bug in xvid looked like this:
CODE
y_int  = yRef/4;

if (yRef < 0 && yRef % 4)
 y_int--;

I knew the division was producing a rounded result for negative numbers so the if/decrement wasn't needed. But when I removed it the value for y_int was still incorrect; with the mod operation removed the quotient was being truncated like it should be. Exactly like you described the bug report - the compiler smooshes the / and % together and gets it wrong. I just didn't grasp it at the time.
Solution was to change yRef/4 to yRef>>2 and drop the if/decrement.
 
     Top
phaeron
Posted: May 8 2005, 08:17 PM


Virtualdub Developer


Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02



That's a really obfuscated (and slow) way of truncating toward negative infinity for what looks like a motion prediction vector. You might want to scan the XviD codebase for any other misuses of division and modulus!
 
    Top
squid_80
Posted: May 8 2005, 10:37 PM


Advanced Member


Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05



Spot on: a qpel motion vector, to be exact. The same section of code is used in all 5 qpel interpolation functions. If there are any other instances like this they're far enough apart to not trigger the compiler bug, but I guess any possible speed increases are worth looking for.
 
     Top
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
9 replies since May 6 2005, 12:11 PM Track this topic | Email this topic | Print this topic

<< Back to Off-Topic