|
|
| squid_80 |
| Posted: May 6 2005, 12:11 PM |
 |
|
Advanced Member
  
Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05

|
After spending many hours trying to get xvid64 to work with the new PSDK, I found out that integer division was causing problems. I made a small program to demonstrate:
| CODE | int _tmain(int argc, _TCHAR* argv[]) { int yref; scanf("%d", &yref); printf("yref = %d, yref/4 = %d, yref%%4 = %d", yref, yref/4, yref%4); return 0; }
|
If I use -7 as input, it gives the following: yref = -7, yref/4 = -2, yref%4 = -3
The problems I was having are caused by assuming -7/4 = -1, not -2. I know the AMD64 compiler is using the SAR instruction, hence the rounded result. Is this correct behaviour for a C compiler? Seems kind of strange that it would produce different results to x86 code. |
 |
| fccHandler |
| Posted: May 6 2005, 03:07 PM |
 |
|
Administrator n00b
  
Group: Moderators
Posts: 3961
Member No.: 280
Joined: 13-September 02

|
MSVC6 gives me yref/4 = -1. Hmmm...
-------------------- May the FOURCC be with you... |
 |
| squid_80 |
| Posted: May 6 2005, 10:27 PM |
 |
|
Advanced Member
  
Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05

|
Hmph. I dug out my C textbook just to check and it says this: "The direction of truncation for / and the sign of the result for % are machine-dependent for negative operands, as is the action taken on overflow or underflow." So I guess you could argue that there's nothing wrong with the way it's behaving. But it still seems wrong that 4*(yref/4) + yref%4 is not always equal to the value of yref. |
 |
| phaeron |
| Posted: May 7 2005, 03:44 AM |
 |
|

Virtualdub Developer
  
Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02

|
I think it's guaranteed that (x/y)*y + (x%y) == x, given y!=0 and no overflow. The CPU most definitely defines these operations as truncating toward zero.
A bug was filed on this exact issue on the VS2005 beta 1 compiler: http://lab.msdn.microsoft.com/productfeedb...e0-9d772e467744
It has to do with the optimization of a signed / and % pair with a constant power-of-two divisor. Another reason to prefer & and >>/<< when possible. |
 |
| squid_80 |
| Posted: May 7 2005, 04:12 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05

|
| QUOTE (phaeron @ May 7 2005, 01:44 PM) | I think it's guaranteed that (x/y)*y + (x%y) == x, given y!=0 and no overflow. The CPU most definitely defines these operations as truncating toward zero.
|
According to AMD's docs, the SAR instruction doesn't truncate to zero for negative dividends:
| QUOTE | | Although the SAR instruction effectively divides the operand by a power of 2, the behaviour is different from the IDIV instruction. For example, shifting -11 (FFFFFFF5h) by two bits to the right (that is, divide -11 by 4), gives a result of FFFFFFFDh, or -3, whereas the IDIV instruction for dividing -11 by 4 gives the result of -2. This is because the IDIV instruction rounds off the quotient to zero, whereas the SAR instruction rounds off the remainder to zero for positive dividends and to negative infinity for negative dividends. So, for positive operands, SAR behaves like the corresponding IDIV instruction. For negative operands, it gives the same result if and only if all the shifted-out bits are zeroes; otherwise, the result is smaller by 1. |
I did try using (yref&~3)>>2, but that didn't work either. Probably got optimized away. In the end I had to use the clumsy looking (yref-yref%4)>>2. |
 |
| phaeron |
| Posted: May 7 2005, 05:24 AM |
 |
|

Virtualdub Developer
  
Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02

|
I meant it defines % and / as such in IDIV. SAR, of course, always rounds to negative infinity by virtue of being an arithmetic right shift. The compiler normally uses SAR to emulate a signed divide by two by applying a correction if the number is negative; it just goofed it up in this case. You can do such a correction yourself, too:
| CODE | x/(1<<n) == (x + ((x>>31) & ((1<<n)-1)) >> n)
|
|
 |
| squid_80 |
| Posted: May 7 2005, 06:39 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05

|
Ah, that's why (yref&~3)>>2 didn't work - forgot to take the sign into account. I was close though!
EDIT: Nope, I'm just retarded. Ignore this post. |
 |
| squid_80 |
| Posted: May 8 2005, 05:35 AM |
 |
|
Advanced Member
  
Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05

|
OK, now I really do understand it. 2 reasons why I was confused: - I hadn't slept in 30 hours. 2's complement arithmetic doesn't seem to work well for me in this state. - The original code that was causing the bug in xvid looked like this:
| CODE | y_int = yRef/4;
if (yRef < 0 && yRef % 4) y_int--;
|
I knew the division was producing a rounded result for negative numbers so the if/decrement wasn't needed. But when I removed it the value for y_int was still incorrect; with the mod operation removed the quotient was being truncated like it should be. Exactly like you described the bug report - the compiler smooshes the / and % together and gets it wrong. I just didn't grasp it at the time. Solution was to change yRef/4 to yRef>>2 and drop the if/decrement. |
 |
| phaeron |
| Posted: May 8 2005, 08:17 PM |
 |
|

Virtualdub Developer
  
Group: Administrator
Posts: 7773
Member No.: 61
Joined: 30-July 02

|
That's a really obfuscated (and slow) way of truncating toward negative infinity for what looks like a motion prediction vector. You might want to scan the XviD codebase for any other misuses of division and modulus! |
 |
| squid_80 |
| Posted: May 8 2005, 10:37 PM |
 |
|
Advanced Member
  
Group: Members
Posts: 594
Member No.: 13813
Joined: 22-January 05

|
Spot on: a qpel motion vector, to be exact. The same section of code is used in all 5 qpel interpolation functions. If there are any other instances like this they're far enough apart to not trigger the compiler bug, but I guess any possible speed increases are worth looking for. |
 |