Division in CDFJ

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Andrew Davie

Technically, division is not possible in the ARMv4T architecture as used in CDFJ bank-switching, and for the longest time I used lookup tables for simple divisions, or repeated subtract loops. But recently I've realised that it's actually possible to do reasonably accurate, extremely quick division simply by multiplying by the reciprocal.

Say you want to divide by 7.  Now if you wrote "x = y/7" you'd have a big fail.  You'd ALSO have a big fail if you wrote "x = y * 1/7", but for a different reason. The GCC compiler will replace 1/7 with a compile-time-calculated value which will effectively be 0 - as we don't have floating point either (at least, I work in integer arithmetic).  But what we CAN do is "x = y * (0x100/7)" -- and that will result in "x = y * 36" being compiled. Well, that's hardly dividing by 7, right?  It's multiplying by 36.  Yes, but if you look closely... the result will be 256 times bigger than we want (because of the 0x100 in the statement).  So if we then shift right by 8 bits, we'll have the answer.  So the full statement would be "x = (y * (0x100/7) >> 8" and what does that give us?  Yep x/7.  Let's work an example.

y = 12345678
so, x = (12345678 * (256/7)) >> 8
 = (12345678 * 36) / 256
 = 1736110

... which, when multiplied by 7 is definitely NOT 12345678.
It's 12152770. The correct answer should be 1763668.285714285714286... (or 1763668 integer).

SO at first glance it's completely wrong. That's because of our inaccuracy in representing 256/7 as 36, when in fact it's 36.57...

Now there's a bunch of round-offs and truncations that happen, so I twiddle the actual value of (256/7) in several ways to make sure my specific use-case works. I might use "(y * (0x10000/7) >> 16" for extra precision, for example. Since that comes to 9362.28... I might round that to 9363. Depends on circumstances/tests. I could use 37, which would give the result 1784336.

In actual use I tend to use the 0x10000 multiplier, so for the given example, we'd have...

x = 12345678 * ((0x10000/7)) >> 16
 = (12345678 * 9362) >> 16
 = 1763614

(remembering the acutal value 1763668.28...) this is certainly good enough for most uses.
Using 9363 would give 1763802 so it's a bit either-or in terms of which is best.


So there we go - a handy tip for dividing by a known constant value in ARM.
It saves heaps of time and space in Boulder Dash, and I use it a lot.


Al_Nafuur

Does this also workes in 6502 ASM?

Andrew Davie

Quote from: Al_Nafuur on 17 Apr 2024, 11:29 AMDoes this also workes in 6502 ASM?

Well, yes in principle.  To divide by a constant, multiply by (0x100/constant) and then shift down 8 bits. Limited bits though. Or you could extend it to 8 or more bits. But in 6502 you don't have hardware multiply so you might as well do a SOFTWARE division because you'll be doing lots of calculations to multiply anyway.

Al_Nafuur

Quote from: Andrew Davie on 17 Apr 2024, 11:38 AM
Quote from: Al_Nafuur on 17 Apr 2024, 11:29 AMDoes this also workes in 6502 ASM?

Well, yes in principle.  To divide by a constant, multiply by (0x100/constant) and then shift down 8 bits. Limited bits though. Or you could extend it to 8 or more bits. But in 6502 you don't have hardware multiply so you might as well do a SOFTWARE division because you'll be doing lots of calculations to multiply anyway.
Yes, I completely overlooked that the 6502 has no general multiplication, so it makes no sense to simulate one by the other.

Thomas Jentzsch

Multiplications can be done much faster than divisions when done in software. So this approach is good for 6507 too.

Fast multiplications can be done using square tables:

Example

This page compares multiple algorithms.