time to move to fixed point

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Andrew Davie

Now that the 3D engine is approaching "maturity" and I have a pretty good idea where the bottleneck is, it's time to bite the bullet. I've used floating point calculations so far and it's now time to move to fixed-point instead. This will take me some time, but I expect the results to be dramatic. The stm32g070 uses software libraries for floating point operations (!!) and does not have a floating point unit (FPU). Imagine how much processing power is sunk into the 3D calculations - no wonder it's slow.  Clearly the bottleneck, and I predict a... let's say 10x speedup in the pipeline when I'm done. That would mean I could, say, run 100 space ships at 60Hz. Well, that's the dream. Watch this space, I guess.


Andrew Davie

Mmh. Disappointing, to say the least.

#define FPINT 0

#if FPINT
typedef int fpint;

#define Q 8
#define FIXED_ONE (1 << Q)
#define FLOAT_TO_FIXED(x) ((int)((x) * FIXED_ONE))
#define FIXED_TO_FLOAT(x) ((float)(x) / FIXED_ONE)
#define FPINT_ADD(a, b) ((a) + (b))
#define FPINT_SUB(a, b) ((a) - (b))
#define FPINT_MUL(a, b) ((int)(((long long)(a) * (b)) >> Q))
//DOESNOTWORK #define FPINT_DIV(a, b) ((int)((((long long)(a)) << Q) / (b)))

#else

typedef float fpint;

#define FLOAT_TO_FIXED(x) ((x))

#define FPINT_ADD(a, b) ((a) + (b))
#define FPINT_SUB(a, b) ((a) - (b))
#define FPINT_MUL(a, b) ((a) * (b))
//#define FPINT_DIV(a, b) ((a) / (b))

#endif

#define N(a) FLOAT_TO_FIXED(a)

I hit upon a "neat idea" as to how to go about converting from floating point calculations to fixed-point. I'd make macros for all of the math operations and for the type (float/int) and start with my working floating-point version and replace all the math with the macros. That is, where in floating point I had

float x = a * b;
I would replace this with

fpint x = FPINT_MUL(a, b);
The neat thing about this, so I thought, as that I don't have to add new lines, or change things (other than using the macro). If FPINT is defined as 1, then we get the fixed-point implementation, and if it's 0 we get the original floating point version.  After a fair bit of debugging I got all of this working pretty much perfectly - I could switch between floating point implementation or fixed point implementation just by changing "FPINT" to 1 or 0. And they both work!

The disappointing thing is there's no speedup noticeable. This is likely because of a couple of factors. Firstly, there aren't actually all that many floating point operations in the pipeline after all. But secondly, mostly, my architecture which tries to do things in small parts so that it all fits into the overblank -- is probably not efficient.  So, negligible speedup.

I did find one bug in *somewhere*.  The FPINT_DIV does not work for the fixed-point code. There's something fundamentally broken because Gopher will not even load/run a binary with this macro used *anywhere* in the code; even when the code is unreachable. Very strange, but a lot of testing indicated to me that the (long)(long) division just doesn't work. So, I left one single floating point divide in the fixed point implementation.

Anywho... it was a road worth travelling. Fortunately I did all this in a branch so I can switch back and forth easily enough - but for now I'll go back to the earlier code (without fixed point) and soldier on.  I'm not shattered, but I am surprised and a bit disappointed.


DirtyHairy

If you send me the binary using FPRINT_DIV I can likely tell you what's wrong with it.

JetSetIlly

Quote from: Andrew Davie on 01 Aug 2024, 03:13 AMI did find one bug in *somewhere*.  The FPINT_DIV does not work for the fixed-point code. There's something fundamentally broken because Gopher will not even load/run a binary with this macro used *anywhere* in the code; even when the code is unreachable.
What does the logging output for Gopher2600 say?
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

Andrew Davie

Quote from: JetSetIlly on 10 Aug 2024, 01:58 AM
Quote from: Andrew Davie on 01 Aug 2024, 03:13 AMI did find one bug in *somewhere*.  The FPINT_DIV does not work for the fixed-point code. There's something fundamentally broken because Gopher will not even load/run a binary with this macro used *anywhere* in the code; even when the code is unreachable.
What does the logging output for Gopher2600 say?

I've just come back to this, and tracked down the issue. It's a divide by zero.
Gopher says

panic: undefined 16-bit thumb-2 instruction (deff)

Essentially the macro/code is...

#define FPINT_DIV(a, b) ((int)((((long long)(a)) << Q) / (b)))

    debug[0] = FPINT_DIV(512/0); // this will crash Gopher
    debug[1] = FPINT_DIV(512,1); // this is OK

Really the FPINT_DIV macro is nothing to do with it.  x = 1/0; will do the trick.  Me bad, I guess. I really don't know what should happen if we have a divide by zero. Perhaps a nicer error message but other than that this is a user error.


Andrew Davie

#5
Quote from: DirtyHairy on 09 Aug 2024, 07:04 AMIf you send me the binary using FPRINT_DIV I can likely tell you what's wrong with it.

Many thanks for the offer, and sorry for the delay in response. A bit of down-in-the-dumps the past few weeks for me, I'm afraid. Haven't had the motivation.  Problem is now found - see my post (just now) responding to JetSetIlly.. See follow-up below.

Andrew Davie

OK, not as simple as I thought. Yes, a divide by 0 will cause the problem. But there are other cases.
Like this one. In the following code...

                fpint temp = (p1->x - p0->x) * ymax - p0->y;
                fpint temp2 = (p1->y - p0->y);

                if (!temp2)
                    x = 0;
                else
                    x = p0->x + FPINT_DIV(temp, temp2);

This is the ONLY usage of FPINT_DIV in the entire program.
With the FPINT_DIV there, Gopher will not run - here's the output.

➜  Gopher2600 git:(maintenance_work) ✗ ./gopher2600_darwin_arm64 ../sentinel/source/bin/sentinel.elf
* error in RUN mode: debugger: preview: cpu: starting a new instruction is invalid mid-instruction
➜  Gopher2600 git:(maintenance_work) ✗

That's binary #1. Note I have explicit trap for zero so there's no divide-by-zero happening. Could be an overflow I suppose.

If I replace the FPINT_DIV(temp, temp2) with "temp/temp2" all works OK. That's binary #2.
Note, I did check that "if (!temp2)" and tried "if (temp2 == 0)" just to be sure. No change.

The code is on the fixed-int branch, and pushed.
This problem code is in graphics.cpp around line #368


JetSetIlly

#7
Quote from: Andrew Davie on 11 Aug 2024, 01:43 AM➜  Gopher2600 git:(maintenance_work) ✗ ./gopher2600_darwin_arm64 ../sentinel/source/bin/sentinel.elf
* error in RUN mode: debugger: preview: cpu: starting a new instruction is invalid mid-instruction
➜  Gopher2600 git:(maintenance_work) ✗


This is an unhelpful error message from Gopher2600. What's happened here is that the cartridge loading has failed but the emulation is continuing and trying to execute nothing.

So, in other words, it's nothing to do with the code.


The root problem is actually a relocation type for the ARM architecture that I'm not currently supporting. For DirtyHairy's benefit, this is an R_ARM_REL32 relocation. I don't know if Stella supports that yet.

I'll read up on it and add it.

* also R_ARM_PREL31
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

JetSetIlly

Quote from: JetSetIlly on 11 Aug 2024, 05:02 AMThe root problem is actually a relocation type for the ARM architecture that I'm not currently supporting. For DirtyHairy's benefit, this is an R_ARM_REL32 relocation. I don't know if Stella supports that yet.

I'll read up on it and add it.

* also R_ARM_PREL31

I've added support for REL32. As it happens this relocation is for the __aeabi_ldiv0 function, which is a divide-by-zero handler. So that might be why it was crashing - jumping to illegal memory.

Not sure what the PREL31 is for in this case.

In "maintenance_work" branch
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

DirtyHairy

Stella already supports R_ARM_REL32, and it the binary works fine there.

R_ARM_PREL31 is in the section .ARM.exidx which is not used anywhere. According to the wisdom of the web this section contains metadata which is used when unwinding the stack, i.e. during exception handling. I am not sure what to do with it (it is not PROGBITS, but a special section type), but it is not referenced anywhere in the binary, so there's no reason to solve that riddle right now 😏

JetSetIlly

Quote from: DirtyHairy on 11 Aug 2024, 07:16 AMStella already supports R_ARM_REL32, and it the binary works fine there.

It looks to me like the Stella ELF parser is just ignoring any relocation type it doesn't know about and letting the original value through. The applyRelocationToSection() only switches on R_ARM_ABS32, R_ARM_TARGET1, R_ARM_THMCALL and R_ARM_JUMP24. The default case does nothing.

It works in this instance but what would happen in the case of __aeabi_ldiv0 being called
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

DirtyHairy

Quote from: JetSetIlly on 11 Aug 2024, 04:27 PMIt looks to me like the Stella ELF parser is just ignoring any relocation type it doesn't know about and letting the original value through.

Ouch, you're right. It was late and I didn't check the code --- I just tested the binary (and mixed it up in my head with R_ARM_TARGET1). Indeed, this is a bug, it should throw, and it does throw now.

Adding support for REL32 is trivial enough, but isn't very useful as long as the cartridge driver does not support it. However, as gcc seems to emit it as soon as a division by zero may be possible this makes a lot of sense to me. I'll ping Zack on the Stella ELF github issue.

DirtyHairy

R_ARM_REL32 is now implemented in Stella.