Experimental Cartridge Shim

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

JetSetIlly

I've been playing around with this idea of attaching a real 2600 cartridge to an emulator on the PC. It was inspired by the real-time Stella project but I wanted to try a different direction.

It's basically the same design as a cartridge dumper. It only accepts address and data bus values from the emulator. Those values are put on the bus for the benefit of the cartridge. The data bus is then read and the value returned back to the PC.

It's also very, very slow due to a massive bottleneck in the serial link. Even so, it's a fun idea and could be improved. I'll probably continue to work on it.

There's obviously lots of other things wrong with it but I had this itch and it needed scratching.


https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

alex_79

Very cool seeing the disassembly being populated as the program runs.

Does every memory access appear on the cart reader address and data bus, or only those in cart space?

JetSetIlly

Every address and data bus change on every CPU cycle is presented to the cartridge by the reader. This adds significant overhead for cartridge types that don't need that level of granularity, but the goal is to support every cartridge without knowing anything about it.

The speed is a massive issue and I don't understand it. The USB line should be running a lot faster than that.
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

MachoDrone

5138008

JetSetIlly

Quote from: alex_79 on 19 Oct 2023, 05:36 AMVery cool seeing the disassembly being populated as the program runs.

I enjoy watching that too. I've improved the visuals on this so that it's clearer what's happening.


The video shows the disassembly being created instruction-by-instruction and byte-by-byte when the debugging quantum is changed to "Colour Clock". (The emulation is always updated at the colour clock level but changing the debugging quantum means we interrupt the emulation at that frequency and not after every instruction.)

The Bus section of the Pinout window shows the state of the address and data bus at any time. In the case of a real cartridge these are the actual values that the cartridge is "seeing" at that moment.

Quote from: MachoDrone on 19 Oct 2023, 06:13 PMthis will be awesome

Thanks :D But I feel the need to manage expectations here. I honestly doubt this will ever run at close to full speed and unlikely to work with more advanced cartridge types, but it could be a good educational tool I think.
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

JetSetIlly

Significant increase in speed by reducing the USB latency. Still too slow for anything useful but I'm in a better position than I was a couple of hours ago. And that's all I can hope for.
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

Andrew Davie

I'm following this and get the gist. But I'm not sure exactly what purpose this has. Essentially you're able to talk to a real cart as if you were an actual '2600 (via the emulator).  You put addresses and data on the bus, and read data from the bus. So far I follow. You're able to watch in real-time as a program "runs" via the cartridge, and present a really cool "realtime" disassembly.  Because you're talking to the cart via the address and data bus, it doesn't matter what is on the cart (some future quantum processor, for example). Your emulator will be able to run it, because all comms is still through the buses.  Assuming that in future you can speed things up, this would effectively make your emulator a full emulation of actual hardware able to run any cartride past or future. Is that reading it correctly?
How does this differ from the realtime Stella project?
It's fascinating, I'm just wrapping my head around what it does and the future.

JetSetIlly

Yes. It's the same idea, but a different approach, as the real time Stella project. I have no real expectation for it but I was curious as to how far I could push the concept using a regular PC and USB communication.
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

Andrew Davie

So the USB side of things is basically because there are very few options for communication with "the outside world" on a modern PC?  If, for example, Gopher was running on a really fast arm-based single board computer, then the address/data pins could be "hardwired" and you'd have a fully fledged reimplementation of a '2600. The current limitation being the serial comms over USB?  I think I understand :)

JetSetIlly

Yes. That's what the realtime Stella project is doing. The Raspberry Pi can communicate with the cartridge like a real bus. In my set up, I have to stream the bus information to the microcontroller which can then put it on the bus in parallel. The round trip over the serial connection is a significant bottleneck.
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

Andrew Davie

USB3 is potentially 5Gib/s. That seems at first guess to be sufficiently fast. So where's the actual bottleneck?

JetSetIlly

It's the round trip time I think. I'm only sending bursts of 4 bytes maximum but it happens more-or-less on every CPU cycle. In some cases, I can send the data and not wait for the result, but in the case of a read instruction I have to wait for the result before continuing the emulation. During that wait time, the emulation is not doing anything.

So if we just look at a simple CPU load statistic, running the emulator (on my flimsy hardware) in the normal way uses approx 80% of the CPU time available. When using the cartridge hardware it's 10%.
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

Andrew Davie

Not convinced! So I assume across the USB/serial connection you have to send address (12 bits) and data (8 bits), and then you read the data (8 bits) in return. And presumably you have some kind of inherent "wait" in there after setting those two... it would take some time for the cart to respond and place something on the bus.  In that process, 5Gib/s is not sufficient speed to handle a 6507 accessing the bus at an absolute maximum of 1.19MHz but more likely less than half that? Do you have a hardwired delay in there waiting for data after you send an address, or is it just assumed that the delay of the USB will allow sufficient time for the data bus to stabilise? Or am I misunderstanding the issue?  If you have a delay, could you replace that by a constant poll of the data bus, and as soon as it changes (+a minor delay) you assume that you're good to go, and if it doesn't change after a set time then go anyway.

JetSetIlly

Quote from: Andrew Davie on 21 Oct 2023, 10:32 PMNot convinced! So I assume across the USB/serial connection you have to send address (12 bits) and data (8 bits), and then you read the data (8 bits) in return. And presumably you have some kind of inherent "wait" in there after setting those two... it would take some time for the cart to respond and place something on the bus.  In that process, 5Gib/s is not sufficient speed to handle a 6507 accessing the bus at an absolute maximum of 1.19MHz but more likely less than half that?

Latency is definitely a major factor contributing to the slow execution. When I reduce the latency of the device I get significantly more speed. But I take your point about even USB3 being insufficient to service 1.19MHz in real time.

Quote from: Andrew Davie on 21 Oct 2023, 10:32 PMDo you have a hardwired delay in there waiting for data after you send an address, or is it just assumed that the delay of the USB will allow sufficient time for the data bus to stabilise?
There is no strict timing in the shim yet. There is a small and implicit delay between setting the address bus and reading the data bus that produces accurate results on a simple 4k cartridge.

Here's a simple sequence diagram without timings.

  Host PC                          Shim

  Send data flag >------.
                        -------> Receive data flag

  (Send data) >---------.
                        -------> (Receive data)

  Send Hi Address >---------.
                            .    Set data bus
                            .
  Send Lo Address >-----.   .
                        .   ---> Receive Hi Address
                        .
                        .
                        -------> Receive Lo Address


                                 Set Address bus


                        .------< Send data
  Receive data <--------

The (send data) and (receive data) steps are only conduced if the data flag is true. For simple cartridges with no bank switching and no onboard RAM, we never need to set the data bus. This would be an excellent optimisation for those cartridges if there was a reliable way of discerning whether the additional communication was required.

Assuming that the optimisation was valid, it would mean that communication with the shim only needs to happen on a single cycle of a read instruction, in addition to the retrieval of the instruction via the program counter.
https://github.com/JetSetIlly/Gopher2600
@JetSetIlly@mastodon.gamedev.place
@jetsetilly.bsky.social

Andrew Davie

Quote from: JetSetIlly on 22 Oct 2023, 07:44 PMBut I take your point about even USB3 being insufficient to service 1.19MHz in real time.

No. I was asking a question, not stating that it was not sufficient. I would be surprised if it was not sufficient, but I haven't done the calculations.