Optimizing VRAM access

After discussing cycle pattern registers with VBT, I was wondering about the undocumented bits listed in the VDP2 manual, which are covered over when the page renders:

$18000C - (Labeled as "RESERVED")

Bit 8 (VRAMCE)

0= No change function, VRAM-A and VRAM-B are display RAM

1= Change function, either VRAM-A or VRAM-B as display RAM

Bit 0 (VRAMSL)

0= Use VRAM-A for CPU RAM

1= Use VRAM-B for CPU RAM

I can't find the webpage that had schematics for the Saturn, but assuming the VRAMs are two separate chips (A,B), it sounds like you can 'disconnect' one from the display refresh logic and give the CPU unlimited access to it without the VDP2 have priority when it comes to sharing cycles.

Has anyone tried this and checked what kind of performance gains you can get?

I'm also interested in what happens if you set as many cycle pattern registers as possible to $EEEE (depending on the bank allocation) and try SCU DMA. I think "CPU access" refers to any of SH-2 DMA, SCU DMA, or SH-2 reads/writes. How closely does that speed match VRAM access time during V-Blank?
 
The Saturn schematic site can be found here (keep in mind that the so-labeled "SCU" is actually VDP1 and vice-versa); link courtesy of The Rockin'-B :D

Unfortunately I'm not in a position to test right now, but I have to agree with this just based on the fact that there's no real "CPU access" as such:

cgfm2 said:
I think "CPU access" refers to any of SH-2 DMA, SCU DMA, or SH-2 reads/writes.
 
Did you finally optimized the VRAM access of your entry to make it faster and more impressive ?
 
cgfm2 said:
but assuming the VRAMs are two separate chips (A,B), it sounds like you can 'disconnect' one from the display refresh logic and give the CPU unlimited access to it without the VDP2 have priority when it comes to sharing cycles.

I recall it is similar to this with the VDP1 double frame buffer: one is displayed, the other one can be accessed via cpu. BTW: cpu access to vdp1 VRAM while VDP1 is drawing, causes the VDP1 to pause it's operation.

cgfm2 said:
Has anyone tried this and checked what kind of performance gains you can get?

Well, I've been trying to investigate this topic, too. Unfortunately, the results are not as clear as I'd like them too. I made a little "benchmark" program to meassure RAM access time for differentmemory regions, different word sizes, reads/writes and so on. Oh, you can set different cycle patterns for cpu, too.

I don't know for what reason (method of measuring time, or the benchmark functions itself?)

From my website http://www.rockin-b.de/saturn-benchmarkram.html:

This demo uses the free running timer (FRT) of the CPU to measure memory transfer time. You can run it directly with the predefined benchmarks, however you will most likely want to experiment and add new benchmark settings and recompile.

Though the feature list looks nice, the benchmark results are in some cases not really useful yet. But after I declared some variables as volatile, the results are much more like expected and give a feeling which area is fast and how big the difference is between byte and word access..
Features:
  • read, write access
  • byte, word, long word access
  • cache through
  • high work RAM, low work RAM, VDP1 VRAM, VDP2 VRAM, sound RAM (read only), cart RAM (auto recognize 1MB + 4MB)
  • support any address increment
  • different time output formats
  • "unlimited" number of different benchmarks
  • menu to select all options
 
Back
Top