HELLSLAVE

HELLSLAVE : Project Z-Treme 0.361

OK, I saw the comments in zt_draw_nolink.c :
//CRITICAL!! You MUST change sl_def to include the SPRITE_T structure. It is the same as SPRITE except with a Uint32 pointer, NEXT.
//You must also change SpriteBuf and SpriteBuf2 to be of SPRITE_T * type.

Couldn't fine a SpriteBuf2 variable, but it works without that. Nice !
 

Attachments

  • 1574630083758.png
    1574630083758.png
    26.6 KB · Views: 431
I see that all the textures are monochrome. Did you try with untextured polygons to see if it made a difference in terms of performance ?
 
I see that all the textures are monochrome. Did you try with untextured polygons to see if it made a difference in terms of performance ?
I don't think the rendering could be a bottleneck here since it does the preclipping disable trick and the quads are small overall. Ponut could try the wireframe mode instead to see if the slowdown point is the same.
 
I see that all the textures are monochrome. Did you try with untextured polygons to see if it made a difference in terms of performance ?

I've done a lot more testing. Also, not all the textures are monochrome.
I can _conclusively_ state that is not the bottleneck.
If you disable backface culling, real system tops out at around 1400ish polygons _displayed_.
Of course this is in hi-res mode, and the pre-clipping disable trick is removed.
I guess I haven't really done the final bit of testing of wireframe or progressive scan modes.

That's thoroughly above the limit I expected to put on myself (1300 displayed polygons max).
Moreover, using MSH2 + SSH2 in tandem can't display any more polygons,
but it sure as hell can process more.

I also left a bug in where near-plane culling of MSH2 was using the vertice area for SSH2. Oops. Easy fix.

I guess the polygon render max is low enough that I might have to consider re-adding the pre-clipping disable logic when the quads get big. I know its a tiny cost to the SH2s but increasing the total polygons in the scene is more important to me than increasing whats on the screen.
 
Last edited:
OK, 99% of the textures are monochrome to be more precise 🙂
Your performance results are surprising, I would have expected VDP1 to be more limiting in hi-res as it doubles the number of pixels it has to write.
 
Perhaps it doubles the memory access, which is the biggest cost to be fair, but in theory it does not increase the process cost because the framebuffer is 8bpp rather than 16bpp.
Of course, we don't know how VDP1 really works under the hood ...
I think the only difference to VDP1 is that the size of each polygon has increased. The fields, transform parameters, and such are the same.

Another thing to consider on the framebuffer being 8bpp is that all pixels are color bank codes.
Because of that the order of access is simpler than in 4bpp CLUT with gouraud shading.

It should be noted that when VDP1 gets near its performance limit in hi-res mode, some polygons begin only being caught in the even scanlines, having their odd scanlines blank. The time in which that starts is quite variable and shows that VDP1 perf isn't really so simple as an estimation of # of polygons. On top of that, polygons nearest the screen are the ones rendered last, thus these are the ones affected first. They are also the biggest, and thus most costly.

In reality, a maximum of 1400 mostly small polygons is not great, but it should be OK.
When I actually implement this we'll see how strong a rationale there is for the preclipping disable stuff that XL2 figured out.
It'll probably go back in.

And for any strangers wondering why I hijacked XL2's thread, 1. im sorry 2. this has no bearing on XL2's rendering performance
 
Last edited:
Perhaps it doubles the memory access, which is the biggest cost to be fair, but in theory it does not increase the process cost because the framebuffer is 8bpp rather than 16bpp.
Of course, we don't know how VDP1 really works under the hood ...
I think the only difference to VDP1 is that the size of each polygon has increased. The fields, transform parameters, and such are the same.

Another thing to consider on the framebuffer being 8bpp is that all pixels are color bank codes.
Because of that the order of access is simpler than in 4bpp CLUT with gouraud shading.

It should be noted that when VDP1 gets near its performance limit in hi-res mode, some polygons begin only being caught in the even scanlines, having their odd scanlines blank. The time in which that starts is quite variable and shows that VDP1 perf isn't really so simple as an estimation of # of polygons. On top of that, polygons nearest the screen are the ones rendered last, thus these are the ones affected first. They are also the biggest, and thus most costly.

In reality, a maximum of 1400 mostly small polygons is not great, but it should be OK.
When I actually implement this we'll see how strong a rationale there is for the preclipping disable stuff that XL2 figured out.
It'll probably go back in.
I don't know what is your "performance" limit.
I think you are refering to the pseudo end draw the Saturn does in fixed framerate when it sees it can't complete the frame in time.
It has nothing to do with being in high res, low res does that too.

Just switch to dynamic framerate to solve it. It will simply drop from 30 fps to 20.

Your polygons are really small, it's not a proper vdp1 benchmark.

In my own game, it tends to slowdown more often than not when you have like 10 polygons on screen, but these polygons are usualy taking a huge portion of the screen.
In your case, the polygons will be small on the y axis most of the time since you just display a huge landscape instead of walled rooms.
The preclipping disable saves 5 cycles per scanline per quad, so that's a lot (the "famous" x*y*3 + y*5 formula). You might not need it, but I sure do. You need to do some kind of 2d culling anyway, so I don't see why you wouldn't just flip a bit to save the vdp1 of some work.
 
It's definitely in dynamic mode.
Are you 100% sure? When you do the slInitSystem and specify your resolution, for the framerate you can put a negative number to make it dynamic (say -2 for dynamic 30 fps).
Maybe the high res mode has a bug that disregards the dynamic framerate and make it fixed?
 
Are you 100% sure? When you do the slInitSystem and specify your resolution, for the framerate you can put a negative number to make it dynamic (say -2 for dynamic 30 fps).
Maybe the high res mode has a bug that disregards the dynamic framerate and make it fixed?
Actually, that issue was my fault:
You have to set the color RAM address offset registers every frame (at least within SGL as I am unaware of an SGL function to set these).
[There are clearly functions to do so for background layers, but not for the SPR layer?]
I was setting it within the frame, when in reality, it must be set at Vblank.

In progressive scan modes, this bug manifests as white or blank whole lines, because there's nothing or junk in color RAM there.
In interlaced modes, this results in half-blank or half-junk lines, because when its scanning there, the color RAM address offset isn't set.

I also learned that the near-plane is a big deal (by near-plane I mean the lowest Z dist allowed). Too close and you can really destroy the frame-rate!
 
Last edited:
Actually, that issue was my fault:
You have to set the color RAM address offset registers every frame (at least within SGL as I am unaware of an SGL function to set these).
[There are clearly functions to do so for background layers, but not for the SPR layer?]
I was setting it within the frame, when in reality, it must be set at Vblank.

In progressive scan modes, this bug manifests as white or blank whole lines, because there's nothing or junk in color RAM there.
In interlaced modes, this results in half-blank or half-junk lines, because when its scanning there, the color RAM address offset isn't set.

I also learned that the near-plane is a big deal (by near-plane I mean the lowest Z dist allowed). Too close and you can really destroy the frame-rate!
Did you try the latest "fix" I posted on Discord?
Reorienting your vertex 0 to be within the screen seems to fix the "polygon crossing the near plane causing your framerate to drop to 15 fps" issue.
I have no clue why the preclipping doesn't solve this as it's what it's supposed to do, but manually doing it solved my framerate issue when near walls.
 
That's a very interesting find. I checked VDP1 manual description of pre-clipping p.83 : about reversing the line drawing direction ("horizontal inversion"), it says "limited to vertical and horizontal lines". If we take that literally, that would only occur for drawing lines with constant x or constant y, and most clipped drawing lines don't fall in those categories in a 3D game.

If that is correct, that would mostly reduce the benefit of pre-clipping for random polygons to the drawing lines which are completely outside of the clipping area. So for a clipped polygon which has one side where drawing lines begin or end (so vertex A-D or B-C) completely in the drawing area, pre-clipping would have in most cases little to no benefit.

If I understand your code correctly, you also reverse the drawing direction vertically when the bottom of a texture is in the drawing area and the top is outside. Pre-clipping won't take that in charge because it only reverses the line drawing direction, not the order in which lines are drawn. If there's a performance improvement for this, it would mean that pre-clipping completely stops the drawing of a polygon when a drawing line had some pixels inside the drawing area and the next is completely outside.
 
Last edited:
  • Like
Reactions: XL2
That's a very interesting find. I checked VDP1 manual description of pre-clipping p.83 : about reversing the line drawing direction ("horizontal inversion"), it says "limited to vertical and horizontal lines". If we take that literally, that would only occur for drawing lines with constant x or constant y, and most clipped drawing lines don't fall in those categories in a 3D game.

If that is correct, that would mostly reduce the benefit of pre-clipping for random polygons to the drawing lines which are completely outside of the clipping area. So for a clipped polygon which has one side where drawing lines begin or end (so vertex 1-4 or 2-3) completely in the drawing area, pre-clipping would have in most cases little to no benefit.

If I understand your code correctly, you also reverse the drawing direction vertically when the bottom of a texture is in the drawing area and the top is outside. Pre-clipping won't take that in charge because it only reverses the line drawing direction, not the order in which lines are drawn. If there's a performance improvement for this, it would mean that pre-clipping completely stops the drawing of a polygon when a drawing line had some pixels inside the drawing area and the next is completely outside.
I guess the preclipping only rejects whole lines and manually reorienting vertices allows early drawing stop, solving the issue.
It still won't solve issues like a polygon's vertices that are all outside the screen, but you kind of need to keep your polygons small anyway, so it shouldn't be an issue.
Worst case scenario you can actualy clip your polygons, but it will look bad as you will create triangles for sure if you do that.
 
The code you put on Discord seems to reverse the drawing direction horizontally only if vertex A is clipped and vertices B and C are not. I think it may miss some other cases where it would be interesting to reverse horizontally :
- Only D clipped.
- Only B not clipped.
- Only C not clipped.

To cover all cases, you could count the clipped vertices on each side of the polygon : if the A-D side has more clipped vertices than the B-C side, then reverse horizontally.
 
The code you put on Discord seems to reverse the drawing direction horizontally only if vertex A is clipped and vertices B and C are not. I think it may miss some other cases where it would be interesting to reverse horizontally :
- Only D clipped.
- Only B not clipped.
- Only C not clipped.

To cover all cases, you could count the clipped vertices on each side of the polygon : if the A-D side has more clipped vertices than the B-C side, then reverse horizontally.
I would have to take a longer look as I just wrote it in 5 minutes like 6 months ago and just commented it out as I thought I wouldn't need it, but vertex A is the most important afaik.
SGL seems to do it with its draw commands, I don't know if SBL did it too.
It would be interesting to mimic it as I guess they spent a lot of time making sure it's as fast as possible.
Like what did they do if only vertex D is clipped?
Or do they only care about vertex A?
I guess I would need to write a demo program with SGL and look at it carefully, unless I can just see it in SBL's code if they actualy did it too.
 
I didn't know that SGL did this kind of inversion.
Vertex D clipping status is as useful as vertex A's, since the goal is to determine if the A-D side is more clipped than the B-C side.
 
The code you put on Discord seems to reverse the drawing direction horizontally only if vertex A is clipped and vertices B and C are not. I think it may miss some other cases where it would be interesting to reverse horizontally :
- Only D clipped.
- Only B not clipped.
- Only C not clipped.

To cover all cases, you could count the clipped vertices on each side of the polygon : if the A-D side has more clipped vertices than the B-C side, then reverse horizontally.
I think the issue I was facing was situations, such as triangles, making it harder to determine where to flip.
I will have to try with a counter and see if it affects cpu performances or not enough to be seen.
Maybe the japanese documentation explains better what exactly the preclipping does?

Edit : ok, from SGL020A.TXT :
"
4) Distorted Sprite Draw Optimization

When distorted sprites (also includes textures) are displayed, a check is
performed to see whether vertices exist within a window. The character is
flipped and vertices are swapped as necessary to reorient vertex 0 within
the window. The drawing efficiency of distorted sprites that go outside
the window has been improved by the implementation of this software
preclipping.
"
So yeah, they only make sure vertex 0 (A) is on screen.
 
Last edited:
Back
Top