vdp 1 3d?

a question that i had for years after i first read the vdp 1 manual, is how do the saturn

do 3d (x,y,z and face normals)?

i only see 2d hardware function in the manual (the vdp 1 user manual) yet the SGL and SBL show

3d vertices and face normals like in other 3d hardware.

what do this mean does the vdp 1 have built in hardware for these functions or is it only

in software?

I know the saturn has some 3d hardware support as i look in the vdp 2 manual i see 3d placement

or the scoll and rotation screens.

can anybody help me?
 
VDP1 is a strictly 2D chip. You have to translate into display coordinates in software. If it wasn't immediately obvious, this also implies there's no Z-buffer, perspective correct texture mapping or other depth-related functionality.

ETA: Also, while you can provide a transformation matrix to rotate and scale VDP2 layers, they're not "3D". They're drawn strictly in the specified priority order, and don't intersect each other etc.
 
To go into a little more depth, you need to do all the 3D rotation/translation yourself with the SH2 or DSP, then calculate the screen projection coords (x and y on screen) from x/y/z, and feed those coords to VDP1. You handle EVERYTHING right up to the point where you would start rasterizing the polygon in a software renderer.

If you aren't familiar with this, a simple example 3D software renderer can be found in Yeti3D. There's a Saturn port here you can search for where it substitutes the VDP1 warped poly draw for the final step. That would demonstrate all steps leading up to using VDP1 for 3D.

Note that this is not perspective correct - VDP1 cannot render perspective correct polys. This means that as a poly gets closer to the viewer, you may wish to tesselate the poly to limit the size to make the affine rendering less noticeable. That's what later games on the PSX and Saturn did to minimize the "fish-eye" affect seen in many early games.
 
the psx can't do 3d either?

saturn has gourad shading made hardware, that seems to be a clue of 3d hardware.

if it dont process any z positioning how it render shawdows calculations like it do?

also how do it process the normals (polygons face) data?

as for it not having a z-buffer that dont mean much but that it's old, a zbuffer is just

for not displaying overlapping pixels, but i heard the saturn uses zsort to handles overlapping

polygons.

and again for the vdp2 im pretty sure it do handles 3 axis xyz, i'll look back into that. i was

thinking if the vdp2 can handle 3 axis space and not the vdp1 then maybe the z code get pass to the

vdp2 where it get place and handles, sort of like they do transparacies. people say saturn cant do tranparacies

but clearly it can by vdp2 with color calculation's color ratio.
 
Coolgame said:
the psx can't do 3d either?

It doesn't do PERSPECTIVE CORRECT 3D. Almost no early 3D hardware did - it's all affine mapping. The GPU in the PSX is like the Saturn in that you need to pass it already calculated x/y screen coords rather than the 3D coords, but the PSX also has the GTE (Geometry Transform Engine) that specifically deals with 3D calculations and perspective projection - it takes the 3D coords and gives the GPU just what it needs. That all has to be done in software on either the SH2 or the DSP on the Saturn.

saturn has gourad shading made hardware, that seems to be a clue of 3d hardware.

Shading has nothing to do with 3D. It's just an effect that some hardware uses at the same time as 3D rasterization of polygons, but it could just as easily be used on plain 2D drawing.

if it dont process any z positioning how it render shawdows calculations like it do?

also how do it process the normals (polygons face) data?

In software. The GTE in the PSX helps with those calculations - the Saturn is all software.

as for it not having a z-buffer that dont mean much but that it's old, a zbuffer is just

for not displaying overlapping pixels, but i heard the saturn uses zsort to handles overlapping

polygons.

In software... and the PSX is the same for that. The most common thing both do is the Painter's Algorithm - sort the polys from front to back and draw them from back to front.

and again for the vdp2 im pretty sure it do handles 3 axis xyz, i'll look back into that. i was

thinking if the vdp2 can handle 3 axis space and not the vdp1 then maybe the z code get pass to the

vdp2 where it get place and handles, sort of like they do transparacies. people say saturn cant do tranparacies

but clearly it can by vdp2 with color calculation's color ratio.

The VDP2 manages a handful of 2D layers made of cells or plain bitmaps. That plane may be rotated about the x/y/z axes, but it's still ONE SINGLE FULL PLANE. You can only have two rotation planes on VDP2, so if you are satified with TWO 3D ROTATED IMAGES, yes, it does indeed do 3D.
wink.gif
biggrin.gif
 
I like to get WinterSports Eins again, it's not on this site any more or the link is

broken.

but wintersports use 2d sprites it's 3d is prerendered.
 
Tank you Chilly Willy, this is the first time I see somebody speaking about my yeti3d Saturn port
smile.gif


And as explained above, everything related to 3D is actually done on software. The VDP1 is only used in order to display quads (2D shapes) on screen.
 
cafe-alpha said:
Tank you Chilly Willy, this is the first time I see somebody speaking about my yeti3d Saturn port
smile.gif


And as explained above, everything related to 3D is actually done on software. The VDP1 is only used in order to display quads (2D shapes) on screen.

I've a bit of an interest in Yeti3D... I did ports to the 32X and N64, and now I'm working on ports of Yeti3D-Pro, which is the next step up from the original Yeti3D. The two main differences between the original and the Pro versions is Pro allows for slope, and uses models for objects and enemies, where the original is all flat, and uses sprites. The other difference is Pro uses a lot more memory. That's making it tough to port to the 32X as just the level map uses ALL the ram in the 32X. It would do well on the Saturn, though.
 
Chilly Willy said:
I've a bit of an interest in Yeti3D... I did ports to the 32X and N64, and now I'm working on ports of Yeti3D-Pro, which is the next step up from the original Yeti3D. The two main differences between the original and the Pro versions is Pro allows for slope, and uses models for objects and enemies, where the original is all flat, and uses sprites. The other difference is Pro uses a lot more memory. That's making it tough to port to the 32X as just the level map uses ALL the ram in the 32X. It would do well on the Saturn, though.

About 3D models: there is support for displaying 3D models on non-Pro engine too !

(See draw_entity_as_model function in draw.c)

Memory should not be a problem on Saturn, but the increase of number of quads to process because of 3D models would make it unplayable on Saturn ...

Yeti3D Pro ... I discovered it when I made the first public release of my Yeti Saturn adaptation ^^;

At that time, I googled of the spelling of Yeti3D original author (in order to write readme.txt or so), and saw he actually released the Pro engine sources some months before.

I have added some Pro features to ietx2, but it is still WIP (well, I haven't modified source code for half a year, but let's say it is WIP anyway ...)

According to my changelog, I have added the following features:

- Level editor + conversion of all levels found in Yeti3D Pro sources to Saturn.

- Better visual looking by bilinear-resizing textures on PC when converting level data.

- Slopes.

- Transparency. (example here)

It is good to hear somebody interested in Yeti3D
smile.gif
It gives me some motivation to continue my Saturn port !

I plan to release my Saturn port after S.A.T.U.R.N. contest judging and prizes shipping.
 
cafe-alpha said:
About 3D models: there is support for displaying 3D models on non-Pro engine too !

(See draw_entity_as_model function in draw.c)

Support, yes, but it's not used. Yeti redirects all entity drawing to the sprite code. Given that there aren't any test models in the code, I'm not sure how complete the model code is. It may still have bugs, which led him to not use it until later versions which became the pro version.

Memory should not be a problem on Saturn, but the increase of number of quads to process because of 3D models would make it unplayable on Saturn ...

Well, that would depend on how many, wouldn't it?
smile.gif
And using VDP1 for the drawing certainly increases the number since the processor doesn't have to draw the quad, just process them. Did you try changing the number of models, or which ones were used? There are comments in the Pro code about not using a couple models as it increased the number of polys so much that it became too slow, mainly from the sheer number of the objects (it think it was the cactus that had that comment).

Yeti3D Pro ... I discovered it when I made the first public release of my Yeti Saturn adaptation ^^;

At that time, I googled of the spelling of Yeti3D original author (in order to write readme.txt or so), and saw he actually released the Pro engine sources some months before.

I have added some Pro features to ietx2, but it is still WIP (well, I haven't modified source code for half a year, but let's say it is WIP anyway ...)

According to my changelog, I have added the following features:

- Level editor + conversion of all levels found in Yeti3D Pro sources to Saturn.

- Better visual looking by bilinear-resizing textures on PC when converting level data.

- Slopes.

- Transparency. (example here)

It is good to hear somebody interested in Yeti3D
smile.gif
It gives me some motivation to continue my Saturn port !

I plan to release my Saturn port after S.A.T.U.R.N. contest judging and prizes shipping.

Well, it's good to discuss things with someone actually working on the code. I did various tests with the drawing parameters on the 32X when trying to get the speed up without resorting to large amounts of assembly. One thing I noticed in the 32X code that would affect the Saturn as well is to watch the signed vs unsigned integers in places where lots of shifting occurs. Unsigned ints use inline multi-bit shift opcodes, while signed ints call a subroutine that shifts the int one bit at a time (there are no multi-bit shifts for signed ints). There were places in the code I forced a cast to unsigned because I knew at those places the values were never negative and it was critical to use inline multi-bit shifting, not a subroutine. Depending on how many bits are shifted, it would actually be better to do something like this than to use signed shifting:

Code:
#define rshift(val, n) ((v<0) ? -(int)((unsigned int)(-val) >> n) : (int)((unsigned int)val >> n))

You avoid the jsr/rts, and several shifts. At least if you use a constant for the shift count, they have different subroutines for each constant shift count. If the shift count is unknown, it has to call a more generic routine to handle the unknown number of shifts, which makes it even slower.

I got in the habit of compiling snippets of code to assembly with the SH2 to see if it needed a little help.
smile.gif


I ran into the same thing when making a version of Tremor for the 32X... which would probably be pretty good on the Saturn. It runs completely on the slave SH2. The 32X can handle 22kHz mono or 11kHz stereo with my current code, so the Saturn should be even better with it's faster clock rate and 32 bit buses.
 
Sorry for the late reply,

Chilly Willy said:
Well, that would depend on how many, wouldn't it?
smile.gif
And using VDP1 for the drawing certainly increases the number since the processor doesn't have to draw the quad, just process them. Did you try changing the number of models, or which ones were used? There are comments in the Pro code about not using a couple models as it increased the number of polys so much that it became too slow, mainly from the sheer number of the objects (it think it was the cactus that had that comment).

In my WIP yeti 3D Saturn adaptation, there is not yeti3d Pro 3D model support yet.

This is due to the facts that I port Pro features little by little to yeti3D GPL for Saturn and that I didn't done anything concerning 3D models.

I will let you know about this in the case I add 3D models
smile.gif


Chilly Willy said:
Well, it's good to discuss things with someone actually working on the code. I did various tests with the drawing parameters on the 32X when trying to get the speed up without resorting to large amounts of assembly. One thing I noticed in the 32X code that would affect the Saturn as well is to watch the signed vs unsigned integers in places where lots of shifting occurs. Unsigned ints use inline multi-bit shift opcodes, while signed ints call a subroutine that shifts the int one bit at a time (there are no multi-bit shifts for signed ints). There were places in the code I forced a cast to unsigned because I knew at those places the values were never negative and it was critical to use inline multi-bit shifting, not a subroutine. Depending on how many bits are shifted, it would actually be better to do something like this than to use signed shifting:

Code:
#define rshift(val, n) ((v<0) ? -(int)((unsigned int)(-val) >> n) : (int)((unsigned int)val >> n))

You avoid the jsr/rts, and several shifts. At least if you use a constant for the shift count, they have different subroutines for each constant shift count. If the shift count is unknown, it has to call a more generic routine to handle the unknown number of shifts, which makes it even slower.

I got in the habit of compiling snippets of code to assembly with the SH2 to see if it needed a little help.
smile.gif


I ran into the same thing when making a version of Tremor for the 32X... which would probably be pretty good on the Saturn. It runs completely on the slave SH2. The 32X can handle 22kHz mono or 11kHz stereo with my current code, so the Saturn should be even better with it's faster clock rate and 32 bit buses.

Thank you very much for the advice !

I tried your optimization on yeti_build_vis function (draw.c) and it became a little faster:

In this function, f2i is used to compute array index, hence not negative values.

Code:
//        cell = &yeti->map.item[y = f2i(ray->y >> 2)][x = f2i(ray->x >> 2)];

        cell = &yeti->map.item[y = ((u32)ray->y) >> 10][x = ((u32)ray->x >> 10)];

For the same 3D scene,yeti_build_vis using commented out code runs at 24ms per frame, and at 20ms per frame when using logical shifts.

I don't know anything about assembly language, and I just discovered the sh-elf-gcc -c -g -Wa,-a,-ad <compilation flags> srcGPL/draw.c > srcGPL/draw.lst in order to see assembly code produced, and only two constant right shifts are used to compute array index:

Code:
 2611 0f38 4919     		shlr8	r9

 2612 0f3a 4909     		shlr2	r9

Don't hesitate to share other optimization advices
smile.gif


Also, are your optimized sources for 32x available for download ? (if available, I would be very interested in adding your changes to my Saturn version)
 
cafe-alpha said:
Thank you very much for the advice !

I tried your optimization on yeti_build_vis function (draw.c) and it became a little faster:

In this function, f2i is used to compute array index, hence not negative values.

Code:
//        cell = &yeti->map.item[y = f2i(ray->y >> 2)][x = f2i(ray->x >> 2)];

        cell = &yeti->map.item[y = ((u32)ray->y) >> 10][x = ((u32)ray->x >> 10)];

For the same 3D scene,yeti_build_vis using commented out code runs at 24ms per frame, and at 20ms per frame when using logical shifts.

I don't know anything about assembly language, and I just discovered the sh-elf-gcc -c -g -Wa,-a,-ad <compilation flags> srcGPL/draw.c > srcGPL/draw.lst in order to see assembly code produced, and only two constant right shifts are used to compute array index:

Code:
 2611 0f38 4919     		shlr8	r9

 2612 0f3a 4909     		shlr2	r9

Don't hesitate to share other optimization advices
smile.gif


Also, are your optimized sources for 32x available for download ? (if available, I would be very interested in adding your changes to my Saturn version)

It's amazing that a tiny change like a (u32) cast can make a significant improvement in speed.
smile.gif


One of my Tremor tests is here: http://www.mediafire.com/?9acgq3givvi8kvd

and my double-pixel Yeti demo with music and sound is here: http://www.mediafire.com/?a9y2dnhm3e9dfrc

The Tremor-rockbox directory was just used for reference - it isn't needed for the demo. The demo uses the lowmem branch of the official Tremor with various optimizations, but it could use more on the 32X. It should actually be pretty decent on the Saturn.

The Yeti demo renders at 160x112 to a 320x112 15-bit mode display. It's a good example of how to setup the 32X to use only every other line in the display. The code has been modified to draw two pixels at once during rendering. I really need to just make the entire polygon rasterizing assembly. Anywho, drawing 160x112 really improved the performance of Yeti on the 32X. This demo also uses the Slave SH2 to mix and play MOD music with sound effects using DMA PWM audio.

Anyway, another generic optimization you may already know: weird shift lengths. The SH2 only does 1, 2, 8, and 16 bit shifts. Everything else must be done as multiples of those. However, there are times when you can be sneaky for better performance. Instead of

shlr2 r1

shlr2 r1

shlr2 r1

for a shift of 6, try this

shll2 r1

shlr8 r1

Assuming the left shift doesn't kill any significant upper bits, you save a cycle doing the left, then right shift. Most of the shifts not covered directly can be done in a similar manner to save a cycle or two.
 
If you know the value is negative there are some tricks you can use, but some cycle counting may be needed to determine the fastest variant. In the general case, you can convert the operation into an unsigned shift by inverting the bits before and again afterwards, eg.

Code:
; r0 >> 8

not     r0, r0

shlr8   r0

not     r0, r0

Shifting by 16 and 24 bits can be special-cased using the sign extension instructions. It can be faster to handle shift amounts slightly larger like this as well, but that's where the cycle counting comes in.

Code:
; r0 >> 24

shlr16  r0

shlr8   r0

exts.b  r0

; r0 >> 17

shlr16  r0

exts.w  r0

shar    r0
 
Yeah, good points. I've done the logical shift/sign extend trick as well, just forgot to include that in the list. I haven't done the not/shift/not trick... I'll have to remember that one.
smile.gif
 
sorry for taking so long but i like to say about two weeks of your first response i did some research and found

ya'll was right, the vdp 1 is a 2d chip, i was a bit too busy at the time to respond.

I also have a playstation one development manual to compare the two consoles.

i like to say thank you all for your help, and added knowlege!

i learn more about the console thanks to your support (everybody).

you all are a big help.
 
(bump)

Recently, I tried to speedup a little more yeti3d code, so that I add a post to this topic.

As code inside loops in yeti_build_vis functions is executed nearly 5000~8000 times per frame, I focused optimization on this function only.

The "low risk high return" optimization is to modify f2i macro in order to use logical shifts.

Code:
// Before (13 cycles)

#define f2i_old(A) ((A)>>8)

// After (7 cycles)

#define f2i(A) ( (A) >= 0 ? (int)(((u32)(A)) >> 8) : (int)(-( (((u32)(-((A)+1))) >> 8) + 1 )) )

(*) Optimized f2i should require direct reference to local variable in order to be actually faster

Example: tmp = x+y; z = f2i(tmp); instead of z = f2i(x+y);

Another optimization was the most effective and actually the simplest :

yeti_build_vis heavily uses CELL_IS_OPAQUE macro :

Code:
#define CELL_IS_SOLID(A)  ((A)->swi & CELL_SWI_SOLID)

#define CELL_IS_WARPED(A) ((A)->swi & CELL_SWI_WARPED)

#define CELL_IS_OPAQUE(A) (CELL_IS_SOLID(A) && !CELL_IS_WARPED(A))

Instead of performing ands, not, etc thousands times on every frame, I compute opaque attribute on startup, and update it only when it is needed
smile.gif


Code:
#define CELL_UPDATE_OPAQUE(A) ((A)->opaque = CELL_IS_SOLID(A) && !CELL_IS_WARPED(A))

#define CELL_IS_OPAQUE(A)     ((A)->opaque)

Also, outside of yeti_build_vis function, I optimized vertex_project function, which is called around 500 times per frame.

-> 2 additional reciprocal tables are added : reciprocal*WIDTH and reciprocal*HEIGHT that save one multiplication each.

-> At the end of computing, final >>9 shift has been changed to logical >>8 shift, so that only one "shlr8" instruction is used.

(The >>1 remaining shift is computed in reciprocal tables, and doesn't affect projection accuracy)

After optimizations above, speed was around 7~10 FPS, but due to VDP1 issue (display is flickering on real hardware), I had to limit speed to 3~5 FPS.

Hence, there are still a lot of things to investigate on
smile.gif
 
I assume that's 320x200-ish and not using the warped sprite drawing? If so, that's pretty good. You really need to redo the draw poly as assembly to really do better. I plan to do that on the 32X version for better speed.
 
Chilly Willy said:
I assume that's 320x200-ish and not using the warped sprite drawing? If so, that's pretty good. You really need to redo the draw poly as assembly to really do better. I plan to do that on the 32X version for better speed.

Well, the polygon drawing is performed by VDP1, not SH2.

I took some video on real hardware so that you can get an idea how my yeti3d pro adaptation looks like :

ietx3 wacked level

ietx3 church level

ietx3 church level
 
television2000 said:
I was gonna ask you Cafe to show us a video before I saw this lol.

Excellent work.
smile.gif


I presume you aren't using SBL or SGL in Yeti. Right ?

I don't use SGL. The sources used as base for my Yeti3D Saturn adaptation are Charles MacDonald's vdp1ex example program.

However, I use some SGL sources, especially for CD-ROM access.

The project is compilable from sources only, without the need of Sega precompiled libraries.
 
Back
Top