fast memset

GO TO ADMIN PANEL > ADD-ONS AND INSTALL VERTIFORO SIDEBAR TO SEE FORUMS AND SIDEBAR

vbt

Staff member
Joined
Dec 16, 2001
Messages
5,193
Points
63
Age
42
I'm still playing to find speed on emulators and i find memset slow, anybody tried to write a fast memset, i only found this :

_memclrwh: ! terulet torlo: memclrwh(byte* buf, long width, long height, long step)

! step = map width - brush width

xor r2,r2

lswh_1:

mov r5,r1

lswh_2:

mov.b r2,@r4

add #1,r4

dt r1

bf lswh_2

add r7,r4

dt r6

bf lswh_1

rts

nop

i'd like to replace such thing :

memset(SpriteOnScreenMap, 255, 0x10000);
 

mrkotfw

Member
Joined
Dec 30, 2002
Messages
838
Points
28
Age
31
I'm not sire if that's as fast as you think it is. It writes in units of 1 byte and doesn't even look to eliminate some trivial pipeline stalls. Newlib's implementation might be faster.
 

antime

Extra Hard Mid Boss
Joined
Jan 24, 2002
Messages
2,577
Points
48
Website
www.iki.fi
If possible, use DMA. If you have to use CPU transfers, try the existing library code first. Smart people have spent a lot of time optimizing it. Other than that, write in larger chunks, to reduce the amount of time spent waiting on the (S)DRAM. Use all the normal optimizing tricks, like unrolling your loops and taking advantage of the instruction set where possible. Also test on real hardware if you can, emulators rarely take things like memory wait states or pipeline stalls into account.

A naive attempt at a 2-way unrolled version (still writing only a single byte at a time):

Code:
    .align  4

_memclrwh:

    mov     #0, r1

    cmp/eq  r1, r5

    bt      leave

    cmp/eq  r1, r6

    bt      leave

    mov     r5, r0

    mov     r4, r3

    add     #1, r3

    add     r5, r4

outer:

    tst     #1, r0

    bf/s    odd

    mov     r4, r2

inner:

    mov.b   r1, @-r2

odd:

    cmp/hi  r3, r2

    bt/s    inner

    mov.b   r1, @-r2

    add     r7, r3

    dt      r6

    bf/s    outer

    add     r7, r4

leave:

    rts

    nop
 
Top