Author Topic: Prizm Useful Routines -- post here! (Read 33133 times)

calc84maniac · « **Reply #15 on:** May 26, 2011, 06:58:18 pm »

Quote from: z80man on May 26, 2011, 10:36:58 am

Alright I have a conversion from C to asm. It is get pixel and is as fast as possible and can be no more optimized. Unless gcc automatically handles the zero extension which I'll have to check.

Is it just me or does this routine not handle multiplying by 2 to index the array?

z80man · « **Reply #16 on:** May 26, 2011, 07:52:55 pm »

Quote from: calc84maniac on May 26, 2011, 06:58:18 pm

Quote from: z80man on May 26, 2011, 10:36:58 am
Alright I have a conversion from C to asm. It is get pixel and is as fast as possible and can be no more optimized. Unless gcc automatically handles the zero extension which I'll have to check.
Is it just me or does this routine not handle multiplying by 2 to index the array?

That is a very valid concern there. I was just umm testing you to see if you would umm catch it

Unless your image buffer was 256 colors there would be a failure. Routine now fixed. I'll disassemble the results to see how gcc handles inline functions so I can optimize it.

Code: [Select]

short GetPixel(short x, short y)
{
    MOV.W (width),R2          ;get screen width of 384 * 2
    MOV.L (VRAM),R3           ;get VRAM buffer location, usually 0xA8000000, but uses pre-initialized global variable just in case.
    MULU.W R2,R5              ;unsigned 16 bit multiplication
    SHLL R4                   ;single left bit shift which multiplies R4 by 2. Also used to fill slot before the MAC load
    STS MACL,R2               ;load result of multiplication into R2
    ADD R3,R2                 ;add VRAM base address to resulting y value
    ADD R4,R2                 ;add modified x value to to the already added y and VRAM base
    MOV.W @R2,R0              ;Load word from what's at R2 into R0. Sign extension 
    RTS                       ;delayed branch and return. Note, R0 not touched because the result of the previous instruction is still be loaded
    EXTU.W R0                 ;safe to touch R0 now. Get rid of that pesky sign extension. May remove later if gcc handles this step on its own
    align.4
    width: word 384 * 2
    VRAM: long VRAM_address   ;global variable that was initialized earlier to correct address. 
}

calc84maniac · « **Reply #17 on:** May 26, 2011, 10:20:40 pm »

I made a little optimization that takes advantage of the move with offset:

Code: [Select]

short GetPixel(short x, short y)
{
    MOV.W (width),R2          ;get screen width of 384 * 2
    MOV.L (VRAM),R0           ;get VRAM buffer location, usually 0xA8000000, but uses pre-initialized global variable just in case.
    MULU.W R2,R5              ;unsigned 16 bit multiplication
    SHLL R4                   ;single left bit shift which multiplies R4 by 2. Also used to fill slot before the MAC load
    STS MACL,R2               ;load result of multiplication into R2
    ADD R4,R2                 ;add modified x value to to the scaled y offset
    MOV.W @(R0,R2),R0         ;Load word from R2 offsetted from the VRAM base into R0. Sign extension 
    RTS                       ;delayed branch and return. Note, R0 not touched because the result of the previous instruction is still be loaded
    EXTU.W R0                 ;safe to touch R0 now. Get rid of that pesky sign extension. May remove later if gcc handles this step on its own
    align.4
    width: word 384 * 2
    VRAM: long VRAM_address   ;global variable that was initialized earlier to correct address. 
}

z80man · « **Reply #18 on:** May 27, 2011, 02:41:10 am »

Nice job catching that there calc84. Sometimes I forget possible optimizations when I don't have the full instruction set in front of me.
btw on the instructions that you replaced, I wanted to know if you're aware of the hidden slow down when you try to access a register when the previous instruction loaded data from memory into it. The code is fine, but I was just wondering if it was by luck you had it formatted that way or if you knew the whole time because you seem to be very skilled with Super H asm.

calc84maniac · « **Reply #19 on:** May 27, 2011, 08:31:16 am »

Quote from: z80man on May 27, 2011, 02:41:10 am

Nice job catching that there calc84. Sometimes I forget possible optimizations when I don't have the full instruction set in front of me.
btw on the instructions that you replaced, I wanted to know if you're aware of the hidden slow down when you try to access a register when the previous instruction loaded data from memory into it. The code is fine, but I was just wondering if it was by luck you had it formatted that way or if you knew the whole time because you seem to be very skilled with Super H asm.

I've gotten pretty used to ARM9 assembly at this point, which has similar memory load and multiplication delays. I studied up on SH3 when we started learning about the Prizm a while back (and personally I like ARM better, but that might just be me)

Edit: just for the lulz, here's how I might do this in ARM assembly:

Code: [Select]

GetPixel:
    ldr r2,=VRAM        @buffer = VRAM;
    add r3,r1,r1,lsl #2 @temp = y*3;
    add r3,r0,r3,lsl #7 @temp = x+temp*128;
    add r3,r2,r3,lsl #1 @temp = buffer+temp*2;
    ldrh r0,[r3]        @return *temp;
    bx lr

AngelFish · « **Reply #20 on:** June 14, 2011, 06:20:16 pm »

Here are a couple small routines someone might find useful.

Switch register banks:

Code: [Select]

stc sr, r8
mov.l 1f, r9
or r9, r8
ldc r8, sr

1 byte ASCII number -> Hex (r4 and r5 contain the high and low nibbles respectively)

Code: [Select]

xor r7,r7
Start:
add r7,r4
add #0xE0,R4
mov #0x3A,r6
cmp/hs r4,r6
bf/s Start
mov #0xFF,r7
shll2 r4
shll2 r4
          /* Loop is unrolled here for speed, but you could easily loop back to the beginning for this after storing r4 to a safe register. */
xor r7,r7
Start2:
add r7,r5
add #0xE0,R5
mov #0x3A,r6
cmp/hs r5,r6
bf/s Start
mov #0xFF,r7
add r5,r4
rts
mov r4,r1

z80man · « **Reply #21 on:** June 15, 2011, 01:48:51 am »

Just as a note you might want to keep your routines C compliant in their register usage that way other C coders can embed them in their programs. Such as on the first make sure you push r8 and r9 on the stack beforehand and on the second routine remember that args are passed in r4-r7 then stack and return data is in r0 and r1. There are a few exceptions when it comes to structs but for the most part it is pretty general. Leaving the routines the way they are now will force C coders to modify them which they may not be experienced enough to do.

AngelFish · « **Reply #22 on:** June 15, 2011, 01:57:54 am »

Well, the first routine is for ASM coders to access the banked registers that GCC doesn't touch, so it wouldn't really have much use for C coders (except to crash their code). The second routine is quite simple to fix. Just add "mov r4,r1" to the end of the code, with appropriate replacement of rts.

EDIT: changed.

z80man · « **Reply #23 on:** June 15, 2011, 02:02:05 am »

So the first routine switches the mode from privileged to user it appears. I do have one question, how would you get back to privileged mode then if the instructions to access the SR are disabled?

AngelFish · « **Reply #24 on:** June 15, 2011, 02:06:19 am »

No, it switches the register set that's currently swapped in. The Processor has two register sets in Privileged mode: Regular and banked (which map to r8-r15). The GCC doesn't use the banked registers, so they're free to the ASM programmer to mess with. I doubt the OS uses them much either with the probable exception of error handling.

Needless to say, I love abusing the banked registers

z80man · « **Reply #25 on:** June 15, 2011, 02:11:05 pm »

Quote from: Qwerty.55 on June 15, 2011, 02:06:19 am

No, it switches the register set that's currently swapped in. The Processor has two register sets in Privileged mode: Regular and banked (which map to r8-r15). The GCC doesn't use the banked registers, so they're free to the ASM programmer to mess with. I doubt the OS uses them much either with the probable exception of error handling.

Needless to say, I love abusing the banked registers

That could be useful for the ex based instructions on my 83+ emulator. I currently store the z80 registers in the r8-r15 range so I would just have to preserve the non-shadowed registers.

Ashbad · « **Reply #26 on:** September 06, 2011, 08:31:28 pm »

Here's a few routines I'm using, thanks to Qwerty for giving me some input on the Random ones (made because rand and srand didn't work for me due to linking errors):

Code: [Select]

int GetKeyNB() { 
    int key; 
    key = PRGM_GetKey(); 
    if (key == KEY_PRGM_MENU) 
		GetKey(&key);
    return key; 
}

void GetKeyHold(int key) {
	while(GetKeyNB() != key) { }
	while(GetKeyNB() != KEY_PRGM_NONE) { }

}

void GetKeyWaitNone() {
	while(GetKeyNB() != KEY_PRGM_NONE) { }	
}

unsigned short RandomShort() {
	unsigned short retshort = 0;
	int*cur_stack = 0;
	int cur_stackv = 0;
	for(int i = 0; i < 16; i--) {
		retshort = retshort << 1;
		cur_stack = GetStackPtr();
		cur_stackv = *(RTC_GetTicks() + cur_stack);
		retshort = retshort | (cur_stackv % 0xFFFFFFFF);
	}
	return retshort;
}

unsigned int RandomInt() {
	unsigned int retint = 0;
	int*cur_stack = 0;
	int cur_stackv = 0;
	for(int i = 0; i < 32; i--) {
		retint = retint << 1;
		cur_stack = GetStackPtr();
		cur_stackv = *(RTC_GetTicks() + cur_stack);
		retint = retint | (cur_stackv % 0xFFFFFFFF);
	}
	return retint;
}

unsigned char RandomChar() {
	unsigned char retchar = 0;
	int*cur_stack = 0;
	int cur_stackv = 0;
	for(int i = 0; i < 8; i--) {
		retchar = retchar << 1;
		cur_stack = GetStackPtr();
		cur_stackv = *(RTC_GetTicks() + cur_stack);
		retchar = retchar | (cur_stackv % 0xFFFFFFFF);
	}
	return retchar;
}

z80man · « **Reply #27 on:** September 07, 2011, 12:54:43 am »

You might want to run a test on those random routines due to a possible issue with the RTC. The problem is that the RTC ticks only 64 times a second so if you're calling that same routines multiple times in quick succession there may be some repetition. In Simon's random routine he used a static seed to contribute to the result which increased randomness.

SimonLothar · « **Reply #28 on:** September 07, 2011, 12:42:13 pm »

Quote from: z80man on September 07, 2011, 12:54:43 am

In Simon's random routine he used a static seed to contribute to the result which increased randomness.

I did not invent the random routine you referred to. It uses the same algorithm as the one of the old CASIO SDK or the hitachi compiler's libraries (I only translated it to C). Therefor I think you are right. The randomness must be well balanced.

Ashbad · « **Reply #29 on:** September 08, 2011, 06:55:22 pm »

So, here I finally made a new routine that seems *really* random. When I didrectly used it in my map generating routine to take the result of this routine seeded with 0 and mod 6, the tile output showed no patterns whatsoever. However, I attained it by keeping the RTCtimer at bay for a *long*, *random*, time. Anyways, the routine on it's own, which can even seed itself or get seeded by the RTC_GetTicks:

Code: [Select]

unsigned short random(int extra_seed) {
	int seed = 0;
	int seed2 = 0;
	for(int i = 0; i < 32; i++ ){
		seed <<= 1;
		seed |= (RTC_GetTicks()%2);
	}
	for(int i = 0; i < 32; i++ ){
		seed2 <<= 1;
		seed2 |= (RTC_GetTicks()%16);
	}
	seed ^= seed2;
    seed = (( 0x41C64E6D*seed ) + 0x3039);
	seed2 = (( 0x1045924A*seed2 ) + 0x5023);
	extra_seed = (extra_seed)?(( 0xF201B35C*extra_seed ) + 0xD018):(extra_seed);
	seed ^= seed2;
	if(extra_seed){ seed ^= extra_seed; }
    return ((seed >> 16) ^ seed) >> 16;
}

It basically gets one bit from the timer at a time for the first seed, and then the second, "darker" seed (higher chance of filled bits), each bit gets 4 oppertunities to be seeded with a 1. It then goes through the standard wacky operations to fill in the bitfield better, xors the available seeds together, and returns them as a short. Very loosly based on simon's routine.

It's random *depending on how it's used*. If you use it every now and then, it will be *surely* random, if you seed it with RTC_GetTicks or another varying number (even with a seed of 0 it will still be random, though). In loops, it's fixed by slowing down operation in tradeoff for randomiscity:

Code: [Select]

	for(int i = 0; i < 3600; i++) {
		(*(map+i)).natural_cell = random(random(RTC_GetTicks()))%6;
		OS_InnerWait_ms(random(0)%64);
	}

As you see above, the actual random value is seeded by another random value that is not seeded, which worked really well. For the OS_InnerWait_ms, I have it wait for an unseeded random time, mod 64 (so it won't wait a very long time, and since the RTC timer increments 64 times a second, it should give it long enough to increment in usual cases.

All because of a rand() and srand() linking error

fun night spent.

Author Topic: Prizm Useful Routines -- post here! (Read 33133 times)

calc84maniac

Re: Prizm Useful Routines -- post here!

z80man

Re: Prizm Useful Routines -- post here!

calc84maniac

Re: Prizm Useful Routines -- post here!

z80man

Re: Prizm Useful Routines -- post here!

calc84maniac

Re: Prizm Useful Routines -- post here!

AngelFish

Re: Prizm Useful Routines -- post here!

z80man

Re: Prizm Useful Routines -- post here!

AngelFish

Re: Prizm Useful Routines -- post here!

z80man

Re: Prizm Useful Routines -- post here!

AngelFish

Re: Prizm Useful Routines -- post here!

z80man

Re: Prizm Useful Routines -- post here!

Ashbad

Re: Prizm Useful Routines -- post here!

z80man

Re: Prizm Useful Routines -- post here!

SimonLothar

Re: Prizm Useful Routines -- post here!

Ashbad

Re: Prizm Useful Routines -- post here!