Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Xeda112358

Pages: 1 ... 55 56 [57] 58 59 ... 317

841

ASM / Re: DivAHLby10 Routine Check

« on: July 06, 2013, 07:27:33 am »

Cool! If you have inputs as EHL, then the output will be A as the remainder and EHL as the result :

Code: [Select]

DivAHLby10:
 ld bc,$180a
 sub a
DAHLLoop1:
 add hl,hl
 rl d
 rla
 cp c
 jr c,DAHLLoop2
 sub c
 inc l
DAHLLoop2:
 djnz DAHLLoop1
 ret

That saves only 3 bytes and 12 cycles. If you want to squeeze a little more speed out of the routine without fully unrolling it, you can unrll the first 3 iterations since 3 bits will never be >=10 :

Code: [Select]

DivAHLby10:
 ld bc,$150a
 sub a
 add hl,hl \ rl d \ rla
 add hl,hl \ rl d \ rla
 add hl,hl \ rl d \ rla
DAHLLoop1:
 add hl,hl
 rl d
 rla
 cp c
 jr c,DAHLLoop2
 sub c
 inc l
DAHLLoop2:
 djnz DAHLLoop1
 ret

The cost is 12 bytes and you save only 87 cycles. I am trying to think of a better approach to get speed out of this.

EDIT: This routine gets a minimum of 966 tstates, average of 984.5, and max of 1002, making it almost 300 t-states faster at its slowest than the previous routine at its fastest. The downside is that it is 35 bytes, compared to the 15 it could be:

Code: [Select]

DivEHLby10:
;Inputs:
;     EHL
;Outputs:
;     EHL is the quotient
;     A is the remainder
;     D is not changed
;     BC is 10

 ld bc,$050a
 sub a

 sla e \ rla
 sla e \ rla
 sla e \ rla

 sla e \ rla
 cp c
 jr c,$+4
   sub c
   inc e
 djnz $-8

 ld b,16

 add hl,hl
 rla
 cp c
 jr c,$+4
 sub c
 inc l
 djnz $-7
 ret

842

ASM / Re: DivAHLby10 Routine Check

« on: July 06, 2013, 07:10:21 am »

Because you are using add hl,hl, bit 0 of HL is always 0 by the time you get to that, so you should be fine (as verified by jacobly

) You might be able to get better speed by doing this, though:

Code: [Select]

DivAHLby10:
 ld d,a
 ld bc,$180a
 sub a
DAHLLoop1:
 add hl,hl
 rl d
 rla
 cp c
 jr c,DAHLLoop2
 sub c
 inc l
DAHLLoop2:
 djnz DAHLLoop1
 ld e,a
 ld a,d
 ret

E is the remainder, AHL is the quotient. It is 4 bytes smaller and 262 t-states faster

843

TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines

« on: July 05, 2013, 10:17:16 pm »

That is awesome! Is that 6MHz? Also, what other kinds of math routines do you have in there? ^_^

844

TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines

« on: July 05, 2013, 12:00:10 pm »

I see, A basically works as a 16-bit integer where the upper 9 bits are all the same. I have this:

Code: [Select]

     ld hl,0
     or a
     jp p,$+5
       sbc hl,de
     ld b,8
mulloop:
     add hl,hl
     rla
     jr nc,$+5
       add hl,de
       adc a,0
     djnz mulloop
     ret

That treats A as a signed integer, HL as an unsigned integer. I hope that works!

845

TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines

« on: July 05, 2013, 11:31:39 am »

Well, multiplication is always signed, regardless. Division is the only one that you need to do a specific routine for the sign. What inputs/outputs are you expecting, though, when you use it? (just some numbers, so I can figure out what you are looking for)

846

TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines

« on: July 05, 2013, 08:10:51 am »

A lot of users would probably have pre-built images to use, so you should have a way so that users can pass a pointer to the image data (in the format of your stack) to have it rendered. Also, HL_Times_A:

Code: [Select]

HL_Times_A:
     ex de,hl
DE_Times_A:
;Inputs:
;     DE and A are factors
;Outputs:
;     AHL is the product
;     B is 0
;     C is not changed
;     DE is not changed
;Time:
;     342+13x
;
     ld b,8          ;7           7
     ld hl,0         ;10         10
aaa:
       add hl,hl     ;11*8       88
       adc a,a       ;4*8        32
       jr nc,rrr     ;(12|25)*8  96+13x
         add hl,de   ;--         --
         adc a,0
rrr:
       djnz aaa      ;13*7+8     99
     ret             ;10         10

I feel like there is a much better way to do this... Also, it returns a 24-bit result. If you only need the lower 16 bits, you can remove 'adc a,0' and change 'adc a,a' to 'rlca' to preserve a.

847

TI Z80 / Re: FileSyst

« on: July 05, 2013, 07:55:20 am »

You could use a .C3 extension since Celtic 3 supports all of those. An extension of .C3 will look for DCS7 first, since DCS7 has the most complete and bug-free^{[citation needed]} version, else it will look for Celtic 3, and if that isn't available, then it won't execute.

EDIT:Also, thanks DrDnar!

848

TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines

« on: July 04, 2013, 07:23:30 pm »

If you only need 8-bit multiplication, I recently wrote my new personal best for speed and size:

Code: [Select]

H_Times_E:
;Inputs:
;     H,E
;Outputs:
;     HL is the product
;     D,B are 0
;     A,E,C are preserved
;Size:  12 bytes
;Speed: 311+6b, b is the number of bits set in the input H
;      average is 335 cycles
;      max required is 359 cycles
     ld d,0     ;1600    7      7
     ld l,d     ;6A      4      4
     ld b,8     ;0608    7      7
                ;            
     add hl,hl  ;29      11*8   88
     jr nc,$+3  ;3001 12*8-5b   96-5b
     add hl,de  ;19      11*b   11b
     djnz $-4   ;10FA  13*8-5   99
                ;            
     ret        ;C9      10     10

And the unrolled code isn't too large, either, so you can get away with a ridiculously fast routine:

Code: [Select]

H_Times_E:
;Inputs:
;     H,E
;Outputs:
;     HL is the product
;     D,B are 0
;     A,E,C are preserved
;Size:  36 bytes
;Speed: 191+6b+9p, b is the number of bits set in the input H, p is if it is odd
;   average is 229.5 cycles (105.5 cycles saved)
;   max required is 258 cycles (101 cycles saved)
     ld d,0      ;1600   7   7
     ld l,d      ;6A     4   4
           ;      
     sla h      ;CB24    8
     jr nc,$+3   ;3001  12-1b
     ld l,e       ;6B    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29   11
     ret nc      ;D0   11+15p
     add hl,de   ;19   --
     ret         ;C9   --

Also, it returns a 16-bit result that you can work with to do whatever.

EDIT: Simple optimisation in the unrolled loop >.>

849

TI Z80 / Re: FileSyst

« on: July 04, 2013, 07:17:03 pm »

The problem is that I still don't understand Flash protocol and USB protocol. I am not sure, though, but I think SirCmpwn released a template for an operating system, so I could probably use that.

850

TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines

« on: July 04, 2013, 01:09:20 pm »

Yay! Yeah, for some values, it is almost 3 times faster than the routine I gave you originally. What are your typical numbers for HL?

851

TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines

« on: July 04, 2013, 12:38:15 pm »

Okay, because the only problem area that I could find was pointed out by Jacobly earlier (if HL=8000h it will return a wrong result that is negative the real answer). The fix is simple:

Code: [Select]

;===============================================================
HL_Div_BC_Signed:
;===============================================================
;Performs HL/BC
;Speed: 1350-55a-2b
;         b is the number of set bits in the result
;         a is the number of leading zeroes in the absolute value of HL, minus 1
;         add 24 if HL is negative
;         add 19 if BC is negative
;         add 28 if the result is negative 
;Size:    68 bytes
;Inputs:
;     HL is the numerator
;     BC is the denominator
;Outputs:
;     DE is the quotient
;     HL is the remainder
;     BC is not changed
;Destroys:
;     A
;===============================================================
     ld a,h
     xor b
     push af
;absHL
     xor b
     jp p,$+9
     xor a \ sub l \ ld l,a
     sbc a,a \ sub h \ ld h,a
;absBC:
     bit 7,b
     jr z,$+8
     xor a \ sub c \ ld c,a
     sbc a,a \ sub b \ ld b,a

     ld de,0
     adc hl,hl
     jr z,EndSDiv
     ld a,16

     add hl,hl
     dec a
     jp nc,$-2
     ex de,hl
     jp jumpin
Loop1:
     add hl,bc     ;--
Loop2:
     dec a         ;4
     jr z,EndSDiv  ;12|23
     sla e         ;--
     rl d          ;--
jumpin:            ;
     adc hl,hl     ;15
     sbc hl,bc     ;15
     jr c,Loop1    ;23-2b     ;b is the number of bits in the absolute value of the result.
     inc e         ;--
     jp Loop2      ;--
EndSDiv:
     pop af \ ret p
     xor a \ sub e \ ld e,a
     sbc a,a \ sub d \ ld d,a
     ret

Remember that HL and BC are the inputs, DE is the output (HL is the remainder).

852

TI Z80 / Re: CopyProg

« on: July 04, 2013, 11:54:45 am »

Okay, now let's try this. I figured out the problem and even a rookie wouldn't have made this mistake! I forgot to modify HL after crossing a page boundary.
Xeda112358 has shame.

853

TI Z80 / Re: FileSyst

« on: July 04, 2013, 11:15:30 am »

Hey! I was trying to come up with ideas for things that I could add today. I added in a command called OSNEW() that creates an OS variable with an optional size argument. I was thinking of adding in a complicated method for handling variables specific to FileSyst. Basically, here is the idea and I think it is too complicated and I won't add it:

-Create a special type of hidden folder with the name of the main program being run.
-Have a command to define a subroutine
-This creates a folder with the name of the subroutine, and the relative location in the variable for quick lookup
-Inside this folder, will contain named variables used by the routine such as floats, ints, and strings.

This means that if I add an ability to SUB(LBL), or whatever, they can have local variables (and possibly access variables in other subroutines). This could also mean that such a language would be slower than TI-BASIC since variables can have custom names and they are located in folders (then again, this could potentially be faster than the OSes VAT lookup).

So yeah, just some food for thought

854

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: July 04, 2013, 08:59:58 am »

EDIT: Jacobly pointed out the case HL = 8000h, so this doesn't work

Hopefully this file has the updated SDiv routine. I have this:

Original routine

Code: [Select]

p_SDiv:
	.db __SDivEnd-1-$
	ld	a,h
	xor	d
	push	af
	xor	d
	jp	p,__SDivSkip1-p_SDiv-1
	xor	a
	sub	l
	ld	l,a
	sbc	a,a
	sub	h
	ld	h,a
__SDivSkip1:
	bit	7,d
	jr	z,__SDivSkip2
	xor	a
	sub	e
	ld	e,a
	sbc	a,a
	sub	d
	ld	d,a
__SDivSkip2:
	call	$3F00+sub_Div
x_SDivEntry:
	pop	af
	ret	p
	xor	a
	sub	l
	ld	l,a
	sbc	a,a
	sub	h
	ld	h,a
	ret
__SDivEnd:

Smaller routine: 1 byte, 1|6 cycles saved

Code: [Select]

p_SDiv:
	.db __SDivEnd-1-$
	ld	a,h
	xor	d
	push	af
	xor	d
	jp	p,__SDivSkip1-p_SDiv-1
	xor	a
	sub	l
	ld	l,a
	sbc	a,a
	sub	h
	ld	h,a
__SDivSkip1:
	xor	d
	jp	p,__SDivSkip2-p_SDiv-1
	xor	a
	sub	e
	ld	e,a
	sbc	a,a
	sub	d
	ld	d,a
__SDivSkip2:
	call	$3F00+sub_Div
x_SDivEntry:
	pop	af
	ret	p
	xor	a
	sub	l
	ld	l,a
	sbc	a,a
	sub	h
	ld	h,a
	ret
__SDivEnd:

And my only change is the two lines after __SDivSkip1.
Same size, save at least 1 cycle (up to 6 cycles).

EDIT: The same modification can be made to the fixed point signed division routine.

855

ASM / Re: ASM Optimized routines

« on: July 04, 2013, 08:46:10 am »

EDIT: Fixed a problem to take care of the case where HL= 8000h (thanks Jacobly!)
This routine a few pages back can be optimised:

Quote from: Quigibo on May 01, 2010, 03:19:23 am

Code: [Select]
SignedDivision: ld a,h xor d push af bit 7,h jr z,$+8 xor a sub l ld l,a sbc a,a sub h ld h,a bit 7,d jr z,$+8 xor a sub e ld e,a sbc a,a sub d ld d,a call RegularDivision pop af add a,a ret nc xor a sub l ld l,a sbc a,a sub h ld h,a ret

For the sign testing, I came up with this:

Code: [Select]

SignedDivision:
	ld	a,h
	xor	d
	push	af

	xor	d
	jp	p,$+9
	xor	a
	sub	l
	ld	l,a
	sbc	a,a
	sub	h
	ld	h,a

	bit	7,d
	jr	z,$+8
	xor	a
	sub	e
	ld	e,a
	sbc	a,a
	sub	d
	ld	d,a

	call	RegularDivision

	pop	af
	ret	p

	xor	a
	sub	l
	ld	l,a
	sbc	a,a
	sub	h
	ld	h,a
	ret

In all, it saves 1 bytes and at least 5 t-states (it will be either 5 or 10).

Pages: 1 ... 55 56 [57] 58 59 ... 317