Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Xeda112358

Pages: 1 ... 55 56 [57] 58 59 ... 317
841
ASM / Re: DivAHLby10 Routine Check
« on: July 06, 2013, 07:10:21 am »
Because you are using add hl,hl, bit 0 of HL is always 0 by the time you get to that, so you should be fine (as verified by jacobly :P) You might be able to get better speed by doing this, though:
Code: [Select]
DivAHLby10:
 ld d,a
 ld bc,$180a
 sub a
DAHLLoop1:
 add hl,hl
 rl d
 rla
 cp c
 jr c,DAHLLoop2
 sub c
 inc l
DAHLLoop2:
 djnz DAHLLoop1
 ld e,a
 ld a,d
 ret
E is the remainder, AHL is the quotient. It is 4 bytes smaller and 262 t-states faster :)

842
TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines
« on: July 05, 2013, 10:17:16 pm »
That is awesome! Is that 6MHz? Also, what other kinds of math routines do you have in there? ^_^

843
TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines
« on: July 05, 2013, 12:00:10 pm »
I see, A basically works as a 16-bit integer where the upper 9 bits are all the same. I have this:
Code: [Select]
     ld hl,0
     or a
     jp p,$+5
       sbc hl,de
     ld b,8
mulloop:
     add hl,hl
     rla
     jr nc,$+5
       add hl,de
       adc a,0
     djnz mulloop
     ret

That treats A as a signed integer, HL as an unsigned integer. I hope that works!

844
TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines
« on: July 05, 2013, 11:31:39 am »
Well, multiplication is always signed, regardless. Division is the only one that you need to do a specific routine for the sign. What inputs/outputs are you expecting, though, when you use it? (just some numbers, so I can figure out what you are looking for)

845
TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines
« on: July 05, 2013, 08:10:51 am »
A lot of users would probably have pre-built images to use, so you should have a way so that users can pass a pointer to the image data (in the format of your stack) to have it rendered. Also, HL_Times_A:
Code: [Select]
HL_Times_A:
     ex de,hl
DE_Times_A:
;Inputs:
;     DE and A are factors
;Outputs:
;     AHL is the product
;     B is 0
;     C is not changed
;     DE is not changed
;Time:
;     342+13x
;
     ld b,8          ;7           7
     ld hl,0         ;10         10
aaa:
       add hl,hl     ;11*8       88
       adc a,a       ;4*8        32
       jr nc,rrr     ;(12|25)*8  96+13x
         add hl,de   ;--         --
         adc a,0
rrr:
       djnz aaa      ;13*7+8     99
     ret             ;10         10
I feel like there is a much better way to do this... Also, it returns a 24-bit result. If you only need the lower 16 bits, you can remove 'adc a,0' and change 'adc a,a' to 'rlca' to preserve a.

846
TI Z80 / Re: FileSyst
« on: July 05, 2013, 07:55:20 am »
You could use a .C3 extension since Celtic 3 supports all of those. An extension of .C3 will look for DCS7 first, since DCS7 has the most complete and bug-free[citation needed] version, else it will look for Celtic 3, and if that isn't available, then it won't execute.

EDIT:Also, thanks DrDnar!

847
TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines
« on: July 04, 2013, 07:23:30 pm »
If you only need 8-bit multiplication, I recently wrote my new personal best for speed and size:
Code: [Select]
H_Times_E:
;Inputs:
;     H,E
;Outputs:
;     HL is the product
;     D,B are 0
;     A,E,C are preserved
;Size:  12 bytes
;Speed: 311+6b, b is the number of bits set in the input H
;      average is 335 cycles
;      max required is 359 cycles
     ld d,0     ;1600    7      7
     ld l,d     ;6A      4      4
     ld b,8     ;0608    7      7
                ;           
     add hl,hl  ;29      11*8   88
     jr nc,$+3  ;3001 12*8-5b   96-5b
     add hl,de  ;19      11*b   11b
     djnz $-4   ;10FA  13*8-5   99
                ;           
     ret        ;C9      10     10
And the unrolled code isn't too large, either, so you can get away with a ridiculously fast routine:
Code: [Select]
H_Times_E:
;Inputs:
;     H,E
;Outputs:
;     HL is the product
;     D,B are 0
;     A,E,C are preserved
;Size:  36 bytes
;Speed: 191+6b+9p, b is the number of bits set in the input H, p is if it is odd
;   average is 229.5 cycles (105.5 cycles saved)
;   max required is 258 cycles (101 cycles saved)
     ld d,0      ;1600   7   7
     ld l,d      ;6A     4   4
           ;     
     sla h      ;CB24    8
     jr nc,$+3   ;3001  12-1b
     ld l,e       ;6B    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29    11
     jr nc,$+3   ;3001  12+6b
     add hl,de   ;19    --

     add hl,hl   ;29   11
     ret nc      ;D0   11+15p
     add hl,de   ;19   --
     ret         ;C9   --
Also, it returns a 16-bit result that you can work with to do whatever.

EDIT: Simple optimisation in the unrolled loop >.>

848
TI Z80 / Re: FileSyst
« on: July 04, 2013, 07:17:03 pm »
The problem is that I still don't understand Flash protocol and USB protocol. I am not sure, though, but I think SirCmpwn released a template for an operating system, so I could probably use that.

849
TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines
« on: July 04, 2013, 01:09:20 pm »
Yay! Yeah, for some values, it is almost 3 times faster than the routine I gave you originally. What are your typical numbers for HL?

850
TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines
« on: July 04, 2013, 12:38:15 pm »
Okay, because the only problem area that I could find was pointed out by Jacobly earlier (if HL=8000h it will return a wrong result that is negative the real answer). The fix is simple:
Code: [Select]
;===============================================================
HL_Div_BC_Signed:
;===============================================================
;Performs HL/BC
;Speed: 1350-55a-2b
;         b is the number of set bits in the result
;         a is the number of leading zeroes in the absolute value of HL, minus 1
;         add 24 if HL is negative
;         add 19 if BC is negative
;         add 28 if the result is negative
;Size:    68 bytes
;Inputs:
;     HL is the numerator
;     BC is the denominator
;Outputs:
;     DE is the quotient
;     HL is the remainder
;     BC is not changed
;Destroys:
;     A
;===============================================================
     ld a,h
     xor b
     push af
;absHL
     xor b
     jp p,$+9
     xor a \ sub l \ ld l,a
     sbc a,a \ sub h \ ld h,a
;absBC:
     bit 7,b
     jr z,$+8
     xor a \ sub c \ ld c,a
     sbc a,a \ sub b \ ld b,a

     ld de,0
     adc hl,hl
     jr z,EndSDiv
     ld a,16

     add hl,hl
     dec a
     jp nc,$-2
     ex de,hl
     jp jumpin
Loop1:
     add hl,bc     ;--
Loop2:
     dec a         ;4
     jr z,EndSDiv  ;12|23
     sla e         ;--
     rl d          ;--
jumpin:            ;
     adc hl,hl     ;15
     sbc hl,bc     ;15
     jr c,Loop1    ;23-2b     ;b is the number of bits in the absolute value of the result.
     inc e         ;--
     jp Loop2      ;--
EndSDiv:
     pop af \ ret p
     xor a \ sub e \ ld e,a
     sbc a,a \ sub d \ ld d,a
     ret
Remember that HL and BC are the inputs, DE is the output (HL is the remainder).

851
TI Z80 / Re: CopyProg
« on: July 04, 2013, 11:54:45 am »
Okay, now let's try this. I figured out the problem and even a rookie wouldn't have made this mistake! I forgot to modify HL after crossing a page boundary.
Xeda112358 has shame.
:P

852
TI Z80 / Re: FileSyst
« on: July 04, 2013, 11:15:30 am »
Hey! I was trying to come up with ideas for things that I could add today. I added in a command called OSNEW() that creates an OS variable with an optional size argument. I was thinking of adding in a complicated method for handling variables specific to FileSyst. Basically, here is the idea and I think it is too complicated and I won't add it:

-Create a special type of hidden folder with the name of the main program being run.
-Have a command to define a subroutine
-This creates a folder with the name of the subroutine, and the relative location in the variable for quick lookup
-Inside this folder, will contain named variables used by the routine such as floats, ints, and strings.

This means that if I add an ability to SUB(LBL), or whatever, they can have local variables (and possibly access variables in other subroutines). This could also mean that such a language would be slower than TI-BASIC since variables can have custom names and they are located in folders (then again, this could potentially be faster than the OSes VAT lookup).

So yeah, just some food for thought :)

853
EDIT: Jacobly pointed out the case HL = 8000h, so this doesn't work D:

Hopefully this file has the updated SDiv routine. I have this:
Original routine
Code: [Select]
p_SDiv:
.db __SDivEnd-1-$
ld a,h
xor d
push af
xor d
jp p,__SDivSkip1-p_SDiv-1
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
__SDivSkip1:
bit 7,d
jr z,__SDivSkip2
xor a
sub e
ld e,a
sbc a,a
sub d
ld d,a
__SDivSkip2:
call $3F00+sub_Div
x_SDivEntry:
pop af
ret p
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
ret
__SDivEnd:
   Smaller routine: 1 byte, 1|6 cycles saved
Code: [Select]
p_SDiv:
.db __SDivEnd-1-$
ld a,h
xor d
push af
xor d
jp p,__SDivSkip1-p_SDiv-1
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
__SDivSkip1:
xor d
jp p,__SDivSkip2-p_SDiv-1
xor a
sub e
ld e,a
sbc a,a
sub d
ld d,a
__SDivSkip2:
call $3F00+sub_Div
x_SDivEntry:
pop af
ret p
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
ret
__SDivEnd:
And my only change is the two lines after __SDivSkip1.
Same size, save at least 1 cycle (up to 6 cycles).

EDIT: The same modification can be made to the fixed point signed division routine.

854
ASM / Re: ASM Optimized routines
« on: July 04, 2013, 08:46:10 am »
EDIT: Fixed a problem to take care of the case where HL= 8000h (thanks Jacobly!)
This routine a few pages back can be optimised:
Code: [Select]
SignedDivision:
ld a,h
xor d
push af

bit 7,h
jr z,$+8
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a

bit 7,d
jr z,$+8
xor a
sub e
ld e,a
sbc a,a
sub d
ld d,a

call RegularDivision

pop af
add a,a
ret nc

xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
ret
For the sign testing, I came up with this:
Code: [Select]
SignedDivision:
ld a,h
xor d
push af

xor d
jp p,$+9
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a

bit 7,d
jr z,$+8
xor a
sub e
ld e,a
sbc a,a
sub d
ld d,a

call RegularDivision

pop af
ret p

xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
ret
In all, it saves 1 bytes and at least 5 t-states (it will be either 5 or 10).

855
TI Z80 / Re: [z80 ASM] Unnamed set of 3D routines
« on: July 04, 2013, 06:44:59 am »
Hmm, what were you passing and what was it returning? It was definitely working for me.

Pages: 1 ... 55 56 [57] 58 59 ... 317