This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
Messages - Xeda112358
Pages: 1 ... 55 56 [57] 58 59 ... 317
841
« on: July 06, 2013, 07:10:21 am »
Because you are using add hl,hl, bit 0 of HL is always 0 by the time you get to that, so you should be fine (as verified by jacobly ) You might be able to get better speed by doing this, though: DivAHLby10: ld d,a ld bc,$180a sub a DAHLLoop1: add hl,hl rl d rla cp c jr c,DAHLLoop2 sub c inc l DAHLLoop2: djnz DAHLLoop1 ld e,a ld a,d ret
E is the remainder, AHL is the quotient. It is 4 bytes smaller and 262 t-states faster
842
« on: July 05, 2013, 10:17:16 pm »
That is awesome! Is that 6MHz? Also, what other kinds of math routines do you have in there? ^_^
843
« on: July 05, 2013, 12:00:10 pm »
I see, A basically works as a 16-bit integer where the upper 9 bits are all the same. I have this:
ld hl,0 or a jp p,$+5 sbc hl,de ld b,8 mulloop: add hl,hl rla jr nc,$+5 add hl,de adc a,0 djnz mulloop ret
That treats A as a signed integer, HL as an unsigned integer. I hope that works!
844
« on: July 05, 2013, 11:31:39 am »
Well, multiplication is always signed, regardless. Division is the only one that you need to do a specific routine for the sign. What inputs/outputs are you expecting, though, when you use it? (just some numbers, so I can figure out what you are looking for)
845
« on: July 05, 2013, 08:10:51 am »
A lot of users would probably have pre-built images to use, so you should have a way so that users can pass a pointer to the image data (in the format of your stack) to have it rendered. Also, HL_Times_A:
HL_Times_A: ex de,hl DE_Times_A: ;Inputs: ; DE and A are factors ;Outputs: ; AHL is the product ; B is 0 ; C is not changed ; DE is not changed ;Time: ; 342+13x ; ld b,8 ;7 7 ld hl,0 ;10 10 aaa: add hl,hl ;11*8 88 adc a,a ;4*8 32 jr nc,rrr ;(12|25)*8 96+13x add hl,de ;-- -- adc a,0 rrr: djnz aaa ;13*7+8 99 ret ;10 10
I feel like there is a much better way to do this... Also, it returns a 24-bit result. If you only need the lower 16 bits, you can remove 'adc a,0' and change 'adc a,a' to 'rlca' to preserve a.
846
« on: July 05, 2013, 07:55:20 am »
You could use a .C3 extension since Celtic 3 supports all of those. An extension of .C3 will look for DCS7 first, since DCS7 has the most complete and bug-free[citation needed] version, else it will look for Celtic 3, and if that isn't available, then it won't execute.
EDIT:Also, thanks DrDnar!
847
« on: July 04, 2013, 07:23:30 pm »
If you only need 8-bit multiplication, I recently wrote my new personal best for speed and size:
H_Times_E: ;Inputs: ; H,E ;Outputs: ; HL is the product ; D,B are 0 ; A,E,C are preserved ;Size: 12 bytes ;Speed: 311+6b, b is the number of bits set in the input H ; average is 335 cycles ; max required is 359 cycles ld d,0 ;1600 7 7 ld l,d ;6A 4 4 ld b,8 ;0608 7 7 ; add hl,hl ;29 11*8 88 jr nc,$+3 ;3001 12*8-5b 96-5b add hl,de ;19 11*b 11b djnz $-4 ;10FA 13*8-5 99 ; ret ;C9 10 10
And the unrolled code isn't too large, either, so you can get away with a ridiculously fast routine:
H_Times_E: ;Inputs: ; H,E ;Outputs: ; HL is the product ; D,B are 0 ; A,E,C are preserved ;Size: 36 bytes ;Speed: 191+6b+9p, b is the number of bits set in the input H, p is if it is odd ; average is 229.5 cycles (105.5 cycles saved) ; max required is 258 cycles (101 cycles saved) ld d,0 ;1600 7 7 ld l,d ;6A 4 4 ; sla h ;CB24 8 jr nc,$+3 ;3001 12-1b ld l,e ;6B --
add hl,hl ;29 11 jr nc,$+3 ;3001 12+6b add hl,de ;19 --
add hl,hl ;29 11 jr nc,$+3 ;3001 12+6b add hl,de ;19 --
add hl,hl ;29 11 jr nc,$+3 ;3001 12+6b add hl,de ;19 --
add hl,hl ;29 11 jr nc,$+3 ;3001 12+6b add hl,de ;19 --
add hl,hl ;29 11 jr nc,$+3 ;3001 12+6b add hl,de ;19 --
add hl,hl ;29 11 jr nc,$+3 ;3001 12+6b add hl,de ;19 --
add hl,hl ;29 11 ret nc ;D0 11+15p add hl,de ;19 -- ret ;C9 --
Also, it returns a 16-bit result that you can work with to do whatever.
EDIT: Simple optimisation in the unrolled loop >.>
848
« on: July 04, 2013, 07:17:03 pm »
The problem is that I still don't understand Flash protocol and USB protocol. I am not sure, though, but I think SirCmpwn released a template for an operating system, so I could probably use that.
849
« on: July 04, 2013, 01:09:20 pm »
Yay! Yeah, for some values, it is almost 3 times faster than the routine I gave you originally. What are your typical numbers for HL?
850
« on: July 04, 2013, 12:38:15 pm »
Okay, because the only problem area that I could find was pointed out by Jacobly earlier (if HL=8000h it will return a wrong result that is negative the real answer). The fix is simple:
;=============================================================== HL_Div_BC_Signed: ;=============================================================== ;Performs HL/BC ;Speed: 1350-55a-2b ; b is the number of set bits in the result ; a is the number of leading zeroes in the absolute value of HL, minus 1 ; add 24 if HL is negative ; add 19 if BC is negative ; add 28 if the result is negative ;Size: 68 bytes ;Inputs: ; HL is the numerator ; BC is the denominator ;Outputs: ; DE is the quotient ; HL is the remainder ; BC is not changed ;Destroys: ; A ;=============================================================== ld a,h xor b push af ;absHL xor b jp p,$+9 xor a \ sub l \ ld l,a sbc a,a \ sub h \ ld h,a ;absBC: bit 7,b jr z,$+8 xor a \ sub c \ ld c,a sbc a,a \ sub b \ ld b,a
ld de,0 adc hl,hl jr z,EndSDiv ld a,16
add hl,hl dec a jp nc,$-2 ex de,hl jp jumpin Loop1: add hl,bc ;-- Loop2: dec a ;4 jr z,EndSDiv ;12|23 sla e ;-- rl d ;-- jumpin: ; adc hl,hl ;15 sbc hl,bc ;15 jr c,Loop1 ;23-2b ;b is the number of bits in the absolute value of the result. inc e ;-- jp Loop2 ;-- EndSDiv: pop af \ ret p xor a \ sub e \ ld e,a sbc a,a \ sub d \ ld d,a ret
Remember that HL and BC are the inputs, DE is the output (HL is the remainder).
851
« on: July 04, 2013, 11:54:45 am »
Okay, now let's try this. I figured out the problem and even a rookie wouldn't have made this mistake! I forgot to modify HL after crossing a page boundary. Xeda112358 has shame.
852
« on: July 04, 2013, 11:15:30 am »
Hey! I was trying to come up with ideas for things that I could add today. I added in a command called OSNEW() that creates an OS variable with an optional size argument. I was thinking of adding in a complicated method for handling variables specific to FileSyst. Basically, here is the idea and I think it is too complicated and I won't add it: -Create a special type of hidden folder with the name of the main program being run. -Have a command to define a subroutine -This creates a folder with the name of the subroutine, and the relative location in the variable for quick lookup -Inside this folder, will contain named variables used by the routine such as floats, ints, and strings. This means that if I add an ability to SUB(LBL), or whatever, they can have local variables (and possibly access variables in other subroutines). This could also mean that such a language would be slower than TI-BASIC since variables can have custom names and they are located in folders (then again, this could potentially be faster than the OSes VAT lookup). So yeah, just some food for thought
853
« on: July 04, 2013, 08:59:58 am »
EDIT: Jacobly pointed out the case HL = 8000h, so this doesn't work Hopefully this file has the updated SDiv routine. I have this: Original routine
p_SDiv: .db __SDivEnd-1-$ ld a,h xor d push af xor d jp p,__SDivSkip1-p_SDiv-1 xor a sub l ld l,a sbc a,a sub h ld h,a __SDivSkip1: bit 7,d jr z,__SDivSkip2 xor a sub e ld e,a sbc a,a sub d ld d,a __SDivSkip2: call $3F00+sub_Div x_SDivEntry: pop af ret p xor a sub l ld l,a sbc a,a sub h ld h,a ret __SDivEnd:
| | Smaller routine: 1 byte, 1|6 cycles saved
p_SDiv: .db __SDivEnd-1-$ ld a,h xor d push af xor d jp p,__SDivSkip1-p_SDiv-1 xor a sub l ld l,a sbc a,a sub h ld h,a __SDivSkip1: xor d jp p,__SDivSkip2-p_SDiv-1 xor a sub e ld e,a sbc a,a sub d ld d,a __SDivSkip2: call $3F00+sub_Div x_SDivEntry: pop af ret p xor a sub l ld l,a sbc a,a sub h ld h,a ret __SDivEnd:
|
And my only change is the two lines after __SDivSkip1. Same size, save at least 1 cycle (up to 6 cycles). EDIT: The same modification can be made to the fixed point signed division routine.
854
« on: July 04, 2013, 08:46:10 am »
EDIT: Fixed a problem to take care of the case where HL= 8000h (thanks Jacobly!) This routine a few pages back can be optimised: SignedDivision: ld a,h xor d push af
bit 7,h jr z,$+8 xor a sub l ld l,a sbc a,a sub h ld h,a
bit 7,d jr z,$+8 xor a sub e ld e,a sbc a,a sub d ld d,a
call RegularDivision
pop af add a,a ret nc
xor a sub l ld l,a sbc a,a sub h ld h,a ret
For the sign testing, I came up with this: SignedDivision: ld a,h xor d push af
xor d jp p,$+9 xor a sub l ld l,a sbc a,a sub h ld h,a
bit 7,d jr z,$+8 xor a sub e ld e,a sbc a,a sub d ld d,a
call RegularDivision
pop af ret p
xor a sub l ld l,a sbc a,a sub h ld h,a ret In all, it saves 1 bytes and at least 5 t-states (it will be either 5 or 10).
855
« on: July 04, 2013, 06:44:59 am »
Hmm, what were you passing and what was it returning? It was definitely working for me.
Pages: 1 ... 55 56 [57] 58 59 ... 317
|