// Multiply a times b temp = 0 repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1 return temp if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
stolen from Axe p_MulFull: ; Input in hl, result in cahl ld c,h ld a,l ld hl,0 ;11 ld b,16 ;7 __MulFullNext: add hl,hl ;11 rla ;4 rl c ;8 jr nc,__MulFullSkip ;12/7 add hl,de ;11 adc a,0 ;7 jr nc,__MulFullSkip inc c __MulFullSkip: djnz __MulFullNext ret __MulFullEnd:
// Sqrt a temp = high byte of a a <<= 8 b = 0 repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2 return b If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
stole my own routine from axe (and modified it) p_Sqrt88: ; input in hlde, result in de ld b,16 ld a,h ld c,l push de ; ld ixh,d pop ix ; ld ixl,e ld de,0 ld h,d ld l,e __Sqrt88Loop: sub $40 sbc hl,de jr nc,__Sqrt88Skip add a,$40 adc hl,de __Sqrt88Skip: ccf rl e rl d add ix,ix rl c rla adc hl,hl add ix,ix rl c rla adc hl,hl djnz __Sqrt88Loop ret __Sqrt88End:
p_ArcTan: .db __ArcTanEnd-1-$ exde,hl;de = y pophl ex(sp),hl;hl = x pushhl lda,h;\ xord;/ Get parity jpm,__ArcTanSS-p_ArcTan-1 addhl,de;\ jr__ArcTanDS; | __ArcTanSS:; |hl = x +- y sbchl,de; | __ArcTanDS:;/ exde,hl;de = x +- y ldb,6;\ __ArcTan64:; | addhl,hl; |hl = 64y djnz__ArcTan64;/ call$3F00+sub_SDiv;hl = 64y/(x +- y) popaf;\ rla; |Right side, fine retnc;/ sbca,a;\ subh; |Reverse sign extend ldh,a;/ lda,l;\ adda,128; |Add or sub 128 ldl,a;/ ret __ArcTanEnd:
I'm curious as to why you multiplied by 64 before dividing. It would seem that if the times 64 was after the division, the result would generally be the same, but there would be less of a chance of overflow. It's possible though that it doesn't matter. Edit: Oh yeah... accuracy. Your way is more accurate, nvm.
p_DrawBmp: saves 3 to 4 bytes, and (8 ± 0 or 4) × (visible height) - 8 cycles
p_DrawBmp: ; ... c = bytes + 1 is required for the rest of the optimizations __DrawBmpGoodSize: ldb,a;B = plot_height incc;C = bytes+1 pushbc;****** BEGIN BUFFER CALCULATIONS ****** ; ... undo inc c above, affect z flag the same as before, c is still one more than before __DrawBmpLeftLoop: decc jrz,__DrawBmpSkipMain ; ... since c is one more than before, check e = c - 1 for 0, instead of c __DrawBmpOnLeft:;A = X + 8 ldd,(hl) inchl lde,c dece;E = 0 and z (if bytes = 0) jrz,__DrawBmpSt ; ... this stores one more than before to e, but all code paths lead to ; either pop de, ld e,(hl), or ld e,c before e is ever used. __DrawBmpStSkip: lda,e popde;D = X lde,c popbc ldc,e;C = bytes+1 ; ... same as above __DrawBmpColWall: decc jrz,__DrawBmpSkipMain lda,d jrnz,__DrawBmpColLeft cp88 ldd,(hl) inchl jrnc,__DrawBmpSkipMain ; I do not understand the reason for ld e,c, however, c is one more than before, ; so dec e to have e be the same as before, but I don't know if this is necessary. lde,c dece jr__DrawBmpSt ; ...
Sorry for bumping some of these so soon, but I wanted to change them to work with the new version.
:Repeat getKeyʳ=64 :End Could cause an infinite loop that may be impossible to get out of, if lowercase is enabled.
The reason for this is that if you type a lowercase letter, it is stored at $8446, but if you type any other key, it does not reset the value at $8446. (Edit: This causes all of the other keycodes to change to different values every time a lowercase letter is typed.) On the plus side, its value does seem to always be reset at the beginning of the program.
Some fixes for this are to only read $8446 if a >= $fc, reset $8446 after it is read, or to change checks for any key that is not a lowercase letter to getKeyʳ^256=key code. Edit: And in the last case, it should probably be documented somewhere, since it is not obvious just from playing around with getKeyʳ.
Due to multiple requests, I wrote an axe clock library, LIBCLOCK. See CLOCKTST for example code.
Axe Code Function on calculators with a clock (function on calculators without a clock).
Main functions: ClkOf() Turns the clock off (does nothing). ClkOn() Turns the clock on (does nothing). IsClk() Returns 1 if the clock is on, 0 if off (returns 0). °A:GetDT() Gets the current date and time. Sets 6 consecutive variables, or 6 consecutive words, starting at the passed in address. In this example, A = year, B = month, C = day, D = hour, E = minute, F = second (returns midnight of January 1, 1997). Do not pass in °r₁. SetDT(year,month,day,hour,minute,second) Sets the current date and time (does nothing relatively slowly). DOfWk(year,month,day) Returns the day of the week of the specified date, 1 = Sunday, ..., 7 = Saturday.
Low level functions: °A:GetRT() Gets the current raw time. Sets 2 consecutive variables, or 2 consecutive words, starting at the passed in address. In this example, AB = seconds since January 1, 1997 (AB = 0). °A:SetRT() Sets the current raw time. Uses 2 consecutive variables, or 2 consecutive words, starting at the passed in address (does nothing).
Bonus functions: Mul21(r₁,r₂,r₃) Multiplies r₁r₂ by r₃ and stores the result in r₁r₂. Div21(r₁,r₂,r₃) Divides r₁r₂ by r₃ and stores the result in r₁r₂, and the remainder in r₄. Edit: I think r₃ must be < 256.
Quick question: do ...'s nest? They obviously can't normally (start and end are indistinguishable) but with preprocessor conditionals they theoretically could. Not that I need it yet, but it would be a good thing to know.
Instead of commenting out code like this: ... .Code ... You can do something like this: ...If condition .Code ... And it will only comment out the code if the condition is true (condition must be a constant).
Also, yet another question, what is "Now able to use Return in a single argument for loop."
This used to be an error (because it wouldn't work): For(10) If A+1→A=B Return End End Now, Quigibo has found some amazing way to make it work, so it is allowed .
btw, most of this info is probably in the command list
Edit: Yet another yet another question is: What does select() do? I don't get what the commands list is saying
Select(EXP1,EXP2) finds the value of EXP1, stores it into some secret place, then it evaluates EXP2, and lastly, it looks in that secret place and returns the original value of EXP1. For example: Select(A,B→A)→B swaps A and B. A is stored to a secret place, B is stored to A, then the value of that secret place (the original value of A) is stored to B.
.db 3 sbchl,hl lda,h orl .db 2 sbchl,hl I figured that you already checked that the value of a is not used.
The second one was an optimization for 32-bit subtraction (it was supposed to be p_LtLeXX followed by dec hl). However, the following should work because it has no differing side effects and should be more common. (btw, I think Runer may have made a similar suggestion) .db 2 inc hl dec hl .db 0 ;<- dont know if this works
Hey, jacobly, I tested both of your multiplication routines and it doesn't seem like they're doing the same thing... I've tried both of them in my program but they have different results.
They both have different inputs. The first is hla * cde, and the second is hlc * bde. You probably want to add some code to the beginning of each so that the input works better for what you are doing. For example: ld a,c ld c,b
at the beginning of the first routine causes them to have the same input (hlc and bde).
// Multiply a times b temp = 0 repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1 return temp
// Divide a by b temp = 0 repeat for each bit in a temp <<= 1 temp += high bit of a a <<= 1 if (temp >= b) temp -= b set low bit of a return a
// Sqrt a temp = 0 b = 0 repeat for every 2 bits in a temp += high 2 bits of a a <<= 2 test = b << 2 + 1 b <<= 1 if (temp >= test) temp -= test set low bit of b return b // Sqrt a, sometimes better with multiple-of-a-byte registers temp = high byte of a a <<= 8 b = 0 repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2 return b The tricky part is figuring out how many bits are in each variable and allocating the z80 registers accordingly.
For p_SortD, affecting de doesn't matter because if you follow the code path, the next occurrence of de is when it is loaded from hl, so its contents don't matter. As for p_Mod, ac is the division result, which is never needed. You can also notice that in the original routine, the new bits shifted into ac are never read.
Actually, my routine only uses about 3 virtual registers. The only reason it uses so many z80 registers is because the virtual registers are so big. In fact, the first routine uses the same number of bits as your's would (don't forget that A and B in your routine would each be 48 bits wide). However, your routine does have the advantage of using similarly sized registers (and fewer iterations), so it probably would be more useful on other processors.
A*^B seems to be compiling to ld hl,A ld de,B call sub_Mul ld h,c ld l,a Which always returns zero because it calls sub_Mul instead of sub_MulFull. This happens with and without peephole opts.
Almost. CPIR returns an address, whereas this routine returns an index. Also, this routine is much easier to extend to arrays of words or objects (but I'm not saying that access to CPIR through an axe command wouldn't be cool ).