Author Topic: How Axe works for the Assembly adverse (Read 2988 times)

Freyaday · « **on:** April 23, 2011, 09:40:07 pm »

Axe works in mysterious ways, and some of them lead to optimizations that don't make sense unless it's known what's going on.
First off, whenever Axe encounters a variable, it puts that variable into HL, no matter what. Having to do ld HL varVAR every time is slow, so use constants whenever possible.
-> puts whatever is at HL into the following variable
Math operands put the proceeding value, be it a var or a constant, into HL and performs the operation on HL using the following value.
This is why you put constants at the end of the maths when possible (You have been doing that, right?

), so that 1 becomes 1 in the code instead of ld HL 1.

PS: Am I doing this right?

Munchor · « **Reply #1 on:** April 24, 2011, 07:53:57 am »

I think this post would be in a better place here or here.

Nevertheless, I wouldn't be surprised if Quigibo used register a for Math.

Freyaday · « **Reply #2 on:** April 24, 2011, 09:53:10 am »

The second place would not be a good place, seeing as how this is a non-ASM explanation.

Also, if anyone else can contribute, that'd be great, especially seeing as how what's in the first post is all I've got so far.

Deep Toaster · « **Reply #3 on:** April 24, 2011, 12:33:46 pm »

Sorry if this makes your post look redundant, but Runer had a nice explanation here:

Quote from: Runer112 on January 03, 2011, 11:07:14 pm

Science lesson! YAY (By the way I completely understand if you don't want to read this whole thing, it's a bit lengthy. I'll put the important parts in bold.)

All Axe operations rely on using the register pair hl as the "Ans" value, using it to hold the running value of calculations. For those who aren't fully familiar with z80 assembly, hl is a combination of h and l, two 8-bit (1-byte) "registers." Registers are like variables you might store in memory, but they're stored directly inside of the the processor, so they can be used quickly. The basic registers (a, b, c, d, e, h, and l) are all 8-bit values, so most commands were built to work with these 8-bit registers, hence the z80 being an 8-bit system. However, Zilog knew that 8 bits was a little restrictive, and especially for systems that would have more than 256 bytes of memory, being able to easily use and manipulate 16-bit values (like pointers) would be very helpful. Did you notice that the other 5 basic registers go in alphabetical order before randomly jumping to h and l? Well that's because h and l are two very special 8-bit registers, designed to easily be combined into the Higher and Lower halves of a 16-bit value. With this special designation come very useful 16-bit operations built-in.

*2, for instance, simply breaks down to one assembly instruction: add hl,hl. This simply adds the value of hl to itself, which in other words multiplies it by 2. Because Zilog knew a basic function like adding would be a core operation, they made sure to make it small and swift: 1 byte to call and 11 cycles to execute.

*256 is a multiplication by 2^8, so one could achieve this by adding hl to itself 8 times. But there's an easier way to think of this. Just like how multiplication by 10 in a decimal system shifts every digit left one place, multiplication by 2 in a binary system shifts every digit left one place as well. And because hl is a 16-bit value with the high 8 bits in h and the low 8 bits in l, shifting these bits left 8 places would just result in the value in l being shifted all the way out and into h, and 8 trailing zeros being shifted into l. So instead of using add hl,hl eight times, *256 uses the following instructions: ld h,l / ld l,0. The first costs 1 byte and 4 cycles and the second costs 2 bytes and 7 cycles, for a grand total of 3 bytes and 11 cycles. Just as fast as *2!

*128, however, isn't so easy. Again, the obvious approach is to add hl to itself 7 times. This would cost 7 bytes and 77 cycles. You may also think to use the previous technique to multiply hl by 256, and then divide it by 2. However, we have a problem (pun not intended). If we multiplied the value by 256 and then divided it by 2, we would have lost the highest bit from the multiplication by 256 before dividing it by 2 again! So that's a bit of a problem. Anyways, Axe is optimized for size, and we can do better: using a loop. And although the z80 is a pretty old system, they were nice enough to give us a built-in loop structure: ld b,7 / add hl,hl / djnz $-1. This loads 7 into the b register, adds hl to hl, and then repeats adding hl to hl 6 more (b-1) times. Although this is a good amount slower than either of the previous options, coming in at 170 cycles, it only takes 2 bytes to initialize the loop counter, 1 byte for the add instruction, and 2 bytes for the loop execution instruction, for a total of 5 bytes.

Sorry to bore you... But congratulations, you now all know at least a little bit about z80 assembly, the structure of compiled Axe code! And the more you know about Axe's internals, the more you can optimize it, whether it be for speed or size!

EDIT: Wow, it took me a whole hour to write this? Major ninja'd.

Freyaday · « **Reply #4 on:** April 24, 2011, 12:42:52 pm »

This is intended to be free of Asm, though, an explanation for those of us who, like me, find Asm scary.

Author Topic: How Axe works for the Assembly adverse (Read 2988 times)

Freyaday

How Axe works for the Assembly adverse

Munchor

Re: How Axe works for the Assembly adverse

Freyaday

Re: How Axe works for the Assembly adverse

Deep Toaster

Re: How Axe works for the Assembly adverse

Freyaday

Re: How Axe works for the Assembly adverse