Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Xeda112358

Pages: 1 ... 29 30 [31] 32 33 ... 317
451
KnightOS / Re: KnightOS
« on: April 04, 2015, 10:01:38 am »
I feel bad for not bheing more active in the project. I have been writing the floating point routines (significantly higher precision than the TI-OS, by the way) and while they are working, I've been stuck on converting floats to displayable strings.

452
TI Z80 / Fire (revisited)
« on: April 03, 2015, 12:40:21 pm »
A few years ago when Builderboy wrote his fire graphics turorial, I remember thinking it was super cool and I went on to make my own fire engines designed around it. However, one thing always peeved me-- the design was very clever and made for speed, but it seemed too artificial for my liking-- particularly in the way it masked out pixels.

The idea was to kill a pixel 1/8 of the time, so a PRNG was used to select from one of the following masks:
Code: [Select]
%01111111
%10111111
%11011111
%11101111
%11110111
%11111011
%11111101
%11111110
Of course an AND mask will kill one bit in a byte, so exactly 1/8 of the pixels in a byte. A few weeks ago, this was bugging me and preventing my sleep. I set out to devise an algorithm that would generate masks that didn't necessarily have a 1/8 kill rate, but over time the probability converges to 1/8. For example, one mask might be %11111111 (0% killed) and the next could be %10110111 (25% killed).

The Idea
Suppose you 3 random bits and you OR them together. The only way this can result in a 0 is if every input was a 0. The probability of that happening is .5^3 = .125 = 1/8. So for our fire mask, we can generate three random bytes, OR them together, and each bit has a 1/8 chance of being reset.
Speed
The disadvantage here is in speed. Builderboy's method requires one pseudo-random number, this method requires three. However, if we consider that we are almost certainly going to use a fast pseudo-random number generator, and that we will want a long (ish) period, we have room to take advantage of the PRNG and achieve almost the same speed. For example, suppose you generate a 24-bit pseudo-random number-- with this method, you can just OR the three bytes generated (12cc) versus using an LUT (Builder's method):
Code: [Select]
ld hl,LUT
and 7
add a,l
ld l,a
jr nc,$+3
inc hl
ld a,(hl)
;43cc
In the example code I will give, I use a 16-bit PRNG, so I generate three of these numbers (6 8-bit values) for 2 masks, making generation take 1.5 times longer than Builder's as opposed to 3 times as long.
Considerations
In order for this to work, you need an RNG or PRNG in which every bit has a 50% chance of being set or reset. I went with a Lehmer LCG that was fast to compute and had a period of 2^16 and had this property.
Example
The following code works at 6MHz or 15MHz and the LCD will provide almost no bottleneck. Sorry, no screenie:
Code: [Select]
smc = 1    ;1 for SMC, 0 for no SMC (use 1 if code is in RAM; it's faster)
plotSScreen = 9340h
saveSScreen = 86ECh


;==============================
#IF smc = 0
seed = saveSScreen
#ENDIF
;==============================



.db $BB,$6D
.org $9D95
fire:
;;first, set up. We will be writing bytes to the LCD left to right then down
    di
    ld a,7      ;LCD y-increment
    out (16),a
;;setup the keyboard port to read keys [Enter]...[Clear]
    ld a,%11111101
    out (1),a
;make the bottom row of pixel;s black to feed the flames
    ld hl,plotSScreen+756
    ld bc,$0CFF
    ld (hl),c \ inc hl \ djnz $-2
fireloopmain:
    ld ix,plotSScreen+12
    in a,(1)
    and %01000000
    ret z
    call LCG
    ld b,63
fireloop:
;wait for LCD delay
    in a,(16)
    rla
    jr c,$-3
    ld a,80h+63
    sub b
    out (16),a
    push bc
    call LCG
    ld a,20h
    out (16),a
    call fire_2bytes+3
    call fire_2bytes
    call fire_2bytes
    call fire_2bytes
    call fire_2bytes
    call fire_2bytes
    pop bc
    djnz fireloop
    jp fireloopmain   
fire_2bytes:
    call lcg
    push hl
    call lcg
    pop de
    ld a,e
    or d
    or l
    and (ix)
    out (17),a
    ld (ix-12),a
    inc ix
    push hl
    call lcg
    pop af
    or h
    or l
    and (ix)
    out (17),a
    ld (ix-12),a
    inc ix
    ret
lcg:
;240cc or 241cc, condition dependent (-6cc using SMC)
;;uses the Lehmer RNG used by the Sinclair ZX81
#IF SMC = 1
seed = $+1
    ld hl,1
#ELSE
    ld hl,(seed)
#ENDIF
;multiply by 75
    ld c,l
    ld b,h
    xor a
    ld d,a
    adc hl,hl
    jr nz,$+9
    ld hl,-74
    ld (seed),hl
    ret
    rla
    add hl,hl \ rla
    add hl,hl \ rla \ add hl,bc \ adc a,d
    add hl,hl \ rla
    add hl,hl \ rla \ add hl,bc \ adc a,d
    add hl,hl \ rla \ add hl,bc \ adc a,d
;mod by 2^16 + 1 (a prime)
;current form is A*2^16+HL
;need:
;  (A*2^16+HL) mod (2^16+1)
;add 0 as +1-1
;  (A*(2^16+1-1)+HL) mod (2^16+1)
;distribute
;  (A*(2^16+1)-A+HL) mod (2^16+1)
;A*(2^16+1) mod 2^16+1 = 0, so remove
;  (-A+HL) mod (2^16+1)
;Oh hey, that's easy! :P
;I use this trick everywhere, you should, too.
    ld e,a
    sbc hl,de       ;No need to reset the c flag since it is already
    jr nc,$+3
    inc hl
    ld (seed),hl
    ret
EDIT: .gif attatched. It looks slower in the screenshot than my actual calc, though.
EDIT2: On my actual calc, it is roughly 17.2FPS

453
TI Z80 / FastGK
« on: March 31, 2015, 02:46:38 pm »
I made this program to remove the delay in the BASIC getKey function! It's simple and ugly and I want to improve it at some point. For now, I just wanted a small code.

Basically, repeating keys like arrows and delete will repeat a lot faster, and not just on the homescreen like one of my older programs (Speedy Keys).

454
Math and Science / Computing Arcsine
« on: March 30, 2015, 01:48:30 pm »
I'm really peeved that I cannot seem to find an algorithm for computing arcsine that works like what I have. This algorithm is based on my work from years ago to compute sine and I have no idea why I never reversed it before now. Anyways, the algorithm:
ArcSine(x), ##x\in[-1,1]##
Code: [Select]
    x=2x
    s=z=sign(x)
iterate n times
    x=x*x-2
    if x<0
        s=-s
    z=2z+s
return pi*z*2^(-n-2)
This algorithm extracts one bit each iteration, so for 16 bit of accuracy, you would iterate 16 times. I think it is a pretty cute and compact algorithm, so where is it? @_@

As a note, at the endpoints {-1,1} it may not give the expected answer. In both cases, if allowed to run infinitely, it would return (in binary): ##\frac{\pi}{2}.1111111111111..._{2}## which of course is equivalent to ##\frac{\pi}{2}##.

455
ASM / Re: eZ80 Optimized Routines
« on: March 18, 2015, 10:53:06 pm »
That "one in ten million" is when bc = 1? How does pipelining work for an instruction like ldir (or any of the other repeat instructions)?
Yup, it is when BC=1. Pipelining works for ldir by reading the first byte of the instruction (1cc) then the next (1cc), then I am assuming each iteration does the RAM copy (2cc to read byte, then write), increment/decrement stuff (1cc). I assume looping comes at no additional cost.

456
Axe / Re: VAT tutorial
« on: March 16, 2015, 11:09:57 am »
I made a program for changing a variable's name (in RAM, as @Runer112 pointed out). Essentially, I made a new variable of zero size with the name I wanted. Then I just swapped all the VAT info except the names, and deleted the original.

This is probably the easiest way to do it, though somewhat slow and it requires up to 17 bytes of free RAM available.

457
Miscellaneous / Google Code shutting down
« on: March 13, 2015, 01:00:16 pm »
I got this email from Google:
Quote
Hello,

Earlier today, Google announced we will be turning down Google Code Project Hosting. The service started in 2006 with the goal of providing a scalable and reliable way of hosting open source projects. Since that time, millions of people have contributed to open source projects hosted on the site.

But a lot has changed since 2006. In the past nine years, many other options for hosting open source projects have popped up, along with vibrant communities of developers. It’s time to recognize that Google Code’s mission to provide open source projects a home has been accomplished by others, such as GitHub and Bitbucket.

We will be shutting down Google Code over the coming months. Starting today, the site will no longer accept new projects, but will remain functionally unchanged until August 2015. After that, project data will be read-only. Early next year, the site will shut down, but project data will be available for download in an archive format.

As the owner of the following projects, you have several options for migrating your data.

scope-os
langz80
The simplest option would be to use the Google Code Exporter, a new tool that will allow you to export your projects directly to GitHub. Alternatively, we have documentation on how to migrate to other services — GitHub, Bitbucket, and SourceForge — manually.

For more information, please see the Google Open Source blog or contact [email protected].

-The Google Code team

Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
You have received this mandatory email service announcement to update you about important changes to Google Code Project Hosting.
I found Google Code much easier to use, but I see their point about there being enough services already.

458
News / Re: New user mentions
« on: March 13, 2015, 12:53:47 pm »
So does that mean I can do something like @/me ? Or does it parse the @ first?

EDIT: Whelp, I found out :P @Xeda112358

459
ASM / Re: ASM Optimized routines
« on: March 02, 2015, 06:13:50 am »
Not sure if this is useful for any actual applications, but to get a mask for the lowest set bit in C:
Code: [Select]
    xor a
    sub c
    and c
So if c=%10110100, this would return a=%00000100 (it also always returns nc, and z only when c=0).

460
ASM / Re: eZ80 Optimized Routines
« on: March 01, 2015, 10:27:22 pm »
For those who don't care to check for timings and stuff, there was a trick for the Z80 to speed up copying data in RAM faster than LDIR. THe premise was to unroll the loop as LDI \ LDI \ ... \ LDI \ jp pe,ldi_loop
LDI is 16cc, the jump is 10cc, and an LDIR is 21cc for each byte, minus 5 (the last copied byte is 16cc). So unrolled to 4 LDI instructions saves 10cc for every 4 bytes, unrolled 8, 12, or 16 times saves  30, 50, or 70 cc for each chunk.

For the eZ80, this trick does not work. LDIR takes 3cc for each byte copied, plus 2cc for the last one, whereas LDI takes 5cc. This makes LDIR asymptotically faster than any unrolled LDI loop by 40% and the very worst (less than 1 in a ten million opportunities) is exactly the same speed.




TL;DR: On the eZ80, always use LDIR to copy chunks of data.

461
ASM / Re: ASM Optimized routines
« on: March 01, 2015, 10:14:07 pm »
This routine is a fast way to copy a chunk of data. For BC>34, calling this proves to be faster than using LDIR:
Code: [Select]
fastLDIR:
;copy BC bytes from HL to DE
;Cost:
;    27cc for having to call
;    110cc for setting up the loop, worst case
;    10cc * ceiling(BC/n)        ;n=2^k for some k, see the line below "ldirloop:"
;    16cc * BC
;costs roughly 152-BC*(5-10/n) more than a simple LDIR (worst case)
;for n=4,  BC>=61 saves
;for n=8,  BC>=41 saves
;for n=16, BC>=35 saves   * default, see the "ldirloop" to change
;for n=32, BC>=33 saves
;for n=64, BC>=32 saves
    push hl
    push af
    xor a
    sub c
    and 15               ;change to n-1
    add a,a
    ld hl,ldirloop
    add a,l
    ld l,a
    jr nc,$+3  ;these aren't needed if the ldirloop doesn't cross a 256 byte boundary. Can save 12cc on the above timings and 3 bytes.
    inc h       ;
    pop af
    ex (sp),hl
    ret
ldirloop:
;n=16, (number of LDI instructions, use qty of 4,8,16,32,64)
    ldi
    ldi
    ldi
    ldi
   
    ldi
    ldi
    ldi
    ldi
   
    ldi
    ldi
    ldi
    ldi
   
    ldi
    ldi
    ldi
_ldirloop_end:
    ldi
    jp pe,ldirloop
    ret
This might be useful for things like copying code to RAM (from an App) in speed critical applications, or other data handling tasks.

EDIT 9 Oct 18: Saved 1 byte and 3cc. Originally, I had 'ld a,16 \ sub c \ and 15'. The 'ld a,16' could hold any multiple of n (16 in this example), including 0, so I just used 'xor a'. I also updated all the timing info.

462
Reuben Quest / Re: Reuben 3 teasers
« on: February 19, 2015, 08:49:05 pm »
Sorunome :D

Someday you should use one of my engines since you would actually make good use of it XD


EDIT: I mention this because I use magic to get the smooth-scrolling background to work without breaking the player sprite.

463
How would a decimal computer be faster? Binary is definitely faster for most math algorithms. The projector idea seems more like you want a non-discrete display which would be cool. I've personally thought of how cool it would be to make a screen that instead of plotting pixels, it would plot basic geometric shapes.

Also, a side note: decimal is no better than binary. Humans started to count in decimal because it was natural to use a 1-1 map of each digit (finger digit, not numerical digit) to objects. It has been passed down generation to generation, but had you learned to count in binary instead, binary would be easier as there are a lot more tricks you can do with it. By the way, you can count to 1023 using your digits, whereas the 10-state system our ancestors have passed down only let us count to 10.

464
TI Z80 / Re: Vectors and Sprite Scaling
« on: February 11, 2015, 09:17:17 am »
For the sprite scaling routine, I am currently working on writing my own version inspired by Bajda's to suit your purposes (masked, grayscale, scalable sprites). I found a bunch of optimizations, and it was just easier to start from scratch, especially since I needed to make it draw 3 layers of sprites.

465
ASM / Re: eZ80 Optimized Routines
« on: February 09, 2015, 10:33:25 am »
So far today I have added
hl*a/255 which can be useful for things like dividing by anything divisible by 255 (like 3,5,15,17,51,85) as well as other values!
sqrt2424-bit integer square root, which can be useful for 8.8 fixed point square root (as 12 bits of precision are needed).
div16 is the 16-bit division. It is 145cc versus the 56cc worst case for mul16. The eZ80 will have a big advantage with multiplication over division.

Pages: 1 ... 29 30 [31] 32 33 ... 317