I haven't had much time to be active lately, but I thought I would update on my long term programming goals. I am rewriting FileSyst at the moment to have what I see as a more intelligent design. I decided that it would be beneficial to store files and folders in alphabetical order. This would be more aesthetically pleasing and no more difficult than the current method. Currently, I append new files to the end of the folder data, and then since the working directory is stored as an array of addresses, those values must be updated to reflect the changes of addresses, and each folder's size bytes in the path need to be updated. By storing alphabetically, this would still be the case, but I am also going to use an index.
Using the index, I can search for files more quickly. Instead of comparing the name of the files, reading through the header to get to the size bytes, and skipping the file to move to the next (which is already quick enough and similar to how TI searches the VAT), I can be more elegant using a binary search. This means I start halfway in the index, compare the names and then I jump another halved distance either up in the array or down until the halved distance becomes 0. To put this into perspective, if a folder had 128 files, instead of checking through an average of 64 files to locate one, I could instead check at most 7 files before locating the correct one.
I have already written this search routine and it is functional
If the file doesn't exist, I create the file by inserting a two-byte address into the index, then inserting the necessary RAM, then adjusting all of the appropriate index values in every folder in the path, and adjusting the size bytes of the folders. (This is only marginally better than the old method, speed wise).
As well, folder paths can be stored by index number, so the addition or deletion of a file will only require 1 byte incremented or decremented in the working directory path. I have been continuing my work on writing Z80 Floating Point Routines to add to my collection of integer math routines. Currently, the 24-bit floats are the most complete, including addition, subtraction, multiplication, division, square roots, logarithms, and arctangent. The 80-bit floats which are on par with the TI-OS floating point numbers (except with about 5 digits of extended accuracy) have only the basic 4 functions (+-*/). I am also aware that Grammer 3 should be a powerful programming language and that Grammer 4 should provide a replacement Operating System. FileSyst already contains a pseudo-language for command line functions such as CD(), OPEN(), et cetera. However, it currently does not allow nested functions or the like. I am in the process of rewriting this parser to allow complicated nesting of functions and I redesigned the file system layout with the intent to add variable support. I want to take advantage of the file system to store local variables and global variables, but I want to make it fast and that is why the binary search method sounded like the best option to me. So now imagine that you have a program running called Test.fs and you create a variable. Test.fs will be treated like a folder, too, and the variable will be stored inside it. Now say you wanted to call a subroutine called Func(). Internally, Func will be treated as another program/folder inside Test.fs where it can have its own variables. If it tries to access a variable that doesn't exist in its folder, it will go up a level in the folder system to see if the var exists there, and so on.
My hope is that this proves to be at least on par with the speed of TI-BASIC, if not faster. I imagine this could turn into Grammer 3 or Grammer 4, too.
EDIT: To understand the method, see this document. For simple formulas: arctan(x) : (9x+2x3)/(9+5x2) (at least 10 bits on [-1,1] use atan(x)=pi/2-atan(1/x) to extend to the whole real line) ln(x+1) : x(8+x)/(8+5x) (at least 9 bits on [0,1])
A while ago, I mentioned a nice formula that I derived for estimating arctangent to within 10 bits of accuracy on [-1,1] (so within 1/1024 of the actual arctan function). I really like it because it is so simple and short, and requires few math operations for such accuracy: (9x+2x3)/(9+5x2)
You can get away with using the following operations:
I was very excited to have a fairly fast 16-bit fast square root routine (300 cycles faster than my previous best), but after testing the routine, it appears to have problems for some large inputs. The first issue starts after the 14-bit range, so I imagine it is overflow. Looking at my routine, I foresaw the overflow issue in the last iteration so I moved it outside of the loop (while optimising it). I will be trying to fix this, but I thought there might be a chance somebody else spies the problem first.
sqrtHL: ;input is HL ;output is A ;734 t-states worst case ;39 bytes ld bc,700h ld d,c ld e,c sqrt16loop: add hl,hl \ rl e add hl,hl \ rl e rl c ld a,c rla sub e jr nc,$+5 inc c cpl ld e,a djnz sqrt16loop
ld l,c ld a,l add hl,hl \ rl e \ rl d add hl,hl \ rl e \ rl d sbc hl,de rla ret
I once wrote an algorithm that I originally generated from a fractal. It was a lot easier for me to come up with the rules for the fractal and then use the image for the algorithm than it was to solve the algorithm problem directly. However, yesterday, after nt touching this problem in almost a year, it randomly popped into my head with a solution, so I have the algorithm below in TI-BASIC (since I had my calculator nearby, I tested it on that). What I am wondering is how to optimise it. I tried to generate it with a binary expansion because ultimately, that is what I will be using it with, but it got too messy without the fractal I was using as a crutch:
"A must be odd While not(remainder(a,2 .5A→A B-1→B End
2A/2^B→A "constraints on A and B will always cause this to be on [0,1) B-2→B ".→Str1
While B B-1→B 2A→A int(A→R A-Ans→A If not(R 1-A→A Str1+sub("01",R+1,1→Str1 End Except for the first character, Str1 will be a sequence of 0s and 1s. Any ideas? It essentially maps an odd binary number to another binary number and the map is 1-1.
;given a,b such that a/2^(b+1) is on [0,1) and a is odd a/2^(b+1)→a b-2→b ""→string While b>0 b-1→b 2*a→a string & str(int(a))→string ;int(a) will either be 1 or 0, so append this to the string if a<1 1-a→a a-int(a)→a
This was one of the first assembly programs I ever made and it was inspired by the HextToBin command of Celtic 3. It basically converts a string of hex digits into the binary data (represented as tokens in BASIC). I realised it might be easier for some programmers to use a tiny program like this versus an app
EDIT: I'm still not good with GitHub, so I'm going to try to get this to work, but I make no promises. z80float Hopefully, this can evolve into a collection of floating point routines, with this first post edited to keep them in one easy place. I am working on some floating point routines for use in things like Grammer and I hope it will some day make its way into a community made OS, so I hope that people find these useful enough to use (and feel free to do so). I am still working on these routines, trying to optimise them, too.
edit: I removed the code because it was too much. Instead, I have made a public folder available here to keep these organised.
Tari, from Cemetech posted a link to some routines received from Scott Dial for BCD format (the same as the TI-OS). They can be found here.
ld hl,FP1 ld de,FP2 call FloatDiv <<code>> FP1: .db 0,2, $c9,$0f,$da,$a2,$21,$68,$c2,$34 ;pi, not rounded up FP2: .db 0,2, $ad,$f8,$54,$58,$a2,$bb,$4a,$9A ;e, not rounded up The 80-bit division and multiplication routines do not handle overflow yet. On the to-do list:
From these, routines such as those dealing with probability, statistics, and other similar routines should be easier to make. The advantage to using BCD is for displaying numbers or reading them during parsing. With the 80-bit format, most non-integers don't have an easy representation (like .1) and in order to parse a base-10 string of digits, you would have to do some very time consuming processes (probably a hundred times slower than with BCD). However, for compiled languages and where text output isn't necessary, 80-bit floats offer a much higher range of numbers, greater accuracy, and all for a decent speed.
I have noticed that for such areas of the site as the Tutorials section, when making a new tutorial, there isn't a way (that I can see) to preview it for formatting and whatnot. In the past, I have just opened another topic and hit the reply button so that I could preview what it looks like, but then I have to worry about accidentally posting it. With this in mind, here are my ideas:
-Adding a post preview button when writing a new tutorial. -If this is not possible, maybe a place on the site that we can use for scratch work while designing a post that doesn't have a 'post' button (just a preview button).
Pokémon Amber is neither a clone, nor a port of a Pokémon game. It takes place 100 million years in the past, before Fossil Pokémon went extinct. After doing much research into ancient Pokémon, it is realised that we don't know the habitats of most Fossil Pokémon and there is some ambiguity in their types. What was the Pokémon world like 100 million years ago? What other Pokémon existed that have not been discovered? How have they adapted through time? These questions and more will play a key role in the development of Pokémon Amber.
Note: This game is not endorsed by Nintendo or any other Pokémon-affiliated organisation. This is just a game being developed by a fan purely for the enjoyment of others. There will be some creative liberties taken. I thought I would start posting progress here instead of keeping it to my signature. As of yesterday, the following screenshot was up-to-date on showing progress:
As you can see, I have a scrolling, animated tilemap with proper clipping, and working text commands. As well, I wrote routines for easily making menu and drawing rectangles. Internally, there is a token set, so words that are used frequently in the game can be represented by 1 byte and menus can be somewhat interactive.
I ended up completely rewriting the graphics today, after analysing a different approach. This made the tilemap routine take about 60% of the time previously required (so it is a lot faster) and all of the current drawing got a speed boost (as well as LCD updating). My plan is to keep it running at 6MHz and it is currently energy friendly, I hope (it makes heavy use of the halt instruction).
Things I need to do: I need to place a player sprite on the screen as well as NPCs and other objects. These are all handled by the event handler, so once I have that finished (probably a good hour or two of focused coding), then trees (or Vines, in this game), boulders, non-player characters, doorways, warp tiles, and all of that will be in working order.
I need to make a smooth-scrolled transition as the player walks. This will definitely slow things down, but that should be fine.
I have not started on battles, Pokémon, or Items yet.
This game will probably use a lot of RAM. I am trying to keep it all in non-user RAM, but that will not be possible with the storage system. Save data will be about 1000 bytes, plus however many Pokémon I allow in the PC (it is about 50 bytes for each Pokémon). So to use the PC system, I either need to have all the boxes in RAM or, only a few at a time (like how the first two generations required you to save every time you changed boxes, it was basically archiving the data and bringing a new box to RAM).
Sprites are going to be monochrome, 24x24 and I plan for 64 pokémon total. This means 4608 bytes of sprite data, but they will all have additional information, bloating it to around 6400 bytes for Pokémon data. Because of this, I will probably have map data and most other event data stored in archived appvars, unless somebody thinks it is a better idea to keep it all in a multi-page app. If I keep it separate, it might be easier to create new games.
I have had this idea since my first attempt at an OS, but I ran into a few problems. Basically, I wanted to store the graph buffer in columns because I thought it would be very useful for drawing tiles and updating the LCD. Then I started thinking about line drawing, circle drawing, and anything that would cross a byte boundary and I realised that some routines would take a major hit to speed.
I am working on a project and I am still trying to decide if it would be beneficial to organise the screen in this manner. Here is an example of a tile drawing routine using the current buffer setup:
drawtile: ;DE points to the sprite data ;BC = (y,x) ; draw an 8x8 tile where X is on [0,11] and Y is on [0,7] ld a,b add a,a add a,b add a,a add a,a add a,a ld h,0 ld b,h ld l,a add hl,hl add hl,hl add hl,bc ld bc,(DrawBufPtr) add hl,bc
ld bc,12 ld a,8 ex de,hl ; 32 ldi ;128 ex de,hl ; 32 add hl,bc ; 88 inc c ; 32 dec a ; 32 jr nz,$-7 ; 91 ret And here is how it looks the other way:
drawtile: ;DE points to the sprite ;BC = (y,x) ;X*64 or a ld a,c ld l,0 rra rr l rra rr l ld h,a ;y*8 ld a,b add a,a add a,a add a,a add a,l ld l,a ld bc,(DrawBufPtr) add hl,bc ex de,hl ld bc,8 ldir ret The former is 565 t-states 33 bytes, the latter is 281 t-states 28 bytes. There are ways to optimise both routines for speed.
Spoiler For Optimised:
Taking the first routine and replacing the sprite drawing code:
push hl \ pop ix ;IX points to where it gets drawn ;DE is the sprite layer ld a,(de) ; 7 inc de ; 6 ld (ix),a ;19 ld a,(de) ; 7 inc de ; 6 ld (ix+12),a ;19 ld a,(de) ; 7 inc de ; 6 ld (ix+24),a ;19 ld a,(de) ; 7 inc de ; 6 ld (ix+36),a ;19 ld a,(de) ; 7 inc de ; 6 ld (ix+48),a ;19 ld a,(de) ; 7 inc de ; 6 ld (ix+60),a ;19 ld a,(de) ; 7 inc de ; 6 ld (ix+72),a ;19 ld a,(de) ; 7 ld (ix+84),a ;19 ret Now it is 388 t-states (saving 187 cycles), but the cost is 25 bytes. The latter has a similar optimisation (unrolling). It saves only 45 t-states at the cost of 11 bytes by replacing the LD BC,8 \ LDIR with 8 LDI instructions.
If you have ever written your own LCD updating routine, you probably already realized just how straight forward this would make the routine (and if we had no LCD delay, it would amount to basically 12 iterations of ld b,64 \ outir). We typically don't need to optimise for speed with such an LCD update because most of the time, the code is waiting for the LCD to respond before moving to the next byte. However, if you are doing something like grayscale or interleaving another routine with the LCD update (like drawing a tilemap at the same time), this gives you even more time to do more complicated things with the LCD, putting your 'waste cycles' to more use.
Sprite Drawing The reason for why some drawing will be so easy is that each 8 columns of pixels is 64 bytes which is much nicer to work with than a row of pixels being 12 bytes. We also see a huge boost in performance when moving down or up a pixel because that only requires an increment or decrement of a pointer, instead of adding 12 each time. However, now we get the same problem when moving left or right across byte boundaries. This means that sprite routines could take a hit, but let's see how far we can remedy this.
This will be a very simple routine to XOR an 8x8 sprite to the gbuf:
PutSprite8x8: ;Note: No clipping. ;Inputs: ; BC = (x,y) ; IX points to the sprite ; 1871 worst-case ld a,b and $F8 ld h,0 rla \ rl h rla \ rl h rla \ rl h ld l,a ld a,b ld b,0 add hl,bc ld bc,9340h add hl,bc ;HL points to the first byte to draw at and 7 jr nz,crossedbound push ix \ pop de ld b,8 ld a,(de) xor (hl) ld (hl),a inc hl inc de djnz $-5 ret crossedbound: ld b,a dec a ld (smc_jump1),a ld (smc_jump2),a ld a,1 rrca djnz $-1 dec a ld e,a ld c,8 ;E is the mask ;IX points to the sprite ;HL points to where to draw drawloop1: ld a,(ix) .db 18h ;start of jr * smc_jump1: .db 0 rlca rlca rlca rlca rlca rlca rlca and e xor (hl) ld (hl),a inc ix inc hl dec c jr nz,drawloop1 ld c,56 add hl,bc ld a,e cpl ld e,a ld c,8 drawloop2: ld a,(ix-8) .db 18h ;start of jr * smc_jump2: .db 0 rlca rlca rlca rlca rlca rlca rlca and e xor (hl) ld (hl),a inc ix inc hl dec c jr nz,drawloop2 ret That actually turns out to be pretty fast, so if you need to draw sprites, this is still a viable buffer setup.
LCD Updating As promised, the routine to update the LCD is fairly straight forward:
#define lcddelay() in a,(16) \ rlca \ jr c,$-3 ld a,5 out (16),a lcddelay() ld a,80h out (16),a ld hl,9340h lcddelay() ld a,20h col: out (16),a push af ld bc,4011h row: lcddelay() outi jr nz,row lcddelay() pop af inc a cp 2Ch jr nz,col ret
Note that if you are only ever doing fullscreen updates (or at least full columns) and you are always using the same increment mode, you can leave the first part of that code in a setup portion of your code:
.org 9D93h .db $BB,$6D Start: ld a,5 ;set the increment mode, only needs to be done once out (16),a lcddelay() ld a,80h ;set the row pointer, only needs to be done once, since the LCD update routine leaves it where it started. out (16),a Main:
<code>
UpdateLCD: ld hl,9340h ld a,20h col: out (16),a push af ld bc,4011h row: lcddelay() outi jr nz,row lcddelay() pop af inc a cp 2Ch jr nz,col ret
;GetPixelLoc ;Inputs: ; BC =(x,y) ; DE is the buffer on which to draw ;Outputs: ; Returns HL pointing to the byte where the pixel gets plotted ; Returns A as a mask ; NC returned if out of bounds, else C if in bounds ld a,c \ cp 64 \ ret nc ld a,b \ cp 96 \ ret nc and $F8 ld h,0 rla \ rl h rla \ rl h rla \ rl h ld l,a ld a,b ld b,0 add hl,bc add hl,de ;HL points to the first byte to draw at and 7 ld b,a ld a,1 inc b rrca \ djnz $-1 scf ret Now to set the pixel, use or (hl) \ ld (hl),a or use xor to invert, and to erase, cpl \ and (hl) \ ld (hl),a.
Final Analysis It turns out that most drawing is faster and that my original fears were just based on me being too accustomed to one way of doing things. Line drawing, circle drawing, and rectangle drawing are all faster (lines and circles just because it is faster to locate a pixel, rectangles because it just works fantastically). Sprites, tiles, and LCD updating work out great. However, there is one area that does in fact take hit and that is scrolling the screen. Shifting up and down is still relatively easy, but shifting left and right will be slower and more complicated. Shifting up or down is just shifting the whole buffer 1 byte instead of 12, which is the same speed. Here is shifting right:
ld hl,9340h ;gbuf ld de,64 ld c,e loop: or a ld b,12 rr (hl) push af \ add hl,de \ pop af djnz $-5 dec h \ dec h \ dec h inc l dec c jr nz,loop ret That is now half the speed of what it is for the current gbuf setup. We can cut out 9828 t-states if interrupts are off, though, but that is still a huge hit to speed.
Aside from that, I like the idea of organising the buffer this way.
EDIT: Modified a few routines to be smaller, no speed change, though. EDIT2: Added a link to the rectangle routines below.
This is my first working Prizm C program and it is much faster and prettier than my BASIC version. As the title implies, this draws the Mandelbrot set and currently does not do much. You have to wait for it to draw the whole thing and it isn't very fast (it takes about 1 minute 15 seconds to render), but once it is done it looks cool:
Press [MENU] to exit.
I really don't suggest downloading it since all it amounts to is what you see in the still screenshot. My plan is to make an explorer and see if I can possibly make it faster, but I thought I would share with anybody that is curious. Also, the code:
Based on code that Matrefeytontias gave me, I put together a plasma program in assembly I am pleased with the speed, but it can be made a bit faster. First, I am using the OS bcall _GrBufCpy which is not that fast, and second, if I include a custom routine, it will be interleaved with the program so updating the LCD will cost almost no extra time. Using some mathy things and Wabbit's code counter, I estimate cutting out 167000 t-states every loop by ditching the OS routine and using an interleaved routine. Anyways, here is what it looks like at 6MHz and the code I through together for this first version
ld a,(smc_index) add a,a .db $C6 smc_var1: .db 0 and 31 ld hl,LUT add a,l jr nc,$+3 inc h ld l,a ld c,(hl) .db 16h ;ld d,* smc_xskew: .db 32 call C_div_D ld a,c ; ld a,(hl) ; rlca \ rlca \ rlca \ and 7 ld (smc_xcomponent),a ld ix,933Fh ld hl,0 while_y_lt_32: ld a,l .db $C6 smc_ycomponent: .db 0 and 31 ld de,LUT add a,e jr nc,$+3 inc d ld e,a ld a,(de) ld (smc_m),a while_x_lt_48: ld a,h .db $C6 ;add a,* smc_xcomponent: .db 0 and 31 ld de,LUT add a,e jr nc,$+3 inc d ld e,a ld a,(de) .db $C6 smc_m: .db 0 ld b,0 jr nc,$+3 inc b ld c,a push bc ld de,(smc_ycomponent) ld a,(smc_xcomponent) add a,e add a,h add a,l ld c,a ld a,(smc_index) add a,c and 31 ld de,LUT add a,e jr nc,$+3 inc d ld e,a ld a,(de) pop bc add a,c jr nc,$+3 inc b rr b \ rra rr b \ rra ld b,a ;color ld a,h \ and 3 jr nz,$+10 inc ix ld (ix),a ld (ix+12),a rl (ix) \ rl (ix) rl (ix+12) \ rl (ix+12) ld a,b cp 144 \ jr nc,$+5 inc (ix) cp 96 \ jr nc,$+8 inc (ix+12) inc (ix+12) cp 48 \ jr nc,$+11 inc (ix+12) inc (ix) inc (ix) inc h ld a,h cp 48 jr nz,while_x_lt_48 ld h,0 ld de,12 add ix,de inc l ld a,l cp 32 jp nz,while_y_lt_32
bcall(486Ah) bcall(4744h) cp 15 ret z ld hl,(smc_xskew) ld de,(smc_yskew) ld bc,(smc_var1) cp 3 jr nz,$+3 inc l cp 2 jr nz,$+3 dec l cp 1 jr nz,$+3 inc e cp 4 jr nz,$+3 dec e cp 10 jr nz,$+3 inc c cp 11 jr nz,$+3 dec c ld a,c and 31 ld (smc_var1),a ld a,e cp 64 jr nc,$+5 ld (smc_yskew),a ld a,l cp 64 jr nc,$+5 ld (smc_xskew),a jp MainLoop
LUT: .db 255,254,246,234,219,199,177,153,128,103,79,57,37,22,10,2,0,2,10,22,37,57,79,103,128,153,177,199,219,234,246,254 C_Div_D: ;Inputs: ; C is the numerator ; D is the denominator ;Outputs: ; A is the remainder ; B is 0 ; C is the result of C/D ; D,E,H,L are not changed ;
xor a sla c rla sla c rla cp d jr c,$+4 inc c sub d sla c rla cp d jr c,$+4 inc c sub d sla c rla cp d jr c,$+4 inc c sub d sla c rla cp d jr c,$+4 inc c sub d sla c rla cp d jr c,$+4 inc c sub d sla c rla cp d jr c,$+4 inc c sub d sla c rla cp d ret c inc c sub d ret Controls: Arrows change the skew value. I actually have no idea what effect these have except that small number produce more noticeable effects. The upper limit is 63, I think the lower limit is 0 (which will result in dividing by 0). + and - change a value to, and I also don't know precisely how that value is changing things.
I have never before gotten a raycaster to work, but I may have finally gotten a version to work in BASIC. It is incredibly slow, but if the code is correct, then this means I will be able to use it as a basis for converting it to other languages (like Axe, Assembly, Grammer). I know raycasters have been made many times, but I am glad to finally have made one myself
How I did it: Basically, I read through a tutorial, tried to convert the C code to BASIC, failed (most likely a typo) then I tried it again a few days later and I made it properly (I think). I modified some of the code to optimise it for speed, I made some inputs as constants, and I made the viewing direction based on an angle measure instead of a vector. So essentially, there are three inputs-- the player (x,y) coordinate in the map and the angle at which they are turned. Also, the map data is stored in the last 31 columns of pixels on the graph screen.
I am pretty sure that there is still something wrong with the code, but I will try to work that out. As an example of what seems to be going wrong, if the viewing angle is 270 degrees (down) it seems that the rays only sweep from 270 degrees to about 315 instead of 225 to 315.
It seems that some topics get easily derailed by religion and religion is a topic that people seem to like to respond to. My idea was that we can use this topic as a way to respond to things dealing with religion if that response would otherwise be off topic. For example, to this topic, pimathbrainiac has requested several times to get the topic back on track. So i will instead post my response here, since it is completely about religion and not focused on the topic. Feel free to comment, respond, or bring in your own conversations from other topics
As always, try to remain civil. Since this is about religion, there will likely be offensive things said, but let's try to keep as much of that out of here as possible. We have done it before, so I know it is possible.
With the argument about their being a first cause, the argument is not invalid if you look at it mathematically. In fact, it is another example of mathematical induction (which, I would like to point out is completely logically sound, unlike philosophical induction). Take the natural numbers. Every number can be represented as the number before it added to 1. For example, 924576235238 is 1+924576235237. You can keep following this chain backwards until you hit 1. 1 is the smallest natural number-- the one where all natural numbers start. It does not come from anything else. You can think of this as the first cause, where 2 comes from that, and 3 comes from that, and so on.
The actual flaw in the argument does not come from assuming an initial cause is uncaused, even though all causes cause causes. The flaw is more in assuming there even was a 'first' cause. I can believe that if there was a first cause, then you could attribute that to a God (but even then, I would not be convinced that this being is anything that would impose directly upon anything other than the most base compositions of existence). However, if there was no first cause, that is where things get more interesting. What if you could simply keep going back and there never was a first cause? The God that is presented by the first cause argument is really powerful, but not sentient, moral, or any number of things like that-- it just is and that is the extent of it. There is no pleasing or displeasing such a thing. The God presented by the no-first-causes argument is one that I might call a Being, but since I have not thought fully on this branch of the topic, I have not yet convinced myself of this.
I am more inclined to believe that there was no 'first cause.' I think this is where many people fall into a trap because most people don't have to face the concept of infinity and so they choose to interpret 'first' as something finite-- which it is. I think that the argument may have been made, keeping with the idea of numbers, as one where the events could keep extending back through 0, -1, -2,... and that the chain of events never ended. The whole chain was envisioned and employed all at once by a Being outside of the system. Even then, it is a simple matter for us to see that, if we try. Draw a line segment. There are infinitely many measures along that line segment, yet you can see the whole thing at once.
If you think, you have a mind, because that is what makes you capable of thinking.
The mind is the soul and the brain working together.
If you have a soul, your soul must have been created.
Some being made your soul, and it would have to exist in its own existence, which would make it eternal, changeless, timeless.
This being is God.
("I am who am")
Points 1 and 3 are valid, but points 2 and 5 make a definition, so if we accept that definition, it is not debatable. Point 4 is where your argument falls through. There is no support for "Some being made your soul," or that "it would have to exist in its own existence" and this does not imply that it must be "eternal, changeless, timeless."
Also, even if you don't think that argument makes sense, how would you feel if you died and found out you were wrong about yourself being in a dream? This philosophy could make facing God at your judgement rather uncomfortable...
Personally, if this happens to me, then it is as simple as that-- I would be wrong and that is the extent of it. If God is a being that requires my devotion and belief more than being a 'good' person, then I cannot respect such a god. If that god values goodness, then I can respect such a god and I would see God as a friend. Regardless of the existence of God, I try to be a good person because that matters more to me than the belief in a god. I would rather do something purely out of my own desire to be good than with the bribe of eternal contentedness or the fear of eternal damnation.
Hey everyone, I wrote a routine for computing fixed point 8.8 natural logs. The issue is that it is huge and slow, taking about 11000 t-states and being 188 bytes for the routine plus the 8.8 FP division routine.
I am sure there are optimisations to make-- I know a few that will increase the size, but speed it up a tiny bit. But I also wonder if there is an even faster algorithm? Here is a breakdown of what my code does:
First, I use part of a continued fraction. I tried three methods-- a Maclaurin Series (the most computationally expensive), a series I derived yesterday (slightly less computationally expensive), and continued fractions. The continued fractions converges faster than the other two and requires less than half the number of Multiplication/Division calls.
The very first step, is testing if the input is zero, in which case it returns $FFFF
Next, it puts the number in the range of [1,2] by multiplying it by a power of 2 (that power is either negative or positive or zero). It stores the power for later use.
Next it performs the following computations in BASIC-ish pseudocode (for the continued fraction)
X-1→X push X 4*X→X ;in code, X is in HL, so I just use add hl,hl \ add hl,hl. This doesn't really count as a multiplication routine, then. Ans+4 ;In code, this is just inc h \ inc h \ inc h \ inc h X/Ans pop X Ans+3 X/Ans Ans+2 X/Ans Ans+1 X/Ans
The next step is to take the power of two that the original value was multiplied by, multiply that power by ln(2) and add it to the result. If the power was negative (so it was divided by a power of two), I store ln(2) in 8.16 FP format and add it a number of times to an accumulator , then add it to the result. If the power is positive, I simply subtract the 8.8 approximation of ln(2) from the result the number of times needed.
FPLog88: ;Input: ; HL is the 8.8 Fixed Point input. H is the integer part, L is the fractional part. ;Output: ; HL is the natural log of the input, in 8.8 Fixed Point format. ld a,h or l dec hl ret z inc hl push hl ld b,15 add hl,hl jr c,$+4 djnz $-3 ld a,b sub 8 jr nc,$+4 neg ld b,a pop hl push af jr nz,lnx jr nc,$+7 add hl,hl djnz $-1 jr lnx sra h rr l djnz $-4 lnx: dec h ;subtract 1 so that we are doing ln((x-1)+1) = ln(x) ld d,h ld e,l inc h call FPDE_Div_HL ;preserves DE, returns AHL as the 16.8 result inc h ;now we are doing x/(3+Ans) inc h inc h call FPDE_Div_HL inc h ;now we are doing x/(2+Ans) inc h call FPDE_Div_HL inc h ;now we are doing x/(1+Ans) call FPDE_Div_HL ;now it is computed to pretty decent accuracy pop af ;the power of 2 that we divided the initial input by ret z ;if it was 0, we don't need to add/subtract anything else ld b,a jr c,SubtLn2 push hl xor a ld de,$B172 ;this is approximately ln(2) in 0.16 FP format ld h,a ld l,a add hl,de jr nc,$+3 inc a djnz $-4 pop de rl l ;returns c flag if we need to round up ld l,h ld h,a jr nc,$+3 inc hl add hl,de ret SubtLn2: ld de,$00B1 or a sbc hl,de djnz $-3 ret
FPDE_Div_HL: ;Inputs: ; DE,HL are 8.8 Fixed Point numbers ;Outputs: ; DE is preserved ; AHL is the 16.8 Fixed Point result (rounded to the least significant bit) di push de ld b,h ld c,l ld a,16 ld hl,0 Loop1: sla e rl d adc hl,hl jr nc,$+8 or a sbc hl,bc jp incE sbc hl,bc jr c,$+5 incE: inc e jr $+3 add hl,bc dec a jr nz,Loop1 ex af,af' ld a,8 Loop2: ex af,af' sla e rl d rla ex af,af' add hl,hl jr nc,$+8 or a sbc hl,bc jp incE_2 sbc hl,bc jr c,$+5 incE_2: inc e jr $+3 add hl,bc dec a jr nz,Loop2 ;round ex af,af' add hl,hl jr c,$+6 sbc hl,de jr c,$+9 inc e jr nz,$+6 inc d jr nz,$+3 inc a ex de,hl pop de ret On average this is off from the actual value by about 1/400. Since this is smaller than the smallest 8.8 value, it is pretty accurate. The worst case scenario that I have found so far was trying to find ln(1/256) which was off by about 7/512 from the actual value.
Later I will hopefully be working on sin() and cos() among others, but I thought I would share this and see if anybody had better methods.
EDIT: Thought of a fairly simple optimisation. I am going to test another optimisation that may speed it up by as much as 40% EDIT2: Here is a much smaller and more efficient log2 routine. It is 71 bytes (so 117 bytes smaller) and takes <4000 t-states: EDIT3: I found a handful of bugs (errors in translating from hex to assembly) so I went through the code and fixed it. I also optimised a few areas, saving a total of 1 byte and some clock cycles. Also, a legitimate use of 'rr a' instead of 'rra'. I only added that for ambiguity since I only needed to check if A>1. In a later post, I have a size optimised version that averages about 39 t-states slower which is about 1% slower.
Log_2_88: ;Inputs: ; HL is an unsigned 8.8 fixed point number. ;Outputs: ; HL is the signed 8.8 fixed point value of log base 2 of the input. ;Example: ; pass HL = 3.0, returns 1.58203125 (actual is ~1.584962501...) ;70 bytes ex de,hl ld hl,0 ld a,d ld c,8 or a jr z,DE_lessthan_1 srl d jr z,logloop-1 inc l rr e jp $-7 DE_lessthan_1: ld a,e dec hl or a ret z inc l dec l add a,a jr nc,$-2 ld e,a
I wanted to show you one result from my work over the past two days, since it lead to some neat derivations. Essentially, it is an approximation that converges to the gamma function. I know, it isn't that awesome, but I had fun
Attached are some computation thingies, too, if you wanted to look at it.