Author Topic: nGL - a fast (enough) 3D engine for the nspire (Read 286089 times)

DJ Omnimaga · « **Reply #375 on:** July 06, 2014, 07:27:35 pm »

I think it might be a better idea to use a different texture for the selected block since I tend to get confused with actual glass windows and stuff. Would a square with nothing except an outline that slowly flashes between black and white be better?

Quote from: Vogtinator on July 06, 2014, 01:58:34 pm

Edit: I didn't downvote you. I really like discussions both sides benefit from.

I think the downvote might have been more because his post could be interpreted as an unnecessary reprimand (although not comparable to the name calling and all-caps yelling on IRC within the last few weeks, which was similar to something I and some others previously got banned for) and starting a pissing contest about which lib is better or not and doing so in someone else's lib thread.

Matrefeytontias · « **Reply #376 on:** July 06, 2014, 08:06:38 pm »

First, I know who downvoted me. No need to pretend not knowing, he-knows-I'm-talking-to-him. Second, I find it legitimate to be angry when people keep bashing my work and saying without any stats to prove it "yes my lib is faster than his lib". Moreover, I keep proving my lib is not slow, but people keep ignoring me and seeing it as the slowest thing of the universe, at the point that they'll take any other lib instead of it. So it's getting *a little bit* annoying. Of course, if someone does test and proves by a + b that Vogtinator's lib is faster than mine at an identical task, then no problem. But just stating it with no numbers of any kind is insulting.

DJ Omnimaga · « **Reply #377 on:** July 06, 2014, 08:24:43 pm »

To be honest I didn't see anyone literally bashing your lib, other than implying it's either slower than the others (with no valid proof) or blaming it for nKaruga slowdowns on CX models (even though we all saw what happened with the transition from 84+SE to 84+CSE). I mean I didn't see anyone literally saying the lib is useless or that it sucks, just misinformed people who probably didn't mean it as an insult. Hence why I thought that the reaction was overboard. I'm not fine with the idea of downgrading someone else's work either.

Also, I was once told by a moderator that even if a n00b is being annoying or someone spams that it's against the rules to be agressive, so on Omni, if you question people's reading skills, tell members they have no brain, yell at them using cussing and all-caps or try to make them look stupid/inferior, then you cannot use any excuse to get away with it (the only way to get away with it is if you don't get caught). This is more an head-up, because in the past I got banned for being rude too (although not for very long) and you can see my rating ratio as an indicator.

Hayleia · « **Reply #378 on:** July 07, 2014, 01:29:12 am »

Quote from: Vogtinator on July 06, 2014, 02:35:56 pm

Quote from: Hayleia on July 06, 2014, 02:05:34 pm
Quote from: Vogtinator on July 06, 2014, 01:58:34 pm
Use a Texture struct rather than a unsigned short* for better readability.
Well, two of us come from Axe and care less about readability than about speed (even though it seems like we are not Nspire pros ) or efficiency.
It should compile to the same code. If not, it could be faster due to alignment if you use a "flexible array member" (StackOverflow question)

Well that answers the "speed problem", but not the "efficiency" one because to convert the tab into a struct, well there is a need for a function to convert the tab into a struct -.-
Not hard to do, not speed wasting, but not necessary either. Axe coders (or at least I

) like to know what they are doing and why it works instead of using functions that mean something in English but that don't mean anything to their eyes, like "initSprite" or I don't know what. It's like when you declare a Bitmap in Axe, you know what the first bytes represent and what the other ones represent

Quote from: DJ Omnimaga on July 06, 2014, 08:24:43 pm

Also, I was once told by a moderator that even if a n00b is being annoying or someone spams that it's against the rules to be agressive, so on Omni, if you question people's reading skills, tell members they have no brain, yell at them using cussing and all-caps or try to make them look stupid/inferior, then you cannot use any excuse to get away with it (the only way to get away with it is if you don't get caught). This is more an head-up, because in the past I got banned for being rude too (although not for very long) and you can see my rating ratio as an indicator.

I would not even say "on Omni" only. It's not constructive to be angry, whether on Omni or IRL or anywhere. When there is someone angry in a discussion, it only makes the other people angry. And when you tell people they have no brain, especially when you are the one to blame and not the others, it only makes them defend themselves about this free insult before contributing to the real discussion, and I would even say "instead of" instead of "before" because once they defended themselves they don't really want to discuss with you anymore.

Vogtinator · « **Reply #379 on:** July 07, 2014, 06:23:32 am »

Oh god, what did I do

Quote

and starting a pissing contest about which lib is better or not and doing so in someone else's lib thread.

Yeah, I didn't expect that.

Quote

I find it legitimate to be angry when people keep bashing my work and saying without any stats to prove it "yes my lib is faster than his lib".

You asked me on IRC TWICE how you could optimize it further, I answered you and you ignored it, except for the "unsigned int" as x/y part. I even provided a nice list of suggestions but you ignored it again. If you don't want to implement it, don't say that your lib is faster. Not using "-Ofast" or at least "-O2" is just ignorant.

And I wouldn't call a simple "And btw, it's faster than n2dlib

" bashing.

Quote

Moreover, I keep proving my lib is not slow, but people keep ignoring me

Which "people"?

Quote

and seeing it as the slowest thing of the universe, at the point that they'll take any other lib instead of it.

Your lib isn't slow in the meaning of "too slow". I never meant that. I just implied it could be much faster.

Quote

Of course, if someone does test and proves by a + b that Vogtinator's lib is faster than mine at an identical task, then no problem. But just stating it with no numbers of any kind is insulting.

Well, I don't need any benchmarks to compare multiple comparisions, branches and additions to a single increment.
You even have a "performance-bug" in your code I already pointed out multiple times. (BTW: The ARM926EJ-S has a hardware multiplier. "mla" should be faster than "x + y<<8 + y<<6", but generally the compiler knows better, so "y*320" is always the best choice, combined with "-ffast-math")
But if you want some, I'll make a benchmark for both libs. I guess rendering the same 32x32 tile from the center of a 128x128 tilemap onto the screen (to fill it completely) would do? If you don't trust me, you can make your own benchmark if you want to.

Quote

Well that answers the "speed problem", but not the "efficiency" one because to convert the tab into a struct, well there is a need for a function to convert the tab into a struct -.-

No reason to do anything like that, it's directly binary-compatible

Code: [Select]

struct Texture {
uint16_t width;
uint16_t height;
uint16_t transparent_color;
uint16_t bitmap[];
} __attribute__((packed));

could work. Untested.

Quote

or blaming it for nKaruga slowdowns on CX models (even though we all saw what happened with the transition from 84+SE to 84+CSE).

That's not his fault, that's the hardware and it will be the exact same with nGL (may be a little bit faster as nGL doesn't invert each sprite, but rather the entire screen once before it's drawn).

Quote

think it might be a better idea to use a different texture for the selected block since I tend to get confused with actual glass windows and stuff. Would a square with nothing except an outline that slowly flashes between black and white be better?

Yay, on-topic again! Seems like I should use my graphics tablet again.

A gif of a cute cat to calm down

Matrefeytontias · « **Reply #380 on:** July 09, 2014, 02:25:59 pm »

<off-topic again sorry, but got to clear that up>

Okay so I finally have a computer again, so I worked on n2DLib after several days when I couldn't. So I tried to implement your suggestions as good as I could.

First, sorry for being angry like this. I know it's ridiculous and shit, but I just had enough of seeing people implying the lib was supposedly slow. (and Vogtinator I never meant you, I know aeTIos kept saying it some time ago although he stopped now ; or maybe I'm just being too proud or paranoid, don't know which).

Anyway, well, it's a lib, so compile flags are up to the user.

DJ : I didn't want to start a war about what lib was better, I was just asking please stop assuming my lib is slow without even trying it or wondering if that's not your code's fault

I remember some days ago on IRC, Streetwalrus was saying something like "tetris is slow, it's n2DLib's fault" and was serious about it, and of course it turned out the problem was his own code. The fact itself isn't really important, it's just the way of thinking that's annoying. But let's forget that and pretend it never happened.

Vogtinator : for your optimizations, I do remember me asking you some, but I don't know, maybe I was working on something else at that moment so I forgot afterwards

I did my best at implementing them this time, it should be visible in github's history.

Although I find it a good idea, I can't afford inverting the buffer only when updateScreen-ing depending on the screen model, because maybe one day someone will not want to erase the buffer each frame and that will give weird results.

So I guess that's it, sorry for the off-topic, sorry for being upset, sorry for saying stupid shit, things like that.

Back on-topic, when testing the last version on an emulated TI-Nspire CX CAS with Ndless 3.1 r914, I noticed you couldn't swim in water nor lava, is that intended ?

Vogtinator · « **Reply #381 on:** July 09, 2014, 02:50:37 pm »

Quote from: Matrefeytontias on July 09, 2014, 02:25:59 pm

<off-topic again sorry, but got to clear that up>

Okay so I finally have a computer again, so I worked on n2DLib after several days when I couldn't. So I tried to implement your suggestions as good as I could.

Looks better, but still a lot to improve! (any branch per pixel, for instance if(has_color) definitely needs to go!)

Quote

First, sorry for being angry like this. I know it's ridiculous and shit, but I just had enough of seeing people implying the lib was supposedly slow. (and Vogtinator I never meant you, I know aeTIos kept saying it some time ago although he stopped now ; or maybe I'm just being too proud or paranoid, don't know which).

Well, then I wonder why you posted here.. To tell everybody my lib was slow as well/as fast as yours?

Quote

Anyway, well, it's a lib, so compile flags are up to the user.

Yeah, but then you should use the correct flags for your example, it's an example after all..

Quote

DJ : I didn't want to start a war about what lib was better, I was just asking please stop assuming my lib is slow without even trying it or wondering if that's not your code's fault I remember some days ago on IRC, Streetwalrus was saying something like "tetris is slow, it's n2DLib's fault" and was serious about it, and of course it turned out the problem was his own code. The fact itself isn't really important, it's just the way of thinking that's annoying. But let's forget that and pretend it never happened.

Vogtinator : for your optimizations, I do remember me asking you some, but I don't know, maybe I was working on something else at that moment so I forgot afterwards I did my best at implementing them this time, it should be visible in github's history.

Although I find it a good idea, I can't afford inverting the buffer only when updateScreen-ing depending on the screen model, because maybe one day someone will not want to erase the buffer each frame and that will give weird results.

That's the reason nGL allocates a third buffer on monochrome calcs (yup, one on CX and three on non-CXs)!
320*240*2 = ~155KB, that's almost nothing.

Quote

So I guess that's it, sorry for the off-topic, sorry for being upset, sorry for saying stupid shit, things like that.

Back on-topic, when testing the last version on an emulated TI-Nspire CX CAS with Ndless 3.1 r914, I noticed you couldn't swim in water nor lava, is that intended ?

No, definitely not! You can't swin in lava and water needs to be at least two blocks deep to swin in.
I just tested and it works for me :-/

Edit: Is it permitted to double-post to seperate a release announcement from the rest?

Matrefeytontias · « **Reply #382 on:** July 09, 2014, 03:10:43 pm »

Quote from: Vogtinator on July 09, 2014, 02:50:37 pm

Quote from: Matrefeytontias on July 09, 2014, 02:25:59 pm
<off-topic again sorry, but got to clear that up>

Okay so I finally have a computer again, so I worked on n2DLib after several days when I couldn't. So I tried to implement your suggestions as good as I could.
Looks better, but still a lot to improve! (any branch per pixel, for instance if(has_color) definitely needs to go!)

Quote
Although I find it a good idea, I can't afford inverting the buffer only when updateScreen-ing depending on the screen model, because maybe one day someone will not want to erase the buffer each frame and that will give weird results.
That's the reason nGL allocates a third buffer on monochrome calcs (yup, one on CX and three on non-CXs)!
320*240*2 = ~155KB, that's almost nothing.

Well, you answered yourself to the problem

I'll do that too, coming from the z80 scene I'm always very afraid of allocating lots of memory.

Quote from: Vogtinator on July 09, 2014, 02:50:37 pm

Quote
First, sorry for being angry like this. I know it's ridiculous and shit, but I just had enough of seeing people implying the lib was supposedly slow. (and Vogtinator I never meant you, I know aeTIos kept saying it some time ago although he stopped now ; or maybe I'm just being too proud or paranoid, don't know which).
Well, then I wonder why you posted here.. To tell everybody my lib was slow as well/as fast as yours?

That "by the way, it's faster than n2DLib" just sitting here without any further development really made me upset.

Quote from: Vogtinator on July 09, 2014, 02:50:37 pm

Quote
Anyway, well, it's a lib, so compile flags are up to the user.
Yeah, but then you should use the correct flags for your example, it's an example after all..

Woops, I actually forgot that.

Quote from: Vogtinator on July 09, 2014, 02:50:37 pm

Quote
So I guess that's it, sorry for the off-topic, sorry for being upset, sorry for saying stupid shit, things like that.

Back on-topic, when testing the last version on an emulated TI-Nspire CX CAS with Ndless 3.1 r914, I noticed you couldn't swim in water nor lava, is that intended ?
No, definitely not! You can't swin in lava and water needs to be at least two blocks deep to swin in.
I just tested and it works for me :-/

Ah, it wasn't 2 blocks deep, but I thought that in the original Minecraft, you could swim on top of a water cube even if it was all alone ; here it just jumps.

Quote from: Vogtinator on July 09, 2014, 02:50:37 pm

Edit: Is it permitted to double-post to seperate a release announcement from the rest?

Yep.

rwill · « **Reply #383 on:** July 09, 2014, 03:13:18 pm »

Quote from: Vogtinator on July 06, 2014, 01:58:34 pm

And there's still more to optimize. Partial loop unrolling as 2x16bit access is slower than 1x32bit for example.

To give you an example of a almost fully optimized function, I optimized drawTexture some more: https://github.com/Vogtinator/crafti/blob/master/texturetools.cpp#L151
GCC does some partial unrolling, but doesn't transform 2 16bit accesses (ldrh/strh) to 32bit access(ldr/str), although -Ofast should do something, should I report a bug?
Code: [Select]
8e28: e15392b4 ldrh r9, [r3, #-36] ; 0xffffffdc 8e2c: e14292b4 strh r9, [r2, #-36] ; 0xffffffdc 8e30: e15392b2 ldrh r9, [r3, #-34] ; 0xffffffde 8e34: e14292b2 strh r9, [r2, #-34] ; 0xffffffde

There are actually two reasons why a compiler cannot use 32bit memory move instructions here.

1:

Code: [Select]

COLOR *dest_ptr = dest.bitmap + dest_x + dest_y * dest.width, *src_ptr = src.bitmap + src_x + src_y * src.width;dest_ptr might be src_ptr + 1 ( *src_ptr is not const ) so moving two texels for this case will produce different results.

2:
It is not guaranteed that src_ptr and dest_ptr are each aligned to 4 byte which is required if you want to move 4 byte at a time on ARM.

Vogtinator · « **Reply #384 on:** July 09, 2014, 03:50:52 pm »

I'm not creative enough to write some kind of intro, so here you go: Crafti v1.1

New features:

Redstone!
New executable file format Zehn
Take a screenshot ("/documents/ndless/screenshot_NR.ppm.tns") with Ctrl+. in-game
Change your FOV! The closer your near plane, the bigger your FOV (but also less fps)
Messages appear if the world has been loaded, saved, a screenshot taken or settings applied

Bugs fixed:

Bounding box of wheat
Embarrassing rounding error fixed: The weird line at the right screen edge is now gone!
Apparently confusing indicator texture changed to something animated
Better performance due to cache preloading, more GCC optimizations and Zehn
No crashes due to exhausted RAM: Exceptions are caught and terminate the app instead of rebooting!

Known issues:

You can fall through pressure plates
You can get stuck in a closed door

Redstone tutorial:
Redstone in crafti works a bit different than in minecraft, redstone wire doesn't need to be on a block, it's a block by itself and can be placed on everything including air and redstone torches!
Each block has two states: Powering or not powering. Examples for powering blocks are turned-on redstone torches, turned-on switches, pressure plates you're standing on and powered redstone wire. Redstone torches are on by default and turn off if any block surrounding the block the torch has been placed on is powering (ignoring the torch itself, of course). Redstone lamps glow if any block surrounding them is powering. Doors open if any adjacent block is powering or being powered.

Some screenshots to give you a better idea of how it works and looks like:

Switch turning redstone torch off and on:

Pressure plate turning on a redstone lamp:

It doesn't turn off directly after stepping off of the plate, there's a delay of 5 ticks, so doors stay open a bit longer and don't close directly in front of you.

Wire controling a floating lamp:

And the most basic redstone logic gate: The inverter!

Due to redstone wire being a block a clock is very simple:

The redstone torch controls the wire underneath, which is connected to the wire which controls the torch. First I thought it's a bug but actually it's rather useful!

Of course doors work as expected:

If you want to see how pressure plates and doors play together, you'll have to try that yourself

Have fun!

Vogtinator · « **Reply #385 on:** July 09, 2014, 03:58:04 pm »

Quote from: rwill on July 09, 2014, 03:13:18 pm

Quote from: Vogtinator on July 06, 2014, 01:58:34 pm
And there's still more to optimize. Partial loop unrolling as 2x16bit access is slower than 1x32bit for example.

To give you an example of a almost fully optimized function, I optimized drawTexture some more: https://github.com/Vogtinator/crafti/blob/master/texturetools.cpp#L151
GCC does some partial unrolling, but doesn't transform 2 16bit accesses (ldrh/strh) to 32bit access(ldr/str), although -Ofast should do something, should I report a bug?
Code: [Select]
8e28: e15392b4 ldrh r9, [r3, #-36] ; 0xffffffdc 8e2c: e14292b4 strh r9, [r2, #-36] ; 0xffffffdc 8e30: e15392b2 ldrh r9, [r3, #-34] ; 0xffffffde 8e34: e14292b2 strh r9, [r2, #-34] ; 0xffffffde

There are actually two reasons why a compiler cannot use 32bit memory move instructions here.

1:
Code: [Select]
COLOR *dest_ptr = dest.bitmap + dest_x + dest_y * dest.width, *src_ptr = src.bitmap + src_x + src_y * src.width;dest_ptr might be src_ptr + 1 ( *src_ptr is not const ) so moving two texels for this case will produce different results.

2:
It is not guaranteed that src_ptr and dest_ptr are each aligned to 4 byte which is required if you want to move 4 byte at a time on ARM.

To 1: I tried various GCC options to force such an optimization (but actually overlooked the missing const, thanks!)
2: Various GCC options + __builtin_assume_aligned
and it just doesn't want to optimize it -.-

Quote from: Matrefeytontias on July 09, 2014, 03:10:43 pm

Quote from: Vogtinator on July 09, 2014, 02:50:37 pm
Quote
First, sorry for being angry like this. I know it's ridiculous and shit, but I just had enough of seeing people implying the lib was supposedly slow. (and Vogtinator I never meant you, I know aeTIos kept saying it some time ago although he stopped now ; or maybe I'm just being too proud or paranoid, don't know which).
Well, then I wonder why you posted here.. To tell everybody my lib was slow as well/as fast as yours?
That "by the way, it's faster than n2DLib" just sitting here without any further development really made me upset.

From my POV you ignored my hints and I just had to write it, sorry.

Streetwalrus · « **Reply #386 on:** July 09, 2014, 03:59:00 pm »

Quote from: Matrefeytontias on July 09, 2014, 02:25:59 pm

DJ : I didn't want to start a war about what lib was better, I was just asking please stop assuming my lib is slow without even trying it or wondering if that's not your code's fault I remember some days ago on IRC, Streetwalrus was saying something like "tetris is slow, it's n2DLib's fault" and was serious about it, and of course it turned out the problem was his own code. The fact itself isn't really important, it's just the way of thinking that's annoying. But let's forget that and pretend it never happened.

When did I say that ?

I was just jokingly saying that n2DLib's 4K was big, which it obviously isn't.

Vogtinator · « **Reply #387 on:** July 09, 2014, 04:06:56 pm »

If you use most standard C/POSIX features like sprintf, fopen and opendir the MINIMUM executable size is 52K on my fork, which is more than the current ndless_resources_3.6.tns

But still less than a fullscreen image

Could we keep this slowly getting ridiculous and annoying discussion about n2dlib vs nGL and Matrefeytontias vs everyone out of here and/or just FORGET it?

Matrefeytontias · « **Reply #388 on:** July 09, 2014, 05:08:24 pm »

@Streetwalrus : I remember saying "Y U never think it could be your own fault" or something like that, and you said "yeah sorry" or kinda.

ANYWAY what about you don't call that a versus because it's not one, and someone deletes the posts that have to be deleted ? I got enough drama for a while.

Also that looks great

does it still work on GS calcs which use the 3.1 version of Ndless, since you now use another executable format ?

GinDiamond · « **Reply #389 on:** July 09, 2014, 05:15:27 pm »

Will pistons, comparators, and repeaters and the like be implemented?

Author Topic: nGL - a fast (enough) 3D engine for the nspire (Read 286089 times)

DJ Omnimaga

Re: nGL - a fast (enough) 3D engine for the nspire

Matrefeytontias

Re: nGL - a fast (enough) 3D engine for the nspire

DJ Omnimaga

Re: nGL - a fast (enough) 3D engine for the nspire

Hayleia

Re: nGL - a fast (enough) 3D engine for the nspire

Vogtinator

Re: nGL - a fast (enough) 3D engine for the nspire

Matrefeytontias

Re: nGL - a fast (enough) 3D engine for the nspire

Vogtinator

Re: nGL - a fast (enough) 3D engine for the nspire

Matrefeytontias

Re: nGL - a fast (enough) 3D engine for the nspire

rwill

Re: nGL - a fast (enough) 3D engine for the nspire

Vogtinator

Re: nGL - a fast (enough) 3D engine for the nspire

Vogtinator

Re: nGL - a fast (enough) 3D engine for the nspire

Streetwalrus

Re: nGL - a fast (enough) 3D engine for the nspire

Vogtinator

Re: nGL - a fast (enough) 3D engine for the nspire

Matrefeytontias

Re: nGL - a fast (enough) 3D engine for the nspire

GinDiamond

Re: nGL - a fast (enough) 3D engine for the nspire