Author Topic: nGL - a fast (enough) 3D engine for the nspire  (Read 265376 times)

0 Members and 1 Guest are viewing this topic.

Offline gameblabla

  • LV3 Member (Next: 100)
  • ***
  • Posts: 86
  • Rating: +17/-1
    • View Profile
    • Gameblabla's website
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #465 on: May 16, 2015, 04:36:00 pm »
I finally got around to writing a tutorial on how to use nGL: http://github.com/Vogtinator/nGL
That would be great if you made a tutorial about texture mapping.
Also, does anyone know if nGL is faster for 2D stuff than n2DLib ?
From what i've seen, it seems so.

Offline Vogtinator

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1193
  • Rating: +108/-5
  • Instruction counter
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #466 on: May 16, 2015, 04:59:43 pm »
Quote
That would be great if you made a tutorial about texture mapping.
Will do, that's the next step :)

Quote
Also, does anyone know if nGL is faster for 2D stuff than n2DLib ?
The 3D parts definitely not. Although it's faster on desktop machines to use orthogonal projection for 2D rendering as that's hardware accelerated,
that's not the case here.
There is a small 2D part in texturetools.cpp, for working with TEXTURE objects, like (GL_LINEAR scaled) blitting, block blitting with 50% opacity, resizing and converting to greyscale for classic calcs, but not much more. Those parts are optimized, thus probably faster than n2DLib (never tested, but it might show if you blit an excessive amount of pixels) and support blitting from TEXTURE to TEXTURE instead of blitting to screen only.
If you read some older posts in this thread and n2DLib, you might notice that there was already quite a discussion about speed...

Edit: Lesson 2 - Texture mapping, is up.
« Last Edit: May 16, 2015, 06:15:27 pm by Vogtinator »

Offline Matrefeytontias

  • Axe roxxor (kinda)
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1982
  • Rating: +310/-12
  • Axe roxxor
    • View Profile
    • RMV Pixel Engineers
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #467 on: May 16, 2015, 06:41:58 pm »
Yeah well people wouldn't stop talking shit about how n2DLib was slower than nGL even though not a single test was ever made. I'm very glad to notice the exact same sentence again, "it's probably faster (though never tested)". You optimized it ? Oh yeah cool, so did pierrotdu18, Hayleia and I. So if someone could make the damned speed test so that we finally know.

For the thousandth time, I won't be angry if nGL is faster that n2DLib. But reading sentences like "it's optimized, thus probably faster than n2DLib" clearly assuming that n2DLib is not optimized, thus totally ignoring all the work that was put into it by several people, that really upsets me.

Offline pimathbrainiac

  • Occasionally I make projects
  • Members
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1731
  • Rating: +136/-23
  • dagaem
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #468 on: May 17, 2015, 01:39:43 am »
I finally got around to writing a tutorial on how to use nGL: http://github.com/Vogtinator/nGL
Maybe we'll see some more 3D games on the nspire now!
Yay tutorials! Now to make some cool stuff with nGL!
I am Bach.

Offline Vogtinator

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1193
  • Rating: +108/-5
  • Instruction counter
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #469 on: May 17, 2015, 08:07:19 am »
I don't know why I even bother answering this the second time, I wrote this some time ago already.

Quote
Yeah well people wouldn't stop talking shit about how n2DLib was slower than nGL even though not a single test was ever made.
I would make a test, but n2DLib doesn't support TEXTURE-TEXTURE blitting what nGL does. Although that shouldn't make a huge difference, it'll be unfair.

Quote
You optimized it ? Oh yeah cool, so did pierrotdu18, Hayleia and I.
Well, I'm sorry to say, but it just doesn't look like it.
The drawSprite routine, as the simplest example, makes a call to setPixel per pixel.
This is bad because of four reasons:
-Function calls are slow
-Two comparisons
-Multiplication
-Variable loaded from RAM, indirectly

For easier comparision, here are the two inner loops of drawSprite and the nGL equivalent, compiled with the same flags as your example:
Code: [Select]
00000a80 <setPixel>:
     a80:       e35100ef        cmp     r1, #239        ; 0xef
     a84:       93500d05        cmpls   r0, #320        ; 0x140
     a88:       33a03d05        movcc   r3, #320        ; 0x140
     a8c:       30210193        mlacc   r1, r3, r1, r0
     a90:       359f300c        ldrcc   r3, [pc, #12]   ; aa4 <setPixel+0x24>
     a94:       31a01081        lslcc   r1, r1, #1
     a98:       35933000        ldrcc   r3, [r3]
     a9c:       318320b1        strhcc  r2, [r3, r1]
     aa0:       e12fff1e        bx      lr
     aa4:       00011078        .word   0x00011078

In drawSprite:
     c94:       e1550008        cmp     r5, r8
     c98:       e08b3005        add     r3, fp, r5
     c9c:       aa000008        bge     cc4 <drawSprite+0x68>
     ca0:       e0da20b2        ldrh    r2, [sl], #2
     ca4:       e1d630b4        ldrh    r3, [r6, #4]
     ca8:       e1530002        cmp     r3, r2
     cac:       0a000002        beq     cbc <drawSprite+0x60>
     cb0:       e1a01004        mov     r1, r4
     cb4:       e1a00005        mov     r0, r5
     cb8:       ebffff70        bl      a80 <setPixel>
     cbc:       e2855001        add     r5, r5, #1
Code: [Select]
5a0:   e25cc001        subs    ip, ip, #1
 5a4:   3a000005        bcc     5c0 <drawTexture(...)+0x158>
 5a8:   e0d560b2        ldrh    r6, [r5], #2
 5ac:   e1d080b6        ldrh    r8, [r0, #6]
 5b0:   e2811002        add     r1, r1, #2
 5b4:   e1580006        cmp     r8, r6
 5b8:   114160b2        strhne  r6, [r1, #-2]
As that code is run per pixel, I guess that that is definitely a noticable difference.

Offline Adriweb

  • Editor
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1708
  • Rating: +229/-17
    • View Profile
    • TI-Planet.org
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #470 on: May 17, 2015, 09:13:38 am »
I guess one is optimized C, the other is optimized ASM... (or, at least, C code such that the ASM gets optimized much better in the end). So clearly on this level the assembly-optimized version can only be better.
But knowing how to optimize at this level is definitely something that much less people know how to do.

And I'm sure that since both things are open-source, one could help the other when needed ;)
My calculator programs
TI-Planet.org co-admin.
TI-Nspire Lua programming : Tutorials  |  API Documentation

Offline pimathbrainiac

  • Occasionally I make projects
  • Members
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1731
  • Rating: +136/-23
  • dagaem
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #471 on: May 17, 2015, 12:16:42 pm »
That with what Adriweb said. I'm sure if you all worked on the same project, we'd get one even better library than either of the two are now :P
I am Bach.

Offline Hayleia

  • Programming Absol
  • Coder Of Tomorrow
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 3367
  • Rating: +393/-7
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #472 on: May 17, 2015, 05:21:15 pm »
You optimized it ? Oh yeah cool, so did pierrotdu18, Hayleia and I.
Wut ? I don't know about pierrot and you but I didn't do anything. I pretty much dropped the project when I saw one of you was not putting brackets for one-instruction blocks, even when it's a for in a for (which leads to very ugly code in my opinion).
I own: 83+ ; 84+SE ; 76.fr ; CX CAS ; Prizm ; 84+CSE
Sorry if I answer with something that seems unrelated, English is not my primary language and I might not have understood well. Sorry if I make English mistakes too.

click here to know where you got your last +1s

Offline rwill

  • LV2 Member (Next: 40)
  • **
  • Posts: 29
  • Rating: +3/-0
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #473 on: May 18, 2015, 09:35:12 am »

For easier comparision, here are the two inner loops of drawSprite and the nGL equivalent, compiled with the same flags as your example:
Code: [Select]
...As that code is run per pixel, I guess that that is definitely a noticable difference.

Sadly both routines appear to be quite unoptimized.

Offline Adriweb

  • Editor
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1708
  • Rating: +229/-17
    • View Profile
    • TI-Planet.org
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #474 on: May 18, 2015, 09:47:55 am »
Then let's just code the lib in hand-written ASM directly :P
My calculator programs
TI-Planet.org co-admin.
TI-Nspire Lua programming : Tutorials  |  API Documentation

Offline Vogtinator

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1193
  • Rating: +108/-5
  • Instruction counter
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #475 on: May 18, 2015, 01:44:43 pm »
Quote
Sadly both routines appear to be quite unoptimized.
I guess you could make it faster by doing 32-bit transfers, but the shortest asm version with word-transfers is the nGL version minus
Code: [Select]
ldrh    r8, [r0, #6], because that should happen outside of the loop and r1 could be used as counter instead of r12.
Basically (r0 is source, r1 is dest, r2 is end of source)
Code: [Select]
loop:
ldrh r3, [r0], #2
strh r3, [r1], #2
cmp r0, r2
bne loop
Also, 32bit transfers would be impossible if source or dest aren't 32-bit aligned which isn't the case if you have an uneven X.
« Last Edit: May 18, 2015, 01:53:16 pm by Vogtinator »

Offline rwill

  • LV2 Member (Next: 40)
  • **
  • Posts: 29
  • Rating: +3/-0
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #476 on: May 18, 2015, 03:40:58 pm »
Shorter is not *edit* always */edit* faster. Especially given the ARM9EJ-S core.

*edit*

Given your example, which seems to copy without the transparency check, we end up with something this ( cycle estimate at end of line ):

Code: [Select]
loop:
ldrh r3, [r0], #2   | 1
strh r3, [r1], #2   | 2-4 ( 2 cycles of 2 cycle interlock on r3 )
cmp r0, r2     | 5
bne loop            | 6-8

So, for example, unrolling the loop would help quite a bit.
8 Cycles for copying a single pixel without any transformation is way too much.
« Last Edit: May 19, 2015, 02:23:02 am by rwill »

Offline Vogtinator

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1193
  • Rating: +108/-5
  • Instruction counter
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #477 on: May 20, 2015, 07:44:41 am »
Quote
So, for example, unrolling the loop would help quite a bit.
8 Cycles for copying a single pixel without any transformation is way too much.
It would, but as I said, only possible if Xsrc % 2 == Xdest % 2, so not widely applicable.

Quote
8 Cycles for copying a single pixel without any transformation is way too much.
Yeah, but I guess you can't do much about it without using Asm (and I target not to, except if there are very obvious improvements).
The assembler doesn't look much different with more gcc optimizations.

Quote
Given your example, which seems to copy without the transparency check, we end up with something this ( cycle estimate at end of line ):
I guess it could be improved by one cycles if the cmp is moved between the ldrh/strh?

Offline rwill

  • LV2 Member (Next: 40)
  • **
  • Posts: 29
  • Rating: +3/-0
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #478 on: May 20, 2015, 10:08:15 am »
Ah .. might as well be more specific and direct ..
Code: [Select]
void blit_pels( const int16_t *pp_srcB, int32_t i_src_stride, int16_t *pp_dstB, int32_t i_dst_stride, int32_t i_width, int32_t i_height )
{
int32_t i_y;
for( i_y = 0; i_y < i_height; i_y++ )
{
int16_t pp_a0, pp_a1, pp_a2, pp_a3;
const int16_t *pp_src;
int16_t *pp_dst;
int32_t i_remain4, i_remain1;

pp_src = pp_srcB;
pp_dst = pp_dstB;

i_remain4 = i_width >> 2;
while( i_remain4 > 0 )
{
pp_a0 = *( pp_src++ );
pp_a1 = *( pp_src++ );
pp_a2 = *( pp_src++ );
pp_a3 = *( pp_src++ );
*( pp_dst++ ) = pp_a0;
*( pp_dst++ ) = pp_a1;
*( pp_dst++ ) = pp_a2;
*( pp_dst++ ) = pp_a3;
i_remain4--;
}

i_remain1 = i_width & 3;
while( i_remain1 > 0 )
{
pp_a0 = *( pp_src++ );
*( pp_dst++ ) = pp_a0;
i_remain1--;
}
pp_srcB += i_src_stride;
pp_dstB += i_dst_stride;
}
}


There might be some gain to increase the unroll block to 8 pixels instead of the 4 but I did not bother to even test this one here.

Offline Vogtinator

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1193
  • Rating: +108/-5
  • Instruction counter
    • View Profile
Re: nGL - a fast (enough) 3D engine for the nspire
« Reply #479 on: May 21, 2015, 11:02:19 am »
Quote
Quote
So, for example, unrolling the loop would help quite a bit.
8 Cycles for copying a single pixel without any transformation is way too much.
It would, but as I said, only possible if Xsrc % 2 == Xdest % 2, so not widely applicable.
Somehow I was thinking about 32-bit access here, I don't know why.