0 Members and 3 Guests are viewing this topic.
I finally got around to writing a tutorial on how to use nGL: http://github.com/Vogtinator/nGL
That would be great if you made a tutorial about texture mapping.
Also, does anyone know if nGL is faster for 2D stuff than n2DLib ?
I finally got around to writing a tutorial on how to use nGL: http://github.com/Vogtinator/nGLMaybe we'll see some more 3D games on the nspire now!
Yeah well people wouldn't stop talking shit about how n2DLib was slower than nGL even though not a single test was ever made.
You optimized it ? Oh yeah cool, so did pierrotdu18, Hayleia and I.
00000a80 <setPixel>: a80: e35100ef cmp r1, #239 ; 0xef a84: 93500d05 cmpls r0, #320 ; 0x140 a88: 33a03d05 movcc r3, #320 ; 0x140 a8c: 30210193 mlacc r1, r3, r1, r0 a90: 359f300c ldrcc r3, [pc, #12] ; aa4 <setPixel+0x24> a94: 31a01081 lslcc r1, r1, #1 a98: 35933000 ldrcc r3, [r3] a9c: 318320b1 strhcc r2, [r3, r1] aa0: e12fff1e bx lr aa4: 00011078 .word 0x00011078 In drawSprite: c94: e1550008 cmp r5, r8 c98: e08b3005 add r3, fp, r5 c9c: aa000008 bge cc4 <drawSprite+0x68> ca0: e0da20b2 ldrh r2, [sl], #2 ca4: e1d630b4 ldrh r3, [r6, #4] ca8: e1530002 cmp r3, r2 cac: 0a000002 beq cbc <drawSprite+0x60> cb0: e1a01004 mov r1, r4 cb4: e1a00005 mov r0, r5 cb8: ebffff70 bl a80 <setPixel> cbc: e2855001 add r5, r5, #1
5a0: e25cc001 subs ip, ip, #1 5a4: 3a000005 bcc 5c0 <drawTexture(...)+0x158> 5a8: e0d560b2 ldrh r6, [r5], #2 5ac: e1d080b6 ldrh r8, [r0, #6] 5b0: e2811002 add r1, r1, #2 5b4: e1580006 cmp r8, r6 5b8: 114160b2 strhne r6, [r1, #-2]
For easier comparision, here are the two inner loops of drawSprite and the nGL equivalent, compiled with the same flags as your example:Code: [Select]...As that code is run per pixel, I guess that that is definitely a noticable difference.
...
Sadly both routines appear to be quite unoptimized.
ldrh r8, [r0, #6]
loop:ldrh r3, [r0], #2strh r3, [r1], #2cmp r0, r2bne loop
loop:ldrh r3, [r0], #2 | 1strh r3, [r1], #2 | 2-4 ( 2 cycles of 2 cycle interlock on r3 )cmp r0, r2 | 5bne loop | 6-8
So, for example, unrolling the loop would help quite a bit.8 Cycles for copying a single pixel without any transformation is way too much.
8 Cycles for copying a single pixel without any transformation is way too much.
Given your example, which seems to copy without the transparency check, we end up with something this ( cycle estimate at end of line ):
void blit_pels( const int16_t *pp_srcB, int32_t i_src_stride, int16_t *pp_dstB, int32_t i_dst_stride, int32_t i_width, int32_t i_height ){ int32_t i_y; for( i_y = 0; i_y < i_height; i_y++ ) { int16_t pp_a0, pp_a1, pp_a2, pp_a3; const int16_t *pp_src; int16_t *pp_dst; int32_t i_remain4, i_remain1; pp_src = pp_srcB; pp_dst = pp_dstB; i_remain4 = i_width >> 2; while( i_remain4 > 0 ) { pp_a0 = *( pp_src++ ); pp_a1 = *( pp_src++ ); pp_a2 = *( pp_src++ ); pp_a3 = *( pp_src++ ); *( pp_dst++ ) = pp_a0; *( pp_dst++ ) = pp_a1; *( pp_dst++ ) = pp_a2; *( pp_dst++ ) = pp_a3; i_remain4--; } i_remain1 = i_width & 3; while( i_remain1 > 0 ) { pp_a0 = *( pp_src++ ); *( pp_dst++ ) = pp_a0; i_remain1--; } pp_srcB += i_src_stride; pp_dstB += i_dst_stride; }}
QuoteSo, for example, unrolling the loop would help quite a bit.8 Cycles for copying a single pixel without any transformation is way too much.It would, but as I said, only possible if Xsrc % 2 == Xdest % 2, so not widely applicable.