Author Topic: CalcRAR (Read 6397 times)

shmibs · « **Reply #15 on:** February 24, 2012, 01:12:27 pm »

alright, well that's a start

what his compressor does is check for the same string of data occuring twice. the compressor looks at the compressed data it has already written and searches for a byte that equals the byte it's currently at in the uncompressed data. if it finds that byte, it then looks at the next byte in the uncompressed data and checks if that one matches up with the byte after the one it found in the compressed data as well. it keeps doing this until two of the bytes don't match. if the compressor wants to be absolutely certain that it has found the best possible match for the current point in the uncompressed data, it can store the position and size of the data that matched there and continue searching through the rest of the compressed data up to its current position in the same way, storing the new position and size if it finds a better match(one that is longer). it then writes the position and size of the best match in the compressed data and moves to the point in the uncompressed data that follows the section just matched. conversely, if it never found a match, it writes a 0, stores a 1 into the next 7 bits, and then adds the byte to the compressed data. it then moves to the next byte in the uncompressed data, and, if it can't find a match for that one either, adds it to the compressed data and updates the 7 bit number to hold a 2 instead.

since this compression method depends on matches in previous sections of the compressed data itself, when you first start compressing, everything will initially be stored as uncompressed data, as there is nothing to search through to find matches yet. also, you can change the number of bytes in a row that must match for the match to be considered "good enough" to be added as compressed data. depending on the file, this could either increase or decrease the resulting size, so it's a good idea to include an option to define this size yourself prior to compression and then test different sizes to see which comes out best.

in review, the compressing program looks through the data that has already been compressed and tries to find a string of bytes that matches the one it's currently looking at in the data. in the example, once the compressing program reaches this point in the data:

00 01 02 03 04 05 06 07 08 09 00 00 | 06 07 08 09 00 00 06 02 03 04 05 06 08 09 04 03 06 06 06 06 06 06 06 06 06 06 04 04 05 08 09 09 01 01 01 02 03 04 05

Key:
(1 bit)
8 bits
[11 bits]
{7 bits}

and if it has a minimum match length of 3 bytes, it will currently have this much compressed code written:

(0) {12} 00 01 02 03 04 05 06 07 08 09 00 00

and the next section written in the compressed data will be this:

(1) [7] {6}

where the (1) indicates that the next section is compressed,

[7] indicates that the data to be compressed begins 7 bytes from the beginning of the compressed data, and

{6} indicates that 6 bytes are to be referenced from that point, those 6 bytes being 06 07 08 09 00 00

the compressor will then have this much written to the compressed program:

(0) {12} 00 01 02 03 04 05 06 07 08 09 00 00 (1) [7] {6}

and will be at this point in the data:

00 01 02 03 04 05 06 07 08 09 00 00 06 07 08 09 00 00 | 06 02 03 04 05 06 08 09 04 03 06 06 06 06 06 06 06 06 06 06 04 04 05 08 09 09 01 01 01 02 03 04 05

EDIT: i wrote this without having ever used this compression method myself, and just noticed that LZ77 looks through the UNcompressed data when searching for matches rather than the compressed (which makes a whole lot more sense, now that i think about lt

) otherwise, though, this should still hold true.

ACagliano · « **Reply #16 on:** February 24, 2012, 02:03:12 pm »

I really do give up. lol. Sorry. Compression was never my strong point.

Edit: If someone want to write a compression/decompression routine, I will be eternally grateful. But, for the time being, I'm afraid, it is outside my skill level.

DJ Omnimaga · « **Reply #17 on:** February 24, 2012, 02:38:17 pm »

An decompressor/compressor that is on and off calc would be nice, because it would allow some calc games to take much less space when some files are not needed. It would kinda be like some Celtic III features allowing you to package multiple programs together, except that they would be compressed.

shmibs · « **Reply #18 on:** February 24, 2012, 02:59:39 pm »

iambian already made both of those.
athena packages and compresses a bunch of files together, so that all take up less space, and his pucrunch axiom and asm routines make it so asm and axe programs can decompress files during runtime, so they take up less space in the archive when not in use.

ACagliano · « **Reply #19 on:** February 24, 2012, 03:04:28 pm »

Oh. Can you link me to the pucrunch axiom? I'll just use that. Will it be sufficient?

shmibs · « **Reply #20 on:** February 24, 2012, 03:08:48 pm »

here is it: link

it's a bit confusing to use, but a lot simpler than writing your own routines =)

DJ Omnimaga · « **Reply #21 on:** February 24, 2012, 03:19:08 pm »

Quote from: shmibs on February 24, 2012, 02:59:39 pm

iambian already made both of those.
athena packages and compresses a bunch of files together, so that all take up less space, and his pucrunch axiom and asm routines make it so asm and axe programs can decompress files during runtime, so they take up less space in the archive when not in use.

Oh wait I thought it didn't do any compression. But does it let you unzip/rar files directly from TI-BASIC/Axe code like Celtic III with groups?

shmibs · « **Reply #22 on:** February 24, 2012, 03:41:39 pm »

athena can't be run directly, as in it has a gui and can't be passed names for what to decompress. that doesn't seem like it would be too difficult to add, but, as it is, you could just tell people what to decompress in the program, have them do it, and then check afterwards if they did it right, or even use it as a method to choose what level to play, assuming levels are packaged separately.

DJ Omnimaga · « **Reply #23 on:** February 24, 2012, 03:50:29 pm »

Yeah I was asking because it would be nice for BASIC coders, so users don't need to constantly ungroup chapters like in Illusiat 10, 11 and 12 or FFTOM.

Iambian · « **Reply #24 on:** February 25, 2012, 01:11:18 am »

Just to be clear, the pucrunch Axiom does not do any compression. All it does is decompress stuff. It is up to the user to use pucrunch to compress their data from the PC side of things.

The Athena Packager/Installer project does all compression PC-side, and includes a utility on-calc that does the decompression.

However, the idea that games should be able to do their own package management is a good idea, and should be included in the next version of Athena. I'll have to think about it.

ACagliano · « **Reply #25 on:** February 25, 2012, 06:31:45 pm »

Ok, let me see if I understand this....

to compress using LZ77.

1. Set a pointer to the start of data to compress.
2. Search the data for the longest string of data that does not repeat. Leave that uncompressed.
3. Locate other sequences that emulate parts of the data string in 2 and write them as references to it.

Is that right?

shmibs · « **Reply #26 on:** February 25, 2012, 06:47:28 pm »

i don't think so.
from what i've gathered, reading what thepenguin and wikipedia had to say, if string 1 is the uncompressed data and string 2 is the compressed data.

1. have a pointer set to the beginning of string 1 and look back through previous parts of string 1 to find sections that match the current position.

2. if there isn't any match (as will be the case when you start, because there isn't any previous data to look back at) write the byte directly to string 2 and move the pointer in string 1 forward one byte; if there is a match, write the pointer value and size of the section matched in string 1 to string 2, and move the pointer in string 1 forward to the end of the section that was matched.

and then, when decompressing

1. check if the current section in string two is indicated as compressed or uncompressed
2. if it's uncompressed, write directly to string 1 and move the pointer in string 2 forward to the next section; if it's compressed, copy the data indicated in string 1 (which you have already uncompressed and written, because the decompression is linear) to the end of string 1, and move the pointer in string 2 to the next section.

i was wrong before when i said the pointer is to a section in the compressed data. it really points to a section in the UNcompressed data, which makes a lot more sense because there's more data available in which to find matches.

ACagliano · « **Reply #27 on:** February 25, 2012, 06:51:32 pm »

Ok, I understand that now. But, my question is...How many bytes should your compressor check for at a time. What stops the compressor from taking the whole file as String1?

ACagliano · « **Reply #28 on:** February 28, 2012, 09:09:45 am »

thepenguin77 · « **Reply #29 on:** March 02, 2012, 11:09:19 pm »

Good bump, I missed this.

As far as your question, honestly, the entire program should be String1, that will get you higher compression ratios.

I used to have a smaller version of this example, but honestly, I think my long drawn out example is easier to understand. The confusion really is in the details.

More realistic example:
1. Determine what the maximum distance your compressor will have to search in reverse in order to find a match. (For simplicity, put your entire program into the search window.) Take this distance and round up to the nearest power of 2. (2^11, 2^12, 2^13, etc)
2. Write the beginning of the program to _old, write the location of a new (large) buffer to _new. _new will actually increase by bits, not bytes. This will take special handling.
3. Start a counter called _count, this will represent how many uncompressable bytes we've seen since the last uncompressed chunk was written. This starts at 0.

Loop1:
4. Grab the byte at _old. Starting at the beginning of your data to compress, search for this byte stopping only at _old-1.

Loop2: (for each match)
5. See if the next byte matches the next byte after _old.
6. If it matches, check the next one as well. Keep repeating loop2 until you either have a mismatch or you reach your string length (128 - 7 bits is probably a good choice)

(For this stage, all that is required is that the beginning of the match starts before _old, the actual matching string is allowed to go past _old. This ability allows for this method to perform RLE. (Think it through how that would work, _old would point to the second byte of the repeating byte sequence) It is also important that you do not allow your program to search for matches starting at _old. This would result in finding one super long match all the way to the end of your program.)

7. Determine which match you found is the longest.
8. If the longest match is longer than 2 bytes GOTO Compressed.
If the longest match is shorter than 2 bytes GOTO Uncompressed. (The reason for 2 is this: the data for compressed data is a 1 bit flag, a 7 bit ptr, and then a 10-14 bit size word. This means compressed data actually takes up about 2.3 bytes. If we only found a 2 byte match, the compressed data would actually be bigger.)

Uncompressed:
9. Increase _old
10. Increase _count
11. If _count is at 127, follow the procedure in Compressed for writing uncompressed data (the 0 bit is needed still)
12. GOTO Finish

Compressed:
13. Write a zero bit to _new
14. Write _count (7 bits) to _new
15. Write _count bytes starting at _old-_count to _new
16. Write 0 to _count
17. Write a 1 bit to _new
18. Write the offset to the match we found to _new (variable bits)
19. Write the size of the match we found to _new (7 bits)

Finish:
20. GOTO loop1 until _old is at the end of the uncompressed buffer.
21. If _count is >0, write those bytes

This will write the compressed data, however, you're also going to want a header. In this header, you'll want to write things like the uncompressed size as well as the number of bits used for your offset.

Now hopefully you understand

Author Topic: CalcRAR (Read 6397 times)

shmibs

Re: CalcRAR

ACagliano

Re: CalcRAR

DJ Omnimaga

Re: CalcRAR

shmibs

Re: CalcRAR

ACagliano

Re: CalcRAR

shmibs

Re: CalcRAR

DJ Omnimaga

Re: CalcRAR

shmibs

Re: CalcRAR

DJ Omnimaga

Re: CalcRAR

Iambian

Re: CalcRAR

ACagliano

Re: CalcRAR

shmibs

Re: CalcRAR

ACagliano

Re: CalcRAR

ACagliano

Re: CalcRAR

thepenguin77

Re: CalcRAR