Good bump, I missed this.
As far as your question, honestly, the entire program should be String1, that will get you higher compression ratios.
I used to have a smaller version of this example, but honestly, I think my long drawn out example is easier to understand. The confusion really is in the details.
More realistic example:
1. Determine what the maximum distance your compressor will have to search in reverse in order to find a match. (For simplicity, put your entire program into the search window.) Take this distance and round up to the nearest power of 2. (2^11, 2^12, 2^13, etc)
2. Write the beginning of the program to _old, write the location of a new (large) buffer to _new. _new will actually increase by bits, not bytes. This will take special handling.
3. Start a counter called _count, this will represent how many uncompressable bytes we've seen since the last uncompressed chunk was written. This starts at 0.
Loop1:
4. Grab the byte at _old. Starting at the beginning of your data to compress, search for this byte stopping only at _old-1.
Loop2: (for each match)
5. See if the next byte matches the next byte after _old.
6. If it matches, check the next one as well. Keep repeating loop2 until you either have a mismatch or you reach your string length (128 - 7 bits is probably a good choice)
(For this stage, all that is required is that the beginning of the match starts before _old, the actual matching string is allowed to go past _old. This ability allows for this method to perform RLE. (Think it through how that would work, _old would point to the second byte of the repeating byte sequence) It is also important that you do not allow your program to search for matches starting at _old. This would result in finding one super long match all the way to the end of your program.)
7. Determine which match you found is the longest.
8. If the longest match is longer than 2 bytes GOTO Compressed.
If the longest match is shorter than 2 bytes GOTO Uncompressed. (The reason for 2 is this: the data for compressed data is a 1 bit flag, a 7 bit ptr, and then a 10-14 bit size word. This means compressed data actually takes up about 2.3 bytes. If we only found a 2 byte match, the compressed data would actually be bigger.)
Uncompressed:
9. Increase _old
10. Increase _count
11. If _count is at 127, follow the procedure in Compressed for writing uncompressed data (the 0 bit is needed still)
12. GOTO Finish
Compressed:
13. Write a zero bit to _new
14. Write _count (7 bits) to _new
15. Write _count bytes starting at _old-_count to _new
16. Write 0 to _count
17. Write a 1 bit to _new
18. Write the offset to the match we found to _new (variable bits)
19. Write the size of the match we found to _new (7 bits)
Finish:
20. GOTO loop1 until _old is at the end of the uncompressed buffer.
21. If _count is >0, write those bytes
This will write the compressed data, however, you're also going to want a header. In this header, you'll want to write things like the uncompressed size as well as the number of bits used for your offset.
Now hopefully you understand