Welcome,
Guest
. Please
login
or
register
.
Did you miss your
activation email
?
1 Hour
1 Day
1 Week
1 Month
Forever
Login with username, password and session length
Home
About
Team
Rules
Stats
Status
Sitemap
Chat
Downloads
Forum
News
Our Projects
Major Community Projects
Recent Posts
Unread Posts
Replies
Tools
SourceCoder3
Other Things...
Omnimaga Radio
TI-83 Plus ASM File Unsquisher
Z80 Conversion Tools
IES TI File Editor
Free RAM areas
Comprehensive Getkeyr table
URL Shortener
Online Axe Tilemap Editor
Help
Contact Us
Change Request
Report Issue/Bug
Team
Articles
Members
View the memberlist
Search For Members
Buddies
Login
Register
Omnimaga
»
Forum
»
Calculator Community
»
TI Calculators
»
Axe
(Moderator:
Runer112
) »
Huffman compression
« previous
next »
Print
Pages: [
1
]
Go Down
Author
Topic: Huffman compression (Read 5796 times)
0 Members and 1 Guest are viewing this topic.
Michael Pang
LV1
Newcomer (Next: 20)
Posts: 19
Rating: +0/-0
Huffman compression
«
on:
March 07, 2015, 11:29:36 pm »
I'm trying to make file compression programs for my ti-84. So far i've successfully implemented LZSS which reduced the size of my ASM programs by 10-20%, and now I'm trying to make a huffman compressor, however I ran into a few problems. I can generate the optimal tree and convert it to canonical representation, but how do I serialize the code-lengths without a recursive DFS algorithm? BFS would work, but it'd take a lot of free ram that I need to store the tree structure. There must be something incredibly simple that I'm missing here...
Logged
Runer112
Project Author
LV11
Super Veteran (Next: 3000)
Posts: 2289
Rating: +639/-31
Re: Huffman compression
«
Reply #1 on:
March 08, 2015, 12:57:23 pm »
Disclaimer: I've never actually implemented Huffman compression, but I've read enough about it that I think I would be able to do so.
I was under the impression that, when compressing, the simplest way to store the huffman encoding information is in a table or map. This structure would have an entry for each source symbol, mapping the symbol to a bit length and pattern. To make storing the pattern easier, you usually enforce that patterns can't be longer than a certain length so your pattern entries can be contained in bitfields of the same size, but that's not strictly necessary.
Decompressing, though, probably does need a tree that you can navigate binary search style.
In either case, I'm unaware of any efficient encoding/decoding method that only references the canonical representation. So you definitely need some amount of RAM to allocate for an expanded structure, but it shouldn't be a colossal amount; it should be constant with respect to the number of source symbols.
«
Last Edit: March 08, 2015, 01:06:19 pm by Runer112
»
Logged
Michael Pang
LV1
Newcomer (Next: 20)
Posts: 19
Rating: +0/-0
Re: Huffman compression
«
Reply #2 on:
March 08, 2015, 01:06:12 pm »
What you said is true for a static pre-generated table, but I need to generate the tree dynamically based on frequency data from an input file, where the leaf symbols are bytes 0-255. And even then, I'd still have to extract the bit-strings from the tree with a graph search algorithm, which is a pain to implement on a calculator.
I think I've solved everything by making the pointers in the tree all lead towards root
Logged
harold
LV5
Advanced (Next: 300)
Posts: 226
Rating: +41/-3
Re: Huffman compression
«
Reply #3 on:
March 08, 2015, 01:11:28 pm »
The best decompression techniques all rely on building a table that you index with your current buffer, giving you the first symbol in the buffer and its size. That may be awkward on z80 though, that table can get pretty big (2 to the power however long you allow your longest symbol to be). There are more advanced techniques that use smaller buffers for that, but they tend to rely on other things that are awkward on z80 (such as bitscan). Or nested tables, but they are intrinsically a pain.
I don't even know where this tree search for
encoding
is coming from, doing it the normal way is much simpler. That table is only as long as your alphabet is big.
«
Last Edit: March 08, 2015, 01:14:49 pm by harold
»
Logged
Blog about bitmath:
bitmath.blogspot.nl
Check the
haroldbot thread
for the supported commands and syntax.
You can use haroldbot from
this website
.
Michael Pang
LV1
Newcomer (Next: 20)
Posts: 19
Rating: +0/-0
Re: Huffman compression
«
Reply #4 on:
March 08, 2015, 01:19:09 pm »
I'm using an algorithm similar to this
https://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html
to determine an optimal encoding. What's the normal way?
Logged
Runer112
Project Author
LV11
Super Veteran (Next: 3000)
Posts: 2289
Rating: +639/-31
Re: Huffman compression
«
Reply #5 on:
March 08, 2015, 01:23:02 pm »
I believe that's
the
way to build the encoding. But once you've built the tree and saved it in some format to embed in the output, you can scrap it for a much more efficient table/map format, as I described earlier, for the encoding process.
Logged
harold
LV5
Advanced (Next: 300)
Posts: 226
Rating: +41/-3
Re: Huffman compression
«
Reply #6 on:
March 08, 2015, 01:24:03 pm »
That's fine (but be careful, it can give very long longest symbols, and that's bad for decoding - you can iterate this algorithm with all the frequences divided by 2 (rounding up), that will reduce the skew). You can throw the tree away after extracting the symbol lengths (the patterns are useless, you're using canonical encoding right? so you'd end up reassigning them anyway). You can just traverse the nodes in any order and at every leaf, write the depth into lengths[leaf.symbol].
An other way is using the Package-merge algorithm to immediately generate optimal lengths given that there is some maximum length. Doesn't use trees.
Logged
Blog about bitmath:
bitmath.blogspot.nl
Check the
haroldbot thread
for the supported commands and syntax.
You can use haroldbot from
this website
.
Michael Pang
LV1
Newcomer (Next: 20)
Posts: 19
Rating: +0/-0
Re: Huffman compression
«
Reply #7 on:
March 08, 2015, 01:26:21 pm »
And how to convert from tree to table/map/symbol lengths without searching? Will check up on package-merge later.
Logged
harold
LV5
Advanced (Next: 300)
Posts: 226
Rating: +41/-3
Re: Huffman compression
«
Reply #8 on:
March 08, 2015, 01:28:06 pm »
Just a regular old Pre-Order traversal or whatever. Any order works.
Logged
Blog about bitmath:
bitmath.blogspot.nl
Check the
haroldbot thread
for the supported commands and syntax.
You can use haroldbot from
this website
.
Michael Pang
LV1
Newcomer (Next: 20)
Posts: 19
Rating: +0/-0
Re: Huffman compression
«
Reply #9 on:
March 08, 2015, 01:51:19 pm »
Isn't that DFS? I thought Axe didn't support recursion...
Logged
harold
LV5
Advanced (Next: 300)
Posts: 226
Rating: +41/-3
Re: Huffman compression
«
Reply #10 on:
March 08, 2015, 02:34:26 pm »
It is, but it doesn't need real recursion, it just abuses recursion to push nodes on the stack so it can go back up to a parent. So it can be replaced by a stack of nodes (pop a node, "visit" it, push its two children), or parent pointers (takes more space, also has to know whether a node has been processed).
«
Last Edit: March 08, 2015, 02:37:52 pm by harold
»
Logged
Blog about bitmath:
bitmath.blogspot.nl
Check the
haroldbot thread
for the supported commands and syntax.
You can use haroldbot from
this website
.
Michael Pang
LV1
Newcomer (Next: 20)
Posts: 19
Rating: +0/-0
Re: Huffman compression
«
Reply #11 on:
March 08, 2015, 03:15:06 pm »
I agree, but I still think it'd be a lot cleaner and faster to store the tree backwards (as in edges go from children to parent) and then just parse the contiguous block of leaf nodes sequentially.
EDIT: Just read about package-merge on wikipedia, but unfortunately it looks like it would take up too much memory. 256 symbols * 16 bits max length=4096 coins. In addition i have to do manual memory management for all the merge operations.
«
Last Edit: March 08, 2015, 03:33:12 pm by Michael Pang
»
Logged
harold
LV5
Advanced (Next: 300)
Posts: 226
Rating: +41/-3
Re: Huffman compression
«
Reply #12 on:
March 08, 2015, 04:38:06 pm »
You can use whatever order is convenient, it really doesn't matter as long as in the end every length is saved.
Are you using this by the way?
In-Place Calculation of Minimum-Redundancy Codes
Also, Deflate uses 15 bits max, so you could probably dial the length down a bit and make some tables smaller. Maybe 14? 13? Using the simplest single-layer decoding would still take a "slightly bigger than would be nice" table though..
Logged
Blog about bitmath:
bitmath.blogspot.nl
Check the
haroldbot thread
for the supported commands and syntax.
You can use haroldbot from
this website
.
Michael Pang
LV1
Newcomer (Next: 20)
Posts: 19
Rating: +0/-0
Re: Huffman compression
«
Reply #13 on:
March 09, 2015, 07:58:07 pm »
I'm currently storing all the subtrees in L1, and keeping frequency data in L2 and pointer data in L6 during compression, then using the normal huffman tree-merge algorithm to pair them up until one is left. Keeping code lengths down to 15 bits would make things simpler, reduce storage overhead while having little to no effect on the compression ratio, but in a 24 kb file it's possible (although somewhat unlikely) to end up with optimal code lengths of more than 15 bits. My idea is to use a bit of package merge to reduce leaf variance and then the huffman algorithm for the rest. What do you think?
For decompression I'll just be building a tree or using sequential search through a 256 entry table.
Logged
Print
Pages: [
1
]
Go Up
« previous
next »
Omnimaga
»
Forum
»
Calculator Community
»
TI Calculators
»
Axe
(Moderator:
Runer112
) »
Huffman compression