Since I'm using the z80float library, addition and subtraction are slightly slower on average (about 350cc, which translates to 20%), multiplication and division are about three times faster. The real bottleneck is in converting to and from strings which is significantly slower than the TI-OS since they use a BCD format.
Eventually, I want to make a fairly simple programming language and my plan is to tokenize/precompile so that the bottleneck is only in displaying strings (converting from a string to an integer or a float will be done once). Since I had this idea in mind as I worked on this, evaluation from plaintext first compiles to a bytecode and then runs it through a parser. If I wanted it to be a non-programmable calculator I could combine both into one step that uses less memory (and slightly less CPU time).