The problem with that is that it is quite a lot slower for larger values of K. The algorithm I posted takes ceiling(log2K) number of iterations compared to K iterations. For example, K=999 takes 10 iterations versus 999. At K=999999999, my code takes about 2 seconds, yours would take one or two years, if my estimations are correct o.o
In fact, that is very similar to the code I was trying to speed up (somebody posted an encryption program on TI|BD, and I knew the more efficient algorithm).
EDIT: Never mind, I thought you were quoting my code, I see now. The issue with the first optimisation is a rather bothersome one. I used to use the fPart( method, but that will cause rounding errors that mess up the mod() formula, especially with numbers with certain prime factors.