I'm not convinced that using a custom display routine like that would really save you that much speed. The only possible merit would be if it runs at 15MHz, and that's still a small one.
At 6MHz, even if nothing changed between one frame and the last, my best efforts at prototyping a fast diffing display algorithm that also clears the last frame buffer resulted in something that still runs in about 2/3 of the time that Axe's standard
DispGraphClrDraw runs in. That may sound like a big speed boost, but when you take into account the other processing that has to be done each frame, it really isn't. Assuming the current framerate is relatively low (<=30) and that's why you were searching for this boost, the framerate would only improve by about
framerate/3 percent; 30fps would become 33fps, 20fps would become 21.3fps, and 10fps would become 10.3fps. The total percentage gains only start becoming substantial above 30fps, but if the framerate is above 30fps, you don't really need the gains anyways.
At 15MHz, it
might be worth it. With my prototype algorithm, if nothing changed between one frame and the next, it would run in about 1/4 of the time that Axe's standard
DispGraphClrDraw runs in. With a relatively low framerate (<=30), it would improve by about
framerate*4/5 percent; 30fps would become 37.6fps, 20fps would become 23.1fps, and 10fps would become 10.7fps. But keep in mind that these are numbers for the best case scenario, in which nothing at all has changed between frames. Even if only a quarter of the buffer bytes changed, I suspect those gains would be about cut in half. If about half of the buffer bytes changed, the gains would probably be cut down to a quarter.
In conclusion: you'd probably be better off spending your time looking for optimization potential elsewhere, namely in the graphics rendering code itself.
If you want to fill me/us in on the kind of computations that are done for graphics on an average frame, I/others might be able to give some ideas.