I'm only kinda new to java but this is how I would begin thinking about it. Pipe in the file .html, parse as text, move through it until you reach <table>, begin parse table code which just looks through and ignores everything in <> unless it's tr, then look for stuff in td and output it on a line (when you reach </td>, output a comma), once you reach </tr> make a newline, and loop until you reach </table>
I guess the annoying part would be trying to make a good tag recognition algorithm, but you could if you were feeling lazy/inelegant just scan one character at a time and then just use a lot of if statements once you find a "<" character.
This is when I think a lower-level language would be more appropriate than java, mainly because looking for certain character patterns would be super easy in something like Axe or C (i think, anyway), where in Java I have no idea what the proper library/api is or how to use it for that purpose