Omnimaga

Calculator Community => Other Calc-Related Projects and Ideas => TI Z80 => Topic started by: shmibs on April 06, 2013, 10:48:08 am

Title: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on April 06, 2013, 10:48:08 am
things like kerm's Source Coder 2 or merth's TokenIDE are all well and good for converting an 8x program between plaintext and link formats, but there are a few limitations to both: namely that they require graphical environments and cannot be scripted. to achieve these two purposes, i've started writing a tokeniser/detokeniser of my own (github) (https://github.com/shmibs/tok8x). it's already functional, with the only remaining tasks being adding in full token sets (right now it only contains a very small subset of the axe tokens which i was using for testing purposes), implementing the option to skip comment lines, optionally writing to stdout, and writing the detokeniser (which is much simpler than its counterpart). have a screenshit!
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Lionel Debroux on April 06, 2013, 11:28:55 am
I've just relayed the information about your project over at TI-Planet: http://tiplanet.org/forum/viewtopic.php?f=10&t=11529

Nitpick about the code shown in the screenshot: you should declare "const" your token lists (and probably even the character string in the token struct), so that they're placed to .rodata, which can help catching bugs in some circumstances :)
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: mdr1 on April 06, 2013, 11:31:47 am
Great ! Will you add some directives like #include, #define etc. ?
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Sorunome on April 06, 2013, 11:38:30 am
Nice project!
And yeah, a pre-prozessor would be fun :D
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on April 06, 2013, 02:01:02 pm
what sorts of things would you want from a preprocessor (besides define, which might be doable)?

and thanks, lionel =)
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Lionel Debroux on April 06, 2013, 02:47:54 pm
You're welcome :)

Together, #defines (even fairly simple ones) and #includes can have the same positive effect on maintainability as they have in C/C++ or, say, LaTeX. Splitting a program / document across multiple files, having parametrized values repeated (and simplified) multiple times in a program so that they can be changed at a single place.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Sorunome on April 06, 2013, 02:49:57 pm
#if calculator=84+ then (89? nspire?)
<code>
#else
<code>
#end

just a thought :)
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on April 06, 2013, 03:21:33 pm
You're welcome :)

Together, #defines (even fairly simple ones) and #includes can have the same positive effect on maintainability as they have in C/C++ or, say, LaTeX. Splitting a program / document across multiple files, having parametrized values repeated (and simplified) multiple times in a program so that they can be changed at a single place.

hokay, those both seem like they'll be easy enough to manage. i'll add them on to the end of the todo list.

#if calculator=84+ then (89? nspire?)
<code>
#else
<code>
#end

just a thought :)

this is only for the 8x series, though. EDIT: hmm, #if (compile option = blah ) would be really useful for debugging purposes, though...

speaking of which, what changes have there been to the token set for the 84+SEC? are there any new 2-byte regions?

EDIT2: xeda and runer? i don't really trust myself to make sure all the grammer and axe tokens are defined correctly, so, once i write those, do you think you could take a look at them and make sure i'm not missing anything important?
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Lionel Debroux on April 06, 2013, 03:37:03 pm
Quote
this is only for the 8x series
If your tokenizer is modular enough, it could (assuming someone spends time on it, while there are few programmers for that platform anymore...) be used for TI-68k/AMS programs as well. The token lists for AMS are well known, and haven't evolved since 2005 (89T / V200 AMS 3.10).
As far as the Nspire is concerned... I performed enough reverse-engineering on the OS in 2011 to show that Nspire BASIC programs are still based on tokens similar to the AMS BASIC ones (and in general, the Nspire's CAS still has its foundation in the 92 CAS from 1996), but there are lots of undocumented things on the Nspire, and third parties are not making Nspire BASIC programs on the computer side.

There are, indeed, some new tokens for the 84+CSE, but I'm not the most knowledgeable about them - ask Kerm, BrandonW, Benjamin Moody or several others :)
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Adriweb on April 06, 2013, 07:12:35 pm
third parties are not making Nspire BASIC programs on the computer side.
Hmm, you mean other than using TINCS ?

Because using the software is by far the most efficient way to create basic programs.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: DJ Omnimaga on April 07, 2013, 01:04:18 am
Will this support XML files like Tokens (and even the same format)?
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on April 07, 2013, 01:12:44 am
there isn't really any need for xml files. anyone can just poke me with a new set to add to the source (or do it themselves) and compile it in. it's not that much of an overhead, size-wise. i just finished adding in all the Axe tokens, for example, and it only increased the executable's size from 14.8kb to 19.9kb
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: DJ Omnimaga on April 08, 2013, 01:25:13 am
Ah ok I was mostly wondering since new commands could be added in apps like Grammer/Axe/etc, and cross-compatibility with TokenIDE XML files would prevent your app from falling behind if you ever got too busy to maintain it.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on April 08, 2013, 04:40:38 am
i just changed the storage format for other libraries so that they only need to list tokens that are changed from the main token set (like >Char instead of >Frac), so that should make maintenance a much simpler matter.

also, here's a little thing i thought people might find useful:
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: merthsoft on April 08, 2013, 04:41:06 pm
This is pretty neat. It's similar to what elfprince is doing, it seems. I've thought about just having a simple executable that doesn't require the editor and everything (it would be fairly simple), but no one expressed interest in it so I haven't done it--yours is probably better for that anyway since it's in C and therefore requires fewer dependencies.

One suggestion I would have it to make it so it can take TokenIDE-style XML files and use those for tokenization/detokenization. I see that you've mentioned that, but your solution of "anyone can just poke me with a new set to add to the source (or do it themselves) and compile it in" isn't really idea. Why make users recompile when they can just drop in an XML file? It also has the added bonus of making it so if someone makes a new token set for TokenIDE, it'll automatically work with yours and vice-versa. Standardization is, I think, a good thing.

There are, indeed, some new tokens for the 84+CSE, but I'm not the most knowledgeable about them - ask Kerm, BrandonW, Benjamin Moody or several others :)
Or Merth ;). The latest release of TokenIDE has the 84+CSE XML file with all the new/renamed tokens:
http://merthsoft.com/Tokens.zip (http://merthsoft.com/Tokens.zip)
(Hopefully you don't think I'm trying to advertise, that's just where the xml file is, and you can use that for the new tokens.)
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on April 08, 2013, 05:01:39 pm
coolio, i'll take a look at that, then =). the rest of your xml files have already been a big help, by the way.

as for adding in support for xml files, it sounds nice, but i'm still not convinced that it's worth the trouble. at present, all it takes for someone to add in a new token is to write a single line in the form {first_byte, second_byte, "string" }, and then run make. compilation takes less than a second, and the result is a binary that can be dragged and dropped anywhere without the need of additional files, whereas adding in xml support would increase the size considerably and either require me to include some non-standard library or write a lot more code.

EDIT: both the default BASIC and Axe token sets are now complete =D
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Lionel Debroux on April 09, 2013, 02:06:30 am
I'm not really convinced either by the necessity or urgency of adding XML support, even if:
* it isn't hard to make users clone some external simple existing StAX parser, and use it in your program. Handling JSON would be easier, and more efficient, than handling XML - but Tokens IDE uses XML;
* with a bit more (easy) code, it would enable detecting mismatches between the internal, compiled-in data and the data extracted from XML files.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on April 10, 2013, 01:25:21 pm
apart from the preprocessor and another token set option for pretty printing (in case you want to display "λ" rather than "lambda", etc), i'm fairly certain this is done!

new stuff:
* full pipeline support, so things like
Code: [Select]
cat <some.8xp> | tok8x | grep -n <search term>are perfectly valid =D

* an option to strip excess whitespace (outside of strings) and comments (the format of which is specific to the language) from a generated program, so a file containing:
Code: [Select]
.ABCD
"hey there! i'm a string"->Str1

.one line comment

...
multiple
line
comment
...

For(A,0,23)
Text(0,A*6,{Str1+A})
End
run through:
Code: [Select]
tok8x -s -t axe -i <infile> | tok8x -t axewould generate:
Code: [Select]
.ABCD
"hey there! i'm a string"->Str1
For(A,0,23)
Text(0,A*6,{Str1+A})
End

* searching for individual tokens matching strings on the command line (so something like
Code: [Select]
tok8x Get azvd DiagnosticOffwould return:
Code: [Select]
"Get":Axe:E8
"DiagnosticOff":BASIC:BB67

* expanding leading spaces on a line to tabs for easier reading when converting from an 8xp
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: DJ Omnimaga on April 10, 2013, 01:34:19 pm
Will this editor support image/picture importing/exporting by the way? Even though the calc has been out for months, no existing software other than the Mac version of TI-Connect supports them and it will be annoying for BASIC programmers who use pics to store data and title screens or whatever...
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on April 11, 2013, 11:06:59 pm
this isn't an editor; it's just a tool for converting between file formats. someone could build an editor that uses it as a backend, though.

i added an include preprocessor directive! the format is:
##include path/to/file
and the contents of file will be inserted verbatim into the file at that point.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Streetwalrus on July 25, 2013, 08:40:38 am
This looks pretty awesome Shmibs, just the tool I've been looking for.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on July 25, 2013, 02:18:34 pm
well, apparently i exploded something or other at some point, because it isn't working now. this version was more of a proof of concept than anything else, really, though, so i guess i'll try to redo it properly over this weekend. the long, tedious bit of writing out all the tokens is done, though, so there's nothing to worry about there.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on July 28, 2013, 02:28:53 pm
/\ignore that entirely. it's a problem of me having no memory. right now, programs generated without a specified name are always named "A", but i had completely forgotten that and thus was looking for something that wasn't there./me goes to change it so that it defaults to the first 8 letters of the filename instead.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Streetwalrus on August 04, 2013, 02:56:47 am
I made a little pull request for you. Not much since I just fixed a missing character but... :P
This is the first time I make a pull request so I hope I did it right.
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on August 04, 2013, 04:50:08 pm
yup, that's the way it works =)
thanks!
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Streetwalrus on August 05, 2013, 09:27:17 am
You welcome, I'm glad I could help. :)
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Hayleia on July 25, 2014, 08:36:24 am
I am having problems with this (sorry for necroposting btw).

I can't compile what I get from GitHub:
─┐  ╔[ asuka @ Luna : ~/Bureau/tok8x-master ]
 ╘══╩═[ make
ghc --make tok8x
target `tok8x' is not a module name or a source file
make: *** [all] Erreur 1


And when I download the old version, it is a bit better since I can compile it and it works in one way, but not the other way (unfortunately, it's the less interesting way that works):
─┐  ╔[ asuka @ Luna : ~/CALC/SBO ]
 ╘══╩═[ vim TEST.8xp.txt && ./tok8x -t axe -i TEST.8xp.txt -o OUTPUT.8xp -n SBOS -f && tilem2 --help
tok8x: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
Abandon (core dumped)


Am I doing something wrong ? Maybe that the old version needs a bit more than just the "tok8x" in the folder to convert (edit indeed, I can get this one to work when I don't move only the executable) ? But what about the "new" version ?
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: Streetwalrus on July 25, 2014, 09:31:49 am
Hmmm ghc eh ? Seems like Shmibs moved to Haskell. I don't like Haskell that much due to dependencies so I can't really help. :P
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: shmibs on December 29, 2015, 10:51:36 pm
heys, sorry about that. the previous version is in a branch labelled "old". was messing around a while ago seeing if i could make a haskell version work, but gave up and am rewriting the whole thing from scratch in C, because this was sort of the first C project thing i ever made and the old code is really gross and crufty and bug-filled. new version should be done soon, if anybody actually cares about this
Title: Re: tok8x: a very simple on-computer tokeniser/detokeniser
Post by: TIfanx1999 on December 30, 2015, 10:21:23 pm
Oh hey, activity! Pretty nice! ^^