0 Members and 2 Guests are viewing this topic.
// A test of the strcspn function, and how it worksvoid _main(void){ const char string[10] = "Abc\Defg"; unsigned long length = strcspn(string, "\\"); clrscr(); printf("%d", (int)length); ngetchx();}
tokenInfo findToken(*char string); /*Given a string, this should find the longest token and return its length and hex*/typedef struct { size_t chars; /* How many chars are in the string making up the token */ BOOL twoByteToken; /*Whether or not the token is two bytes*/ byte hex[2]; /*the two bytes making up the token, or only hex[0] if it's a one-byte token*/} tokenInfo;/*Working from these two...*/void tokenize(*char string, *byte buffer) { /*Takes a string and a buffer to store generated hex. Make the buffer dynamic on your own time. (Or ask me ;D) */ tokenInfo currentToken; while(*string != '\0') { currentToken = findToken(string); *(buffer++) = currentToken.hex[0]; if(currentToken.twoByteToken) *(buffer++) = currentToken.hex[1]; string += currentToken.chars; }}
Disp *45Output(*84GarbageCollectBB68sin(*97OpenLib(82d2
The trouble is if the "next character" in the string is the same as one of the characters that's part of the token, it won't work. So if the line is: Disp "HELLO", then the program will hang when it's truncated "HELLO" down to LLO", because two Ls are next to each other, and so strpbrk("LLO"", "L") will return "LLO"".
"unsigned char" is 8 bits, "unsigned short" is 16 bits. "enum Bool", "BYTE" and "BOOL" are seldom used.
QuoteThe trouble is if the "next character" in the string is the same as one of the characters that's part of the token, it won't work. So if the line is: Disp "HELLO", then the program will hang when it's truncated "HELLO" down to LLO", because two Ls are next to each other, and so strpbrk("LLO"", "L") will return "LLO"".In many parsers, characters are divided into classes of lexical elements (string, comma, etc.), and a lexical analyzer is built to recognize them (e.g. a string starts with a '"' and it lasts until the next '"' or maybe end of line). On top of the lexical analyzer, there's a syntactical analyzer, which tries to parse a stream of lexical elements (get an element, then act upon its type - maybe there are other expected elements of various types after it). And on top of the syntactical analyzer, if the language is complicated enough, there's a semantics analyzer (e.g. {, number 234, comma, number 567, comma, number 8 is a list).But maybe I'm replying out of scope because I'm dense ?