Author Topic: Issues with \0 character  (Read 6170 times)

0 Members and 1 Guest are viewing this topic.

Offline TC01

  • LV6 Super Member (Next: 500)
  • ******
  • Posts: 344
  • Rating: +9/-0
    • View Profile
Issues with \0 character
« on: August 16, 2010, 01:03:11 pm »
This morning, I checked if Solar89 can properly handle two-byte tokens (by adding the Matrix tokens). Most of the time it works... but sometimes it doesn't, and I'm not sure how I can fix it.

Here's a situation that would cause a problem: trying to run a text file containing "[A]", or the matrix A token, through Solar89. Why? The hex code for this token is 5C00h.

The way Solar89 is programmed, each line of the text file is looked at individually, and then the hex code for the token is added to an unsigned character array (since unsigned char = 8 bits). So if the line contains the token ClrHome, the program will add the character E1h to the array, then add 3Fh to the array to finish it off. A two-byte token is handled by splitting the hex code into two characters and adding them both to the array. So for the token [C] (5C02h), the character 5Ch and then the character 02h.

The problem? For any token that has a byte of 00 (there aren't too many, but they include [A]), the \0 character is what will be added to that character array. And that seems to prevent me from adding anything else to the array. So if I'm trying to tokenize [A], no end-of-line character will be added, and anything after [A] on that line won't be added either.

But that's not the main problem. The main problem is that the code that saves the token array to a file does it by writing each individual character to the file, and it stops before it reaches \0. So even for a file only containing [A], I won't get 5Ch 00h 3Fh (3Fh being the hard return, the newline character), I'll just get 5Ch.

Fortunately, only a few tokens have a byte of 00h, and probably a lot of programs can be written without using them. But it would be nice to support them, but I'm not really sure how. Would I need to use something other than an unsigned character array? Or can I implement some workaround?



The userbars in my sig are links embedded links.

And in addition to calculator (and Python!) stuff, I mod Civilization 4 (frequently with Python).

Offline TravisE

  • LV4 Regular (Next: 200)
  • ****
  • Posts: 182
  • Rating: +33/-0
    • View Profile
    • ticalc.org
Re: Issues with \0 character
« Reply #1 on: August 16, 2010, 03:50:52 pm »
I don't know what code you're using to write the arrays, but in C many of the string-manipulation library routines treat byte 00 as indicating the end of a string. If you're using any of those, you'll probably want to switch to using the functions that instead write x bytes instead of using \0 as a terminator.
« Last Edit: August 16, 2010, 03:51:05 pm by TravisE »
ticalc.org staff member—http://www.ticalc.org/

Offline Netham45

  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2103
  • Rating: +213/-4
  • *explodes*
    • View Profile
Re: Issues with \0 character
« Reply #2 on: August 16, 2010, 03:59:49 pm »
00 is the standard for the end of an array. you could either check for it and replace it with something like FF, or, as previously suggested, use a byte count array instead of a null terminated one.
Omnimaga Admin

Offline TC01

  • LV6 Super Member (Next: 500)
  • ******
  • Posts: 344
  • Rating: +9/-0
    • View Profile
Re: Issues with \0 character
« Reply #3 on: August 16, 2010, 04:06:13 pm »
I do use strncat for part of this... I have a function, tokenizeString, that is given the string, an unsigned character array, and the array of tokens. It returns the number of bytes that need to be appended to the output array, and then I use strncat to do just that:

Code: [Select]
numswaps = tokenizeString(line, tokline, tokens);
output = strncat(output, tokline, numswaps);

So, I should be using memmove (or memcpy?) instead here?



The userbars in my sig are links embedded links.

And in addition to calculator (and Python!) stuff, I mod Civilization 4 (frequently with Python).

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: Issues with \0 character
« Reply #4 on: August 16, 2010, 04:14:21 pm »
Yeah, I think using an array of bytes instead of a string would work best here. Remember that you'll need to keep track of the size manually though.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline TravisE

  • LV4 Regular (Next: 200)
  • ****
  • Posts: 182
  • Rating: +33/-0
    • View Profile
    • ticalc.org
Re: Issues with \0 character
« Reply #5 on: August 16, 2010, 04:21:27 pm »
Yes, strncat assumes 00h is a terminator. It does take a length as a parameter, but this is only used to ensure that it doesn't overflow the destination buffer (it will just truncate the string if it would go over). memcpy should do what you want—copying the actual number of characters you pass to it without caring what they are. As mentioned, you'll need an extra variable of some sort to keep track of the actual size of the data in this case.
« Last Edit: August 16, 2010, 04:27:40 pm by TravisE »
ticalc.org staff member—http://www.ticalc.org/

Offline TC01

  • LV6 Super Member (Next: 500)
  • ******
  • Posts: 344
  • Rating: +9/-0
    • View Profile
Re: Issues with \0 character
« Reply #6 on: August 16, 2010, 04:26:46 pm »
Is there a reason to use memcpy over memmove for this case? memmove is working at the moment- I used it because the documentation says it can deal with it when the source and destination overlap and memcpy can't.

But thanks for the help, everyone, it's working now.



The userbars in my sig are links embedded links.

And in addition to calculator (and Python!) stuff, I mod Civilization 4 (frequently with Python).

Offline TravisE

  • LV4 Regular (Next: 200)
  • ****
  • Posts: 182
  • Rating: +33/-0
    • View Profile
    • ticalc.org
Re: Issues with \0 character
« Reply #7 on: August 16, 2010, 04:31:03 pm »
My guess is that memcpy is smaller or faster in your program if you're absolutely sure that the source and destination regions to be copied will never overlap in memory (such as when you have two separate buffers for source and destination). If they can overlap (like when you insert/delete bytes in the same buffer and want to shift everything after up or down), then memmove should be used instead.
« Last Edit: August 16, 2010, 04:32:49 pm by TravisE »
ticalc.org staff member—http://www.ticalc.org/

Offline TC01

  • LV6 Super Member (Next: 500)
  • ******
  • Posts: 344
  • Rating: +9/-0
    • View Profile
Re: Issues with \0 character
« Reply #8 on: August 20, 2010, 12:03:02 pm »
Well, memcpy isn't exactly working, because it doesn't append the bytes to the array like strncat does. This means that only the last line of a program will be written to a file.

Is there a way to do this, or would I have to do it manually using a for loop of some sort?

EDIT: Well, I got it working using a for loop that increments the pointer, so I guess I've answered my own question for once.

Code: [Select]
numswaps = tokenizeString(line, tokline, tokens);
for (i = 0; i < numswaps; ++i)
    *(output++) = tokline[i];
« Last Edit: August 20, 2010, 01:11:41 pm by TC01 »



The userbars in my sig are links embedded links.

And in addition to calculator (and Python!) stuff, I mod Civilization 4 (frequently with Python).