Author Topic: Searching Operator Precedence in C (Read 4541 times)

Halifax · « **on:** June 12, 2007, 02:20:00 pm »

Alright how would you search for operator precedence when programming in C. And how would you do it effciently. I have been raking my brain looking for and idea but I cannot find a speed effcient way to do it. Does anyone have any ideas I am checking for these operators.

Level 1:
(
Level 2:
!, ~
Level 3:
*, /, %
Level 4:
+, -
Level 5:
<<, >>
Level 6:
<, <=, >, >=
Level 7:
==, !=
Level 8:
&
Level 9:
^
Level 10:
|
Level 11:
&&
Level 12:
||

JincS · « **Reply #1 on:** June 12, 2007, 10:40:00 pm »

http://www.difranco.net/cop2220/op-prec.htm

Good luck!

**EDIT**

Ok, I just re-read what you said: do you mean you want to check a C file and figure out which operators come first in that? If so, you would have to read the C file into a buffer, tokenize it, and then run the tokens through an algorithm to find out which comes first. No really fast way to do it though, sadly.

Example:

c1-->

CODE

ec1(i>=0)&&(i<10)c2

ec2

Equals (token list with numbers assigned to each token and token type):
0: ( (operator)
1: i (variable)
2: >= (operator)
3: 0 (number)
4: ) (operator)
5: && (operator)
6: ( (operator)
7: i (variable)
8: < (operator)
9: 10 (number)
10: ) (operator)

Then, if you leave everything except the operators in the order they appear, this code snippet would be executed in the following order (just using token numbers here):

0,1,2,3,4,6,7,8,9,10,5

Like this:
Do this: i>=0 (relational logic)
Then this i<10 (relational logic)
Then this: && (logical AND)

It gets pretty complicated, but it's doable. This doesn't happen to be for z8-gcc, does it?

Halifax · « **Reply #2 on:** June 13, 2007, 05:26:00 am »

Lol funny you say that but no GCC handles the parsing in that thank god. I am just making a simple script compiler for a language of my own for two reasons. First to just get experience with the whole compiler process. And two because I need practice for this AP course I am taking next year. I have really hit a stand still in z8-GCC with the object file parsing. I am still working on 'ld'. Thanks though JincS that is a good way to parse it.

Halifax · « **Reply #3 on:** June 14, 2007, 02:53:00 pm »

I actually have seemed to have found a way that is more efficient than tokenizing that I would have never thought of. After reading up on the internals of how GCC parses I have found that it uses abstract syntax trees to parse data efficiently. Of course it is hard to implement, but once implemented I think it would be much easier to handle a lot of the parsing.

JincS · « **Reply #4 on:** June 14, 2007, 06:03:00 pm »

Sounds fun. Where can I find this info? It sounds a lot easier than using a million slow "if(token==operator)" statements...

**EDIT**

nvm, found it (and it was a stupid question->everyone knows the answer is Google!).

Halifax · « **Reply #5 on:** June 14, 2007, 06:51:00 pm »

Hmmm my bad it seems that Abstract Syntax Trees are only used as medians for optimization in GCC. Sorry for the misinformation. Parse trees are used for syntax computating. ASTs are parse trees but not in order of precedence or any other rules etc.

Halifax · « **Reply #6 on:** June 23, 2007, 03:41:00 am »

If I get code written down I will post it later. Anyways here are notes on parse trees that I wrote.

===== Parse tree Notes =====

Background:

Basically with parse trees to setup the data you go through the equation one time and you tokenize it. After tokenizing it you are going to put it in the parse tree.

Note: I do parse tree in tree minor fashion instead of tree major. That is just saying basically that I am looking for the lowest element not the highest(you will see).

Ok now let's start off with a simple table of precedence.

+, - == 0
*, / == 1

So that is our simple table of precedence made for tokenizing. Now let's start with a simple expression.

5+4*5

Now let's tokenize this. In token form.

5+4*5 = 5(0)4(1)5 or simply 0,1. That was easy. Now based off of the table of precedence you should be able to figure out why that is so

.

So now how do we get that stuff into a parse tree. The logical thing and what people are taught in life is to start with multiplication in an expression like that, but in a parse tree you find the lowest element.

So in our token list 0,1 is the lowest so we put that at the top of our tree. Then we look to the left to find what argument it takes. There is a number there and no other math operators to the left of it. Now to the right of it there is a number also, but there is an operator to that side because we haven't reached the end of our token list. So since there is an operator we search for the lowest token in the rest of the list which is 1 of course. So this would start us off with

+
/ \
5 *

Now what are the two operands to * or 1. Well look to the left. There is a number and no other math operators that we haven't touched so use the number. Now look to the right and there is a number and no math operators so then we settle for the numbers. This gives us a parse tree of.

+
/ \
5 *
/ \
4 5

Great we have a parse tree but how do I solve it??? You solve by levels. Start at the lowest level and farthest to the left and start solving. So then it would be like this.

+
/ \
5 *
/ \
4 5 becomes ...

+
/ \
5 20

ok now there are no more expression to solve on the last level we did so move a level up and solve that.

+
/ \
5 20 becomes...

25

so the answer to 5+4*5 = 25

Alright now it is time to get into a little bit of history I think? Either way look at this tree again.

+
/ \
5 *
/ \
4 5

Now let's imagine a real tree and break it down into two parts. There are branches and then leaves that come off of the branches. Now the leaves are always on the outside never the inside. Now applying that to parse trees the + and * are the branch nodes because they are not on the very tip of the outside. The 5,4, and 5 are the leaf nodes because they are on the very outside.

Ok so do you notice anything? The numbers always HAVE to be in the leaf nodes and operators always HAVE to be in the branch nodes. There are no exceptions.

Now let's get into some more complicated parse trees.

(5+4)*5

uhhhh wtf that f**** up our whole tokenize thing. We are going to tokenize that the same way and then solve it wrong. D***.

But wait! There is such a thing called weighting your expression. Now let's look at this expression as we would normally tokenize it.

ToP:
+,- = 0
*,/ = 1

(5+4)*5 would be 0,1 in token form. But wait isn't that plus inside parentheses. Well exactly how many is it inside? The + is inside 1 of them. So that means we weight the expression by 1. That would give us a token list of

1,1

Now look at this expression.

((5+4))*5

the + is inside 2 of them now so we weight it by 2 which would give us

2,1 for our token list. So you should understand by now that weighting will only add the number of OPEN parentheses to the operators.

So then what is this expression.

(((5+4))*5)

Tokenized it would be: 3,2

There are 3 open when we encounter the + and only 1 open when we encounter the *. Alright after going off into that spew let's get back to the first one.

(5+4)*5

Now the token list 1,1 means we have two of the same expressions does it not? Yes technically it does but that is why I mentioned before we are doing this tree minor which means we search for the lowest token in the list that is also the farthest to the right of it. That means we would take the 1 token on the right which is the *

Now let's solve this parse tree. Look to the left and you see that * is not operating on a number but a ) so take the next lowest token in the list to the left which is 1(+). Then we look to the right of the * and find a number 5 with no expressions to the right so we use the number. Now moving on to the + we look to the left and find a number and no more expressions to the left, then we look to the right and find a number with no more expressions that we HAVE NOT used already. So that would give us a tree setup of.

*
/ \
+ 5
/ \
5 4

now solve from the lowest level on the left up.

5+4 = 9, 9 * 5 = 45!!!!

Now we will do a pretty complicated expression that I will not explain next. I will just show the expression and the respective parse tree and you should be able to figure it out through the process we went through and weighting.

Expression:
   5+((4*5)+(2*5))-4
Token List:
   0,3,1,3,0
Parse Tree:

-
/ \
+ 4
/ \
5 +
/ \
* *
/ \ / \
4 5 2 5
Answer:
   17

Conclusion and Overview:
   This maybe complicated, but aren't the good things in life complicated?? Anyways I will give you the overview of the algorithm for searching used.

Steps

1: Tokenize the list including weighting
2: Find the lowest token that is also the farthest to the right
(so in a list 1,1,1,1,1,1,1,0,1,0,2 the 0 all the way at the right before the 2 would be the first token.)
3: Find the tokens left operand
4: If there are any other UNUSED tokens to the left then
   5: Find the lowest token closet to the spot you are at
5: Else use the number as your operand
6: Repeat 4-5 on the right operand respectively
7: Insert the operator into the tree and its operand as branch/leaf nodes of it
8: Repeat the steps until you are done parsing the expression