locked
Is C# context-free?

    Question

  • Hello,

     

    In a work I also have to write a few theoretical things about it grammar-wise (formal grammar). I'm wondering if C# is considered to be a _context free_ language? (That is, theoretically in 2nd Chomsky class etc.) I read such statements about Java, but haven't seen it anywhere explicitly about C#. I strongly assume it is, but I need some 100% confirmation.

     

    Thanks in advance,

    Rubylyn

    Tuesday, March 04, 2008 2:03 PM

Answers

  • All computer language grammers are context free. It would require some serious AI to build a contextual computer language grammer. Natural languages are contextual, for instance:

     

    "Old McDonald has an ***." (edit: I can't believe that word is censored)

     

    Because we all know that Old McDonald is a farmer, we know that we are talking about his donkey, not his backside. A computer language parser cannot allow such ambiguity. Maybe it's possible to have some kind of limited context in a computer language, however, because the context must be clearly stated in some way prior to its use, it's not really context-free.

    Tuesday, March 04, 2008 3:57 PM

All replies

  • All computer language grammers are context free. It would require some serious AI to build a contextual computer language grammer. Natural languages are contextual, for instance:

     

    "Old McDonald has an ***." (edit: I can't believe that word is censored)

     

    Because we all know that Old McDonald is a farmer, we know that we are talking about his donkey, not his backside. A computer language parser cannot allow such ambiguity. Maybe it's possible to have some kind of limited context in a computer language, however, because the context must be clearly stated in some way prior to its use, it's not really context-free.

    Tuesday, March 04, 2008 3:57 PM
  • If, however, we were to say "Old McDonald has an arse", that would be unabiguous.

    Funny how "***" is censored, but "arse" isn't. Smile
    Tuesday, March 04, 2008 4:04 PM
  • Hello Rubylyn

     

    I found a web page that seems to be an excerpt from a German book about C#. The author expresses C# as a Context-Free Grammer using terminal and non-terminal symbols. Looks like C# is Context free to me. Check it out, it may help you.

     

    http://www.tfh-berlin.de/~oo-plug/Cis/txt/Syntax.html

     

    • Proposed as answer by arctidog Sunday, June 24, 2012 11:47 PM
    Tuesday, March 04, 2008 4:12 PM
  • Thanks guys for the replies (I don't know which to mark as an answer, so I'll choose one, but thx to all).

     

    Of course, the subject of my interest is the "real", full C# compiler (or its implementation in Mono), but e.g. in this research compiler, Microsoft states that the parser generator accepts ambiguous grammars:

    ftp://ftp.research.microsoft.com/pub/tr/tr-2003-32.pdf

    Search for the paragraph that begins with "Ambiguities are resolved during AST".

     

    Of course, this is just a local solution (and as the text says, it's simplified), but it may imply that some ambiguitis may arise in the "full" C# compiler as well, and these are resolved via some good heuristics. However, strictly as a formal language, I think it can be considered as context-free. Especially because seeing the Mono msc C# parser's jay file (it's like BNF, so describes the grammar), the right side of entries is always exclusively one non-terminal (and there are no terminals in its context, so left or right to it).

     

     

    Tuesday, March 04, 2008 7:49 PM
  • Compiler technology has really come a long way in the past 40 years. You might say that modern languages make the line between contextual and context-free a little blurry, but I would argue that they are still context-free. The only ambiguities allowed are those that are easily detected. One example is type ambiguity:

     

    Code Snippet

    namespace One

    {

       class MyType {}

    }

     

    namespace Two

    {

       class MyType {}

    }

     

    using One;

    using Two;

    MyType myType; // ambiguous

     

     

    This kind of ambiguity is easily detected by the compiler, but the compiler makes no effort to try and resolve the ambiguity. It simply generates an error and leaves it for the human to do it. If the compiler were made intelligent enough to try to resolve this (maybe by looking at what members are accessed later in the code) it would further blur the line, but at least as of today, this is not happening.

     

    Wednesday, March 05, 2008 3:43 PM
  • It would be a general compiler error because on the last line you are trying to declare a field on namespace level, but C# only allows fields in classes, structs and methods. In terms of resolving such simple errors, csc is able to identify and suggest missing statement terminators (Wink, so it is aware of the syntactic context.
    C# does have context-sensitive keywords though (such as yield, get, set and LINQ statements), to minimize interference with programmer productivity.
    Thursday, March 06, 2008 5:13 PM
  • Any grammer that can be expressed in Backus-Naur Form is context free. See Aho, Lam, Sethi, and Ullman, Compilers Principles, Techniquest, and Tools. C# is such a grammer, there are a ton of books out there that give it.

     

    Thursday, March 06, 2008 5:44 PM
  • I think it depends on what you mean by the "C# Language" when you ask if it is context-free.  The syntax of C# can and has been expressed in Backus-Naur form.  However, that grammar does not reflect all the constraints that are needed to verify an arbitrary string represents a valid C# program.  In other words, source code that validates against such a grammar will not necessarily compile.  For example, the ambiguous type reference presented by a previous post doesn't really affect whether C# is context-free, because C#'s grammar doesn't say anything about resolving identifiers such as "myType".  The grammar would simply specify that "myType" is an identifier token and would validate that it is valid for an identifier token to exist at that part of the program.  If one modified the grammar in order to enforce additional constraints that are needed to validate a program, then it might no longer be context free.
    Thursday, March 06, 2008 7:13 PM
  • You're right Nimrand, there is a difference there. I had not thought of it in that way before. Point taken.

    Thursday, March 06, 2008 7:48 PM
  • Yeah that is what caused me doubts, strictly the grammar. The fact the source code even doesn't conform the grammar (not an element of Sigma*) is just the very first step, the compiler is unhappy for lots of other problems as well.

     

    (For example, "A bread ate two men" is a syntactically correct English sentence I believe)

     

    So strictly, the grammar is CF if it's true that all "substitution rules" have the form:

    A --> alpha

    where A is a nonterminal and alpha is a sequence of terminals and/or nonterminals.

     

    While:

    beta A delta --> beta alpha delta

    would mean that it's context sensitive (CS), that is, how we substitute A with alpha depends on its context (i.e. on beta and delta). Beta and delta are sequences of terminals and/or nonterminals.

     

    So this is exclusively formal grammar context interpretation, not something that is related to semantics (unlike in case of the áss of Old' McDonald).

     

    But I doubt programming languages have this type of rules, especially because it would be hard (and maybe algorythmically undecidable in general case) to parse such source code, or impossible. I am not experienced in programming languages and in this topic generally, so please correct if I've written anything incorrectly above...

     

    Though this is interesting, written by Kea:

    "C# does have context-sensitive keywords though (such as yield, get, set and LINQ statements), to minimize interference with programmer productivity."

    Does this mean it's CS in the aspect of formal language (i.e. what I described above with those "subtitution" rules)? And if yes, does it apply to the "older" C# too, i.e. the one that you can use for .NET framework 2.0 (so the 3.5 stuff, such as LINQ is missing)


     

    Rubylyn

     

    Thursday, March 06, 2008 11:54 PM
  • The context-sensitive keywords for yields and properties are available in C# 2.0, I think the yield keyword was added as of the second version of the lanugage. I doubt it really makes the language contextual/contextful, but simply implies that the compiler knows what to expect after certain tokens, and decides whether it is correct depending on the context. In that sense, every programming language is context-sensitive to a certain degree, as there are rules for how elements such as types and methodsmay be laid out (pointed out in an earlier post).
    So to validate myself, a quote from Wikipedia says: "There exist context-sensitive languages which are not context-free." C# seems to emphasis a rigid while simple grammar, which serves well to reduce human misinterpretation. If context was to get advanced and require extensive analysis, I doubt it would necessarily reduce prone bugs, which is always one of the goals of designing new, high-level languages.
    Friday, March 07, 2008 1:28 PM
  • "If context was to get advanced and require extensive analysis, I doubt it would necessarily reduce prone bugs, which is always one of the goals of designing new, high-level languages."

     

    Indeed. I think it would in fact promote bugs, or at least increase the potential for misinterpretation, leading to code that is difficult to manage in a team environment. Unambiguity is desireable not only from the compiler's point of view.

    Friday, March 07, 2008 2:09 PM
  • "I doubt it really makes the language contextual/contextful, but simply implies that the compiler knows what to expect after certain tokens, and decides whether it is correct depending on the context."

     

    I don't think this has anything to do whether a grammar is CS or CF, because what you are saying is about semantics (despite the fact the compiler show a syntax error or something like that, it's still semantics in a certain respect because in its formal grammar aspect, the code is correct). For example, "Integer str1 = new MemoryStream();" is allowed by the grammar (provided it's in a position where it's allowed by the grammar, e.g. inside a nonstatic class), but the type mismatch causes a compile-time error. (It's another matter, that in common parlance, we say this isn't syntactically correct, of course...).

     

    "In that sense, every programming language is context-sensitive to a certain degree, as there are rules for how elements such as types and methodsmay be laid out (pointed out in an earlier post)."

     

    I guess you're referring to this:

     

    "It would be a general compiler error because on the last line you are trying to declare a field on namespace level, but C# only allows fields in classes, structs and methods."

     

    I think this is exactly something defined by the grammar (unlike semantical stuff that is checked by compile-time heuristics), so they don't make the language context-sensitive. For example, the specification (list of terminals and nonterminals) of where types and methods can be laid out is on the right side of rules (while the left side is one single nonterminal), this still meets the criteria of "context-free".

     

    So maybe we were talking of different things, I'm trying to use the formal language definition (and not the intuitive) of "context-free", so it doesn't deal with semantics (i.e. that type mismatch etc. isn't allowed), but only strictly with the rules of the grammar. In the light of this, I guess (but I'm still open to corrections as always) that C# as a formal language is fully context-free, I guess the rules that apply on yield or LINQ stuff have also one nonterminal on the left side. Maybe this is over-discussed I know, but I want to be 100% sure and I'm interested in the details, so thx for your replies

     


     

     

     

     

     

     

    Friday, March 07, 2008 2:19 PM
  • Nimrand is absolutely right.  The only sensible definition of syntax is the set of strings that a compiler accepts.  

    For example, no compiler will accept the declaration

    int A, A, A;

    or allow

    F + G + H + 3

    if F is a function and G is a string and H is the name of an exception.

    A context-free grammar for C# defines a set of strings that include the acceptable ones, but does not exclude context-sensitive requirements on those strings, as above.

     

    Monday, April 26, 2010 5:29 PM
  • Tergiver's answer does not answer the question asked.

    (That is, theoretically in 2nd Chomsky class etc.)

    The question was asked with reference to the formal language concept of a Context Free Language, so

    All computer language grammers are context free.

    is completely wrong. F# and python are some examples of languages with a context-sensitive grammars. Any language that has whitespace sensitive syntax is likely context-sensitive and therefore not context-free.

    "Mountains poop lemonade." is a grammatically (syntactically) valid sentence, but it's meaningless (semantically). I think you'll find that "Context" doesn't have a widespread general definition in the science of semantics like it does in the science of grammar.

    I think AnthonyM8 has the most relevant answer.

    Sunday, June 24, 2012 11:46 PM