Regular Language

  • The weakest formal languages widely used
  • Many applications are simply not regular languages
    • ${(^{i})^{i}} i >= 0$
    • Regular language can never parse balance brackets
    • Nested if ... fi structures
      if ...
        if ...


  • Inputs: list of tokens
  • Outputs: parse tree of the program (AST )

Context-Free Grammars

  • Not all strings of tokens are programs
  • Parser must distinguish between valid and invalid strings of tokens
  • We need
    • A language for describing valid strings of tokens
    • A method for distinguishing valid from invalid strings of tokens
  • Programming languages have recursive structure
  • An EXPR is
if EXPR then EXPR else EXPR fi
whie EXPR loop EXPR pool
  • Context-free grammars are a natural notation for this recursive structure
  • A CFG consists of
    • A set of terminals
      • should be tokens of the language
    • A set of non-terminals
    • A start symbol
    • A set of productions (产生式)
      • X -> Y1...Yn


  • A derivation is a sequence