Jake's CS Notes - Automata and Complexity

# Pumping Lemma for CFGs

## March 10th, 2020

- THE PLAN (for the day's lesson...a "lesson plan", if you will)
    - Pushdown automata + equivalence to CFGs
    - Pumping lemma for CFGs

- Our dear Professor Peng(uin) was 2 minutes late. TWO! THE INDIGNITY!
    - "Well, with the coronavirus, I guess this is what reduced human interaction looks like"
    - "Also, I thought the rain would help my allergies, but no..."
--------------------------------------------------------------------------------

- Today, we're going to look at pumping lemma for CFGs - "it's very similar to our old pumping lemma, and the basic idea is that at some point part of the parse tree has to repeat"

- The CFG pumping lemma looks like this:
    - If $L$ is context free
    - Then there exists some $p$ such that
    - For ANY $w \in L$ with $|w| > p$,
    - There's a split of $w = uvxyz$ with:
        - $|vxy| \leq p$
        - $|vy| > 0$
    - Such that for ALL $i >= 0$,
    - $uv^ixy^iz \in L$

- As we talked about last week, a context-free grammar has an alphabet, variables it can use, a starting state of those variables, and rules for transforming those variables
    - We start with S, keep recursively applying rules to it, and we say a string is in L if we can generate it by taking some path of rules
        - Using this, we were able to decide $0^n1^n$ by saying:

            $$
            S -> \epsilon | 0S1
            $$

    - The parse tree then represents the sequence of rules we took to get some output, with each node representing a single variable/character
        - CFGs have a fixed size, so the rough idea for the pumping lemma proof is that if we have a parse tree somewhere and the height of the tree is larger than the number of variables |V|, there must be some path from a leaf to the root that repeats a variable - cool!
        - Once we find that repeated variable, we can take that intermediate loop where it repeats and duplicate it a bunch of times - the "v" in the pumping lemma then corresponds with the parts generated by the left branches of this loop, "y" represents the part on the right side, and then "x" represents all the stuff in the middle (i.e. the leaves generated AFTER this loop)
            - "u" is all the stuff above and to the left of that repeated subtree, and "z" is the same but above and to the right
            - From this, we know that |vxy| <= p, since the height of the repeated section is at most |V| and the total size of the subtree <= (max rule size)^|V| (?)

- So, to show a language is NOT context-free, we need to find a bad string that breaks this, meaning that for ALL splits we can find a |vxy| such that if we pump it, something bad happens

- Let's take the example of $L = \{ a^nb^nc^n | n \geq 0\}$
    - *At this point Professor Peng left the room, removed his sweater, and then came back sweater-less*
    - So, to show that this isn't context-free, we can no longer pump just the beginning of the string; instead, we need to take advantage of having 3 characters now by showing that we can change one side of the string without changing the other
    - For instance:
        - Let p be *any* value (not necessarily the pumping length), and consider the string $w = a^{2p}b^{2p}c^{2p}$ (the 2s are just to be safe)
        - Then, for ANY split of $w$, we have several cases:
            - Since $|vxy| <= p$, we know vxy is either all $a$s, all $b$s, all $c$s, $a$s and $b$s, or $b$s and $c$s
            - In ALL of these cases, observe that i=2 $\implies$ $uv^2xy^2z$ will have a different number of $a$s, $b$s, and $c$s, since there'll always be at least 1 character we're not generating more of
        - Therefore, this string breaks the language!
    - Thus, this language isn't context-free!

- What does this have to do with regularity? Well, if a language is NOT context free, then that also means it can't be regular - and similarly, if a language IS regular, we can represent it as a CFG!
    - Why? Well, if a language is regular, we can turn the CFG split $w = uvxyz$ into the regular pumping lemma split $w = xyz$ by just saying $u$ and $v$ are the empty string!

- Alright, let's look at another example of this: show $L = \{ww | w \in \{0, 1\}\}$ isn't context-free
    - To show this wasn't regular, we used the string $w = 0^p1^p0^p1^p$
    - For CFGs, we can do something similar: we know $|vxy| \leq p$, so there are 3 cases we need to consider:
        - If $vxy$ is entirely within a block of characters in one of the strings, pumping it will result in an unmatched number of characters with the same block in the duplicated string
        - If $vxy$ crosses between the 1st and 2nd block in one of the strings (i.e. some 0s and some 1s), we have to be careful to prove that pumping it can't possibly result in a squared string at ALL; it can't just break the form of the $w$ string we picked, but it has to break the rules for ALL strings in the language
            - To prove this, let's instead change our string to $w = 0^{2p}1^{2p}0^{2p}1^{2p}$
            - If we now consider all splits where $x$ starts in the first block of 0s and $y$ ends in the 1s, then when i=2, then the string $uvxy$ will have a length of at most $3p$ and $uv^2xy^2$ will be at most $4p$; therefore the next $p$ 1s will get pushed to the right, so the 2nd half of the string must start with a 1 - which contradicts our language definition!

- "Okay, I'm going to go contribute towards the rising prices of disinfectants right now and buy hand sanitizer - see you on Thursday!"