The syntax of English is undecidable
The question of parsing English and other natural languages has come up in the course of my work on the Marpa parser. As in the case of Perl, I first posed the question of whether any algorithm running on a Turing machine can parse the target language. This post contains what I hope the reader will find to be a rigorous demonstration that the syntax of the English language is undecidable.
When I say “undecidable”, I mean that term in the strict sense. Undecidability is not vagueness or uncertainty -- undecidability is the certainty that a “decision” of the matter is not possible. I will give a specific example of an English-language sentence which is unsyntactic if and only if it is syntactic.
The sentence hinges on the distinction between sentences in the passive voice, and sentences which contain a copular verb (in this case, “to be”) and an adjectival complement. For example,
(A) “The door was opened”is a passive voice sentence. In the sentence
(B) “The door was open”,on the other hand, “open” is an adjectival complement, and “was” is a copular verb. (While the passive voice of English is the subject of considerable discussion, the terminology and analysis of this post follows Quirk et al., A comprehensive grammar of the English language, pp. 159-171. and Language Log, and sticks to terrain which should not be controversial, even for non-specialists.)
Now consider the following sentence.
(C) “The window was closed.”Is this a sentence with a predicate in the passive voice, or is “was” a copular verb with an adjectival complement? In standard dictionaries, you will find “closed” is listed both as an adjective and as the past participle of the verb “close”, so that either could be the case. Further information would be needed to decide the syntax of (C).
Next I make two observations. The first observation is that a sentence can be syntactically correct, but not meaningful. The traditional example is
(G) “Green dreams sleep furiously”.Another example is
(H) “The first even prime greater than two crossed the street symmetrically”.Both these sentences allow easy syntactic analysis: each has a subject, and a predicate. Nouns, adjectives, adverbs and verbs can be readily identified and given fixed locations in a formal syntactic structure. But neither sentence has any meaning, except in a poetic or highly figurative sense.
The second observation is that, while we can have correct syntax without a semantics, we cannot have meaning without syntax. That is, sentences like
(I) “Airplane from to the and for of or pilot up”,while they might contain words which could be selected and rearranged to be meaningful, do not mean anything.
We now return to the passive-versus-adjective decision, where we observed that
(P) “The window was closed”could be an example of either a verb in the passive voice or of a copular verb with an adjectival complement. In the sentence
(Q) “The window was closed, but the door was open”,the question is resolved by coordination. Since in (Q) “open” can only be an adjectival complement, the expectation is that “closed” will also be. Similarly, in the sentence
(R) “The window was closed, but the door was opened”,coordination with “opened” decides the matter in favor of the passive voice. In a sentence where the coordination is closer, and includes a by-phrase to show agency for both verbs,
(S) “The window was closed and the door opened by the same person”,the presumption that both verbs are in the passive voice becomes a certainty.
We are now in a position to consider our undecidable sentence,
(U) “The window was closed and the door opened by the burglar after he discovered that the window was in fact a beautifully executed trompe d'oeil mural.”The window is fake. It therefore could not have been closed by the burglar, which makes its syntax wrong. The verb “was closed”, before semantic feedback is taken in account, clearly is in the passive voice. After the semantic feedback to the syntax is considered, however, it is clear that “was closed” cannot be in the passive voice, and that therefore (U) is unsyntactic.
But if a sentence does not have correct syntax, it cannot have a semantics. Since the only problem with the syntax of (U) was a result of the semantic feedback, if semantics is not considered (U) has correct syntax. So (U) is incorrect syntactically if and only if it has correct syntax.
We cannot decide (U) to have incorrect syntax, because in that case it becomes meaningless and the syntax becomes unobjectionable. But neither can we decide (U) to have correct syntax, because that makes (U) meaningful. When (U) is meaningful, it has incorrect syntax, because the passive voice cannot be correct given the meaning.
Our only choice is to say that the correctness of the syntax of (U) cannot be decided. It is not true that (U) has correct syntax, but neither is it true that (U) has incorrect syntax. This concludes the demonstration.
A Thought Problem
To a reader unconvinced by the preceding demonstration, I would urge that, as a thought problem, she consider a computer program attempting to parse English. Such a program would have to know at least some semantics, and would have to feed this knowledge back to the parsing process. But the preceding demonstration shows that such a feedback loop, if completely effective, will encounter sentences like (U), where it cannot decide whether the syntax of the sentence is correct or incorrect. I believe that, by reflecting on this thought problem, the candid reader will be able to convince herself that a human being parsing English sentences like (U) will have the same problems, and for similar reasons.
About Syntax and Semantics
It needs to be emphasized that the diagonalization in this post's demonstration is not about semantic truth and has nothing to do with semantic paradox. This post is about the feedback from semantics into syntax, and the relationship between them. By way of contrast, consider these sentences:
(X) “Two plus two is five.”(X) is false; (Y) attempts a contradiction but fails; and (Z) achieves a semantic contradiction. But in none of them is the semantics an issue for the syntax, and all three are correct syntactically. In each case, even when there are semantic issues, the feedback that the semantics provide the syntax is unproblematic.
(Y) “This sentence has incorrect syntax.”
(Z) “This sentence is false.”