The Chomsky Normal Form (CNF), proposed by the linguist Noam Chomsky, is a normalized version of the CFG with a standard set of rules defining how production rule must be written.

A grammar where every production is either of the form A → BC or A → c (where A, B, C are arbitrary variables and can arbitrary symbol).

Example:

S → AS | a

A → SA | b

(If a language contains ε, then we allow S → ε where S is a start symbol, and forbid S on RHS.)

Why Chomsky Normal Form?

The key advantage is that in Chomsky Normal Form, every derivation of a string of n letters has exactly 2n − 1 steps.

Thus: one can determine if a string is in the language by exhaustive search of all derivations.

Conversion to CNF.

The conversion to Chomsky Normal Form has four main steps:

- Get rid of all ε productions.
- Get rid of all productions where RHS is one variable.
- Replace every production that is too long by shorter productions.
- Move all terminals to productions where RHS is one terminal.

Eliminate ε Productions.

Determine the nullable variables (those that generate ε) (algorithm given earlier).

Go through all productions, and for each, omit every possible subset of nullable variables.

For example, if P → AxB with both A and B nullable, add productions P → xB | Ax | x.

After this, delete all productions with empty RHS.

Eliminate Variable Unit Productions

Unit production is where RHS has only one symbol.

Consider production A → B. Then for every production B → α, add the production A → α. Repeat until done (but don’t re-create a unit production already deleted).

Replace Long Productions by Shorter Ones.

For example, if have production A → BCD, then replace it with

(In theory, this introduces many new variables, but one can re-use variables if careful.)

Move Terminals to Unit Productions.

For every terminal on the right of a non-unit production, add a substitute variable.

For example, replace production A → bC with productions A → BC and

Example.

Consider the CFG:

S → aXbX

X → aY | bY | ε

Y → X | c

The variable X is nullable, and so, therefore, is Y.

After elimination of ε, we obtain:

S → aXbX | abX | aXb | ab

X → aY | bY | a | b

Y → X | c

Example: Step 2.

After elimination of the unit production Y → X,

we obtain:

S → aXbX | abX | aXb | ab

X → aY | bY | a | b

Y → aY | bY | a | b | c

Example: Steps 3 & 4

Now, break up the RHSs of S; and replace a by A,

b by B and c by C wherever not units:

S → EF | AF | EB | AB

X → AY | BY | a | b

E → AX

F → BX

Summary

There are special forms for CFGs such as Chomsky Normal Form, where every production has the form A → BC or A → c. The algorithm to convert to this form involves (1) determining all nullable variables and getting rid of all ε-productions, (2) getting rid of all variable unit productions, (3) breaking up long productions, and (4) moving terminals to unit productions.