Interpreter pattern in PHP
Some recurring problems can be expressed as sentences in simple language which is usually called Domain Specific Language (DSL). Interpreter pattern allows you:
- To express the grammar rules of a DSL as classes,
- To represent sentences in DSL as a tree of objects. Each object is an instance of some class from p.1. This tree is usually called AST (Abstract Syntax Tree).
- To define methods to interpret sentences in DSL that is the methods to evaluate AST.
I will develop this succint introduction in a moment.
BNF of DSL
A grammar of a DSL can be defined using [Backus–Naur form] (BNF). It is a notation technique for the exact description of languages syntax.
BNF is a finite set of rules (sometimes called “productions”). Each rule is written as:
<symbol> ::= __expression__
<symbol> is an artificial name (variable) for a language construct. Tt is
__expression__. In other words,
<symbol> is derived from
__expression__; that is why BNF rules are called derivation rules. This means
<symbol> can be replaced by
__expression__ can consist of:
- individual characters (
- individual words (
- any string (
to be or not to be),
- alternation (choice) of characters, words, strings : (
come | leave | stay). The
|character separates the alternatives,
- other symbols or even recursively include the symbol that is being currently defined.
There are two kind of symbols: terminal and non-terminal ones.
A symbol is called terminal if its
__expression__ does not include another
symbol. Terminal symbols are evaluated immediately:
<terminal symbol> ::= good dog # corresponds exactly to the text "good dog" in a sentece
<terminal symbol with alternation> ::= one | two | three # corresponds to one of these values: "one" or "two" or "three"
Non-terminal symbols are defined through other symbols that in turn can be both terminal ones and non-terminal ones:
<description> ::= a <features> <animal> <features> ::= <single feature> | <single feature> <features> <single feature> ::= red | kind | young <animal> ::= dog | cat
<description> corresponds to many velues: “a red dog”, “a young red dog”,
“a red young dog”, “a kind kind king dog”, etc.
<description>is a non-terminal symbol defined through another symbol
<features>symbol is defined through two symbols :
and itself. It’s a recursive definition.
<single feature>is a terminal symbol with three alternative vales.
<animal>is a simple terminal symbol with two alternative vales.
Interpreter pattern suggests crating a class for each derivation rule of BNF.
For the example above it would be
It’s clear that the
Description class aggregates one instance of
Features and one instance of
Features class in turn aggregates an
SingleFeature and optionally an instance of
Features are non-terminals. The classes
Animal do not aggregate other objects, that is why they
While the classes represent grammar rules of a DSL, the objects of these classes represent sentences of the DSL. More precisely a three of objects represent a sentence. This kind of tree is usually called [Abstract Syntax Tree] (AST). E.g. the sentence “a big kind dog” is represented by this AST:
AST clearly depicts the difference between terminal and non-terminal symbols: terminal symbols are always the leafs of AST while non-terminal symbols always have children.
The process of constructing AST from the sentence in DSL is called parsing. It worth nothing that parsing is out of the scope of the Interpreter pattern. In fact the pattern defines a method to evaluate an AST that was constructed beforehand. It is literally some public method of the classes representing grammar rules and available in every object of AST.
Implementation: BNF of DSL
Imagine you want to provide the users of your library/API/application with a possibility to write sentences in some basic language. These sentences are assertions or predicates that are evaluated and produce a result. Let us say that these sentences act as security expressions to permit or reject client’s requests to some resources. Every request contains one of these actions: “view”, “change”, “delete”. Every resource is in one of these states: “processing”, “treated”, “rejected”.
For example, this expression results to “true” or “false” depending on the current request’s action:
This depends on the current state of a resource:
inState('treated') OR inState('rejected')
A change request to a resource in the processing state:
doAction('change') AND inState('processing')
A delete request to a resource in the processing or in the treated state:
doAction('delete') AND (inState('processing') OR inState('treated'))
The BNF of this DSL is:
<expression> ::= <doAction> | <inState> | <alternation> | <sequence> | (<expression>) <alternation> ::= <expression> OR <expression> <sequence> ::= <expression> AND <expression> <doAction> ::= doAction(<permission>) <inState> ::= inState(<state>) <action> ::= view | change | delete <state> ::= processing | treated | rejected
Implementation: the pattern
The corresponding classes are
It is a common interface for all grammar rules. Also it is a common interface for all nodes in AST.
It is the expression that evaluates boolean
Is is the expression that evaluates boolean
doAction function. It retrieves the current request action from
$context and compares it to its argument in a sentence.
An action (
delete) in a sentence.
inState function. It retrieves the current request action from
$context and compares it to its argument in a sentence.
A state (
rejected) in a sentence.
For example, AST for the sentence
doAction('change') AND (inState('processing')
OR inState('treated')) is constructed as follows:
AST looks like:
InState classes require a
$context that holds current
request’s action and current resource state:
Other dependencies (parameters, services, couple of variables) could replace
$context if appropriate.
AST is evaluated recursively. Terminal nodes are evaluated immediately while non-terminal nodes are evaluated using the values of theirs children. But sometimes a node needs to know the evaluated value of its sibling node or subtree. In that case Interpreter pattern suggests storing already evaluated values in the evaluation context in the form of array or hash or object. In this way a node can obtain a evaluated value of its sibling node.
Check out the full code as well as its tests.