The meaning of “life” and other NLP stories

A Pythonic introduction to compositional semantics for language enthusiasts.

The meaning of “meaning”

“ We are stuck with technology when what we really want is just stuff that works .” — D. Adams

E ven with all the promises by tech giants in this A.I. era and all the brand new gadgets rapidly filling up our homes, it is still fairly easy to stumble upon frustrating examples of complete misunderstanding of human language by machines. After dozens of million of dollars, we got this:

Some A.I. epic fails — I’m sure you have your own top 10.

while we were expecting this (where’s my jetpack, by the way?):

A.I. writing a piece of music for you (awwww).

Why?

The short answer is that understanding language is very hard , as solving the language riddle means navigating through a network of equally hard questions on the limits of our cognitive abilities, the power of logical representations and the quirks and biases of human societies: years after Plato’s thoughts on the matter, we are still really far from having a good theory of it.

Central to “ natural language processing ” (or “ natural language understanding ”, NLU , as the cool kids say these days) is obviously the concept of meaning : it could be argued that one of the reasons we are still behind in NLU is that we don’t have a good, unified perspective on what meaning is. For technical and historical reasons, the literature is somewhat split between two perspectives:

there’s a “statistical” view, as exemplified for example by word2vec and related works , carried out mostly within the machine learning community: the focus here is mostly on lexical items and semantic relations like synonymity; typically, these analyses become the backbone for downstream systems addressing challenges such as sentiment analysis and text classification . In a slogan, meaning is a vector in a multi-dimensional semantic space;
there’s a “functional” view on meaning, as exemplified for example by this and related works , carried out mostly by linguists , philosophers of language and logicians: the focus here is mostly on systematic rules of inference and semantic relations like entailment; typical tasks are automated reasoning and knowledge representation. In a slogan, meaning is a function from pieces of language (e.g. nouns, connectives, etc.) to set-theoretic objects (e.g. elements of a set, functions, etc.).

While the distinction is obviously a bit simplistic (although, for example, very close to the background for Baroni et al. ), it is a good first approximation: the first approach is more pragmatic and good at explaining relationships between lexical concepts (“ man is to king , what woman is to… ? ”); the second is more abstract and good at explaining how words combine together to produce complex concepts.

We believe both traditions enabled great progress in our understanding of language, but of course neither alone will get us to HAL 9000 . Since there’s a ton of recent material on the first tradition already available , this small post will serve as our (opinionated) introduction to basic concepts in the second.

DISCLAIMER: this post is mainly written for data scientists and educated readers that are familiar with the statistical NLP toolkit (from back-off language models , to word2vec ) but have never been exposed to the thrills and perils of formal semantics. Following the old saying “ What I cannot create, I do not understand”, we reinforce the concepts sharing a small Python project (full details at the very end). On the other hand, readers already familiar with model theory but not much with modern programming languages may use the repo to see how “semantics computations” can be expressed through code (please note we will slightly abuse notation and terminology here and there to focus on building intuitions more than develop a formal system).

The code for this post is freely available on GitHub .

The language of thoughts (or: how to express complex concepts with simpler ones)

“ The proposition is the expression of its truth-conditions. ” — L. Wittgenstein, Tractatus Logico-Philosophicus (4.431)

To understand formal semantics as a discipline (and how it differs from other approaches), we need to go back to a crazy Austrian dude at the start of the 20th century. What does it mean to understand the meaning of a sentence?

To understand the meaning of a sentence means understanding its truth conditions, that is, understand how the world would look like, if the sentence were true.

So, to make “Lions are bigger than domestic cats” true, the world should be such that a given type of felines is bigger than another (true); to make “Lions are bigger than blue whales” true, the world should be such that a given type of felines is bigger than a given type of aquatic mammal (false) (please note: the fact that we can establish if the sentence is true/false has nothing to do with understanding it; everybody understands “The total number of cats in Venice in January 1st, 1517, was odd.”, but nobody knows if it’s true).

So if we buy that meaning = truth conditions , isn’t the problem solved? Actually nope , since the number of possible sentences is infinite : there is no list, however big, that will give us “all the truth conditions of English”. It should never cease to amaze the reader that the following sentence — most likely written here for the first time in history — can be understood without effort:

Forty-six penguins were lost in the Sahara desert after a fortuitous escape from a genetic lab in Ciad.

How does that happen? How can limited minds with limited resources understand infinitely many things?

Formal semantics is like playing infinite LEGO: complex LEGOs are build using simpler ones, and simpler LEGOs are built with basic LEGO bricks; if you know how bricks can be combined and have some bricks to start with, there are countless things you can create. Pretty much in a similar fashion, the (to be defined) meaning of a sentence is predictably built out of the (to be defined) meaning of its constituents: so if you know the meaning of penguins and Sahara , you can understand what it means for a penguin to be lost in the desert.

Formal semantics is the discipline studying the instruction set in which the bricks of our language can be put together.

If all this seems pretty straightforward to humans, it will be good to examine compositionality in some well-known NLP architectures. Take for example what happens with the two sentences below and DeepMoji , a neural network that suggests emojis (the example comes from ourA.I. opinion piece):

My flight is delayed.. amazing.
My flight is not delayed.. amazing.

The same emojis are suggested for the sarcastic vs normal sentence (original video here ).

The two sentences differ for just one word ( NOT ), but we know that word is “special”. The way in which not contributes to (yes!) the truth conditions of the sentences above is completely ignored by DeepMoji, which does not possess even a very elementary notion of compositionality; in other words, adding negation to a sentence does not typically “move the meaning” (however construed) by a few points on an imaginary “meaning line” (like adding “very” to “This cake is (very) good”), but completely “reverses” it.

Whatever “language understanding” is embedded in DeepMoji and similar systems, we need a completely different way to represent meaning if we are to capture the not behavior above. The story of formal semantics is the story of how we can use math to make the idea of “language LEGO” more precise and tractable.

Mind you, it’s not a happy ending story .

Semantics 101

“There is in my opinion no important theoretical difference between natural languages and the artificial languages of logicians.” — R. Montague

A crucial thing about meaning is that there are two elements to it — recall the weird Austrian dude’s definition above:

…understand how the world would look like, if the sentence was true.

So there is a sentence , sure, but there is also the world : meaning, in its essence, is some kind of relation between our language and our world (technically, a plurality of worlds , but things get complicated then). Since the world is a fairly big and unpractical thing to work with, we use objects from set theory as our model of the world . Before formulas and code, we’ll use this section to build our intuition first.

Our first toy language L is made by the following basic elements:

names = ['Jacopo', 'Mattia', 'Ryan', 'Ciro']
predicates = ['IsItalian', 'IsAmerican', 'IsCanadian']
connectives = ['and']
negation = ['not']

The basic elements can be combined according to the following syntactic rules:

a "name + predicate" is a formula
if A is a formula and B is a formula, "A connective B" is a formula
if A is a formula, "negation A" is a formula

which means that the following sentences are all part of L :

Jacopo IsItalian
Mattia IsAmerican and Jacopo IsItalian
not Jacopo IsItalian
...

It is now time to introduce semantics : while we can be tempted to interpret L using some background knowledge (e.g. my first name is “Jacopo”) is absolutely crucial to remember that sentences in L have no meaning at all. Since we expect the meaning of complex things to be built up from simpler ones, we will start with the meaning of names and predicates, since “name + predicate” is the simplest sentence we need to explain. We start with a domain of discourse D , which is a set with some elements and some subsets, and we then say that:

the meaning of a name (its “denotation”) is an element of D;
the meaning of a predicate (its “extension”) is a subset of D.

D is a generic “container” for our model: it’s just a “box” with all the pieces that are needed to represent meaning in L. If you visualize a sample D (below), it is easy to understand how we define truth-conditions for “name + predicate” sentences:

if A is a “name + predicate” sentence, A is true if and only if the denotation of name is in the extension of predicate .

A sample domain for our toy language L.

So, for example:

“Jacopo IsItalian” is true if and only if the element in D representing Jacopo is a member of the set representing IsItalian ;
“Jacopo IsCanadian” is true if and only if the element in D representing Jacopo is a member of the set representing IsCanadian .

As we learned, truth conditions don’t tell you what is true/false, but tell you how the world (better, your model of the world) should look like for things to be true/false. Armed with our definition, we can look again at our D and we can see that, in our case, “Jacopo IsItalian” is true and “Jacopo IsCanadian” is false.

The extension of “isItalian” contains the denotation of “Jacopo” (in purple).

When a sentence in L is true in our set-theoretic, small world, we also say that the sentence is satisfied in the model (technically, being true for sentences is a special case of being satisfied for generic formulas). Now that we have defined truth conditions for the basic sentences, we can define truth conditions for complex sentences through basic ones:

if A is formula and B is a formula, “A and B” is true if and only if A is true and B is true.

So, for example:

“Jacopo IsItalian and Mattia IsAmerican” is true if and only “Jacopo IsItalian” is true and “Mattia IsAmerican” is true.

Since “Jacopo IsItalian” and “Mattia IsAmerican” are “name + predicate” sentences, we can now fully spell out the meaning:

“Jacopo IsItalian and Mattia IsAmerican” is true if and only the element in D representing Jacopo is a member of the set representing IsItalian , and the element in D representing Mattia is a member of the set representing IsAmerican .

Armed with our definition, we can look at D and see that “Jacopo IsItalian and Mattia IsAmerican” is false as “Mattia IsAmerican” is false:

The extension of “isAmerican” does not contain the denotation of “Mattia” (in blue).

Finally, we can see in our semantics how negation is indeed a “reversing” operation:

if A is a formula, “not A” is true if and only if A is false.
“not Jacopo IsItalian” is true if and only “Jacopo IsItalian” is false.
“not Jacopo IsItalian” is true if and only if the element in D representing Jacopo is a not member of the set representing IsItalian .

Obviously, specifying truth-conditions for our toy language L is not terribly useful to build HAL 9000. But even with this simple case, two things should be noted:

our semantics is fully compositional and allows, in a finite way, to assign truth conditions to an infinite number of sentences : there is no possible sentence in L left out by our definition of meaning. More expressive languages will have (much) more complex compositional rules, but the general gist is the same: a finite set of instructions automatically generalizing to an infinite number of target sentences;
our choice of D was just one world among many possibilities: we could have chosen a world where “Mattia IsAmerican” is true, and our semantics would have been the same — remember, semantics assigns truth conditions, but it’s silent on how these conditions are actually satisfied. In real world applications we are often interested in truth as well, so we will need to couple semantics with a “knowledge base”, i.e. specific facts about the world we care about: when modeling a real world phenomenon, D should be construed to be “isomorphic” to it, so that “true in D ” will mean the same as “true in the domain of interest”.

The expert reader may well have guessed already how we can build an application of immediate value by leveraging (1) and (2) above: (1) guarantees that the knowledge encoded by semantics generalizes well; (2) guarantees that, insofar as we chose our target domain carefully, the satisfaction algorithm will evaluate as true all and only the sentences whose truth we care about.

In particular, even the simpler program in computational semantics (such as code that checks satisfaction for arbitrary formulas) can be seen as an instance of querying as inference (as championed here ):

given a state of the world as modeled in some useful way (e.g. a database), can a machine automatically answer our questions about the domain of interest?

In the ensuing section we are going to explore a slightly more complex language in such a setting.

[ Bonus technical point : if semantics does not constrain truth in any way — i.e. as far as semantics goes, a world where Jacopo isItalian is true is just as good as one in which Jacopo isCanadian is true — is it helpful at all by itself? Yes, very, but to know why we need to understand that the core concept of semantics is indeed entailment, i.e. studying under which conditions a sentence X is logically implied by a set of sentences Y 1 , Y 2 , … Y n . In particular, the real question semantics is set to answer is:

given a domain D, a sentence X, a sentence Y , if X is true in D , is Y necessarily true as well?

Entailment is also the key concept of proof theory: in fact, we have an amazing proof of the relation between deductive systems and semantics , but this note is too small to contain it.]

“Querying as inference” using computational semantics

“In order to understand recursion, you must first understand recursion.” — My t-shirt

Let’s say the following table is a snippet from our CRM:

A sample customer table recording customers and payments.

There are a lot of interesting questions we may want to ask when looking even at a simple table like this:

Did all customers pay?
Did Bob specifically pay?
Did Bob pay five dollars?
… and so on

We can put our framework to good use, formulate a semantics for this domain and then query the system to get all the answers we need (a Python notebook sketching this use case is also included in the repo ). The first step is therefore to create a language to represent our target domain, such as for example:

names = ['bob', 'dana', 'ada', 'colin'] + digits [0-9]
unary predicates = ['IsCustomer', 'IsPayment']
binary predicates = ['MadePayment', 'HasTotal']
quantifiers = ['all', 'some']
connectives = ['and']
negation = ['not']

Our language allows us to represent concepts like:

there is a thing in the domain of discourse which is a customer named bob

there is a thing ... X which is a customer, a thing Y which is a payment, and X made Y

there is a thing ... which is a payment and has a total of X

The second step is building a model which faithfully represents our table of interest. In other words, we need to build a domain of objects, a mapping between names and objects, and properly construe predicate extensions such that properties as specified in the table appear to be consistently represented in the model:

domain: [1, 2, 3, 4, 5, 6],
constants: {'bob': 1, 'dana': 2, 'ada': 3, 'colin': 4},
extensions: {
  'IsCustomer': [[1], [2], [3], [4]],
  'IsPayment': [[5], [6]],
  'MadePayment': [[1, 5], [2, 6]]
  ...
}

Once that is done, we can query the system and let the machine compute the answers automagically:

Did all customers pay? becomes the query For each thing x, if x IsCustomer, there is a y such that y IsPayment and x MadePayment y , which is evaluated to False [ Bonus technical point : for the sake of brevity, we have been skipping over the exact details involving the semantics of all , whose meaning is far more complex than simple names; the interested reader can explore our repo to learn all the technical steps needed to compute the meaning of all and some ].
Did Bob pay? becomes the query There is an x such that x IsPayment and bob MadePayment y , which is evaluated to True .
Did Bob pay 5 dollars? becomes the query There is an x such that x IsPayment and bob MadePayment x and x HasTotal 5 , which is evaluated to True [ Bonus technical point : to quickly extend semantics to handle number comparisons, we had to i) introduce digits in the grammar specifications and ii) modify the definition for satisfaction in atomic formulas to make sure that digits are mapped to themselves. Obviously, including numbers in full generality would require some more tricks: the very non-lazy reader is encouraged to think how that could be done starting from the existing framework!].

Isn’t this awesome? If our model mirrors the underlying customer table, we can ask a virtually infinite number of questions and make sure to be able to precisely compute the answers — all with a few lines of Python code.

From toy models to reality

“In theory there is no difference between theory and practice. In practice, there is.” — Y. Berra

The “querying as inference” paradigm has all the elegance and beauty of formal logic: a small and well-understood Python script can be used to answer potentially infinite questions over a target domain. Unfortunately, it has also all the drawbacks of formal logic, that makes its immediate use outside the lab not as straightforward as you would hope:

semantics as we defined it is limited to express somewhat basic concepts and relations, but we would love to do much more (for example, we would love to sum over numbers in our customer table above). While it’s possible to extend the framework to cover increasingly complex structures, that comes with some cost in complexity and manual effort;
model building in real use cases requires lots of hard decisions: in our toy customer table example, we were still required to make non-trivial choices on how to map table rows to a domain that can be formally queried. The more complex the use case, the harder it is for data scientists to produce a compact, complete and extensible formal domain;
querying is done in a formal language which is not exactly human friendly: the user would have to know how to translate English into some logical dialect to get the desired answers. Of course, a much better UX would be to provide users with an English search bar and put an intermediate layer translating from natural to formal languages — some of the work we have been doing at Tooso exploits a version of this idea to make querying as human friendly as possible [ note for the historically inclined readers : defining semantics for a formal language F and then providing an English-to- F translation goes back to the seminal PTQ paper].

These scalability concerns and other technical reasons (such as limitations with fully general inference in first-order logic ) havehistorically prevented computational semantics to become as pervasive in industry as other NLP tools. In recent times, some research programs have been focused on bridging the gap between the vector-based and the set-theory-based view of meaning, in an effort to take the best of both worlds: scalability and flexibility from statistics, compositionality and structure from logic. Moreover, researchers from the probabilistic programming community are working within that framework to combine probability and compositionality to systematically account for pragmatic phenomena (see our own piece on the topichere).

At Tooso , our vision has always been to bridge the gap between humans and data. While we believe no single idea will solve the mystery of meaning , and that many pieces of the puzzle are still missing, we do think that there is no better time in the history of humanity to tackle this challenge with fresh theoretical eyes and the incredible engineering tools available today.

Before we solve the language riddle in its entirety, there are a lot of use cases requiring * some* language understanding which can unlock immense tech and business value.

As a final bonus consideration, going from science to “the bigger picture”, let’s not forget that after this post we should now be ready to finally know what is the meaning of “life” (anecdote apparently due to famous semanticist Barbara Partee ): we would have to translate it to a constant symbol life , and use an operator such as | to indicate that we are talking about its extension in our model. So, in the end, the meaning of “life” is |life . Maybe this is what the crazy Austrian dude meant when he said:

Even when all possible scientific questions have been answered, the problems of life remain completely untouched.

But this is obviously a completely different story.

See you, space cowboys

That’s all, folks: the mathematically inclined reader interested in receiving a more formal treatment of semantics and, generally speaking, topics in formal logic, is invited to start with Language, Proof and Logic , continue with Computability and Logic and finally explore the fascinating notion of possible worlds with First-Order Modal Logic .

If you have requests, questions, feedback, please get in touch with [email protected] .

Don’t forget to get the latest from Tooso on Medium , Linkedin , Twitter and Instagram .

Acknowledgments

Thanks to all members of Tooso team for suggestions and feedback on a previous draft of this article.

Tarski in the age of Python 3

“Truth can only be found in one place: the code.”

― R.C. Martin

The companion Github repo contains a working “model checker” in Python 3.6, i.e. a Python project that given a formula and some domain, automatically evaluates if the formula is satisfied or not.

While the code is heavily commented and pretty easy to follow, we provide here a very high-level description of its main parts; although we are not aware of other Python checkers built in similar spirit, the code was written as an educational tool for this blog and related projects and not as a high-performance software (interestingly, Bos and Blackburn also lament in their book that implementations of “vanilla” first-order checkers are very hard to come by). The project structure is as follows:

project_folder

notebooks
      tarski-2-pandas.ipynb

fol_main.py
   fol_grammar.py
   fol_semantics.py
   fol_models.py

test_fol_semantics.py

README.md
   requirements.txt

The core files are the following:

fol_main.py shows how to load a model from the static collection and instantiate the classes for grammar and semantics and evaluate an expression.
fol_grammar.py is the class handling the syntactic parts of the checker — it uses lark internally to parse an FOL-like expression and it has a built-in recursive function to retrieve free variables in a formula. If you want to extend/change the vocabulary or the syntactic conventions, this is where you should start.
fol_semantics.py is the class handling the semantics— the class exposes the check_formula_satisfaction_in_model function, which, taken an expression and a model, will evaluate the formula as True/False in the model. The class defines satisfaction in a model with partial assignments, following this more than the classical Tarski ’s work. If you want to add semantics rules or adapt satisfaction to cover additional data structures (say, a database instead of a model specified in Python), this is where you should look.
fol_models.py contains some basic models that should get you started to explore the behavior of the checker (some models are also very useful for testing purposes). If you have your own target domain to model, you can add here a Python object following the examples provided and then use fol_main.py to invoke the checker on that model.
test_fol_semantics.py contains a series of tests (we typically use pytest for our Python testing) to make sure the checker behaves as expected under different conditions.

Happy coding!

The meaning of “life” and other NLP stories