This is a quick start on langugage for cognitive science students.

A simple formal system

Douglas Hofstadter1 has this tiny formal system called pq-System. In defining formal systems, we start with a set of symbols, called the alphabet. This dictates that any expression of the system must be made by bringing together, in some order, one or more of these symbols. Next, and the last, step is to define which expressions formed out of these symbols are well-formed in the system.

This well-formedness step comes in two sub-steps.

  1. First we define the axioms, those expressions that we take to be well-formed without any further justification.
  2. Second is the rules of production that allow us to generate new well-formed expressions from already existing well-formed expressions.

In pq-System, the alphabet consists of the following three symbols: p, q, and -.

We have infinitely many axioms in pq-System, therefore we cannot list them all. Instead, we provide a rule that generates all of the axioms, usually called an axiom schema.

\(x\vect{p}\text{-}\vect{q}x\text{-}\) is an axiom, whenever \(x\) is composed of only ‘\(\text{-}\)’s.

Given this, you should be able to verify that \(\text{-}\text{-}\vect{p}\text{-}\vect{q}\text{-}\text{-}\text{-}\) is an axiom of the system (=a well-formed expression).

There is only one rule of production:

If \(x\vect{p}y\vect{q}z\) is a well-formed expression, again with \(x,y,z\) all composed of zero or more ‘\(\text{-}\)’s, then so is \(x\vect{p}y\text{-}\vect{q}z\text{-}\).

Now. think of a decision procedure that takes an expression and decides whether it is well-formed or not.


Inspecting the well-formed expressions of pq-System, can you see what do they mean? Or, can you say what they are about? Or, can you say what they are for?

They are theorems of binary addition of non-equal (except \(1 + 1 = 2\)) and non-negative integers. Their meaning is this.

Levels of analysis

Language is studied in levels of analysis. The study of the atomic building blocks of language and how they combine into more complex structures is called phonology. The alphabet analogy, although the best we can do, would be a little misleading here, because the building blocks of language are not the orthographic symbols, but the phonemes, the smallest units of sound that can distinguish meaning in a language.

The study of the properties of well-formed expressions is called morphosyntax. As you observed above, this can be studied without any reference to meaning or significance in any form.

The study of the meaning of expressions is called semantics. And again as you saw above, what we mean by “meaning” is not obvious at all. What sort of a relation is it? Correspondence, aboutness, use (procedural)? If it’s a correspondence, what is the nature of the entities that are related? Are they mental, neural, abstract? All these, and many more, are part of the game in the study of natural language meaning.

The last level that is interest to cognitive science is pragmatics, the study of how language is used in context. The boundary between semantics and pragmatics is not clear, and there are many different views on how to draw it. As we go along, we will see that there might not be much value in drawing such a boundary. Perhaps some boundaries are more useful when they are left fuzzy.

In studying language from a cognitive science perspective, studying morphosyntax on its own does not make much sense.2 The relevant field of inquiry is rather the study of the relation between morphosyntax on one hand, and semantics and pragmatics on the other. Let’s call this domain semantax for the sake of having a name for it.3

Grammar

Some important aspects of human language are:

  1. Compositionality: “The meaning of a compound expression is a function of the meanings of its parts and of the way they are syntactically combined.”4 This is usually cited as Frege’s principle. Fodor and Lepore has a nice way to put it: You cannot understand John loves Mary without also understanding Mary loves John.
  2. Recursion: In principle, there is no end to the expressions that can be generated in a language. This is due to the fact that expressions of a given type can be (directly or indirectly) inserted in expressions of the same type.
  3. Structure dependence: For physical constraints, language is produced in a linear fashion (you can speak one word at a time). But the meaning is “read off” from the hierarchical structure. A necessary consequence of this is that items that are related at the meaning level may find themselves fallen apart on the actual acoustic/visual signal. Take:

    I saw the man with the telescope.

    The sentence can mean two different things: (1) the speaker has the telescope, or (2) the man has the telescope. The two meanings are determined by the hierarchical structure of the expression: in (2) the telescope is attached to the man, in (1), to saw.

In a very broad sense, a grammar regulates the correspondence between form and meaning in a way that manifests the three properties above.

When we talk about syntactic structure, we mean a particular organisation of the components of an expression. For instance the two meanings of the telescope sentence are related to the following (fairly simplified) structures:

  1. [I [saw [the man [with the telescope]]]]
  2. [I [[saw [the man]] [with the telescope]]]

Another way of thinking about structure, which is more transparent to semantics, is to think of it as a particular organization of operators and arguments, like we do in arithmetic.

\[((3 + 4) \times 5)\]

When we go from the linear string of symbols \(((3 + 4) \times 5)\) to its meaning, which is the value of the expression (=35), we work out the structure of the expression in such a way that the operator + is applied to its arguments 3 and 4, and then the operator \(\times\) is applied to its arguments, which are, 7, the value of the first operation (recursion!), and 5.

Similarly, for the telescope example, in one reading the operator saw takes as arguments the subject I and the object the man with the telescope; and in the other reading, the operator saw takes as its object argument the noun phrase the man, rather than the entire phrase the man with the telescope; and this time the operator with takes as arguments the noun the telescope and the verb phrase saw the man as arguments.5

Main questions

The major research questions on the grammar6 are:

  1. How do we understand and produce language?
  2. How do we acquire the capacity to do so?
  3. How is our language capacity related to other cognitive capacities, such as perception, planning, reasoning, etc.?
  4. How do these relate to our neural substance?

Cognitive science of language

The above questions are studied in several fields. What distinguishes cognitive science from fields like psychology and linguistics is its emphasis on the computational modeling of the phenomena. Cognitive science, at least in how we take it to mean in our institute, takes the mind to be an information processing system, and therefore the best way to understand it is by building computational models of the phenomena we are interested in, be it a processing question or a learning question.

Warning: Some, or perhaps all, readings suggested below might be difficult at this stage. There is no cure for this. There are no textbooks, which can be read in a linear way.7 You need to have a couple of passes with low reception and come back to them later.

There are two closely related axes of tension in the field, which I think it is better to open-mindedly meet them from the very start, before you find yourself socialized8 into one camp or another.

The first of these is the debate on the relevance of probability (statistics and information theory) to the study of language, particularly syntax. Chomsky is widely considered, at times too crudely I think, to be firmly on the “not relevant” side of the debate. It is nevertheless true that probability is not much popular in mainstream linguistics. But in cognitive science, the debate is almost never on whether but how these fields are relevant for the scientific study of language. For convenience, I will call this the probability debate.9

Another source of tension in the field concerns the relevance of data-driven approaches like corpus linguistics, deep neural nets, machine learning, statistical language models, etc. to the scientific study of language, which seems to be going nuclear especially after the appearance of LLMs. This debate naturally comes intertwined with discussions on the goals of science, especially the distinction between prediction and explanation. For convenience, I will call this the data debate.

  • I suggest to start with Chomsky (1957).10 This sets the frame of discussion for almost everything that follows, and remarkably – considering its impact – requires almost no background.
  • On how probability fits into the study of language: Manning (2003) and Jurafsky (2003).
  • Readings relevant for the both debates are Pereira (2000) and Lappin and Shieber (2007)
  • Norvig (2011) addresses the latest views of Chomsky on the relevance of (statistical) language models to the study of language. Although, Norvig’s reading of Chomsky’s “claims” are rather superficial,11 I think it is still a good read.
  • People sometimes use the term computational linguistics to differentiate the scientific study of language through computational modeling from the engineering field of NLP. But be warned that this is not a well-established terminology. See Schubert (2020) for a balanced view.
  • For a taste of the data debate focused on deep neural networks: Pater (2019) and Rawski and Heins (2023).
  • Lake et al. (2017) and the commentaries on it are a good place to get introduced to the data debate within the broader cognitive science.12
  • Whenever you hit an unfamiliar linguistic concept or term, consult Bender (2022). This will work fine as a practical reference book until you develop a more serious interest in linguistics and/or go on with language research.
  1. Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid (Basic Books, 1979), Ch. 2. 

  2. The same does not apply to phonology, which is a very interesting field of study from a cognitive science perspective on its own. 

  3. John R. Ross offered the term “semantax”, but it seems it did not get caught on. Mark Steedman has a project named such, though. 

  4. Barbara Partee, “Compositionality”, in Varieties of Formal Semantics, 1984. 

  5. I am wildly simplifying for the sake of discussion. 

  6. I use the term grammar in a fairly broad sense as the totality of the system that mediates between form and meaning. The term has a well-defined technical sense in formal language theory and the theory (and applications) of parsing. 

  7. There actually are, but they are too opinionated. Also be warned about “handbook” entries that focus on the personal contributions of the author. 

  8. “Just a collection of buzz words whose appropriate use you are socialized into”, says Gerald Gazdar describing his not so bright opinion about a particular version of Chomskyan generative grammar. 

  9. Closely, related to this is the issue of categorical vs. gradient phenomena in language. 

  10. Chapters 1-4, 6, and 8. Chapter 9 is also of interest on Chomsky’s take on syntax-semantics relation. 

  11. Objectors of Chomsky on this count largely concentrate on his observation that the grammatical English sentence Colorless green ideas sleep furiously has zero probability on crude frequency based models without smoothing, which was the state-of-art at that date, and the corollary he draws from this fact that statistical regularity has no place in defining the notion of “grammatical sentence in language \(L\)”. Chomsky’s attitude toward the place of probability and statistics in language science is not as crude as his objectors appear to think. For instance, Norvig seems to have missed footnote 4 on p. 17 of Syntactic Structures, and his remarks on syntax, semantics, and language use in Chapter 9. See Chomsky’s preface to 1975 edition of Logical Structure of Linguistic Theory, p.3. Also see the other LSLT quotes in Yang (2008) for a more accurate view on Chomsky’s stance on the subject. 

  12. Useful terminology on the data debate are Lake et al. (2017)’s pattern recognition vs. model building, and Frank and Degen (2023)’s input-output models vs internally meaningful models