Date: September 23, 2017
Location: room 32-155, building 32 (Stata Center), MIT
Address: 32 Vassar St., Cambridge, MA.
Note: Dinner will take place on the 3rd floor in MIT building 46 (Brain and Cognitive Sciences). The address is 43 Vassar St., Cambridge, MA.
|9:00-10:00||The Pre-history of Simplicity of Grammar
|10:00-10:45||A generative model of phonotactics
Richard Futrell & Tim O'Donnell
|11:15-12:00||The right notion of simplicity and its use in choosing between theories of UG
Ezer Rasin & Roni Katzir
|12:00-1:30||Lunch (provided) + poster session|
|1:30-2:15||Inducing phonological rules: Perspectives from Bayesian program learning
Kevin Ellis & Tim O'Donnell
|2:15-3:00||Productive and item-specific knowledge in language processing
Emily Morgan & Roger Levy
|3:30-4:15||Beyond simplicity: modeling generalization and substantive biases in phonology
|4:15-5:00||Learning to filter non-basic clauses for argument structure acquisition
Laurel Perkins & Naomi Feldman
|5:00-5:45||One model for the learning of language
Josh Tenenbaum & Steven Piantadosi
|6:00-7:00||Dinner @ MIT building 46|
Abstracts of the talks
The Pre-history of Simplicity of Grammar, Noam Chomsky
The first relevant discussion of simplicity of grammar, to my knowledge, was in my “Morphophonemics of Modern Hebrew” (1949), with a sketch of syntax and a detailed system of ordered rules to generate phonological representations. The main goal was to show that the particular rule-ordering was simpler than alternatives, an effort motivated in part by Nelson Goodman’s work on constructional system and his emphasis on simplicity as explanatory power. Israeli logician Yehoshua Bar-Hillel suggested that adoption of forms closer to proto-Semitic origins might yield a simpler system, which turned out to be correct (Chomsky 1951, published in 1979), an insight that has had significant resonance since. The simplicity measure adopted was the number of symbols, under notational transformations designed to capture alleged general linguistic principles, mostly those still familiar (parentheses, etc.). The general framework was the format-evaluation measure system. Crystallization of the Principles & Parameters framework shifted simplicity concerns in different directions, already explored from the earliest attempts to abstract general principles from descriptive rule systems, from cyclicity in determining stress patterns (Chomsky, Halle, Lukoff, 1955) to conditions on transformational rules (Chomsky, 1962; Ross, 1967) and on to the present.
A generative model of phonotactics, Richard Futrell & Tim O'Donnell
We present a probabilistic generative model of phonotactics, the set of well-formed phoneme sequences in a language. Unlike most computational models of phonotactics, which focus on designing scoring functions to rule out ill-formed strings, we take a fully generative approach, modeling a process where well-formed strings are built up out of subparts by phonologically-informed structure building operations. The modeling approach induces a phonotactic grammar for a language along with structural descriptions of wordforms; the model instantiates a pressure for parsimony in both the grammar and the descriptions of wordforms. The induced grammar builds words out of subparts that are derived from a generative process for phonemes structured as an and-or graph, based on concepts of feature hierarchy from generative phonology. Subparts are combined in a way that allows tier-based feature interactions. We evaluate our models’ ability to capture phonotactic distributions in the lexicons of 14 languages. Our full model robustly assigns higher probabilities to held-out forms than a sophisticated general sequence model for all languages. We also present novel information-theoretic analyses that probe model behavior in more detail.
The right notion of simplicity and its use in choosing between theories of UG, Ezer Rasin & Roni Katzir
We review three notions of simplicity that have been used in work on grammar learning: simplicity of grammar (as in early Generative Grammar), simplicity of accounting for the data (as in the Subset Principle), and compression-based simplicity, balancing between generality and the need to fit the data (as in MDL and Bayesian approaches). We discuss the significance of the choice and some arguments in favor of the third, compression-based notion of simplicity. We then show how the compression-based notion can help us choose between competing grammatical architectures in some cases where adult judgments alone are insufficiently informative. We illustrate this potential with the question of constraints on underlying representations (also known as morpheme-structure constraints), which were central to early generative phonology but rejected in Optimality Theory. Evidence bearing directly on the question of whether the grammar uses constraints on URs has been scarce. We show, however, that if the child is a compression-based learner, then they will succeed in learning patterns such as English aspiration if they can use constraints on URs but run into difficulties otherwise.
Inducing phonological rules: Perspectives from Bayesian program learning, Kevin Ellis & Tim O'Donnell
How do linguists come up with phonological rules, how do kids learn artificial grammars, and how does one acquire pig latin? The solutions to these problems share a common representation, which we show can be modeled as a program, and the corresponding learning problems modeled as program induction. This framing lets us apply ideas from Bayesian Program Learning to induce grammars, which combines program synthesis techniques with a compression-based inductive bias. This lets the models capture phonological phenomena like vowel harmony or stress patterns and learn synthetic grammars used in prior studies of artificial grammar learning. Going beyond individual grammar learning problems, we consider the problem of jointly inferring many related rule systems. By solving many textbook phonology problems, we can ask the model what kind of inductive bias best explains the attested phenomena.
Productive and item-specific knowledge in language processing, Emily Morgan & Roger Levy
The ability to generate novel utterances compositionally using productive knowledge is a hallmark property of human language. At the same time, languages contain non-compositional or idiosyncratic items, such as irregular verbs, idioms, etc. In this talk, we ask how language processing achieves a balance between these two systems (productive and item-specific). Specifically, we focus on the case of binomial expressions of the form “X and Y”, whose word order preferences (e.g. bread and butter/#butter and bread) are potentially determined by both productive and item-specific knowledge. We show that language processing makes rational use of both the ability to generalize and the ability learn idiosyncrasies, demonstrating that ordering preferences for these expressions indeed arise in part from productive violable constraints on the phonological, semantic, and lexical properties of the constituent words, but that expressions also have their own idiosyncratic preferences. We further demonstrate that processing of these expressions relies gradiently upon both productive and item-specific knowledge as a function of expression frequency, with lower frequency items primarily recruiting productive knowledge and higher frequency items relying more upon item-specific knowledge. We provide evidence for this gradient, frequency-dependent trade-off of productivity and item-specificity using behavioral experiments, corpus data, and computational modeling.
Beyond simplicity: modeling generalization and substantive biases in phonology, Adam Albright (joint work with Youngah Do, Hong Kong University)
Within generative phonology, an evaluation metric that favors simpler rules has two important consequences for the grammars that learners favor, which in turn lead to two important empirical predictions. The first consequence is that learners should generally favor the broadest rule that is consistent with the data, since broader rules are usually simpler (fewer restrictions), especially when paired with a theory of phonological features and natural classes. This predicts that speakers should generalize phonological patterns to unseen segments. The second consequence is that learners should favor certain processes over others, based on the number of symbols needed to encode them. This has the potential to predict substantive phonological biases: some processes are more prevalent than others because learners acquire them more quickly or with less data. In this talk, we argue that a naive simplicity metric is neither necessary nor sufficient to model how human learners generalize phonological patterns. We present the results of two artificial grammar learning experiments, testing whether learners automatically generalize phonological patterns to unseen segments, and whether featurally equivalent patterns are learned equally well. The results show partial generalization of alternations to unseen segments, at rates that are not consistent with a single fully general rule, nor with the most specific possible rules. We show that partial generalization can be modeled with a simple MaxEnt grammar, even without general markedness constraints. However, the model predicts too little generalization. By including general markedness constraints and a bias against assigning high weights to very specific constraints, the model is able to achieve a better fit to the experimental results. The results also show that participants systematically prefer certain alternations, such as final devoicing and intervocalic voicing, over others, such as final nasalization and intervocalic spirantization. These substantive biases do not follow in any straightforward way from a simplicity bias, confirming a point anticipated by Chomsky and Halle (1968). We show that these biases are easily modeled in a MaxEnt grammar, by penalizing certain weightings of faithfulness constraints more than others, in accordance with the P-Map hypothesis (Steriade 2001, Wilson 2006).
Learning to filter non-basic clauses for argument structure acquisition, Laurel Perkins & Naomi Feldman
“Non-basic” clauses are problematic for argument structure acquisition. For example, a child hearing What did Amy fix? might not recognize that what stands for the direct object of fix, and might think that fix is occurring without a direct object. Previous literature has proposed that children might filter non-basic clauses out of the data used for verb learning (e.g. Pinker, 1984; Gleitman, 1990; Lidz & Gleitman, 2004). However, this assumes that children can identify which data to filter. We demonstrate that it is possible for learners to filter out non-basic clauses in order to infer verb transitivity, without knowing in advance which clauses are non-basic. Our model instantiates a learner that considers the possibility that it mis-parses some of the sentences it hears. By doing so, the model learns to filter out those parsing errors and correctly infers transitivity for the majority of 50 frequent verbs in child-directed speech.
One model for the learning of language, Josh Tenenbaum & Steven Piantadosi (joint work with Yuan Yang)
A major target of linguistics and cognitive science is to understand what class of learning systems can acquire the key structures of natural language. Until recently, the computational requirements of language have been used to argue that learning is impossible without a highly constrained hypothesis space. Here, we describe an implemented learning system that is maximally unconstrained, operating over the space of all computations, and is able to acquire several of the key structures present natural language from positive evidence alone. We demonstrate this by providing the same learning model with data from 26 distinct formal languages which have been argued to capture key features of language, or which have been studied in experimental work on artificial language learning. The model is able to successfully construct the latent system generating the observed strings in all cases, including regular, context-free, and context-sensitive formal languages. This approach allows us to delineate several aspects of acquisition that are due to problems inherent in the structure of the input, and several which must result from the peculiarities of human cognitive systems. Overall, this approach develops the concept of factorized programs in Bayesian program induction in order to help manage the complexity of learned representations and provides a workable theoretical framework from which to interpret empirical findings.
Reconciling Minimum Description Length with Grammar-Independent Complexity Measures, Jon Rawski, Aniello De Santo & Jeffrey Heinz
Economy in Grammar Learning, Jeffrey Watumull, Noam Chomsky & Ian Roberts
Simplicity and complexity in the production of manner/result meanings: a preliminary investigation, Anne C Mills & Sherry Yong Chen
An evolutionary effect of simplicity bias on the typology of logical operators, Aron Hirsch & Ezer Rasin