the computation and language lab
at the University of Rochester


overview of our research and approach


information on current and prospective members


our publications are publicly available


the lab wiki and available code, data sets

• about •

We study the basic computational processes involved in human language.

overview: colala's research spans computational and experimental approaches to language acquistion, language processing, and design features of language. Work draws on corpus methods, behavioral experiments with adults and children, computational and mathematical modeling, as well as state-of-the-art techniques in machine learning, information theory, statistics, computer science, and linguistics. We also work with an indigenous South American population, the Tsimane' of Bolivia, to study universal aspects of human language, numerical cognition, and cognitive development.

philosophy: Most research in colala centers on creating fully-formalized (implemented) rational theories that aim to address deep problems about human language. We use empirical data from adults and children to test these theories, and work to develop open tools, data, and results.

topics: (with example papers)

facilities: Our experimental facilities include rooms for testing adults and children with eye-tracking and touchscreens. Our computational facilities include 96 dedicated cores at the University Bluehive cluster for running models and analysis, as well as several in-house web, corpus, GPU, and HPC servers.

• positions •

graduate students: Prospective graduate students should apply to the Department of Brain and Cognitive Sciences at the University of Rochester and contact Steve. Applications are due in early January each year. Graduate students should be hard-working, full of ideas, and capable of self-directed research. Students interested in the lab should bring strong quantitative skills including knowledge of statistics and programming, as well as serious background in some of the following areas: machine learning, mathematics, theoretical computer science, mathematical logic, cognitive development, language acquisition, language processing, or linguistics.

undergraduates: Undergraduates must be independent, creative, and self-motivated. Our undergraduate research opportunities include work on current large-scale projects, as well as opportunities for self-directed research. Undergraduates will typically have a background in BCS, math, computer science, or linguistics.

• graduate students •

Frank Mollica

• undergraduates •

Hassler Thurston works on LOTlib.

Samay Kapadia works on LOTlib.

Matt McGovern works on kelpy.

• lab manager •

Eric Bigelow

• collaborators •

Amanda Yung collaborates on kelpy

Kyle Mahowald collaborates on information theory and the lexicon

Julian Jara-Ettinger collaborates on studies with the Tsimane'

Richard Futrell collaborates on information theory and language, ngrampy, and Pirahã

Laura Stearns collaborates on the Pirahã corpus

Holly Palmeri collaborates at the Rochester Baby Lab

• collaborating labs and PIs •

Researchers in the lab work in close collaboration with a number of labs at Rochester and around the country:

• principal investigator •

The lab's PI is Steve Piantadosi. He is interested in understanding the computational mechanisms supporting human language learning and use. He received his Ph.D. from MIT in 2011 in Brain and Cogntive Sciences, and spent three years as a postdoc at Rochester with Dick Aslin before becoming a faculty member in 2014.

• papers •

under review
[35]S. T. Piantadosi, R. Jacobs, "Four problems solved by the inductive Language of Thought", under review. [email]
[34]S. T. Piantadosi, C. Kidd, "Extraordinary intelligence and the care of newborns", under review. [email]
[33]S. T. Piantadosi, J. Tenenbaum and N. Goodman, "The logical primitives of thought: Empirical foundations for compositional cognitive models", under review. [email]
[32]S. T. Piantadosi et al., "Lexical prescriptivism reduces communicative efficiency", under review. [email]
[31]S. T. Piantadosi, R. Aslin, "Compositional reasoning in early childhood", under review. [pdf]
[30]S. T. Piantadosi, "A rational analysis of the approximate number system", under review. [pdf]
[29]S. T. Piantadosi, H. Palmeri and R. Aslin, "Limits on composition of conceptual operations in 9-month-olds", under review. [pdf]
[28]S. T. Piantadosi, N. Goodman and J. Tenenbaum, "Modeling the acquisition of quantifier semantics: a case study in function word learnability", under review. [pdf]
[27]K. Mahowald et al., "Lexical clustering in efficient language design", under review. [email]
[26]J. Jara-Ettinger et al., "Native Amazonian Children Forego Egalitarianism When They Learn to Count", under review. [email]
[25]J. Jara-Ettinger et al., "Mastery of the logic of cardinality is not the result of mastery of counting: Evidence from the Tsimane' of Bolivia", under review. [email]
[24]F. Mollica, S. T. Piantadosi, "Towards semantically rich and recursive word learning models", Proceedings of the Cognitive Science Society, submitted. [pdf]
[23]F. Mollica, S. T. Piantadosi and M. K. Tanenhaus, "The perceptual foundation of linguistic context", Proceedings of the Cognitive Science Society, submitted. [pdf]
[22]P. Hemmer et al., "Inferring the Tsimane's use of color categories from recognition memory", Proceedings of the Cognitive Science Society, submitted. [pdf]
[21]S. Alonso-Diaz, J. Cantlon and S. T. Piantadosi, "Cognition in reach: continuous statistical inference in optimal motor planning", Proceedings of the Cognitive Science Society, submitted. [pdf]
in press
[20]S. T. Piantadosi, "Efficient estimation of Weber's W", Behavior Research Methods, in press. [pdf]
[19]S. T. Piantadosi, B. Hayden, "Utility-free models of binomial choice can replicate predictions of utility models in many conditions", Frontiers in Neuroscience, in press. [email]
[18]J. Cantlon et al., "The origins of counting algorithms", Psychological Science, in press. [email]
[17]S. T. Piantadosi, C. Kidd and R. Aslin, "Rich analysis and rational models: Inferring individual behavior from infant looking data", Developmental Science, 2014, pp. 1-16. [pdf]
[16]S. T. Piantadosi, E. Gibson, "Quantitative Standards for Absolute Linguistic Universals", Cognitive Science, vol. 38, no. 4, 2014, pp. 736-756. [pdf] [doi]
[15]S. T. Piantadosi, J. Jara-Ettinger and E. Gibson, "Children's learning of number words in an indigenous farming-foraging group", Developmental Science, vol. 17, no. 4, 2014, pp. 553-563. [pdf] [doi]
[14]S. T. Piantadosi, "Zipf’s word frequency law in natural language: A critical review and future directions", Psychonomic Bulletin & Review, vol. 21, 2014, pp. 1112-1130. [pdf] [doi]
[13]C. Kidd, S. T. Piantadosi and R. N. Aslin, "The Goldilocks Effect in Infant Auditory Attention", Child Development, vol. 85, 2014, pp. 1795-1804. [pdf] [doi]
[12]S. T. Piantadosi, H. T. H. and E. Gibson, "Information content versus word length in natural language: A reply to Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751]", ArXiv e-prints, jul 2013. [pdf]
[11]S. T. Piantadosi et al., "A corpus analysis of Pirahã grammar: An investigation of recursion", Talk presented at the LSA (by E. Gibson)., 2012. [pdf]
[10]S. T. Piantadosi, J. Tenenbaum and N. Goodman, "Bootstrapping in a language of thought: a formal model of numerical concept learning", Cognition, vol. 123, 2012, pp. 199-217. [pdf]
[9]C. Kidd, S. T. Piantadosi and R. Aslin, "The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex", PLoS ONE, 2012. [pdf]
[8]S. T. Piantadosi, H. Tily and E. Gibson, "Word lengths are optimized for efficient communication", Proceedings of the National Academy of Sciences, vol. 108, no. 9, 2011, pp. 3526. [pdf]
[7]S. T. Piantadosi, H. Tily and E. Gibson, "Reply to Reilly and Kean: Clarifications on word length and information content", Proceedings of the National Academy of Sciences, vol. 108, no. 20, 2011, pp. E109. [pdf]
[6]S. T. Piantadosi, "Learning and the language of thought.", Ph.D. dissertation, MIT, 2011. [pdf]
[5]S. T. Piantadosi, H. Tily and E. Gibson, "The communicative function of ambiguity in language", Cognition, vol. 122, 2011, pp. 280--291. [pdf]
[4]S. T. Piantadosi, J. Tenenbaum and N. Goodman, "Beyond Boolean logic: exploring representation languages for learning complex concepts", in Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [pdf]
[3]C. Kidd, S. T. Piantadosi and R. Aslin, "The Goldilocks Effect: Infants' preference for visual stimuli that are neither too predictable nor too surprising", in Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [pdf]
[2]S. T. Piantadosi, H. Tily and E. Gibson, "The communicative lexicon hypothesis", in Proceedings of the 31st Annual Conference of the Cognitive Science Society, 2009, pp. 2582-2587. [pdf]
[1]S. T. Piantadosi et al., "A Bayesian model of the acquisition of compositional semantics", in Proceedings of the 30th Annual Conference of the Cognitive Science Society, 2008. [pdf]

• libraries and software •

A number of research software packages are actively developed by colala and available under the GNU Public License:

  • LOTlib is a library for modeling learning complex concepts as compositions of primitives in a language of thought. Please note that this is still under heavy development. GPL3

  • kelpy (kid experimental library in python) is a library for running simple psychology experiments in python. It is intended primarily for making simple animated displays with simple responses for baby and child experiments. It is built on top of pygame and is under heavy development at present. Support for Tobii eyetracking, as well as a large library of public domain characters has recently been added. Email me for updates / questions. GPL3

  • ngrampy is a python library for manipulating large google ngram data sets, and computing measures such as average surprisal in context from Piantadosi, Tily, & Gibson (2011). Code is included to replicate and extend that finding. GPL3

  • SimpleMPI provides a quick and easy wrapper for mpi4py that allows parallel mapping with mpi (mpich2/openmpi). GPL3

  • WeberMCMC Bayesian data analysis for estimation of Weber ratios. In practice, incorporating the reliability of an estimate of W into statistics allows for more power and correctness. GPL3

  • DEFUNCT GPUropolis Bayesian inference via Metropolis-Hastings on symbolic expressions, currently only for symbolic regression but under heavy development. Code is highly parallelized using CUDA to run on graphics (GPGPU) hardware. This enables use of tens of thousands of chains in parallel. Includes several classic scientific data sets to test with. GPL3

• data and code •

Data from all projects completed and in progress is available upon request

  • English Surprisal estimates -- data from Piantadosi, Tily, Gibson (2011), giving the average in-context surprisal of each word. Note that we now recommend use of ngrampy to compute this, using the free publicly available data from google books. Full data from the paper is available upon request.

  • PyMC model and subject data -- from Piantadosi, Kidd, and Aslin (2014), implementing Bayesian, partially-pooled by-subject looking curves relating the surprisal of observed events to infants' probability of terminating attention.

Meliora Hall
University of Rochester, River Campus
Rochester, NY