the computation and language lab
at the University of Rochester


overview of our research and approach


information on current and prospective members


our publications are publicly available


the lab wiki and available code, data sets

• about •

We study the basic computational processes involved in human language.

overview: colala's research includes computational and experimental approaches to language acquistion, language processing, and understanding the core features of language. Our work draws on corpus methods, behavioral experiments with adults and children, computational and mathematical modeling, as well as state-of-the-art techniques in machine learning, information theory, statistics, computer science, and linguistics. We also work with an indigenous South American population, the Tsimane' of Bolivia, to study universal aspects of human language, numerical cognition, and cognitive development.

philosophy: Most research in colala centers on creating fully-formalized (implemented) theories that aim to address basic questions about human language and cognition with convergent evidence across methods. We work to develop open tools, data, and results.

topics and selected papers:

facilities: Our experimental facilities include rooms for testing adults and children. Our computational facilities include 96 dedicated cores at the University Bluehive cluster for running models and analysis, as well as several in-house web, corpus, GPU, and HPC servers.

• open positions •

graduate students: Prospective graduate students should apply to the Department of Brain and Cognitive Sciences at the University of Rochester and contact Steve. Applications are due in early January each year. Graduate students should be hard-working, full of ideas, and capable of self-directed research. Students interested in the lab should bring strong quantitative skills including knowledge of statistics and programming, as well as serious background in some of the following areas: machine learning, mathematics, theoretical computer science, mathematical logic, cognitive development, language acquisition, language processing, or linguistics.

undergraduates: Undergraduates must be independent, creative, and self-motivated. Our undergraduate research opportunities include work on current large-scale projects, as well as opportunities for self-directed research. Undergraduates will typically have a background in BCS, math, computer science, or linguistics.

• graduate students •

Frank Mollica

Sam Cheyette

• lab manager •

Jenna Register

• principal investigator •

The lab's PI is Steve Piantadosi. He is interested in understanding the computational mechanisms supporting human language learning and use.

• collaboratoring students & staff •

Amanda Yung collaborates on kelpy

Kyle Mahowald collaborates on information theory and the lexicon

Julian Jara-Ettinger collaborates on studies with the Tsimane'

Richard Futrell collaborates on information theory and language, ngrampy, and Pirahã

Laura Stearns collaborates on the Pirahã corpus

Holly Palmeri collaborates at the Rochester Baby Lab

• collaborating labs & PIs •

Researchers in the lab work in close collaboration with a number of labs at Rochester and around the country:

Celeste Kidd
The Rochester Baby lab

Dick Aslin
The Rochester Baby lab

Jessica Cantlon
The Concepts, Actions, and Objects lab

Ben Hayden
The Hayden lab at Rochester

Michael Tanenhaus
The Tanenhaus Lab

Noah Goodman

Ted Gibson
Tedlab, the language lab at MIT

Ev Fedorenko

Dan Everett

• papers •

under review
[48]S. T. Piantadosi, "The computational origin of representation and conceptual change", under review. [pdf]
[47]S. T. Piantadosi, J. Cantlon, "True Numerical Cognition in the Wild", under review. [email]
[46]S. T. Piantadosi, E. Fedorenko, "Infinitely productive language can arise from chance under communicative pressure", under review. [email]
[45]S. T. Piantadosi et al., "Lexical prescriptivism reduces communicative efficiency", under review. [email]
[44]S. T. Piantadosi, H. Palmeri and R. Aslin, "Limits on composition of conceptual operations in 9-month-olds", under review. [pdf]
[43]K. Mahowald et al., "Lexical clustering in efficient language design", under review. [email]
[42]I. Dautriche et al., "Wordform similarity increases with semantic similarity: an analysis of 101 languages", under review. [email]
[41]S. T. Piantadosi, "A rational analysis of the approximate number system", Psychonomic Bulletin and Review, 2016, pp. 1-10. [pdf] [doi]
[40]S. T. Piantadosi, J. Tenenbaum and N. Goodman, "The logical primitives of thought: Empirical foundations for compositional cognitive models", Psychological Review, 2016. [pdf]
[39]S. T. Piantadosi, R. Jacobs, "Four problems solved by the probabilistic Language of Thought", Current Directions in Psychological Science, vol. 25, 2016, pp. 54-59. [pdf]
[38]S. T. Piantadosi, C. Kidd, "Extraordinary intelligence and the care of infants", Proceedings of the National Academy of Sciences, vol. 113, no. 25, 2016. [pdf]
[37]S. T. Piantadosi, C. Kidd, "Endogenous or exogenous? The data don’t say (Commentary on Han, Musolino, & Lidz 2016)", Proceedings of the National Academy of Sciences, vol. 113, no. 20, 2016. [pdf]
[36]S. T. Piantadosi, "Efficient estimation of Weber's W", Behavior Research Methods, vol. 48, 2016, pp. 42-52. [pdf]
[35]S. T. Piantadosi, R. Aslin, "Compositional reasoning in early childhood", PLOS ONE, 2016. [pdf]
[34]M. C. Overlan, R. A. Jacobs and S. T. Piantadosi, "A Hierarchical Probabilistic Language-of-Thought Model of Human Visual Concept Learning", in Proceedings of the Cognitive Science Society, 2016. [pdf]
[33]L. Martí et al., "What determines human certainty?", in Proceedings of the Cognitive Science Society, 2016. [pdf]
[32]J. Jara-Ettinger et al., "Mastery of the logic of natural numbers is not the result of mastery of counting: Evidence from late counters", Developmental Science, 2016. [pdf]
[31]R. Futrell et al., "A Corpus Investigation of Syntactic Embedding in Pirahã", PLOS ONE, 2016. [pdf]
[30]E. J. Bigelow, S. T. Piantadosi, "A large dataset of generalization patterns in the number game", Journal of Open Psychology Data, vol. 4, 2016. [pdf] [doi]
[29]E. J. Bigelow, S. T. Piantadosi, "Inferring priors in compositional cognitive models", in Proceedings of the Cognitive Science Society, 2016. [pdf]
[28]S. T. Piantadosi, B. Hayden, "Response: "Commentary: Utility-free heuristic models of two-option choice can mimic predictions of utility-stage models under many conditions”", Frontiers in Neuroscience, vol. 9, no. 299, 2015. [pdf] [doi]
[27]S. T. Piantadosi, "Problems in the philosophy of mathematics: A view from cognitive science", in Mathematics, Substance and Surmise: Views on the Meaning and Ontology of Mathematics, E. Davis, P. J. Davis, Eds., Springer, 2015. [pdf]
[26]S. T. Piantadosi, B. Hayden, "Utility-free models of binomial choice can replicate predictions of utility models in many conditions", Frontiers in Neuroscience, 2015. [pdf]
[25]M. Pelz, S. T. Piantadosi and C. Kidd, "The dynamics of idealized attention in complex learning environments", in The 5th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, 2015. [pdf]
[24]F. Mollica, S. T. Piantadosi, "Towards semantically rich and recursive word learning models", in Proceedings of the Cognitive Science Society, 2015. [pdf]
[23]F. Mollica, S. T. Piantadosi and M. K. Tanenhaus, "The perceptual foundation of linguistic context", in Proceedings of the Cognitive Science Society, 2015. [pdf]
[22]J. Jara-Ettinger et al., "Native Amazonian Children Forego Egalitarianism When They Learn to Count", Developmental Science, 2015. [pdf] [doi]
[21]P. Hemmer et al., "Inferring the Tsimane's use of color categories from recognition memory", in Proceedings of the Cognitive Science Society, 2015. [pdf]
[20]J. Cantlon et al., "The origins of counting algorithms", Psychological Science, 2015. [pdf]
[19]S. Alonso-Diaz, J. Cantlon and S. T. Piantadosi, "Cognition in reach: continuous statistical inference in optimal motor planning", in Proceedings of the Cognitive Science Society, 2015. [pdf]
[18]S. T. Piantadosi, C. Kidd and R. Aslin, "Rich analysis and rational models: Inferring individual behavior from infant looking data", Developmental Science, 2014, pp. 1-16. [pdf]
[17]S. T. Piantadosi, E. Gibson, "Quantitative Standards for Absolute Linguistic Universals", Cognitive Science, vol. 38, no. 4, 2014, pp. 736-756. [pdf] [doi]
[16]S. T. Piantadosi, J. Jara-Ettinger and E. Gibson, "Children's learning of number words in an indigenous farming-foraging group", Developmental Science, vol. 17, no. 4, 2014, pp. 553-563. [pdf] [doi]
[15]S. T. Piantadosi, "Zipf’s word frequency law in natural language: A critical review and future directions", Psychonomic Bulletin & Review, vol. 21, 2014, pp. 1112-1130. [pdf] [doi]
[14]C. Kidd, S. T. Piantadosi and R. N. Aslin, "The Goldilocks Effect in Infant Auditory Attention", Child Development, vol. 85, 2014, pp. 1795-1804. [pdf] [doi]
[13]S. T. Piantadosi, H. T. H. and E. Gibson, "Information content versus word length in natural language: A reply to Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751]", ArXiv e-prints, 2013. [pdf]
[12]S. T. Piantadosi et al., "A corpus analysis of Pirahã grammar: An investigation of recursion", Talk presented at the LSA (by E. Gibson)., 2012. [pdf]
[11]S. T. Piantadosi, J. Tenenbaum and N. Goodman, "Bootstrapping in a language of thought: a formal model of numerical concept learning", Cognition, vol. 123, 2012, pp. 199-217. [pdf]
[10]C. Kidd, S. T. Piantadosi and R. Aslin, "The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex", PLoS ONE, 2012. [pdf]
[9]S. T. Piantadosi, H. Tily and E. Gibson, "Word lengths are optimized for efficient communication", Proceedings of the National Academy of Sciences, vol. 108, no. 9, 2011, pp. 3526. [pdf]
[8]S. T. Piantadosi, H. Tily and E. Gibson, "Reply to Reilly and Kean: Clarifications on word length and information content", Proceedings of the National Academy of Sciences, vol. 108, no. 20, 2011, pp. E109. [pdf]
[7]S. T. Piantadosi, "Learning and the language of thought.", Ph.D. dissertation, MIT, 2011. [pdf]
[6]S. T. Piantadosi, H. Tily and E. Gibson, "The communicative function of ambiguity in language", Cognition, vol. 122, 2011, pp. 280--291. [pdf]
[5]S. T. Piantadosi, J. Tenenbaum and N. Goodman, "Beyond Boolean logic: exploring representation languages for learning complex concepts", in Proceedings of the Cognitive Science Society, 2010. [pdf]
[4]C. Kidd, S. T. Piantadosi and R. Aslin, "The Goldilocks Effect: Infants' preference for visual stimuli that are neither too predictable nor too surprising", in Proceedings of the Cognitive Science Society, 2010. [pdf]
[3]S. T. Piantadosi, H. Tily and E. Gibson, "The communicative lexicon hypothesis", in Proceedings of the Cognitive Science Society, 2009, pp. 2582-2587. [pdf]
[2]S. T. Piantadosi et al., "A Bayesian model of the acquisition of compositional semantics", in Proceedings of the Cognitive Science Society, 2008. [pdf]

• libraries & software •

A number of research software packages are actively developed by colala and available under the GNU Public License:

  • LOTlib is a library for modeling learning complex concepts as compositions of primitives in a language of thought. GPL3

  • kelpy (kid experimental library in python) is a library for running simple psychology experiments in python. It is intended primarily for making simple animated displays with simple responses for baby and child experiments. It is built on top of pygame and supportrs Tobii eyetracking. GPL3

  • ngrampy is a python library for manipulating large google ngram data sets, and computing measures such as average surprisal in context from Piantadosi, Tily, & Gibson (2011). Code is included to replicate and extend that finding. GPL3

  • ChurIso is a scheme library for recovering Church encodings from constraints (email for upcoming publication). GPL3

  • pychuriso is a python implementation of churiso inference. GPL3

  • WeberMCMC Bayesian data analysis for estimation of Weber ratios. In practice, incorporating the reliability of an estimate of W into statistics allows for more power and correctness. GPL3

  • GPUropolis Bayesian inference via Metropolis-Hastings on symbolic expressions, currently under heavy development. Code is highly parallelized using CUDA to run on graphics hardware. Includes several classic scientific data sets to test with. GPL3

• data & code •

Data from all projects completed and in progress is available upon request

  • English Surprisal estimates -- data from Piantadosi, Tily, Gibson (2011), giving the average in-context surprisal of each word. Note that we now recommend use of ngrampy to compute this, using the free publicly available data from google books. Full data from the paper is available upon request.

  • PyMC model and subject data -- from Piantadosi, Kidd, and Aslin (2014), implementing Bayesian, partially-pooled by-subject looking curves relating the surprisal of observed events to infants' probability of terminating attention.

  • Number game generalization data -- from Bigelow and Piantadosi (under review), 272700 2-AFC ratings collected across 606 Mechanical Turk workers in the numbergame, with at least 9 ratings for each target 1-100, for each concept stimulus.

Meliora Hall
University of Rochester, River Campus
Rochester, NY