the computation and language lab
at the University of Rochester


overview of our research and approach


information on current and prospective members


our publications are publicly available


the lab wiki and available code, data sets

• about •

We study the basic computational processes involved in human language.

overview: colala's research spans computational and experimental approaches to language acquistion, language processing, and design features of language. Work draws on corpus methods, behavioral experiments with adults and children, computational and mathematical modeling, as well as state-of-the-art techniques in machine learning, information theory, statistics, computer science, and linguistics. We also work with an indigenous South American population, the Tsimane' of Bolivia, to study universal aspects of human language, numerical cognition, and cognitive development.

philosophy: Most research in colala centers on creating fully-formalized (implemented) rational theories that aim to address deep problems about human language. We use empirical data from adults and children to test these theories, and work to develop open tools, data, and results.


facilities: Our experimental facilities include rooms for testing adults and children with eye-tracking and touchscreens. Our computational facilities include 96 dedicated cores at the University Bluehive cluster for running models and analysis, as well as several in-house web, corpus, GPU, and HPC servers.

• positions •

The lab has several open positions for fall 2014:

  • graduate students: Prospective graduate students should apply to the Department of Brain and Cognitive Sciences at the University of Rochester and contact Steve. Applications are due in early January each year. Graduate students should be hard-working, full of ideas, and capable of self-directed research. Students interested in the lab should bring strong quantitative skills including knowledge of statistics and programming, as well as serious background in some of the following areas: machine learning, mathematics, theoretical computer science, mathematical logic, cognitive development, language acquisition, language processing, or linguistics.

  • undergraduates: Undergraduates must be independent, creative, and self-motivated. Our undergraduate research opportunities include work on current large-scale projects, as well as opportunities for self-directed research. Undergraduates will typically have a background in BCS, math, computer science, or linguistics.

  • postdocs: Interested postdocs should be awesome. Contact Steve.

  • programmer: The lab is looking for a Python programmer to start in fall 2014. Jobs will include development of a library for running experiments (eye-tracking and touchscreen in the lab and simple tasks over Mechanical Turk) and development of a library for tree-based MCMC, including GPU programming. Candidates will ideally be familiar with experimental design and have experience building complex and robust libraries for others to use, including an appreciation of thorough documentation and examples. Knowledge of MPI, Pygame, and PyMC are plusses. Programmer duties may include other administrative lab tasks. All programming output will be distributed under the GPL.

• undergraduates •

Maritza Gomez works on coding related to the design of counting systems.

Josh Rose works on LOTlib.

Matt McGovern works on kelpy.

• student & staff collaborators •

Amanda Yung collaborates on kelpy

Kyle Mahowald collaborates on information theory and the lexicon

Julian Jara-Ettinger collaborates on studies with the Tsimane'

Richard Futrell collaborates on information theory and language, including improvements to ngrampy

Laura Stearns collaborates on the Pirahã corpus

Holly Palmeri collaborates at the Rochester Baby Lab

• collaborating labs and PIs •

Researchers in the lab work in close collaboration with a number of labs at Rochester and around the country:

• principal investigator •

The lab's primary investigator is Steve Piantadosi. He received his Ph.D. from MIT in 2011 in Brain and Cogntive Sciences, and spent three years as a postdoc at Rochester with Dick Aslin before becoming a faculty member in 2014.

• papers •

The lab is not yet in full operation, so the below papers were done by the PI and collaborators.

S.T. Piantadosi and R. Aslin. Rational rule inference in 3-4 year-olds. in prep. [ bib | email ]
Julian Jara-Ettinger, Steven T. Piantadosi, Elizabeth Spelke, Roger Levy, and Edward Gibson. Knowledge of set operations precedes number knowledge: Evidence from the Tsimane' of Bolivia. in prep. [ bib | email ]
S.T. Piantadosi and R. Aslin. Compositional reasoning in early childhood. under review. [ bib | .pdf ]
S.T. Piantadosi, Holly Palmeri, and R. Aslin. Limits on composition of conceptual operations in 9-month-olds. in prep. [ bib | .pdf ]
S.T. Piantadosi. Zipf's law in natural language: a critical review and future directions. Psychonomic Bulletin and Review, in press. [ bib | .pdf ]
S.T. Piantadosi. Approximate number from first principles. Under review. [ bib | .pdf ]
S.T. Piantadosi, L. Stearns, D. Everett, and E. Gibson. A corpus analysis of Pirahã grammar: An investigation of recursion. Talk presented at the LSA (by E. Gibson). Paper in progress.bib | .pdf ]
C. Rieth, S.T. Piantadosi, K. Smith, and E. Vul. Put your money where your mouth is: Incentivizing the truth by making nonreplicability costly. European Journal of Personality, 2013. [ bib | .pdf ]
S.T. Piantadosi, J. Jara-Ettinger, and E. Gibson. Children's development of number in an indigenous farming-foraging group. Developmental Science, in press. [ bib | .pdf ]
S.T. Piantadosi, N. Goodman, and J.B. Tenenbaum. Modeling the acquisition of quantifier semantics: a case study in function word learnability. under revision. [ bib | .pdf ]
S.T. Piantadosi, C. Kidd, and R.N. Aslin. Rich analysis and rational models: Inferring individual behavior from infant looking data. Developmental Science, in press. [ bib | .pdf ]
C. Kidd, S.T. Piantadosi, and R.N. Aslin. The Goldilocks effect in infant auditory cognition. Child Development, in press. [ bib | email ]
S. T. Piantadosi, H. Tily, and E. Gibson. Information content versus word length in natural language: A reply to Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751]. ArXiv e-prints, July 2013. [ bib | arXiv ]
S.T. Piantadosi and E. Gibson. Quantitative standards for absolute linguistic universals. Cognitive Science, in press. [ bib | .pdf ]
E. Gibson, P. Jacobson, P. Graff, K. Mahowald, E. Fedorenko, and S.T. Piantadosi. Presuppositional differences in quantified Antecedent-Contained-Deletion relative clauses: A reply to Hackl, Koster-Hale & Varvoutis (2011). Under review. [ bib | email ]
E. Gibson, K. Mahowald, and S.T. Piantadosi. Erroneous analyses of the self-paced reading data in Hackl, Koster-Hale & Varvoutis (2012). Under review. [ bib | email ]
E. Gibson, L. Bergen, and S.T. Piantadosi. The rational integration of noise and prior semantic expectation: Evidence for a noisy-channel model of sentence interpretation. Proceedings of the National Academy of Sciences, 11:8051-8056, 2013. [ bib | .pdf ]
E. Gibson, S.T. Piantadosi, K. Brink, L. Bergen, E. Lim, and R. Saxe. A noisy-channel account of crosslinguistic word order variation. Psychological Science, 24:1079-1088, 2012. [ bib | .pdf ]
S.T. Piantadosi, J.B. Tenenbaum, and N.D Goodman. Bootstrapping in a language of thought: a formal model of numerical concept learning. Cognition, 123:199-217, 2012. [ bib | .pdf ]
E. Gibson, S.T. Piantadosi, and E. Fedorenko. Quantitative methods in syntax / semantics research: A response to Sprouse & Almeida. Language and Cognitive Processes, 2012. [ bib | .pdf ]
C. Kidd, S.T. Piantadosi, and R.N. Aslin. The Goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex. PLoS ONE, 2012. [ bib | http ]
K. Mahowald, E. Fedorenko, S.T. Piantadosi, and E. Gibson. Info/information theory: speakers actively choose shorter words in predictable contexts. Cognition, 126:313-318, 2012. [ bib ]
S.T. Piantadosi, H. Tily, and E. Gibson. The communicative function of ambiguity in language. Cognition, 122:280-291, 2011. [ bib | .pdf ]
S.T. Piantadosi, H. Tily, and E. Gibson. Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences, 108(9):3526, 2011. [ bib | .pdf ]
S.T. Piantadosi, H. Tily, and E. Gibson. Reply to Reilly and Kean: Clarifications on word length and information content. Proceedings of the National Academy of Sciences, 108(20):E109, 2011. [ bib | .pdf ]
E. Gibson, S.T. Piantadosi, and K. Fedorenko. Using Mechanical Turk to Obtain and Analyze English Acceptability Judgments. Language and Linguistics Compass, 5(8):509-524, 2011. [ bib | .pdf ]
S.T. Piantadosi and J.P. Crutchfield. How the dimension of space affects the products of pre-biotic evolution: The spatial population dynamics of structural complexity and the emergence of membranes. Santa Fe Institute Working Paper arXiv:1010.5019, 2010. [ bib | .pdf ]
S.T. Piantadosi, J.B. Tenenbaum, and N.D. Goodman. Beyond boolean logic: exploring representation languages for learning complex concepts. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [ bib | .pdf ]
C. Kidd, S.T. Piantadosi, and R.N. Aslin. The Goldilocks effect: Infants' preference for visual stimuli that are neither too predictable nor too surprising. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [ bib | .pdf ]
H. Tily and ST Piantadosi. Refer efficiently: Use less informative expressions for more predictable meanings. In Proceedings of the workshop on the production of referring expressions: Bridging the gap between computational and empirical approaches to reference, 2009. [ bib | .pdf ]
S.T. Piantadosi, H.J. Tily, and E. Gibson. The communicative lexicon hypothesis. In Proceedings of the 31st Annual Conference of the Cognitive Science Society, pages 2582-2587, 2009. [ bib | .pdf ]
S.T. Piantadosi, N.D. Goodman, B.A. Ellis, and J.B. Tenenbaum. A Bayesian model of the acquisition of compositional semantics. In Proceedings of the 30th Annual Conference of the Cognitive Science Society, 2008. [ bib | .pdf ]

• software •

A number of research software packages are actively developed by colala and available under the GNU Public License:

  • LOTlib is a library for modeling learning complex concepts as compositions of primitives in a language of thought. Please note that this is still under heavy development. GPL3

  • kelpy (kid experimental library in python) is a library for running simple psychology experiments in python. It is intended primarily for making simple animated displays with simple responses for baby and child experiments. It is built on top of pygame and is under heavy development at present. Support for Tobii eyetracking, as well as a large library of public domain characters has recently been added. Email me for updates / questions. GPL3

  • ngrampy is a python library for manipulating large google ngram data sets, and computing measures such as average surprisal in context from Piantadosi, Tily, & Gibson (2011). Code is included to replicate and extend that finding. GPL3

  • GPUropolis Bayesian inference via Metropolis-Hastings on symbolic expressions, currently only for symbolic regression but under heavy development. Code is highly parallelized using CUDA to run on graphics (GPGPU) hardware. This enables use of tens of thousands of chains in parallel. Includes several classic scientific data sets to test with. GPL3

  • SimpleMPI provides a quick and easy wrapper for mpi4py that allows parallel mapping with mpi (mpich2/openmpi). GPL3

  • WeberMCMC Bayesian data analysis for estimation of Weber ratios. In practice, incorporating the reliability of an estimate of W into statistics allows for more power and correctness. GPL3

• data •

Data from all projects completed and in progress is available upon request

  • English Surprisal estimates -- data from Piantadosi, Tily, Gibson (2011), giving the average in-context surprisal of each word. Note that we now recommend use of ngrampy to compute this, using the free publicly available data from google books. Full data from the paper is available upon request.

Meliora Hall
University of Rochester, River Campus
Rochester, NY