the computation and language lab
at the University of Rochester



research

overview of our research and approach


people

information on current and prospective members


papers

our publications are publicly available


resources

the lab wiki and available code, data sets


• about •

We study the basic computational processes involved in human language.



overview: colala's research spans computational and experimental approaches to language acquistion, language processing, and design features of language. Work draws on corpus methods, behavioral experiments with adults and children, computational and mathematical modeling, as well as state-of-the-art techniques in machine learning, information theory, statistics, computer science, and linguistics. We also work with an indigenous South American population, the Tsimane' of Bolivia, to study universal aspects of human language, numerical cognition, and cognitive development.








philosophy: Most research in colala centers on creating fully-formalized (implemented) rational theories that aim to address deep problems about human language. We use empirical data from adults and children to test these theories, and work to develop open tools, data, and results.



topics: (with example papers)



facilities: Our experimental facilities include rooms for testing adults and children with eye-tracking and touchscreens. Our computational facilities include 96 dedicated cores at the University Bluehive cluster for running models and analysis, as well as several in-house web, corpus, GPU, and HPC servers.

• positions •


graduate students: Prospective graduate students should apply to the Department of Brain and Cognitive Sciences at the University of Rochester and contact Steve. Applications are due in early January each year. Graduate students should be hard-working, full of ideas, and capable of self-directed research. Students interested in the lab should bring strong quantitative skills including knowledge of statistics and programming, as well as serious background in some of the following areas: machine learning, mathematics, theoretical computer science, mathematical logic, cognitive development, language acquisition, language processing, or linguistics.

undergraduates: Undergraduates must be independent, creative, and self-motivated. Our undergraduate research opportunities include work on current large-scale projects, as well as opportunities for self-directed research. Undergraduates will typically have a background in BCS, math, computer science, or linguistics.

• graduate students •


Frank Mollica

• undergraduates •


Hassler Thurston works on LOTlib.

Samay Kapadia works on LOTlib.

Matt McGovern works on kelpy.

• lab manager •


Eric Bigelow

• collaborators •


Amanda Yung collaborates on kelpy

Kyle Mahowald collaborates on information theory and the lexicon

Julian Jara-Ettinger collaborates on studies with the Tsimane'

Richard Futrell collaborates on information theory and language, ngrampy, and Pirahã

Laura Stearns collaborates on the Pirahã corpus

Holly Palmeri collaborates at the Rochester Baby Lab

• collaborating labs and PIs •

Researchers in the lab work in close collaboration with a number of labs at Rochester and around the country:

• principal investigator •

The lab's PI is Steve Piantadosi. He is interested in understanding the computational mechanisms supporting human language learning and use. He received his Ph.D. from MIT in 2011 in Brain and Cogntive Sciences, and spent three years as a postdoc at Rochester with Dick Aslin before becoming a faculty member in 2014.


• papers •



under review
[35]S. Piantadosi et al., "Lexical prescriptivism reduces communicative efficiency", under review. [email]
[34]S. Piantadosi, R. Aslin, "Compositional reasoning in early childhood", under review. [pdf]
[33]S. Piantadosi, H. Palmeri and R. Aslin, "Limits on composition of conceptual operations in 9-month-olds", under review. [pdf]
[32]S. Piantadosi, N. Goodman and J. Tenenbaum, "Modeling the acquisition of quantifier semantics: a case study in function word learnability", under review. [pdf]
[31]S. Piantadosi, "Approximate number from first principles", under review. [pdf]
[30]K. Mahowald et al., "Lexical clustering in efficient language design", under review. [email]
[29]J. Jara-Ettinger et al., "Mastery of the logic of cardinality is not the result of mastery of counting: Evidence from the Tsimane' of Bolivia", under review. [email]
[28]J. Cantlon et al., "The origins of counting algorithms", under review. [email]
2014
[27]S. Piantadosi, C. Kidd and R. Aslin, "Rich analysis and rational models: Inferring individual behavior from infant looking data", Developmental Science, 2014, pp. 1-16. [pdf]
[26]S. Piantadosi, J. Jara-Ettinger and E. Gibson, "Children's learning of number words in an indigenous farming-foraging group", Developmental Science, vol. 17, no. 4, 2014, pp. 553-563. [doi]
[25]S. T. Piantadosi, "Zipf’s word frequency law in natural language: A critical review and future directions", Psychonomic Bulletin & Review, vol. 21, 2014, pp. 1112-1130. [doi]
[24]C. Kidd, S. T. Piantadosi and R. N. Aslin, "The Goldilocks Effect in Infant Auditory Attention", Child Development, 2014. [pdf] [doi]
2013
[23]S. Piantadosi, H. Tily and E. Gibson, "Information content versus word length in natural language: A reply to Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751]", ArXiv e-prints, jul 2013.
[22]C. Rieth et al., "Put your money where your mouth is: Incentivizing the Truth by Making Nonreplicability Costly", European Journal of Personality, 2013. [pdf]
[21]E. Gibson, L. Bergen and S. Piantadosi, "The rational integration of noise and prior semantic expectation: Evidence for a noisy-channel model of sentence interpretation", Proceedings of the National Academy of Sciences, vol. 11, 2013, pp. 8051-8056. [pdf]
2012
[20]S. Piantadosi et al., "A corpus analysis of Pirahã grammar: An investigation of recursion", Talk presented at the LSA (by E. Gibson)., 2012. [pdf]
[19]S. Piantadosi, J. Tenenbaum and N. Goodman, "Bootstrapping in a language of thought: a formal model of numerical concept learning", Cognition, vol. 123, 2012, pp. 199-217. [pdf]
[18]K. Mahowald et al., "Info/information theory: speakers actively choose shorter words in predictable contexts.", Cognition, vol. 126, 2012, pp. 313-318.
[17]C. Kidd, S. Piantadosi and R. Aslin, "The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex", PLoS ONE, 2012. [pdf]
[16]E. Gibson, S. Piantadosi and E. Fedorenko, "Quantitative methods in syntax / semantics research: A response to Sprouse & Almeida", Language and Cognitive Processes, 2012. [pdf]
[15]E. Gibson et al., "A noisy-channel account of crosslinguistic word order variation", Psychological Science, vol. 24, 2012, pp. 1079-1088. [pdf]
[14]E. Fedorenko, S. Piantadosi and E. Gibson, "Processing Relative Clauses in Supportive Contexts", Cognitive Science, vol. 36, 2012, pp. 1-27. [pdf]
[13]E. Fedorenko, S. Piantadosi and E. Gibson, "The interaction of syntactic and lexical information sources in language processing: The case of the noun--verb ambiguity", Journal of Cognitive Science, 2012. [pdf]
2011
[12]S. Piantadosi, H. Tily and E. Gibson, "Word lengths are optimized for efficient communication", Proceedings of the National Academy of Sciences, vol. 108, no. 9, 2011, pp. 3526. [pdf]
[11]S. Piantadosi, H. Tily and E. Gibson, "Reply to Reilly and Kean: Clarifications on word length and information content", Proceedings of the National Academy of Sciences, vol. 108, no. 20, 2011, pp. E109. [pdf]
[10]S. Piantadosi, "Learning and the language of thought.", Ph.D. dissertation, MIT, 2011. [pdf]
[9]S. Piantadosi, H. Tily and E. Gibson, "The communicative function of ambiguity in language", Cognition, vol. 122, 2011, pp. 280--291. [pdf]
[8]E. Gibson, S. Piantadosi and K. Fedorenko, "Using Mechanical Turk to Obtain and Analyze English Acceptability Judgments", Language and Linguistics Compass, vol. 5, no. 8, 2011, pp. 509-524. [pdf]
2010
[7]S. Piantadosi, J. Crutchfield, "How the Dimension of Space Affects the Products of Pre-Biotic Evolution: The Spatial Population Dynamics of Structural Complexity and The Emergence of Membranes", Santa Fe Institute Working Paper arXiv:1010.5019, 2010. [pdf]
[6]S. Piantadosi, J. Tenenbaum and N. Goodman, "Beyond Boolean logic: exploring representation languages for learning complex concepts", in Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [pdf]
[5]C. Kidd, S. Piantadosi and R. Aslin, "The Goldilocks Effect: Infants' preference for visual stimuli that are neither too predictable nor too surprising", in Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [pdf]
2009
[4]H. Tily, S. Piantadosi, "Refer efficiently: Use less informative expressions for more predictable meanings", in Proceedings of the workshop on the production of referring expressions: Bridging the gap between computational and empirical approaches to reference, 2009. [pdf]
[3]S. Piantadosi, H. Tily and E. Gibson, "The communicative lexicon hypothesis", in Proceedings of the 31st Annual Conference of the Cognitive Science Society, 2009, pp. 2582-2587. [pdf]
2008
[2]S. Piantadosi, "Symbolic dynamics on free groups", Discrete and Continuous Dynamical Systems, vol. 20, no. 3, 2008, pp. 725-738. [pdf]
[1]S. Piantadosi et al., "A Bayesian model of the acquisition of compositional semantics", in Proceedings of the 30th Annual Conference of the Cognitive Science Society, 2008. [pdf]

• libraries and software •

A number of research software packages are actively developed by colala and available under the GNU Public License:

  • LOTlib is a library for modeling learning complex concepts as compositions of primitives in a language of thought. Please note that this is still under heavy development. GPL3

  • kelpy (kid experimental library in python) is a library for running simple psychology experiments in python. It is intended primarily for making simple animated displays with simple responses for baby and child experiments. It is built on top of pygame and is under heavy development at present. Support for Tobii eyetracking, as well as a large library of public domain characters has recently been added. Email me for updates / questions. GPL3

  • ngrampy is a python library for manipulating large google ngram data sets, and computing measures such as average surprisal in context from Piantadosi, Tily, & Gibson (2011). Code is included to replicate and extend that finding. GPL3

  • SimpleMPI provides a quick and easy wrapper for mpi4py that allows parallel mapping with mpi (mpich2/openmpi). GPL3

  • WeberMCMC Bayesian data analysis for estimation of Weber ratios. In practice, incorporating the reliability of an estimate of W into statistics allows for more power and correctness. GPL3

  • DEFUNCT GPUropolis Bayesian inference via Metropolis-Hastings on symbolic expressions, currently only for symbolic regression but under heavy development. Code is highly parallelized using CUDA to run on graphics (GPGPU) hardware. This enables use of tens of thousands of chains in parallel. Includes several classic scientific data sets to test with. GPL3

• data and code •

Data from all projects completed and in progress is available upon request

  • English Surprisal estimates -- data from Piantadosi, Tily, Gibson (2011), giving the average in-context surprisal of each word. Note that we now recommend use of ngrampy to compute this, using the free publicly available data from google books. Full data from the paper is available upon request.

  • PyMC model and subject data -- from Piantadosi, Kidd, and Aslin (2014), implementing Bayesian, partially-pooled by-subject looking curves relating the surprisal of observed events to infants' probability of terminating attention.







Meliora Hall
University of Rochester, River Campus
Rochester, NY