On the Math to Mouth project, we explored ways of using machine learning to model what babies have to do to be able to speak their native languages with the same fluency that adults have. We adults speak so frequently that sometimes we forget how complex it really is. To speak, we must have a goal in mind of how we would like to sound. We also have to know how to configure our vocal tracts (our voice boxes, tongues, lips, and other parts of our mouths that we move around to make different sounds) to meet that goal.
Prior to learning their first words, children must learn both of these types of knowledge - how they should sound, which they learn from listening and watching their parents and caregivers, and how to configure their vocal tracts to sound that way. This is no simple task! Even children's best attempts at sounding like their parents are likely to be different from what they have heard, since children's tiny-sized vocal tracts aren't capable of producing a booming adult voice. To make things even more complicated, children's vocal tracts grow quite a bit throughout their life. Every time they grow, they have to learn a new way of moving things around to sound like an adult.
In this project, we were trying to understand this complex learning process by building mathematical models called manifolds. A manifold describes what our brains might know about something that is very complex and multi-dimensional by building a much lower-dimensional "map" of it. For example, a map of the world is a two-dimensional manifold built to describe what we need to know to navigate the three-dimensional surface of our planet.
Our models combined real-world data from children's productions, real-world data on the speech spoken to children, and real-world data on adults' perception of the accuracy of children's productions. We have already shown that manifolds are a useful way of understanding some key aspects of early language acquisition, and we are working with manifolds to try to understand more key aspects.
The Math 2 Mouth project was supported from 2008 to 2011 by a collaborative grant from the National Science Foundation (BCS-0729306 [Beckman & Fossier-Lussier], BCS-0729140 [Edwards], and BCS-0729277 [Munson], "Using machine learning to model the interplay of production dynamics and perception dynamics in phonological acquisition".