Voice & Speech Production

Overview
Voice = sounds that are produced as air is pushed through the vocal folds. Speech = modified voice sounds that form words; modification occurs in the vocal tract.
Verbal output requires coordination between four systems:
The respiratory system, particularly the lungs, provide the power source for voice sounds in the form of expired air. We show the diaphragm here, too, since it is a key muscle of respiration.
The phonatory system comprises the larynx; this is where voice sounds are produced.
The articulation/resonance system comprises the vocal tract that modifies voice sound to create speech sounds. Structures included in this system are the nasal, oral, and pharyngeal cavities.
The nervous system, including the PNS and CNS, act as control centers for voice and speech production. Key neural pathways to the other three systems: The phrenic nerve (C3-C5) innervates the diaphragm, which contracts to pull air into the lungs and relaxes to push air out of the lungs. Cranial nerve X innervates the intrinsic muscles of the larynx, primarily via the recurrent laryngeal nerve (the superior laryngeal nerve innervates the cricothyroid muscle). As we discuss elsewhere, the extrinsic muscles of the larynx comprise the supra- and infra-hyoid muscles; they are innervated by other cranial nerves. Cranial nerves V, VII, and IX – XII innervate structures of the articulation and resonance system.
Phonation
Medial compression and longitudinal tension close the vocal folds to produce sounds; repeated closing and opening produces vibrations that alternatively trap and release air.
Select structures that act during phonation
Cartilages: thyroid, cricoid, and arytenoid cartilages. We show the arytenoid cartilages with their vocal processes pointing anteriorly.
Vocal ligaments with the rima glottidis, which is the space between the vocal cords.
We show two muscles adduct the vocal folds: the interarytenoid muscles (which includes the transverse and oblique arytenoids) and the lateral cricoarytenoids.
And two muscles increase longitudinal tension on the vocal folds: the thyroarytenoids, with their medial vocalis portions, and the cricothyroid muscles.
Phonatory Cycle
We show the changes in the vocal folds as they vibrate during a phonatory cycle in laryngoscopic (superior) view.
Open, at rest: We start with the larynx open, as it would be at rest during quiet breathing: the vocal folds are open to allow free air flow through the rima glottidis.
Closed: Then, upon activation of the intrinsic muscles we've drawn, the vocal folds close; in this closed state, subglottal pressure builds as air is trapped under the folds.
Pushed Open: When subglottal pressure surpasses vocal fold resistance, the folds are pushed open and air bursts into the vocal tract, producing voice sounds.
Rapid Closure: Then, because of elastic recoil and myoelastic aerodynamic effects, the vocal folds rapidly re-close; their overlying mucosa shifts in wave-like patterns.
Myoelastic aerodynamic effects & the Bernoulli effect
The myoelastic aerodynamic effects that facilitate rapid vocal fold closure in the larynx are explained by the Bernoulli effect: air rushes past the folds and creates an area of reduced air-pressure, which in turn produces suction between the vocal folds that brings them back together.
Note that while intrinsic laryngeal muscles position and tense the vocal folds, the vibrations of the phonatory cycle are driven by aerodynamic and elastic forces. Our laryngeal muscles would quickly tire if they were responsible for the rapid, repeated closure needed for phonation.
Articulation & Resonance
Ways we modify voice sounds to produce meaningful speech.
Pronunciation key
The production of consonants in the English language requires the use of speech articulators. To produce vowels, we change the configuration of the vocal tract.
The nasal, oral, and pharyngeal cavities participate in resonance, which means they provide amplification and shape to our voice sounds.
Specific anatomical sites where voice sounds are articulated: The pharynx produces the /x/ sound, as in challah. The uvula produces sounds that are more common in French and Arabic. The soft palate, aka velum produces sounds like the /k/ in kiss. The hard palate produces palatal sounds, such as the /j/ in yes. The alveolar ridge, where the upper teeth meet the palate, produces several sounds including the /d/ in dig and the /n/ in not. The teeth produce dental sounds, including the /θ/ in thigh. The lips produce labial sounds, including the /p/ sound in pit, the /f/ sound in finger, and the /m/ sound in monkey. The tongue creates different sounds depending on which part of the tongue is used: Dorsal sounds are created with the back of the tongue and include / ŋ/ as in sing. Laminal sounds are created with the blade of the tongue and include /z/ as in zip. And apical sounds are created with the tip of the tongue and include /t/ as in top.
Velophalangeal Mechanism
Another distinction we make is between nasal and oral phonemes (phonemes are the smallest units of sound that distinguish meaning in a specific language).
When we produce nasal phonemes, air escapes through our nasal and oral cavities; when we produce oral phonemes, air escapes only through our oral cavities.
We direct the air to or away from the nasal cavity via the velophalangeal port (aka mechanism), which moves our soft palate (velum) and pharynx closer together or further apart.
Open Port
When the velophalangeal port is open, the soft palate is lowered and separated from the pharynx, so air can freely move through both the nasal and oral cavities.
We achieve this opening by using our palatopharyngeus and palatoglossus muscles (and to a lesser extent, our tensor veli palatini muscles).
There are three nasal phonemes in General American English: /m/, /n/, and /ŋ/ ("ng").
Closed Port
To close the velopharyngeal port and prohibit air from escaping via the nasal cavity, we pull the soft palate upward and backward and constrict the walls of the pharynx.
We achieve this via contraction of the levator veli palatini, musculus uvulae, and superior pharyngeal constrictors.
We also close off the velopharyngeal port during swallowing to prevent foods and liquids from entering the nasal cavity.
If the velopharyngeal port doesn't completely close, a person will produce hypernasal sounds (think of patients with cleft palates); conversely, if the nasal tract can't be opened, as when allergies cause the nasopharynx to swell, a person will sound hyponasal.