Capturing language, one conversation at a time

Y’all.

Or is it “all y’all?”

Can you guess where I’m from based on my use of that handy little contraction? Or whether I’m talking to my boss or my friend?

Conversational American English is a constantly shifting collection of billions of words, and the words we choose, the order we use them and how we pronounce them communicates as much as what we actually are saying. To better understand it, a team of linguists in the College of Arts and Letters are leading the effort to create the largest recorded collection of conversational American English ever made. The Lancaster-Northern Arizona Corpus of Spoken American English (LANA-CASE) aims to include conversations from the full range of geographic regions, ages, genders and educational backgrounds. It also aims to include the full range of communicative purposes for conversation, such as storytelling, problem-solving, joking around and giving advice.

The database, or corpus, of conversational American English will include recordings of everyday conversations from people of different ethnic groups, ages, professions and genders from throughout the United States. The focus on conversational language, instead of the more formal registers of written communication or the spoken language of prepared speeches, makes this particular corpus unique.

“Putting together this corpus has been challenging, but in the process, we have been lucky to experience people’s generosity firsthand as they have recorded and submitted their conversations,” said Lizzy Hanks, a doctoral student in applied linguistics in the Department of English. “We have had the privilege of listening to diverse voices as they tell their stories: sisters reuniting after years of separation, new parents planning their child’s first birthday party, a son grieving his mother and, tragically, retellings of sexual abuse.”

Hanks, along with associate professor Jesse Egbert, assistant professor Tove Larsson, emeritus professor Doug Biber and emerita professor Randi Reppen, and researchers at Lancaster University in England, are leading the project. Collaborators from throughout the U.S. and England, including several NAU students and alumni—Michael Edens, Marianna Gracheva, Kevin Hirschi, A.J. Holmberg, Kelly Kendro, Elizaveta Kuznetsova, Michelle Richter, Anne Stoughton, Iia Vlasova, Yağmur Demir, Daniel Dixon, Tülay Dixon, Larissa Goulart, Brett Hashimoto, Daniel Keller, Emma Winn and Katherine Yaw—also are involved. It is the American English counterpart of the Spoken British National Corpus 2014.

Some members of the LANA-CASE team presenting at a conference. — Members of the LANA-CASE team present their work at a conference.

This corpus will be used by researchers as well as for tasks like the teaching and learning of English, the interpretation of legal language based on ordinary meaning and the improvement of language technologies like speech-to-text and chatbots. It’s also an important resource for the American public; conversation is the most basic and important register in any language. (Everyone knows their child’s first spoken words—do you remember the first words your child wrote?)

“When analyzing language use, it is crucial to consider register, which is the culturally recognized variety in which language is produced. We use language in dramatically different ways depending on register,” Egbert said. “By recording and documenting conversations between Americans from all different backgrounds, we are documenting what it means to be American and what it means to be human.”

What is American English?

There are some obvious hallmarks of American English, Larsson said. “Y’all” is one. Spelling differences like “humour” and “humor” separate British from American English. But it’s a lot more complicated, as there are various different dialects found in different parts of the country.

There’s also just the difference in how individuals speak. Each person has a distinct style of communication, called an idiolect, that is based on a variety of factors including age, gender, likes and dislikes and life experiences. There are different, but not wrong, ways to pronounce words.

What types of conversations will be in the corpus?

All kinds, the researchers say—the more the better. They’re looking for everyday conversations between two to three people—conversations while you’re cooking dinner, chatting with coworkers in the breakroom, discussing the news, making weekend plans and more. They’re looking for natural language, so participants should speak as they normally would, using language they normally would. (Translation: Swearing is OK. So are difficult subjects.)

I want to share my voice!

People who want to participate can visit the LANA-CASE website to find out how to be a recorder or sign up to receive email updates on the project. Participants must speak English as one of their primary languages, have lived in the United States since before they were in elementary school and be at least 18 years old. Conversations can be recorded on cell phones.

Funding for this project comes from the Lancaster University’s Global Advancement Fund, NAU’s Faculty Course-based Undergraduate Research Experience Development Grant, NAU’s Corpus Research Lab and NAU’s SGS Award. 

Follow @lanalinguistics on TikTok, Instagram, Facebook, YouTube and X (formerly Twitter).

Heidi Toth | NAU Communications
(928) 523-8737 | heidi.toth@nau.edu