Guest post: Working as a language data specialist

Olivia Hirschey Marrese is a linguist based in Boulder, CO. She is currently pursuing her PhD in linguistics at the University of Colorado, where she researches conversations in English and Spanish. Olivia also works in the field of computational linguistics as a data annotator and language data specialist.


Olivia Hirschey Marrese

This summer, I interned at SoundHound Inc. They develop voice-recognition, natural language understanding, and sound-recognition works with partners such as Honda, Pandora, Motorola, and Mercedes-Benz, integrating voice and conversation intelligence into products and services. As a language data specialist on the Spanish and English teams, I validated, curated, transcribed, and QAed speech training data. Though I’ve worked in data annotation and have ample experience in data management, this was a new application for my linguistic knowledge and an exciting challenge.

In machine learning, no detail is too small to overlook. Something as simple as the diacritic mark on the Spanish word cómo completely changes the word’s meaning, and although humans can rely on contextual cues and world knowledge to understand a phrase, machines don’t have that luxury. Every bit of data has to be accounted for, and in addition to the linguistic challenge of handling that volume of data, it also takes a great deal of coordination to make a team and project run smoothly. I was based in the new Boulder office, while the rest of the Spanish team was at headquarters in Santa Clara. Even at a tech company, somethings things go wrong with the Wifi, and all of us had to be adaptable and self-sufficient to get the work done. 

After working a full day at SoundHound, I would return home to work on my second qualifying paper for my PhD program. As an academic, I’m a sociocultural linguist and conversation analyst. Essentially, I study how humans interact through conversation, and how people create, uphold, and challenge societal norms in everyday talk. 

These two worlds may seem a bit disparate, and indeed in many traditional academic circles, exploring careers beyond the tenure-track is often called ‘alternative academic’ or even ‘non academic’. I’d like to challenge this binary a bit. As a language data specialist, all of my work has been highly academic. It takes very specialized linguistic knowledge and training to understand how to work with language data and how to approach challenges in artificial intelligence. On the flip side, academic roles are often much more involved with the “real world” than many people assume. From our funding sources to our students, work inside universities is by no means separate from the cities and societies we live in, even if individual research topics can often appear removed from industry applications. 

Linguists especially occupy a niche position between academia and industry. As the field of AI and voice recognition continues to grow, and as we encounter new challenges in product, policy, and performance, we need linguists leading paths towards the future. As linguists, we understand the technical aspects of language as well as how language functions as a broader system, and industry needs both of these perspectives. SoundHound clearly understands this need and has demonstrated the value they place in their interns. As a linguist, I’m glad I was one of them. 


Thank you Olivia for sharing your experience with the Career Linguist community!

You can reach Olivia at olivia.marrese@colorado.edu