Career Profile: Linguistics in AI Enablement

The Career Profiles in Linguistics section regularly highlights career paths taken by linguists. If you would like to recommend someone (including yourself) for a future profile, please contact Career Linguist.


Time and travel involved to write this piece were generously supported by Appen. ideas and reflections on experience are my own.  


Earlier this month, I had the delight and great privilege to fly up to Seattle to hear Cala Zubair talk to linguistics students at the University of Washington’s Linguistics Department as part of their Treehouse series of talks in Computational Linguistics.  For her talk: Working as a Linguist in AI Enablement, Cala shared her perspective working as a sociolinguist in the tech sector. She shared insights from her professional journey broadly, and particulars of her current work at Appen.

Cala began by observing that perhaps it was starting off her career post PhD as a Linguistics Faculty member in an English Department that best prepared her for the work she does now as a Program Manager for Client Services. As  she put it “success in this position is 50% about understanding the workplace and how to work with people.” Specifically, Cala needs to engage clients and partners – many of whom do not understand language or linguistics – with appreciation for the richness and complexity of language in a way that is accessible and inviting. In the business world, this is called “stakeholder buy-in.”

But the rare and delightful thing about working at Appen is that she is not – by far – the only linguist on her team.  Appen employs hundreds of linguists to work on English alone, and English is just one of 180 languages that they have deep expertise in. There is a real community of linguists who collaborate and know how to ask and give support to one another.

One of Cala’s colleagues was there at her talk in fact, Alyssa Johnson, an Appen Lead Linguist. Cala asked her to share with the group what it was that she had been working on just that morning (the talk was at noon). This was my introduction to NLG technology: National Language Generation.  If I understood correctly, NLG is about generating language for a digital assistant so that it can respond to user requests in a way that sounds everyday.

It is no accident that Appen has such a great reputation among linguists. They were founded by a linguist out of Sydney Australia in 1996. Dr. Julie Vonwiller had been a linguist at the University of Sydney, when she left to start her own automated speech recognition start-up with her husband Chris Vonwiller – who worked at the time as an engineer at Telstra. According to this article in the Sydney Morning Herald, they began in true start-up fashion, using a spare room in their home to get the business off the ground.

Much of Appen’s success today is owing to the growth in Artificial Intelligence, or as the Herald put it “the voice-ification of the internet” which depends on high-quality language data to train systems on. The level of quality of the data, in turn, depends on employing highly skilled linguists to do things like take massive amounts of  unstructured data and normalize it, annotate it, package it, and run quality assurance measures of it.  

For me, one of the most useful aspects of Cala’s talk were some of the contextualizing slides designed to show how all the aspects of the work of the organization interrelate, for example that there are three kinds of data: 

  • Relevance Data
  • Speech and Natural Language Data
  • Image and Video Data

Which comprise three overarching domains of work:

  • Text-to-speech
  • Data Collection
  • Natural Language Processing

Which break down into tasks for the linguists employed there such as:

  • Video transcription tasks
  • Data Collection tasks
  • Annotation & Phonetic Dictionary Development
  • Content Analytics & NLP
  • Localization and Machine Translation

To give students a real sense for the day-to-day of working at Appen, we spent most of our time working with an example. In this case, what we would need to be thinking about if we were designing a chatbot that could answer queries about the weather. How would we need to train it? Cala guided us through reflections starting at the pragmatic level: What is the user intending to do? What historical facts do we need to know about the weather? Then ontologies to generate guidelines, what do you count as a location (if the user asks about the weather “here” or in the “Midwest”?), or a time phrase? (“now” “tomorrow” etc.) We would need to think about how to categorize parts of semantic meaning. We were left with a better appreciation of and understanding of what it means to say that we linguists can see the patterns in the messiness because we know what different parts of language do.

Finally, Cala shared the results of some informal interviewing she did with her colleagues: Basically, she asked a bunch of people who work for the organization about their experience of working there. They shared feeling a sense of validation – that they had hoped and wondered whether their training would be relevant and applicable, and were gratified to find that it indeed it was!

When it turned over to student questions, one of them asked “what’s the delight for you?” (Cala had shared that many of her colleagues used the word “delight” to describe their work in her informal survey). Cala’s response: “my delight? Data is exciting and I get to work with it all day!”

If you are or someone you know is currently looking for work, Appen is hiring!  You can check out their careers page, or the recent post on Career Linguist featuring some selected jobs.

Thank you Cala – and for those of you heading to the LSA Careers Expo later this week, you can stop by and speak to her there!