Alexa for Animals: AI Is Teaching Us How Creatures Communicate

Artificial intelligence has already enabled humans to chat with robots like Alexa and Siri that were inspired by science fiction. Some of its newest creations take a page from a hero of children’s literature: Doctor Dolittle.

Researchers are using AI to parse the “speech” of animals, enabling scientists to create systems that, for example, detect and monitor whale songs to alert nearby ships so they can avoid collisions. It may not yet quite be able to talk to the animals the way the century-old children’s-book character could, but this application of what is known as “deep learning” is helping conservationists protect animals, as well as potentially bridging the gap between human and nonhuman intelligences.

Scientists following this line of inquiry are asking a fundamental question: Is the best way to probe one alien intelligence to use another?

Even asking this question raises all kinds of issues for those who build artificial intelligence, many of whom are eager to point out that what we now call AI isn’t intelligent by any definition recognizable to a layperson. It also raises issues for the scientists studying animals and their habitats, who are by trade and wary tradition of making claims for animal intelligence that liken it to our own.

That said, both groups are enthusiastic about the enormous potential of applying AI to animal communication, both as a way to learn about our finned, furry and flying friends, and as a way to sharpen the tools of artificial intelligence itself. Honing cutting-edge AI on a problem as rich and challenging as what animals are thinking, and whether or not they “talk,” challenges researchers to pursue goals with such systems that go beyond simply using them to understand languages ​​humans can already speak.

“It is fascinating that the tools of artificial intelligence, specifically deep learning, which is the hot new thing, do seem to be the natural tools to study this other kind of ‘AI’—animal intelligence,” says Oren Etzioni, a longtime AI researcher and head of the Allen Institute for AI, a nonprofit set up by the late Microsoft co-founder Paul Allen.

Researchers on animal communication are using a branch of AI that in recent years has proven effective for handling human language. Called “self-supervised learning,” it shows promise as a way to process the immense quantities of recordings of animal communication, captured in laboratories as well as natural environments, now flowing into the computers of scientists all over the world, says Aran Mooney, an associate scientist in the sensory and bioacoustics lab at Woods Hole Oceanographic Institution in Cape Cod, Mass.

To understand self-supervised learning, it helps to understand how most of the AI ​​we interact with every day—be it in virtual assistants or the face-unlocking systems of our phones—came about. For most of the past two decades of development of AI, teaching a computer how to recognize patterns in information, whether it’s transcribing spoken language or recognizing images, required a training phase that involved feeding a powerful array of computers large quantities of sounds, images or other kinds of data that have been labeled by humans.

How, after all, can a computer learn to recognize a cat if it doesn’t have a database of images tagged, by humans, “cat”?

Self-supervised learning is different. Software that employs it can chew through vast quantities of data that no human has ever touched. Self-supervised systems “learn” from patterns inherent in data. This is something humans are also capable of, though we seem to accomplish it with a variety of strategies unique to us, so it’s important not to ascribe humanlike abilities to such systems.

One example of the power of this kind of algorithm is the bigger-than-ever system for language processing and generation created by OpenAI, a not-for-profit research lab. OpenAI trained this system, known as GPT-3, on 45 terabytes of text scraped from all over the internet—everything from books to Wikipedia. From this, OpenAI was able to create software that can generate long blocks of text that are almost indistinguishable from prose written by humans, and which is capable of imitating human language abilities in other ways, like answering trivia questions and concocting recipes.

The same kinds of technologies used to build GPT-3, which was unveiled in 2020, are useful for parsing animal communications, for two reasons. First, self-supervised learning systems don’t require data that is labeled by humans, which is both costly and time-consuming to generate. Second, researchers often simply don’t know what animals are “saying,” so creating human-labeled data of animal “speech” may be impossible.

Kevin Coffey, a neuroscientist at the University of Washington, studies the vocalizations of lab rats and mice. These animals are capable of a surprising amount of complexity in their “songs,” and the order of the different vocalizations in these songs appears to carry information. But beyond the most basic information—whether an animal is feeling distressed, territorial or amorous—scientists just don’t know what rodents are saying.

Instead of listening for a ‘wake word’ like ‘Alexa,’ the Whale Safe system listens for ‘Brrrrrrr, gmmmm, awwwwwrrrghgh.’

That’s one reason Dr. Coffey created “DeepSqueak”—the name is a play on “deep learning”—software that makes it easier for researchers to automatically label recordings of animals’ calls. DeepSqueak is versatile enough that the software has also been used on the vocalizations of primates. In research that has yet to be published, it is also being used to help scientists understand the complicated language of dolphins. (Scientists have found that dolphins are smart enough that they have signature sounds that function like “names” for referring to themselves and others.)

Other AI-based systems for parsing animal communications are focused more on how creatures communicate in the wild, and applications more immediate than understanding what’s on their minds.

Whale Safe, a project of the nonprofit Benioff Ocean Initiative, is deploying buoys—each the size of a small car—off the US West Coast to detect whales and alert ships they are in the area. Ships are then asked to slow down, since collisions between ships and whales are often fatal. Since deployment of the system in 2020, whale strikes declined significantly in the Santa Barbara Channel, which is both one of the busiest shipping lanes in US waters and one of the most active feeding grounds for humpback and endangered blue whales on the West Coast, says Callie Steffen, who heads the project.

Scientists have been studying whale songs for decades, but didn’t previously have the ability to automatically and remotely detect those sounds in the wild. Onboard processing of the whales’ calls on AI-powered computers on the buoys is critical to how the system works, since the buoys can’t upload all the data they collect in real time. This kind of “edge computing” is also how systems like Amazon’s Alexa, Apple’s Siri and Google Assistant work.


If you could have a conversation with your pet, what would you tell it or ask? Join the conversation below.

In this case, instead of the system listening for a “wake word” like “Alexa,” it’s listening for “Brrrrrrr, gmmmm, awwwwwrrrghgh,” which is this columnist’s attempt to transliterate the unexpectedly strange calls of a blue whale. This listening is no small task. Similar to how a smart assistant must handle the many different accents of the world’s English speakers, Whale Safe’s buoys must recognize a wide variety of calls, determine which species is making them, and filter out all the background noise.

Another AI-based system, called BirdNET, made by researchers at Cornell University and Chemnitz University of Technology in Germany, can now recognize the calls of more than 3,000 different species of birds. A version of the system has been turned into a smartphone app, which functions as something like a birders’ version of the song-recognition app Shazam. This allows a new kind of “citizen science” in which recordings made by BirdNET’s nearly two million active users can be geolocated and used to study phenomena like the migratory routes of birds.

Aside from newer techniques like deep learning, what’s driving the use of AI for processing animal communication is the same thing that is driving an explosion in the use of AI across all industries and commercial applications: It’s easier and cheaper than ever to gather vast quantities of data, and to store and process it, says Dr. Etzioni. These technology megatrends also include Moore’s law for AI—an idea I call Huang’s Law—and the geographic disbursement of tech talent and know-how.

Sensors are also becoming cheaper and easier to deploy. For example, a company called Wildlife Acoustics sells small, battery-powered outdoor audio recorders that can pick up sounds made by a wide variety of animals. The recorders are used by conservationists, scientists and educators all over the world, on projects that range from monitoring frogs to protecting bats from being killed by wind turbines. When Wildlife Acoustics started more than a decade ago, it sold only a few hundred such recorders a year. It now sells more than 20,000, and that number continues to grow quickly, says Sherwood Snyder, director of product management at the company. Part of what’s driving adoption are the falling cost of such sensors; the company’s newest and cheapest one is $250.

An audio recording system on a coral reef off St. John, in the US Virgin Islands.


Mooney Lab/Woods Hole Oceanographic Institution

These same trends affect ocean researchers like Dr. Mooney of Woods Hole. With the cost of underwater microphones dropping over the past two decades, and the growing accessibility of AI-based systems to process what they record, an interdisciplinary, multi-institution group has launched a project called the Global Library of Underwater Biological Sounds. The group will attempt to apply techniques like self-supervised learning to the cacophonous sounds of, for example, coral reefs.

At a basic level, self-supervised learning may help scientists upend the way science is done. Rather than coming to a study with a hypothesis in mind, they simply grab all the data they can—drinking the proverbial ocean—and then let the algorithms search for patterns. The result can be results that no human would have anticipated, says Dr. Mooney.

In the past, humans trained AI by first giving it examples of data categorized by humans. In the future, these newer kinds of AI may be teaching us what categories exist in the first place.

“With self-supervised learning,” says Dr. Etzioni, “you can say to the computer: ‘You tell me in the data what structure you see.’ ”

For more WSJ Technology analysis, reviews, advice and headlines, sign up for our weekly newsletter.

Write to Christopher Mims at

Copyright ©2022 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8


Leave a Comment