Mapping ChatGPT's knowledge base

Mapping ChatGPT’s knowledge universe

According to ChatGPT, the most important person in the world is the marine biologist, writer, and conservationist Rachel Carson—or so it seems based on “Artificial Worldviews,” a research project released today by Kim Albrecht of Harvard’s metaLAB, a project of the Berkman Klein Center for Internet & Technology that works at the intersection of the humanities, technology, and design.

For the project, Albrecht asked the application programming interface (API) of the widely publicized artificial intelligence (AI) system, ChatGPT, about its knowledge of the world. (Albrecht used the latest free version of the technology, ChatGPT-3.5.) The first prompt asked ChatGPT to create a dataset of all the different categories of knowledge it possessed; ChatGPT provided 31 responses, such as geography, film, and sports. From there, a recursive algorithm requested data about subfields within each category—for geography, for instance, ChatGPT provided subfields such as cartography, mountains, and oceans. Finally, the algorithm requested the API to list the key humans, objects, and places within each category and subfield. Albrecht then mapped the results in a data visualization, in which fields connect to subfields and objects and humans connect by co-mentions in similar fields.

Albrecht said he became interested in mapping ChatGPT’s knowledge base because of what is commonly referred to as the “black box problem”—humans’ inability to understand how complex AI models make decisions. Although many researchers are concerned with examining the internal workings of ChatGPT to address this issue, Albrecht thought he could also understand ChatGPT’s worldview by looking at its outputs. “We don’t understand other humans by looking into the structures of their brains, but by talking to them,” he says. “We know other humans pretty well by observing how they interact, behave, think about things.” Researchers might understand ChatGPT in a similar way, he thought, by systematically prompting it about the scope of its knowledge. “When you request something from ChatGPT, after a while, you get some kind of intuition about how that system behaves, how it acts, what it’s capable and incapable of doing, how it’s structuring the world,” he says.

In total, Albrecht’s algorithm prompted ChatGPT 1,764 times. Among the people ChatGPT named in response to specific prompts, men appeared three times as often as women. But in other ways, the returned data were more diverse than he expected: Rachel Carson, named 73 times, was the most common response ChatGPT provided. She was followed by the primatologist Jane Goodall (named 60 times), the ancient Greek philosopher Aristotle (52 times), the Kenyan political activist Wangari Maathai (44 times), and the physicist Isaac Newton (41 times).

The responses don’t represent an unbiased look at ChatGPT’s knowledge base, Albrecht cautions. Several factors shaped the responses obtained, including how ChatGPT handles requests, how it understands the information it was trained on, and the political influences woven into its artificial neural network. The way that Albrecht structured his prompts could also have influenced the responses. “I thought it was strange that the most common responses were all humans,” he says. “But that could be because of the way I asked the question. I first requested humans, and in the second request, I asked for ideas, places, objects—so there are more things that it has to fulfill than just ‘humans.’”

He hopes that the results can help researchers and laypeople better understand ChatGPT—and how this AI system moderates and filters that knowledge before presenting it to users. ChatGPT’s knowledge base is made of hundreds of billions of tokens—units of text that were extracted from larger sequences. About 410 billion of these tokens come from a dataset provided by Common Crawl, which combs through the Internet and archives web pages such as Wikipedia.

“As anybody who has spent time on the Internet knows, not all sources reliable,” Albrecht says. In response, OpenAI, the parent company of ChatGPT, has put “a lot of human labor” into filtering these sources. Time reported in January that OpenAI paid workers in Kenya $2 per hour to sort through tens of thousands of lines of text and filter out sometimes graphic content related to topics such as murder, torture, and self-harm. Human labor also goes into attempts to eliminate biases in ChatGPT’s responses. This filtering likely shaped the outputs Albrecht received—perhaps making figures like Rachel Carson more popular than expected.

Albrecht hopes to continue working on the project. He’s creating another similar map, for which he prompted ChatGPT about the different categories of power instead of knowledge—he was inspired by Michel Foucault’s work about the connections between the two. (“The most powerful person, I can already tell you, is Gandhi,” Albrecht says. “Then, Nelson Mandela.”) He is also looking into the computation, data, labor, and other material qualities that are involved in ChatGPT’s decision-making processes. Eventually, he hopes to synthesize these projects, exploring how the material aspects of ChatGPT shape its understanding of knowledge and power—and making the black box of AI a little less inscrutable in the process.

Read more articles by Nina Pasquini

You might also like

Eating for the Holidays, the Planet, and Your Heart

“Sustainable eating,” and healthy recipes you can prepare for the holidays.

Getting to Mars (for Real)

Humans have been dreaming of living on the Red Planet for decades. Harvard researchers are on the case.

Why Humans Walk on Two Legs

Research highlights our evolutionary ancestors’ unique pelvis.

Most popular

Why Men Are Falling Behind in Education, Employment, and Health

Can new approaches to education address a growing gender gap?

The 1884 Cannibalism-at-Sea Case That Still Has Harvard Talking

The Queen v. Dudley and Stephens changed the course of legal history. Here’s why it’s been fodder for countless classroom debates.

Trump Administration Appeals Order Restoring $2.7 Billion in Funding to Harvard

The appeal, which had been expected, came two days before the deadline to file.

Explore More From Current Issue

A jubilant graduate shouts into a megaphone, surrounded by a cheering crowd.

For Campus Speech, Civility is a Cultural Practice

A former Harvard College dean reviews Princeton President Christopher Eisgruber’s book Terms of Respect.

Historic church steeple framed by bare tree branches against a clear sky.

Harvard’s Financial Challenges Lead to Difficult Choices

The University faces the consequences of the Trump administration—and its own bureaucracy

Lawrence H. Summers, looking serious while speaking at a podium with a microphone.

Harvard in the News

Grade inflation, Epstein files fallout, University database breach