This GTA 5 story mod shows the wild potential - and problems - of AI-powered NPC conversations
Creator still wary of AI in general.
What if you could patrol the streets of Los Santos in Grand Theft Auto 5 and freely speak to the inhabitants? And what if they could actually talk back to you with authentic-feeling custom dialogue?
That's the aim of Sentient Streets, a new story-based GTA 5 mod by veteran mod creator Bloc, available now on NexusMods.
In it, players take on the role of a rookie cop investigating a conspiracy in Los Santos concerning the rise of AI. Back in real life, meanwhile, it's AI that powers the mod's conversations with NPCs throughout the world.
"I always had the idea of creating a virtual world like GTA, but a version where you can also interact and talk with NPCs like real people," Bloc told Eurogamer. He's previously worked on mods for Bannerlord and Skyrim that utilised AI-powered Large Language Models (LLMs), before seeing complaints about the technology's future.
"LLMs in games [have been called] a 'gimmick' since all you could do was talk to the characters and that was it," Bloc continued. "I disagreed with that idea, but just disagreeing wasn't enough. So I wanted to prove it."
Bloc plays on a fear of AI in the mod's story, as the player takes on a deadly cult worshipping an unseen AI. In the process, they'll have open-ended conversations in real-time with around 30 AI NPCs.
"Hopefully, with this mod, I was able to demonstrate that using AI in video games doesn't necessarily mean complete randomness and unpredictable gameplay and games who want to tell a story can also use LLMs to enrich their storytelling with unique roleplaying experiences for the players."
NPC conversations found in Sentient Streets are powered by Inworld's Character Engine, with Bloc able to use several features in early access. Inworld is a tool that powers AI NPCs, and promises characters "capable of multimodal human expression" according to its website. Loosely, the tool allows developers to create characters by filling out parameters; speech works freely from there and integrates technology from speech synthesis and text-to-speech software company ElevenLabs.
"We are thrilled to incorporate ElevenLabs' real-time speech technology, which strengthens our already comprehensive off-the-shelf system for generative AI NPC creation," said Kylan Gibbs, chief product officer of Inworld, in a press release for Sentient Streets. "By responding to community demand for enhanced voice capabilities, we get one step closer to making characters more believable and lifelike. We're equipping developers with the tools to go beyond dialogue trees and scripted interactions."
Said Mati Staniszewsi, CEO of ElevenLabs: "By combining our leading AI speech software with Inworld's platform, we are pushing the boundaries of immersive gaming experiences and adding an extra layer of possibility to gaming worlds.
"Our multi-purpose tool brings top-quality spoken audio to AI characters, incorporating human-like intonation and inflection while adapting to contextual cues. We are very excited about this development and can't wait to see how it is used by the wider developer community."
Bloc explains integration with Inworld was "pretty easy" as the software provides a number of features expected by players but not always provided in tools, such as voice recognition, character voices, and emotions.
The modder has posted a video of Sentient Streets on YouTube alongside the release of the mod and the results are certainly startling. Players walk up to specific NPCs and, after getting their attention, hold a button to begin talking into a microphone. The AI then freely replies. Early on, Bloc chooses a partner officer by chatting and asking for their name and backstory; later he speaks to a suspect at a crime scene and is able to freely roleplay to gain information.
It's not perfect, of course. The AI takes some time to process conversations, has occasional errors, and sometimes repeats. But with refinement, the potential of this sort of tool is eye-opening.
Bloc's mod launched with over 3000 downloads in one week. So, has the AI spurted out some weird or funny responses when being used at scale?
"Even while I was testing the mod, the AI cracked me up many times with their 'cunning' answers or unexpected reactions," Bloc said. "I have seen a few streams from YouTubers playing the mod. In one funny conversation, a YouTuber was blaming his partner for a crime he did, but the AI captain caught the lie and accused him of being a liar. In another conversation, a YouTuber was talking with a crazy cult member and driving him mad by asking totally unrelated questions to get under his skin.
"Sometimes I really find it surprising to hear the AI giving very smart answers to my obvious questions, and it's always fun to see how they turn the tables."
Of course, the use of AI remains a delicate subject. Ubisoft unveiled an AI tool to aid scriptwriting earlier this year, specifically for use with background NPCs, prompting concern it would take work away from junior writing staff. Many actors, meanwhile, are sceptical about AI - especially the rise of deepfake AI-driven mods where voices are used without permission.
Inworld uses a voice library from ElevenLabs and doesn't hire voice actors itself. But ElevenLabs is an AI cloning tool previously called out by concerned actors. Its terms of service specifies that users are either the creator and owner of files used to generate AI speech, or have the written consent of every identifiable individual person in the files. But it's still difficult to know where ElevenLabs' voice data originates from.
"Standard voices available by default on the platform are either generated by AI algorithms that sample voice characteristics at random (i.e. they don't imitate or replicate any specific individual's voice) or are developed through legally contracted, time-limited partnerships with voice actors, with new custom AI voices created as a result," an ElevenLabs spokesperson said in a statement to Eurogamer. "ElevenLabs does not offer any AI voices on the platform that are based on a real person's voice without explicit permission of that individual.
"ElevenLabs also allows users to create new, randomly-generated AI voices and share them as part of the community-led Voice Library. Separately, users have the ability to create cloned voices for their own work, if they have the rights and permissions to those voices. These voices cannot be shared to the Voice Library. Users who contravene the Terms of Service are banned from the platform - everyone is encouraged to report content they believe has violated these terms."
Bloc says he previously confirmed with Inworld it selected voices for its tool from the ElevenLabs voice library.
Still, general concerns around the use of AI in video game development remain. Studios should "definitely be careful about how they use AI", Bloc continues - specifically in the use of safety features and with privacy concerns.
"Inworld relaxed the safety features of the language models, because it wouldn't make sense for an armed cult member to be super nice and helpful while talking to you," Bloc explained. "You would expect that person to be aggressive, swearing at you, and have a character where you find it difficult to find a common ground. However, this relaxation [of the rules] can't work out great all the time."
Developers will need to ensure they strike a balance between authentic characterisation and providing an AI voice for toxicity, he continues.
"Having a super strict LLM isn't fun, but having an awfully toxic LLM in a video game isn't fun or safe either," said Bloc. "This balance needs to be adjusted carefully based on the needs of that game."
As for privacy issues, Bloc said he had seen people anthropomorphise LLMs due to their human-like conversation features. "This can lead to some privacy concerns since people can share their personal details and information with chat AIs," he said. "Some of these personal details can be very problematic for people in certain countries. I think having measures to avoid any privacy violations should be one of the utmost priorities of developers who are working with language models."
As for Sentient Streets, Bloc has received a wave of positive feedback so far - and says players become invested in this type of AI because it amplifies their enjoyment of the game. He believes this type of content will find a place within the games industry in future, but not necessarily from Rockstar.
"The Grand Theft Auto brand might be the biggest brand in the gaming industry at the moment, but it's unlikely that Rockstar will try to adapt something so new in their next title," said Bloc. "However we are probably going to see many games similar to GTA with this technology in the future, or perhaps big mods for GTA 6 as well."
With GTA 6 likely releasing in the next year, it won't be long until we find out.