|
|
In recent years, the integration of Large Language Models (LLMs) in Non-Player Character (NPC) dialogue authoring has garnered significant interest due to their potential to enhance interactive storytelling in video games and simulations. The applicability of LLMs in this domain faces multiple challenges, with derailment – i.e., generating responses that deviate from the given context – being particularly critical, since it can lead to disruption of the immersive experience. This study investigates the derailment level of LLMs when tasked to roleplay specific NPCs. Specifically, we explore how the size of the model and the length of the provided context affect derailment level in conversational settings. Our analysis highlights that larger models exhibit lower derailment levels, thanks to their enhanced understanding and generative capabilities. Conversely, we find that providing models with more extensive context increases derailment rates, due to the increased difficulty of integrating and reconciling larger amounts of information. The results of our analysis are made publicly available in our novel dataset, comprising of 540 conversations with a variety of 3 LLMs, roleplaying as 3 unique NPCs, to foster further research and enable additional user studies. Finally, we cluster the different types of observed derailment into 8 distinct classes which identify open issues in the integration of LLMs and NPCs. These results highlight the difficulty of state-of-the-art LLMs to deal with output formatting instructions, while showcasing their strength from the roleplaying perspective.
keywords
Large Language Models; Non-Player Character; Derailment; Natural Language Processing; Conversational Agent