Robustness of LLMs

Carlo Merola • Jaspinder Singh • Othmane Dardouri

abstract

Large language models (LLMs) have demonstrated impressive emergent abilities across diverse natural language tasks, yet their capacity for structured planning remains under-explored. This project investigates whether the LLaMA model can reliably solve planning problems and introduces a meta-network to evaluate and manage its performance. First,
we construct a benchmark suite of planning prompts and record LLaMA’s responses. Next, we train a lightweight meta-network atop these examples to predict, for each prompt, (a) the likelihood that LLaMA’s answer
is correct, and (b) a confidence score. When the meta-network predicts low confidence or high error risk, the system automatically delegates the task to alternative solvers or human experts. Our framework provides
a generalizable mechanism for risk-aware deployment of LLMs balancing autonomy with reliable fallback strategies.

outcomes

forum on Virtuale • repo url for the project