💡 The title references ARC (a well-known AI reasoning benchmark) and hill-climbing (an algorithmic approach), indicating a research focus on AI reasoning methods.
这个Show HN介绍了Republic of Agents,一个通过7人Mafia游戏评估LLM社交推理能力(如协作、欺骗和联盟构建)的实验性基准。它包含两个评估批次(有无游戏间学习),并设有排行榜,根据结果分数对GPT-5.2和Gemini 3.1 Pro Preview等模型进行排名。会话亮点展示了游戏中特定模型的互动和表现。