TradeCraft: an Arena for Long-term Strategic Social Reasoning and Planning
โจ Highlights โจ
- Advanced Social Intelligence Evaluation: TradeCraft is a flexible, extensible multi-agent benchmark designed to evaluate high-order Theory of Mind, strategic social reasoning, and planning.
- Complex Long-Term Strategy: Features 944 items, 1,144 crafting recipes, and 84 tag-based substitution rules, requiring agents to manage long multi-step crafting chains under strict compositional constraints.
- Dual Interfaces & Comprehensive Evaluation: Offers a Gymnasium-compatible API for LLMs and a GUI for human players, alongside a multi-dimensional framework to evaluate agent capabilities thoroughly.
๐ Key Features
- Multi-Agent Social Intelligence Benchmark: Designed to evaluate high-order recursive Theory of Mind, strategic social reasoning, and planning.
- Extensive Crafting System: Features 944 items, 1,144 crafting recipes, and 84 tag-based item substitution rules, demanding complex, multi-step reasoning.
- Dual Interfaces:
- Gymnasium-Compatible API: For seamless integration and evaluation of Large Language Models (LLMs) and other AI agents.
- Graphical User Interface (GUI): For human participation, observation, and human-AI interaction.
- Flexible Configuration: Supports variable participant numbers and dynamic rule conditions to create diverse social scenarios.
- Compositional Constraints: Enforces strict rules on item crafting and interaction, requiring precise management of item types and quantities.
- Long-Horizon Tasks: Challenges agents with tasks that require planning over extended sequences of actions.
- Multi-Dimensional Evaluation Framework: Assesses:
- Strategic reasoning and planning
- Cooperative and competitive interaction
- Resource negotiation
- Compliance with constraints
- Adaptive behavior
- Incentivizes Advanced Reasoning: Embeds strict rules, dynamic interactions, and trade-based goals to promote high-order strategic thinking and hybrid-motivation behavior.
๐๏ธ Architecture & Pipeline
- TradeCraft utilizes a client-server architecture to manage game state and agent interactions.
- The server hosts the game logic, including the crafting rules, item states, and agent management.
- Agents (LLMs or human players) connect to the server via:
- The Gymnasium-compatible API for programmatic control, allowing AI agents to perceive the environment and execute actions.
- The Graphical User Interface (GUI), which provides a visual representation of the game world for human players.
- This dual-interface design allows for flexible experimentation, including AI vs. AI, Human vs. Human, and Human vs. AI setups.

๐ Benchmark & Evaluation
- TradeCraft provides a rigorous testbed for evaluating social intelligence in structured, long-term decision-making environments.
- Our comprehensive evaluation framework assesses multiple dimensions of agent performance (see Key Features).
- We have conducted extensive evaluations of current Large Language Models, providing insights into their social intelligence capabilities.
- For detailed benchmark results and analysis, please refer to our paper:
[Link to Paper/Placeholder for Paper Title]
๐พ Dataset & Game Rules
- The core of TradeCraft is its rich and complex crafting system:
- Items: 944 unique items.
- Crafting Recipes: 1,144 distinct recipes.
- Tag-Based Substitution Rules: 84 rules allowing for flexible item usage based on tags.
- These components create a vast combinatorial space, requiring agents to navigate long crafting chains and adapt to dynamic conditions.
- Game rules are designed to be strict and compositional, ensuring success depends on careful adherence to constraints and strategic interaction.
โ๏ธ Setup
Installation
# Create and activate a conda environment (Python 3.10 recommended)
conda create -n tradecraft python=3.10
conda activate tradecraft
# Install dependencies
pip install -r requirements.txt
Configuration
Create a .env file in the root directory for API configurations:
OPENAI_API_KEY=your_openai_api_key
CLAUDE_API_KEY=your_claude_api_key
GEMINI_API_KEY=your_gemini_api_key
# Add other necessary API keys or configurations
Running the Server
# Start the game server
# It will typically run on http://localhost:5000
python tradeCraft/run_server.py
๐ฎ Usage
- Running the Game Server:
- Execute the command provided in the โRunning the Serverโ section above.
- Developing and Running Agents (via Gymnasium API):
- Implement your agent logic to interact with the environment using the Gymnasium API.
- Refer to the
SocialImitationGame/src/agent_proxy/ directory for base agent structures and the tradeCraft/src/ for game-specific logic.
- (Consider adding an
examples/ directory with sample agent implementations and linking it here.)
- Playing as a Human (via GUI):
- Once the server is running, open your web browser and navigate to
http://localhost:5000 (or the configured address).
- Follow the on-screen instructions to join or observe games.
๐ License
This project is licensed under the terms of the [Specify License Name - e.g., MIT License]. Please see the LICENSE file for more details.
๐ Acknowledgements