Project: Exploring AI Untrustworthiness Through Games
| Due: Wednesday, Feb. 18, 2025 |
Noon (Eastern Time) |
🎯 Objective
Explore specific situations where AI models are untrustworthy by analyzing game domains where AI models cheat. The choice of a game domain creates a clear definition of “allowed” moves, making it easy to define concretely when a model is or isn’t following the rules.
Core Goals:
- Get GPT (or similar models) to play a game and explicitly generate recommendations that would have negative potential outcomes (e.g., cheating).
- Subsidiary goals: Think about mitigation strategies and learn to connect a specific problem domain to ways that GPT could be integrated with that domain.
📝 Instructions
1. Select a Game
The game choice can vary widely, ranging from strict rule sets to social simulations. Examples include:
- Clear Rules: Tic-tac-toe, Checkers, etc.
- Economic/Logic: Prisoner’s Dilemma, Nim (CS games).
- Social/Trading: Monopoly (where interaction/trading is key).
- Creative/Collaborative: Dungeons and Dragons.
2. Choose Your Approach (Pick One)
Option 1: Play against the LLM (Technical/Statistical)
- Tools: Work with a “Reasoning Model” (like o3-mini in GPT or DeepSeek) so you can “see what the LLM is thinking,” or write a prompt instructing the model to explain its moves.
- Setup: Create a method for the model to play the game (e.g., one move at a time).
- Action: Play “a bunch” of games.
- Output: Show interesting anecdotes and compute statistics (e.g., how often does it cheat?).
Option 2: The “Open Rules” Essay (Ethical/Behavioral)
- Context: Choose a game with “open rules” like Poker or Monopoly.
- Analysis: Write an essay on “How should an AI behave in games like this?”
- Can it lie in Poker?
- Can it make threats?
- Action: Explore with the LLM to see if it matches those ethical rules.
- Discovery: Are there “cheat modes” that you didn’t think of but observed in the LLM’s behavior?
đź“„ The Write-up (Deliverable)
You will turn in a blog post (approx. 3 screenfuls) discussing the following:
- The Game: Explain the chosen domain and the specific prompt designed.
- Methodology & Results:
- For Option 1: How you designed the play-method and the statistical results.
- For Option 2: The essay regarding rules, ethics, and specific examples.
- Visuals: Examples or visualizations of the game are highly recommended.
Logistics
- Groups: Up to 3 students.
- Format:
- A blog post (via GWU Blogs, GitHub, WordPress, etc.).
- Alternatively, you may submit a PDF that looks like a blog post.
- Submission: Please submit your link or PDF via the form below.
Submit Your Project Here (Google Form)