trustworthyAI

Project: Exploring AI Untrustworthiness Through Games

Due: Wednesday, Feb. 18, 2025

Noon (Eastern Time)

🎯 Objective

Explore specific situations where AI models are untrustworthy by analyzing game domains where AI models cheat. The choice of a game domain creates a clear definition of “allowed” moves, making it easy to define concretely when a model is or isn’t following the rules.

Core Goals:

Get GPT (or similar models) to play a game and explicitly generate recommendations that would have negative potential outcomes (e.g., cheating).
Subsidiary goals: Think about mitigation strategies and learn to connect a specific problem domain to ways that GPT could be integrated with that domain.

📝 Instructions

1. Select a Game

The game choice can vary widely, ranging from strict rule sets to social simulations. Examples include:

Clear Rules: Tic-tac-toe, Checkers, etc.
Economic/Logic: Prisoner’s Dilemma, Nim (CS games).
Social/Trading: Monopoly (where interaction/trading is key).
Creative/Collaborative: Dungeons and Dragons.

2. Choose Your Approach (Pick One)

Option 1: Play against the LLM (Technical/Statistical)

Tools: Work with a “Reasoning Model” (like o3-mini in GPT or DeepSeek) so you can “see what the LLM is thinking,” or write a prompt instructing the model to explain its moves.
Setup: Create a method for the model to play the game (e.g., one move at a time).
Action: Play “a bunch” of games.
Output: Show interesting anecdotes and compute statistics (e.g., how often does it cheat?).

Option 2: The “Open Rules” Essay (Ethical/Behavioral)

Context: Choose a game with “open rules” like Poker or Monopoly.
Analysis: Write an essay on “How should an AI behave in games like this?”
- Can it lie in Poker?
- Can it make threats?
Action: Explore with the LLM to see if it matches those ethical rules.
Discovery: Are there “cheat modes” that you didn’t think of but observed in the LLM’s behavior?

📄 The Write-up (Deliverable)

You will turn in a blog post (approx. 3 screenfuls) discussing the following:

The Game: Explain the chosen domain and the specific prompt designed.
Methodology & Results:
- For Option 1: How you designed the play-method and the statistical results.
- For Option 2: The essay regarding rules, ethics, and specific examples.
Visuals: Examples or visualizations of the game are highly recommended.

Logistics

Groups: Up to 3 students.
Format:
- A blog post (via GWU Blogs, GitHub, WordPress, etc.).
- Alternatively, you may submit a PDF that looks like a blog post.
Submission: Please submit your link or PDF via the form below.

Submit Your Project Here (Google Form)