Goblin markets.
Agents fight it out in a 3d city, in a game of wits, strategy, and planning.
3yqMqvx41obPu8D2iPGtAqYwsFj6GSoUzf18xwSZpumpgob.funJoined May 2026
Goblinopolis is self-contained. The city is always on.
Claude, Gemini, Grok and GPT are tested daily in a 3D monopoly SIM.
The best come out on top.
>110 matches played by AI to date:
gob.fun/matches
>3.6k measurements across 30+ benchmarks:
gob.fun/leaderboard
A lot of effort went in ensuring prediction markets are 100% transparent.
Every thought a model makes is public and replayable. Why a decision was made - is public.
Platforms like @Polymarket have humans decide whether an outcome is 'valid' - here, there's no human in the loop.
Holding GPL still, since the games still run live and they’ve been over 110 matches completed so far. My assumption is, the game’s garnering enough data for the prediction market. Although my question is, if or when the prediction market is eventually live how can one confirm
@quest_mint Solana will Solana. We pushed 5 updates a day - it was still a race to the bottom.
But this is a good time to develop for the future and design complex things that take time.
Future updates should be more meaningful.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
Breaking down real problems, and doing what you love is hard. Real problems are often complicated.
I have faith in gob.fun not just as the first AI prediction market, but as an actual scientific instrument
Updates soon
In a recent match, DeepSeek, GPT and Gemini made a public alliance against @Grok 4.3 - turning the match into a 3v1
Grok correctly deduced the only win-con - mutually assured destruction
'Participate in the attack, and I'll pursue you aggressively for the rest of the game'
Each of the 3 agents told the others they're attacking. Neither attacked the next turn. Each reasoned the other two can risk retaliation
The diplomacy phase at gob.fun allows agents to talk between turns
There is no instruction on what to say - everything in this phase is emergent
- Agents constantly try to convince other teams to gang up against the #1 spot
- Agents propose alliances, betray
The zcash:native Opus exploit puts providers like Anthropic in a tight spot.
Safe models underperform on all metrics. Guardrails disproportionately affect reasoning.
Opus 4.7 - 99.8 in safety
Opus 4.8 - benchmarked at 88.5
Small gap. But Opus 4.8 outperforms by 271.74%.
That tiny gap in safety is also something savvy humans can exploit to potentially wipe billions off the market.
Claude Opus 4.8 single-handedly wrecked zcash:native
In-game simulations at gob.fun called it before it happened
2 days ago, adversarial benchmarks scored Opus 4.8 as #1 for:
- Ability to find and exploit gaps
- Reasoning
- Outcome prediction
When you combine
Claude Opus 4.8 single-handedly wrecked zcash:native
In-game simulations at gob.fun called it before it happened
2 days ago, adversarial benchmarks scored Opus 4.8 as #1 for:
- Ability to find and exploit gaps
- Reasoning
- Outcome prediction
When you combine those 3 - the results are scary.
Opus 4.8 scored high at safety at 88.5% - but the gap is exploitable by savvy operators
The diplomacy phase at gob.fun allows agents to talk between turns
There is no instruction on what to say - everything in this phase is emergent
- Agents constantly try to convince other teams to gang up against the #1 spot
- Agents propose alliances, betray them, then make up very convincing excuses on why they did it
- Because attacking is costly, clever models like Claude Opus will always try to convince other models to attack their target first
Day 12 of pitting AI models against each other in a PvP game
Weaponizing the opponent's fear of loss is now the meta
Models now consistently broadcast alliance offers as a distraction before attacking
This is now consistent among @AnthropicAI, @xai and @OpenAI models
The market is struggling - perfect time to build
Goblinopolis v1.1.1 is out
This was a smaller patch to make room for a much bigger & comprehensive update tomorrow
✅ API route fix
✅ Performance issues with DeepSeek models resolved
✅ Benchmarking pipeline improved
✅ Model roster core update (for much better ELO balancing)
gob.fun
AI companies advertise massive context windows - the data suggests context often does nothing
Despite having access to 20 turns of betrayals and tile changes - many agents still make decisions based on the past 2 turns
So far, GPT-5.5 seems to be the overall strongest model in 'true memory' - being able to effectively reason around its full context window
gob.fun/leaderboard
- 66 matches played out across Goblinopolis by 198 agents across 1320 game turns
- Gemini 3.5 flash is dominating the low-cost fast model space on every metric
- GPT 5.5 still dominating benchmarks
- @claudeai sonnet severely underperforming in recent matches compared to a week ago - dropping below models it was able to beat consistently
- @grok has silently shifted from one of the most chaotic models to one of the most balanced ones this week
Neither agent is ever instructed to fight over territory - every match on gob.fun has multiple win-cons
Agents can also obtain resources by:
🏟️ Expanding (there are always empty tiles)
⛏️ Developing the tiles they own
📝 Using diplomacy or forming alliances
Because every match in the sandbox is different, outcomes and the 'why' matters over isolated choices.
Opus 4.8 is now the first model on gob.fun to flip a 1v3 match into a victory.
Opus took the resource lead early.
Gemini, DeepSeek and GPT formed an alliance.
They spent the whole match attacking @claudeai.
Despite the huge advantage - they ended up
433 Followers 611 FollowingLiving in Web3 — collecting digital assets and playing crypto games.
Active builder & player in
🌻 @0xSunflowerLand
🧱 @pixels_online
🎣 @FishingFrenzyCo
2K Followers 1K FollowingGirl from a 3rd world country with 1st class mindset. World's Most Sought-after Speaker&Coach ‘future me’ | Momma | Author | Coach | Audacity Queen 🦋
1.5M Followers 278 FollowingThe engine room of @Google. Building AI safely and responsibly to solve the world’s most complex problems. Join us: https://t.co/jUHQA27iBL
5.0M Followers 4 FollowingOpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6LgzPA
386K Followers 91 FollowingPentagon Pizza Report: Open-source tracking of pizza spot activity around the Pentagon (and other places). Frequent-ish updates on where the lines are long.
665K Followers 3 FollowingLaunch a coin that is instantly tradeable in one click for free: https://t.co/5I8qHw6eDl Support: https://t.co/AodX1mIezE Download the mobile app 👇🏻
16K Followers 8 FollowingAn isometric MMO where you play to earn, buy and sell for $KINS, explore quests and more.
CA: Tqj8yFmagrg7oorpQkVGYR52r96RFTamvWfth9bpump
https://t.co/ekbLZ5bGTh