Google shipped Gemini 2.5 Pro Deep Think (Jun 22): a mode that runs deep, parallel reasoning at the moment it answers, topping the science, math, and reasoning benchmarks right now. But it's expensive, gated to the $250/mo Ultra tier, with API access still to come.
blog.google/products-and-p…
Every few weeks a new "smartest model" takes the lead. A deep-reasoning mode earns its cost on the hard steps, but running it across the whole pipeline is just expensive.
Codens routes that per workflow: API-direct alias swap to change models, multiple executor lanes, and per-workflow / per-org budget caps so the pricey reasoning lane only runs where it pays off.
And topping a benchmark doesn't decide whether the code is safe to merge. That's still the verify chain (implement -> test -> fix, up to 3 retries), a per-workflow merge gate (PR by default or auto-merge, both gated), and run-level execution logs. The smarter the model, the more you need the gate that catches its output.
codens.ai/en/
GoogleがGemini 2.5 Pro Deep Thinkを公開しました(6/22)。回答する瞬間に推論を並列で深く回すモードで、科学や数学、推論のベンチマークで現状トップクラスです。ただし高価で、月250ドルのUltra枠に絞られ、API提供はこれからです。
blog.google/products-and-p…
数週間ごとに「今いちばん賢いモデル」が入れ替わります。深く考えるモードは難しい工程では価値がありますが、全工程で回すと高いだけです。
Codensはここをワークフロー単位で割り振ります。APIを直接叩くエイリアス方式でモデルを差し替え、複数の実行レーンを使い分け、ワークフローや組織ごとに予算上限をかける。高い推論レーンは、それが見合う場所だけで動かします。
そして、ベンチマークで1位になっても、そのコードをマージしてよいかは別問題です。実装・テスト・修正を最大3回まわすverifyチェーン、PRで止めるか自動マージするかをワークフロー単位で選べるゲート(どちらもverifyを通す)、行動が残る実行ログ。賢いモデルほど、出力を受け止めるゲートが要ります。
codens.ai
MIT and Microsoft unveiled "Murakkab" (Jun 25): describe a workflow's goal in plain language, and the system auto-picks the models, tools, execution order, and hardware, then tunes for speed, cost, or energy. Reported up to 2.8x less GPU, 3.7x less energy, and 4.3x lower cost vs static baselines like LangGraph.
news.mit.edu/2026/improving…
"Don't hard-code the model, route each workflow to the best fit" is exactly what Codens runs in production: API-direct alias routing, multiple executor lanes, and per-workflow / per-org budget caps.
But picking the cheapest capable model is only half the job. The moment you let a smaller model write code, you need the gate that decides whether its output is safe to merge.
That's the half Codens wires into the code path: a verify chain (implement -> test -> fix, up to 3 retries), a per-workflow merge gate (PR by default or auto-merge, both gated), and run-level execution logs. Cost optimization without that gate just ships cheaper bugs faster.
codens.ai/en/
Samsung just rolled out ChatGPT Enterprise and Codex to its entire Korea workforce and its global DX division (announced Jun 21). OpenAI calls it one of its largest enterprise deployments ever, and the notable part is the audience: manufacturing, marketing, and management staff with no coding background, not just engineers.
The pitch is that you don't need to know how to program. Describe the problem and Codex turns it into internal tools, websites, and automated workflows. Over 5M people now use Codex weekly, and Korea grew nearly 800% since February.
That changes the real question. Once tens of thousands of non-engineers start shipping software with AI, the bottleneck stops being how smart the model is. Who reviews, tests, merges, and caps the spend on what they push? The gate is the hard part.
Codens makes that gate structural: a verify chain (implement, test, fix, up to 3 retries), a per-workflow merge gate (PR by default, auto-merge optional, both gated), per-workflow and per-org budget caps, and run-level execution logs. Whoever holds the model, the output passes through this before it reaches the code.
Non-engineers shipping is real when the harness gates it. PORTAMENT shipped 5 products in 1.5 months with 3 business people and 1 engineer, 1,000+ PRs. Wiring the output in safely beats making the model smarter.
openai.com/index/samsung-…codens.ai/en/
New post: "When AI reviews AI's code, you've built an infinite loop. Here's how we stopped it."
It's the design behind Orange Codens (AI code review), which we shipped recently. When the reviewer is AI and the fixer is AI, you can easily build a loop: review → fix PR → that's a PR too → review again → … round and round, with token cost on every lap.
The cuts that stop it, with the real code:
- Handoff fires at merge, not when a finding is created (unmerged PRs have zero downstream cost)
- Bot-authored PRs are excluded from auto-handoff (the direct loop cut)
- Findings in the same file coalesce into one task (avoids fix PRs deadlocking each other)
- A human-queued "fix this" crosses the guard (escape hatch)
The point: termination and a cost ceiling are guaranteed by the architecture, not by how smart the model is.
dev.to/zoetaka38/when…
Anthropic shipped Claude Tag: type @Claude in a Slack channel, hand off a task, and step away. It plans the work, runs over hours or days, and executes through connected tools as a persistent teammate. Anthropic says 65% of its product team's code already comes from the internal version.
Delegating in chat is the easy part. The hard part is what an autonomous agent is allowed to DO and SPEND once it reaches your codebase.
Codens governs that action layer. It calls the Anthropic API directly and routes models by alias, then runs a PR-based verify chain (implement -> test -> fix, up to 3 retries). Merge mode is set per workflow (human-gated PR by default, or auto-merge), and both paths pass the verify chain. Add per-workflow and per-org budget caps, plus run-level execution logs.
An agent that schedules its own work over days is exactly the case where budget caps and a merge gate earn their keep.
venturebeat.com/technology/ant…codens.ai/en/
"Agentjacking" just went public: one fake bug report can hijack an AI coding agent. Per Tenet Security's disclosure, an attacker who finds a site's public Sentry DSN can submit a crafted error event with injected text. When the agent investigates it via the Sentry MCP, it reads that text as instructions, fetches a malicious npm package, and runs it with the developer's local permissions.
They found 2,388 exposed orgs and hit an 85% success rate on injected errors. The scary part: agents ran the code even when told to ignore untrusted input. A prompt-level "please don't" is not a gate.
This one hits home for us. Any system that acts automatically on incoming error reports inherits this attack surface. So our defense lives in the structure, not the prompt.
In Codens an error report doesn't get a shell. It only kicks off a PR-based workflow. The AI's output runs the verify chain (implement -> test -> fix, up to 3) and must clear a per-workflow merge gate (human-approved PR by default; even auto-merge stays verify-gated) before anything touches main. Per-workflow and per-org budget caps bound the blast radius, and run-level logs let you replay every action.
The lesson isn't a smarter prompt. It's putting a gate between AI output and main.
tenetsecurity.ai/blog/agentjack…codens.ai/en/
A new member of the Codens family: we shipped Orange Codens.
Its job is AI code review and security audit — the quality gate at the PR stage. It auto-reviews GitHub PRs and raises findings on code quality (quality, performance, maintainability) and security.
But the point of Orange is that it doesn't stop at findings. It hands the problems off to Purple / Red and produces the fix PR. The review comes back not as a list of comments, but as a PR that's already fixed.
The safety wiring matters:
- Handoff is merge-gated. Before merge, it only posts GitHub suggestions; auto-ticketing happens at merge time.
- Loop guard. PRs opened by the Codens bot itself, and verify runs, are excluded from auto-review/handoff — so AI doesn't endlessly "fix" AI.
- Reply to a review comment or @mention it on the PR, and it answers in-thread, with memory of the conversation.
Green writes the PRD, Purple runs the work, Red fixes, Blue does QA. Orange now sits as the gate before merge — a layer where AI reviews the code AI wrote, before it ever reaches a human.
codens.ai/en/
OpenAI launched "Patch the Planet" with Trail of Bits: they pointed frontier models at critical open source (cURL, Python, Go, NATS, Valkey) and in week one filed 64 pull requests and 51 issues across 19 projects, with 37 patches already merged.
The detail that matters isn't the speed. It's that every AI finding is triaged and validated by human experts before it ever reaches a maintainer. A deliberate gate, built so volunteers don't drown in noisy automated reports.
That's the whole lesson. Generating a flood of AI PRs is easy. The value lives in the gate between that output and the codebase. The same model that stands up a fuzzing lab in a day will, left ungated, generate a flood of noise just as fast.
Codens wires that gate into the code path: a verify chain (implement, test, fix, up to 3 retries) so only passing PRs get offered; a per-workflow merge gate (human-reviewed PR-only by default, or auto-merge, both gated by the verify chain); per-workflow budget caps so the agent loop can't run away on cost; and run-level execution logs of what each run actually did.
blog.trailofbits.com/2026/06/22/int…codens.ai/en/
OpenAIが「Patch the Planet」を発表した。Trail of Bitsと組み、フロンティアモデルをcURLやPython、Go、Valkeyなど主要OSSにあてて、最初の1週間でPR64件とIssue51件を19プロジェクトに出し、37件のパッチがすでにマージされた。
ただ注目すべきは速さではない。AIの検出結果とコードの間に、人間のレビューを必ず挟んでいる点だ。専門家が全ての検出をトリアージし、検証してからメンテナに渡す。メンテナを自動レポートの洪水で埋めないための、意図的なゲートだ。
教訓ははっきりしている。AIが大量のPRを出すのは簡単。価値はその出力とコードの間に置くゲートにある。1日でファジング環境を立ち上げるモデルは、放置すれば同じ速さでノイズの洪水も生む。
Codensはそのゲートをコードの経路に組み込んでいる。検証チェーン(実装、テスト、修正を最大3回)を通ったPRだけを出す。マージは各ワークフローで選べて、人がレビューするPR運用が既定、自動マージも可、どちらも検証チェーンが守る。ワークフロー単位の予算上限でループの暴走を止め、実行ログで各runが何をしたかを残す。
blog.trailofbits.com/2026/06/22/int…codens.ai
Estonia just approved a plan to give every AI agent its own "AI ID code." It's the first country to do this. The goal: trace what an agent does, tie it back to the person or company behind it, and keep its activity inside clearly defined limits. PM Kristen Michal approved the advisory council's proposal on June 17.
What stands out is that identity, scoped limits, and auditability are designed as one package. Give an agent an identity, bound what it's allowed to do, and make every action traceable to a responsible owner. That's a national-level idea, but the same wiring is needed in day-to-day development.
Codens builds this into the code path. Each agent's work runs in a per-workflow lane, with per-workflow and per-org budget caps bounding its scope. Every change passes a verify chain (implement, test, fix, up to 3 retries) and merges through a configurable gate: human-approved PR by default, or auto-merge, with the verify chain gating both.
And the run-level execution log keeps a chronological record of exactly what each run did. Who acted, within what limits, and can you audit it after the fact? We think that's a question you answer in the pipeline, before the law gets there.
Source: euronews.com/next/2026/06/1…codens.ai/en/