AutoMQ | Low Latency Diskless Kafka® on S3 @AutoMQ_Lab
AutoMQ runs Kafka on S3. 100% Kafka compatible with sub-10ms P99 latency, infinite storage, and elastic scaling.automq.com SingaporeJoined November 2023
AutoMQ is a next-generation Kafka built for the cloud, solving legacy Kafka pain points around local disks, data movement, scaling, and cost. Learn how it grew into one of the most widely followed open-source Diskless Kafka projects.
AutoMQ has reached 10K GitHub Stars, making it one of the most widely followed open-source Diskless Kafka projects.
This is a milestone we are truly grateful for.
Thank you to every developer who starred the repo, tried a deployment, opened an issue, challenged the design, shared feedback, or brought AutoMQ into a real Kafka evaluation.
AutoMQ started with a concrete belief: Kafka's protocol, semantics, and ecosystem are worth preserving, but durable stream storage should no longer have to stay tied to broker-local disks in the cloud.
From the first public open-source release, to developer questions, community discussions, and production deployments, AutoMQ has been shaped step by step by people who cared enough to look closely.
We wrote this blog to share that journey: why we chose Diskless Kafka, how the project earned developer attention, and why more Kafka teams are now asking how the storage model should evolve.
👉 Read the full blog: automq.com/blog/automq-10…
If you believe open-source Diskless Kafka is worth more attention, please help share this post so more developers can discover AutoMQ, read the code, and join the discussion.
For more Kafka and AutoMQ engineering insights, follow AutoMQ on X.com or join the AutoMQ Slack community:
go.automq.com/slack#Kafka#ApacheKafka#DisklessKafka#OpenSource#CloudNative#StreamingData
Excited to share that AutoMQ was featured on the AWS Storage Blog.
For teams running Kafka on AWS, the classic trade-off has been difficult: keep low latency with local disks and pay for replication and cross-AZ traffic, or move toward object storage and accept higher write latency.
This new AWS blog shows a better path: Diskless Kafka with AutoMQ, Amazon S3, and Amazon FSx for NetApp ONTAP as the low-latency WAL layer.
The results are compelling:
- Average write latency under 10 ms
- Multi-AZ resilience
- Near-zero cross-AZ data plane traffic
- S3-level storage economics
- 94% cloud infrastructure cost savings in the cost analysis
What I like most about this architecture is that it keeps Kafka protocol compatibility while changing the storage foundation: brokers become stateless compute, hot writes land on FSx for ONTAP, and historical data is flushed to S3.
This makes Diskless Kafka practical not only for log ingestion, but also for latency-sensitive workloads such as microservices, risk control, and trading systems.
Read the AWS blog here: aws.amazon.com/blogs/storage/…#Kafka#AutoMQ#AWS#AmazonS3#FSx#StreamingData#CloudNative#EventStreaming
𝟏𝟎𝐊 𝐆𝐢𝐭𝐇𝐮𝐛 𝐬𝐭𝐚𝐫𝐬 𝐟𝐨𝐫 𝐨𝐩𝐞𝐧-𝐬𝐨𝐮𝐫𝐜𝐞 𝐃𝐢𝐬𝐤𝐥𝐞𝐬𝐬 𝐊𝐚𝐟𝐤𝐚 𝐨𝐧 𝐒𝟑.
When we launched AutoMQ v0.6.6 as an open source project in November 2023, we started with one belief:
Kafka should become truly cloud-native.
Traditional Kafka was built around broker-local disks. At cloud scale, that often means expensive storage, cross-AZ replication cost, heavy partition reassignment, and complex operations around scaling.
AutoMQ takes a different path.
It is a Kafka-compatible streaming storage engine that moves Kafka storage from local disks to S3-compatible object storage, while keeping the clients, APIs, workloads, and ecosystem tools Kafka users already know.
Today, AutoMQ is one of the most mature open-source Diskless Kafka implementations, validated in production by industry-leading teams including Grab, JD.com, Tencent, Honda, and more
Over the past two and a half years, AutoMQ has continued to contribute to the Kafka ecosystem:
1️⃣ Diskless Kafka on S3, with shared storage instead of broker-bound local disks
2️⃣ Stateless brokers for lighter scaling, reassignment, and recovery
3️⃣ Table Topic, bringing Kafka streams directly into Apache Iceberg tables without a separate ETL pipeline
10K stars is more than a milestone. It is a signal that the Kafka community is actively exploring a more cloud-native future.
Thank you to everyone who starred the repo, tried AutoMQ, opened issues, shared feedback, contributed code, or challenged us to make the project better.
This is just the beginning.
GitHub: lnkd.in/gt7NVRbh#AutoMQ#ApacheKafka#Kafka#OpenSource#CloudNative#S3#ApacheIceberg#StreamingData
Mimir 3.0 changes the Kafka question for observability teams.
With Grafana Mimir's ingest storage architecture, Kafka is no longer just a side dependency. It becomes the write-path commit boundary: distributors write samples to Kafka, Mimir acknowledges writes after Kafka durability, and ingesters consume from Kafka partitions.
That means self-hosted Mimir teams need to evaluate more than Kafka API connectivity.
1️⃣ If Kafka sits on the ingest path, broker-local disks can turn scaling into partition data movement.
2️⃣ In high-throughput metrics ingest, inter-broker replication and cross-AZ reads/writes can become a steady cost driver.
3️⃣ If Mimir already uses object storage for long-term blocks, keeping Kafka durability tied to broker disks creates another stateful storage lifecycle to operate.
AutoMQ fits this architecture by keeping the Kafka interface Mimir expects while moving persistent Kafka data into shared object storage. Distributors and ingesters still use Kafka semantics; the storage and scaling model becomes Diskless.
👉 Read the full blog: automq.com/blog/automq-mi…
For more Kafka and AutoMQ engineering insights, please join the AutoMQ Slack community:
📷go.automq.com/slack#GrafanaMimir#Kafka#AutoMQ#DisklessKafka#Observability#Kubernetes#SRE
Diskless Kafka only becomes production-ready if teams can keep the Kafka ecosystem they already trust.
The architecture is attractive for a clear reason: moving durability away from broker-local disks can reduce replica-heavy storage overhead, make scaling lighter, and avoid large data movement during recovery or reassignment. But those gains only matter if existing clients, tools, operators, and Kafka semantics keep working.
That is why compatibility is the real production gate for Diskless Kafka:
1️⃣ Moving durable data to shared object storage can make brokers more stateless, but that value disappears if applications, SDKs, operators, or tooling need to change.
2️⃣ Rewriting the Kafka API means chasing the fastest-changing layer of Kafka: protocols, coordinators, transactions, Consumer groups, Admin APIs, KRaft behavior, and edge-case fixes.
3️⃣ AutoMQ narrows the change to Kafka's Log/Segment storage boundary: S3Stream replaces local log storage, while the Kafka compute layer continues to run APIs, coordinators, transactions, Consumer groups, and KRaft behavior.
This is why #AutoMQ treats Kafka compatibility as an architecture principle, not a feature checkbox: 𝐀𝐮𝐭𝐨𝐌𝐐 𝐤𝐞𝐞𝐩𝐬 𝐊𝐚𝐟𝐤𝐚 𝐩𝐫𝐨𝐭𝐨𝐜𝐨𝐥𝐬 𝐚𝐧𝐝 𝐬𝐞𝐦𝐚𝐧𝐭𝐢𝐜𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐜𝐨𝐦𝐩𝐮𝐭𝐞 𝐥𝐚𝐲𝐞𝐫 𝐰𝐡𝐢𝐥𝐞 𝐫𝐞𝐩𝐥𝐚𝐜𝐢𝐧𝐠 𝐥𝐨𝐜𝐚𝐥 𝐥𝐨𝐠 𝐬𝐭𝐨𝐫𝐚𝐠𝐞 𝐰𝐢𝐭𝐡 𝐒𝟑𝐒𝐭𝐫𝐞𝐚𝐦.
👉 Read the full blog: automq.com/blog/diskless-…
For more Kafka and AutoMQ engineering insights, follow AutoMQ on LinkedIn or join the AutoMQ Slack community:go.automq.com/slack#Kafka#ApacheKafka#DisklessKafka#DataInfrastructure#CloudNative#AutoMQ
@tobiaslins@alex_holovach It depends on how many vms you will use. On AWS, a minimum three-node cluster with 2C8G instances can deliver a throughput of over 120MB/s.
Kafka availability is not just a replica-count question.
Coinbase’s May 2026 MSK outage is a useful reminder for Kafka platform teams: even a multi-AZ, multi-replica deployment can fail to recover cleanly if the full recovery path does not hold.
The deeper lesson is that Kafka availability has to be designed as an end-to-end recovery path, not a parameter checklist:
1️⃣ Inside one cluster, recovery depends on leader takeover, client reconnects, ISR health, and whether the remaining brokers can absorb shifted traffic.
2️⃣ Because traditional Kafka ties compute to local persistence, post-failure replica recovery, scaling, and partition reassignment can still involve heavy data movement.
3️⃣ Once recovery crosses regions, data replication is not enough unless offsets, downstream state, and failover routing line up.
#AutoMQ approaches this by reducing the broker-data binding through Shared Storage, restoring capacity with stateless brokers and fast scale-out, and using Async Kafka Linking DR with Metadata-only Proxy to combine offset-aligned recovery with seconds-level RTO.
👉 Read the full blog: automq.com/blog/coinbase-…
➡️ Follow AutoMQ on X.com, or join our Slack community for the latest Kafka and AutoMQ engineering insights.
📚 Join the AutoMQ's Slack Community: lnkd.in/gk8txSCU#Kafka#ApacheKafka#DataInfrastructure#SRE#DisasterRecoveryautomq.com/blog/coinbase-…
𝐊𝐚𝐟𝐤𝐚 𝐦𝐢𝐠𝐫𝐚𝐭𝐢𝐨𝐧 𝐢𝐬 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐫𝐞𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧.
#MirrorMaker2 can copy data across clusters. But when a Kafka cluster becomes part of production-critical infrastructure, the hardest part is usually not moving bytes. It is moving traffic.
The cutover is where risk concentrates:
1️⃣ Producers may move in batches, creating split-write risk if the migration path is not coordinated.
2️⃣ Consumers need a safe resume point, and offset translation becomes especially fragile for Flink, Spark Streaming, or Kafka Streams.
3️⃣ Rollback is no longer simple once new writes land on the target cluster.
That is why treating migration as “replicate first, stop writes later, then switch clients” often turns into a maintenance-window plan.
This article compares MirrorMaker2 and #𝐀𝐮𝐭𝐨𝐌𝐐 Linking from a migration perspective: data path, producer cutover, consumer coordination, offset continuity, stateful workloads, and rollback boundaries.
AutoMQ Linking is built around that cutover plane. It keeps offsets aligned, supports rolling producer migration, coordinates consumer groups, and helps teams move Kafka workloads without turning cutover into the riskiest part of the project.
👉 Read the full analysis: automq.com/blog/why-not-m…#Kafka#ApacheKafka#KafkaMigration #MirrorMaker#StreamingData#DataEngineering#AutoMQ
OpenAI’s Kafka journey is a signal for the AI era.
As ChatGPT traffic grew, OpenAI scaled Kafka throughput 20x in one year, but also had to build Prism, Photon, UForwarder, and HA Cluster Groups around Kafka.
The article traces what that architecture reveals:
1⃣Proxy layers can make Kafka usable at massive scale, but they move complexity above the engine.
2⃣The trade-off was real: many workloads moved away from ordering, transactions, and partition-based processing.
3⃣The deeper direction is Diskless Kafka / storage-compute separation: less data movement, lower storage overhead, and lighter recovery.
Building heavy proxies is how teams survive scaling traditional Kafka. Redesigning the engine around storage-compute separation is how they avoid that architectural complexity in the first place.
That is where AutoMQ’s direction fits: not another layer around Kafka, but a Kafka-compatible engine built for the cloud-native Kafka path.
👉 Read the full analysis: automq.com/blog/openai-ka…#Kafka#DataEngineering#CloudNative#AIInfrastructure#SRE#AutoMQ
🪣 AWS S3 Files brings sub-millisecond reads to S3. Can Kafka finally go diskless with S3 Files? Not so fast.
This blog tells the real story — S3 Files is not a good idea for Diskless Kafka:
📉 P99 latency spikes to 700 ms+ — a 140x jump from P95
💸 ~$106,000/month for just 100 MB/s write throughput
⚠️ Durability gap — broker crashes lose unflushed data with replica=1
S3 Files optimizes for read-heavy, small-file workloads. Kafka is the opposite: sustained, high-throughput writes.
AutoMQ has already solved this. Its WAL (Write-Ahead Log) abstraction layer decouples storage from compute, letting users plug in different cloud storage backends — from EBS to S3 — and balance cost against performance for their workload.
The direction is right. The shortcut isn't.
👉 Read the full story: go.automq.com/kafka-on-s3-fi…
🌐 More about AutoMQ: go.automq.com/official?utm_s…#ApacheKafka#Kafka#S3#CloudNative#DataStreaming#CostOptimization#AutoMQ
11 Followers 51 FollowingSingdata's official account. We unify your data lake, warehouse & AI capabilities into one powerful cloud platform for max performance & simple management.
116 Followers 2K FollowingEmprendedor, amante de las tecnologías, crypto, trading, cuántica. Estudioso de la economía austríaca. Políticamente incorrecto.Un poco loco, tremendo quemador.
20K Followers 1 FollowingAgentic AI development from prototype to production. Kiro helps you do your best work by bringing structure to AI coding with spec-driven development.
1K Followers 2 FollowingA quarterly, bilingual showcase of the latest tech trends and projects in Japan. By @reustle & @shoinwolfe
(bg art source: @mrvalenberg)
212K Followers 3K FollowingFollow for posts about GitHub repos, DSPy, and agents
Subscribe for top posts
DM to share your AI project (Due to volume of DMs I'll prioritize subscribers)
748 Followers 4K FollowingSWE working at an intersection of Data, Backend, AI.
Hit me up for any discussions on AI Agents, Data Engineering or System Design...!
18K Followers 532 Following"The Kafka Guy" 🧠
Have worked on Apache Kafka for 7+ years, now I write about it. (& the general data space)
Low-frequency, highly-technical tweets. ✌️
7K Followers 2K FollowingScientist and Startup Founder
Co-founder and CEO @EarthmoverHQ
@pangeo_data steering council member
ex-Professor @columbia @lamontEarth
2K Followers 4 FollowingThe Kafka Operations Platform - We unify your streaming through a proxy, help developers move faster on Kafka, and keep governance simple and reliable.
2K Followers 1K FollowingCo-Founder & CEO @tigrisdata ⏩ https://t.co/XdvyNOA9rZ | Database and Open source enthusiast | Non-conformist | Formerly, Head of Storage @ Uber.