Azeez @AtlasInference

Building Atlas, pure Rust inference engine with custom CUDA kernels | Ambassador @Alibaba_Qwen atlasinference.io Joined March 2026

Tweets

99
Followers

537
Following

43
Likes

109

Azeez @AtlasInference

2 days ago

We love our community 🫶 thanks for an amazing guide for folks to use dredyson.com/complete-begin… "Atlas delivers stable performance without the latency spikes... can run sustained workloads for hours without degradation... matters more in practice than a high watermark number"🔥

0 2 3 229 1

View Details

Azeez @AtlasInference

5 days ago

Atlas Inference is in transformers 🔥github.com/huggingface/tr… With kernel-builder, served with Huggingface Hub 🌎 On a DGX Spark/GB10 there was no compiled fast path, so Qwen3.6 fell back to slow torch GDN. It now auto-loads our fast kernel instead. First of many @huggingface!

0 3 14 785 9

View Details

Azeez @AtlasInference

a week ago

@alexocheema @rot13maxi @TheAhmadOsman @AlicanKiraz0 @spark_arena @NVIDIAAI @NVIDIARTXSpark Because it's a factual claim lol, but fair point. It's not timely anymore, the space has evolved quickly (and so have we!)

0 0 0 31 0

View Details

Azeez @AtlasInference

a week ago

@alexocheema @rot13maxi @TheAhmadOsman @AlicanKiraz0 @spark_arena @NVIDIAAI @NVIDIARTXSpark When Qwen3.5 released, the vLLM support was not nearly there. The benchmarks at the time had us at 3x the decode speed

1 0 0 40 0

View Details

Azeez @AtlasInference

a week ago

@alexocheema @rot13maxi @TheAhmadOsman @AlicanKiraz0 @spark_arena @NVIDIAAI @NVIDIARTXSpark No we don't depend on vLLM. Benefits are ease of use, ~1/10th image size (2.5 gb vs 20gb) time to serve is about 90s compared to roughly 10 minutes for vLLM, and we don't require any external dependencies

2 0 0 52 0

View Details

Azeez @AtlasInference

2 weeks ago

@alexocheema @rot13maxi @TheAhmadOsman @AlicanKiraz0 @spark_arena @NVIDIAAI @NVIDIARTXSpark We have great speed and coherence with 0 external dependencies for an extensive model matrix including the de-facto standard Qwen3.6-27B. Let us know what you think, happy to help you get started! github.com/Avarok-Cyberse…

1 0 0 65 0

View Details

Azeez @AtlasInference

2 weeks ago

Happy to share that @AtlasInference is helping shape the MLPerf Edge LLM benchmark with @MLCommons taskforce 📢 We'll be contributing cross-architecture validation on @NVIDIAAI DGX Spark and @AMD Strix Halo. More details coming after official submissions later this year 📊

1 2 9 553 0

View Details

Azeez @AtlasInference

3 weeks ago

@RisingSayak @huggingface We're trying to enable Qwen3.6, pushed some modified kernels and made a PR for transformers GDN support on DGX Spark :) github.com/huggingface/tr…

0 0 4 166 1

View Details

Spectral Compute @SpectralCom

3 weeks ago

Cross-architecture from a single codebase is exactly why we built SCALE. Thrilled to see @AtlasInference getting this running! More performance optimizations for both @AMD and @nvidia are on the way. scale-lang.com

Azeez @AtlasInference

3 weeks ago

Atlas Inference is running Qwen3.6-27B on AMD Strix Halo 🥳 Using @SpectralCom's SCALE ROCm backend, our CUDA kernels compile and run on RDNA⚙️ Cross-architecture inference from ONE codebase 🗣️ Thank you @AIatAMD for the gift 🙏 POC ✅ excited to keep tuning performance⚡️

7 2 31 2K 10

1 1 5 352 0

View Details

Azeez @AtlasInference

3 weeks ago

@worawisut @seree @SpectralCom @AIatAMD I think we needed it but I hope @worawisut also needed it 😉

1 0 0 20 0

View Details

Azeez @AtlasInference

3 weeks ago

7 2 31 2K 10

View Details

Azeez @AtlasInference

3 weeks ago

@SpectralCom @AIatAMD We picked @Alibaba_Qwen's series to test because it is the de-facto standard for local LLMs! Join our discord for access to early releases, feature requests, and any for help you may need serving 🫂 discord.com/invite/6vDbKaK… Github linked below🔗 github.com/Avarok-Cyberse…

0 0 0 350 0

View Details

Azeez @AtlasInference

3 weeks ago

@no_stp_on_snek Very soon😉 thanks again @no_stp_on_snek

0 0 1 40 0

View Details

Azeez @AtlasInference

3 weeks ago

@RisingSayak @NVIDIAAI Makes sense. We technically support vision for the Qwen3.6-suite but maybe not exactly what you're looking for just yet. Happy to build for any fitting use cases though!

1 0 1 44 0

View Details

Azeez @AtlasInference

3 weeks ago

@LottoLabs

0 0 3 560 0

View Details

Azeez @AtlasInference

3 weeks ago

@seree Thanks for taking the time to run through these! I think the default mem allocation may be higher than needed for a smaller dense model like this. Plz dm or post the details in #bugs regarding any of these other pieces, should be customizable/avoidable :) appreciate the feedback

1 0 3 164 0

View Details

Azeez @AtlasInference

4 weeks ago

@Alibaba_Qwen Excited to try Qwen3.7-Max (plz OSS release soon🙏) Look at how deeply embedded we are optimizing @Alibaba_Qwen: 3.5/3.6-35B, 3.5/3.6-27B, 3.5-122B (EP=2), 3-Next-80B (GDN/Mamba-2), 3-VL, 3-Coder. Achieved 130 tok/s on 3.5-35B. The Qwen series is genuinely WHY we built Atlas!

1 1 4 565 0

View Details

Azeez @AtlasInference

4 weeks ago

It’s official: @AtlasInference is now a @Alibaba_Qwen ambassador! 🤝 Our mission started with Qwen. It remains our top priority and most optimized series. Qwen revolutionized open-source AI, and we’re excited to keep pushing its limits ⚡️ Thank you to our amazing community❤️‍🔥

5 3 24 993 2

View Details

Azeez @AtlasInference

4 weeks ago

@torfi_F_Olafss @huggingface Yes we optimize per {model}_{quant} pair! So to answer your question @torfi_F_Olafss this should definitely help the NVFP4 kernel landscape. Also just as a random sidenote I have many more hours on Minecraft than Atlas inference so take that as you will lol

1 0 1 79 0

View Details

Azeez @AtlasInference

4 weeks ago

DGX Spark lovers 🚨 Thank you @huggingface for merging SM_121 support into kernel-builder, every dev can now pull optimized kernels via get_kernel() 🚀 @AtlasInference pushed to make sure the DGX Spark community had representation 💾 Let's keep squeezing these GB10 chips 📈