Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpAll HF Hub posts
AxionLab-official
posted an update 2 days ago
matteospanio
posted an update 1 day ago
Post
6157
🎶 Released mule-torch — an unofficial PyTorch port of MULE (SF-NFNet-F0), SiriusXM/Pandora's music-audio embedding model (McCallum et al., ISMIR 2022).
No retraining: I re-implemented the architecture in pure PyTorch and transferred the original TensorFlow weights, then checked it layer by layer against the genuine TF pipeline.
✅ End-to-end clip-embedding cosine 0.9999999 vs the original
✅ ONNX backbone parity < 1e-6
✅ 62.35M params (paper: ~62.4M)
✅ Batched, GPU-native, ONNX-exportable — none of which the original
🤗 Weights: matteospanio/mule
💻 Code: https://github.com/matteospanio/mule-torch
📦 PyPI: https://pypi.org/project/mule-torch/
The fun bug: parity was perfect through every conv but the block output was anti-correlated (cos = −1). Cause: the learnable skip-init gains couldn't be mapped by layer name (Keras scrambles the order) — they had to be recovered from the graph.
⚠️ Unofficial, community port — not affiliated with or endorsed by the original authors. All credit to them; please cite the paper. Weights inherit CC-BY-NC-4.0.
No retraining: I re-implemented the architecture in pure PyTorch and transferred the original TensorFlow weights, then checked it layer by layer against the genuine TF pipeline.
✅ End-to-end clip-embedding cosine 0.9999999 vs the original
✅ ONNX backbone parity < 1e-6
✅ 62.35M params (paper: ~62.4M)
✅ Batched, GPU-native, ONNX-exportable — none of which the original
Analysis pipeline doespip install mule-torchfrom mule_torch import MuleModel
emb = MuleModel.from_pretrained()(waveform) # (B, T)@16kHz -> (B, 1728)🤗 Weights: matteospanio/mule
💻 Code: https://github.com/matteospanio/mule-torch
📦 PyPI: https://pypi.org/project/mule-torch/
The fun bug: parity was perfect through every conv but the block output was anti-correlated (cos = −1). Cause: the learnable skip-init gains couldn't be mapped by layer name (Keras scrambles the order) — they had to be recovered from the graph.
⚠️ Unofficial, community port — not affiliated with or endorsed by the original authors. All credit to them; please cite the paper. Weights inherit CC-BY-NC-4.0.
AxionLab-official
posted an update 5 days ago
Post
6486
We're happy to announce that we released a Reasoning tuned version of Supra-50M!
SupraLabs/Supra-50M-Reasoning
SupraLabs/Supra-50M-Reasoning
Jiaqi-hkust
posted an update about 2 hours ago
Post
46
Happy to introduce Response-G1 #ACL2026 — a proactive agent for streaming video understanding.
📄 Paper: http://arxiv.org/abs/2605.07575
📷 Code: http://github.com/kadmkbl/Response-G1
We are happy to have a further discussion!!!
#ACL2026 #AI #Multimodal #VideoUnderstanding #OpenSource #LLM
📄 Paper: http://arxiv.org/abs/2605.07575
📷 Code: http://github.com/kadmkbl/Response-G1
We are happy to have a further discussion!!!
#ACL2026 #AI #Multimodal #VideoUnderstanding #OpenSource #LLM
danielhanchen
posted an update about 3 hours ago
Post
57
Google releases Gemma 4 QAT. ✨
You can now run Gemma 4 at 3x less memory with near original performance.
QAT makes it possible to run Gemma 4 26B-A4B on 16GB RAM.
GGUFs: https://huggingface.co/collections/unsloth/gemma-4-qat
QAT Guide: https://unsloth.ai/docs/models/gemma-4/qat
You can now run Gemma 4 at 3x less memory with near original performance.
QAT makes it possible to run Gemma 4 26B-A4B on 16GB RAM.
GGUFs: https://huggingface.co/collections/unsloth/gemma-4-qat
QAT Guide: https://unsloth.ai/docs/models/gemma-4/qat
Post
202
Human brains don't recreate every pixel to understand the world!
Most current models in genomics, proteomics, and single-cell transcriptomics rely on generative objectives like masked language modeling or next token prediction. While effective, these architectures waste significant capacity reconstructing raw, noisy sequence details that may not carry functional biological meaning.
But a promising, more efficient alternative is emerging: Joint-Embedding Predictive Architecture (JEPA)
Originally introduced by Yann LeCun for computer vision, JEPA is a non-generative, self-supervised learning (SSL) framework. Instead of predicting raw inputs, it operates as a world model that predicts abstract semantic embeddings in latent space.
Recently, the JEPA framework (and its more efficient LeJEPA variant) has been adapted into the biological sciences to develop performing foundation models and to improve on already existing ones.
It's interesting how each adaptation modified and tailored JEPA to suit its specific biological domain, whether by experimenting with different backbones or complementing the objective with other loss terms.
For example, JEPA-DNA and ProteinJEPA used JEPA as a continual pre-training framework to enhance existing foundation models without training from scratch, while Cell-JEPA and JEPA-DNA employed a hybrid objective that combines the JEPA loss with a traditional language modeling loss.
The article below provides an overview of these implementations, along with others that came out this year. As always, your thoughts and feedback are welcome and highly appreciated!
Link to the article is in the first comment 👇
Most current models in genomics, proteomics, and single-cell transcriptomics rely on generative objectives like masked language modeling or next token prediction. While effective, these architectures waste significant capacity reconstructing raw, noisy sequence details that may not carry functional biological meaning.
But a promising, more efficient alternative is emerging: Joint-Embedding Predictive Architecture (JEPA)
Originally introduced by Yann LeCun for computer vision, JEPA is a non-generative, self-supervised learning (SSL) framework. Instead of predicting raw inputs, it operates as a world model that predicts abstract semantic embeddings in latent space.
Recently, the JEPA framework (and its more efficient LeJEPA variant) has been adapted into the biological sciences to develop performing foundation models and to improve on already existing ones.
It's interesting how each adaptation modified and tailored JEPA to suit its specific biological domain, whether by experimenting with different backbones or complementing the objective with other loss terms.
For example, JEPA-DNA and ProteinJEPA used JEPA as a continual pre-training framework to enhance existing foundation models without training from scratch, while Cell-JEPA and JEPA-DNA employed a hybrid objective that combines the JEPA loss with a traditional language modeling loss.
The article below provides an overview of these implementations, along with others that came out this year. As always, your thoughts and feedback are welcome and highly appreciated!
Link to the article is in the first comment 👇
dippatel1994
posted an update 1 day ago
Post
155
To make revising LLM architectures and training methods faster, I created a deck of 180 visual flashcards. It started as a personal hobby, but slowly became cheat code for reviewing LLM concepts before technical interviews. People love it!
Swipe through these samples, and if you want to grab the full set or follow the project, the repo is here: https://github.com/llmsresearch/llm-flashcards.
Swipe through these samples, and if you want to grab the full set or follow the project, the repo is here: https://github.com/llmsresearch/llm-flashcards.
pankajpandey-dev
posted an update 2 days ago
Post
190
🇮🇳 Gemma-3-1B Hindi Instruct — a Hindi LLM that runs fully offline, anywhere.
Last week I shipped Qwen3-4B Hindi. This week I went the other direction: how tiny can a useful Hindi model get? So I fine-tuned Gemma-3-1B on quality-filtered Hindi instruction data and shipped the full GGUF ladder.
✅ Fine-tune (16-bit): pankajpandey-dev/gemma-3-1b-hindi-instruct
✅ GGUF (Q4/Q5/Q8): pankajpandey-dev/gemma-3-1b-hindi-instruct-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 806 MB — runs on CPU, a cheap laptop, even a Raspberry Pi.
What I tried this round: chrF-filtered the training data to drop weak translations, and used response-only loss so the model learns how to answer, not how to repeat prompts.
Honest note: at 1B, Hindi fluency is strong but coherence is bounded by size — it's a lightweight/edge experiment, not a 4B replacement. Gemma-3-4B Hindi is next.
Part of my Hindi LLM Series — openly-licensed Indic models for local & edge use. Feedback welcome 🙏
#Hindi #IndicNLP #GGUF #LocalLLM #Gemma #EdgeAI
Last week I shipped Qwen3-4B Hindi. This week I went the other direction: how tiny can a useful Hindi model get? So I fine-tuned Gemma-3-1B on quality-filtered Hindi instruction data and shipped the full GGUF ladder.
✅ Fine-tune (16-bit): pankajpandey-dev/gemma-3-1b-hindi-instruct
✅ GGUF (Q4/Q5/Q8): pankajpandey-dev/gemma-3-1b-hindi-instruct-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 806 MB — runs on CPU, a cheap laptop, even a Raspberry Pi.
What I tried this round: chrF-filtered the training data to drop weak translations, and used response-only loss so the model learns how to answer, not how to repeat prompts.
Honest note: at 1B, Hindi fluency is strong but coherence is bounded by size — it's a lightweight/edge experiment, not a 4B replacement. Gemma-3-4B Hindi is next.
Part of my Hindi LLM Series — openly-licensed Indic models for local & edge use. Feedback welcome 🙏
#Hindi #IndicNLP #GGUF #LocalLLM #Gemma #EdgeAI
kanaria007
posted an update 2 days ago
Post
124
✅ Article highlight: *Interop Schemas for Learning-World Governance Artifacts* (art-60-175, v0.1)
TL;DR:
This article argues that governance without interop is vendor-local theater.
It is not enough for one system to say *“we have receipts.”* If another vendor cannot parse the artifact, reproduce the digest, replay the bundle, and reach the same admissibility outcome, the claim is not really portable. So 175 defines a common interop layer: shared envelopes, pinned canonicalization, minimal portable schemas, and deterministic bundle formats.
Read:
kanaria007/agi-structural-intelligence-protocols
Why it matters:
• turns governance artifacts into cross-vendor verifiable objects rather than local implementation details
• fixes the classic failure modes of digest drift, schema drift, and bundle drift
• makes “same artifact / same verdict” a testable claim instead of a handshake promise
• gives courts, forgetting flows, and unlearning claims portable bundle formats
What’s inside:
• a common *interop envelope* for contracts, manifests, receipts, and bundles
• a pinned *canonicalization profile* plus conformance receipts to stop digest disagreements
• minimal portable schemas for core learning-world governance artifacts
• deterministic bundle formats like *Court ZIP*, *Forgetting ZIP*, and *Unlearning ZIP*
• replay/conformance receipts so another vendor can verify the same bundle and reach the same admissibility result
Key idea:
Do not say:
*“our system can export the evidence.”*
Say:
*“this artifact uses this schema registry, this canonicalization profile, this interop-safe digest model, and this bundle index—so another vendor can verify the same object and reach the same result.”*
That is how governance stops being local theater and becomes portable infrastructure.
TL;DR:
This article argues that governance without interop is vendor-local theater.
It is not enough for one system to say *“we have receipts.”* If another vendor cannot parse the artifact, reproduce the digest, replay the bundle, and reach the same admissibility outcome, the claim is not really portable. So 175 defines a common interop layer: shared envelopes, pinned canonicalization, minimal portable schemas, and deterministic bundle formats.
Read:
kanaria007/agi-structural-intelligence-protocols
Why it matters:
• turns governance artifacts into cross-vendor verifiable objects rather than local implementation details
• fixes the classic failure modes of digest drift, schema drift, and bundle drift
• makes “same artifact / same verdict” a testable claim instead of a handshake promise
• gives courts, forgetting flows, and unlearning claims portable bundle formats
What’s inside:
• a common *interop envelope* for contracts, manifests, receipts, and bundles
• a pinned *canonicalization profile* plus conformance receipts to stop digest disagreements
• minimal portable schemas for core learning-world governance artifacts
• deterministic bundle formats like *Court ZIP*, *Forgetting ZIP*, and *Unlearning ZIP*
• replay/conformance receipts so another vendor can verify the same bundle and reach the same admissibility result
Key idea:
Do not say:
*“our system can export the evidence.”*
Say:
*“this artifact uses this schema registry, this canonicalization profile, this interop-safe digest model, and this bundle index—so another vendor can verify the same object and reach the same result.”*
That is how governance stops being local theater and becomes portable infrastructure.
danielhanchen
posted an update 4 days ago
Post
8691
Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4