This repository contains notes and code examples related to AI/ML, with a focus on understanding the fundamentals of large language models, inference engines, and hardware acceleration.
- Hailo Hailo-10H AI accelerator (NPU), Raspberry Pi AI HAT+
- Parakeet — Supporting Parakeet in whisper.cpp
- Kimi-Linear
- CUDA FA exploration
- Model Architectures
- Model Formats & Quantization
- Attention & Embeddings
- Inference & Decoding
- Training & Fine-tuning
- Hardware & Acceleration
- Audio & Speech
- Vision & Multimodal
- Agents & Applications
- Miscellaneous Topics
- GGML
- Llama.cpp
- llama.cpp Main
- llama.cpp Buildx
- llama.cpp Tasks
- llama.cpp Logging
- llama.cpp Memory
- llama.cpp Python Notes
- llama.cpp CUDA
- llama.cpp KV Cache
- llama.cpp Quantization
- llama.cpp GPU Sampling
- llama.cpp Tensor Parallelism
- llama.cpp Server
- llama.cpp WebUI
- llama.cpp MTMD
- llama.cpp TTS
- llama.cpp Embedding Gemma
- llama.cpp LLaMA 3.2 Vision
- llama.cpp GPT-OSS
- llama.cpp Convert
- llama.cpp Convert Dequantize
- llama.cpp Debugging
- llama.cpp Packaging
- llama.cpp Tests
- llama.cpp macOS
- llama.cpp HTTPS
- llama.cpp Backend Sampling
- llama.cpp KV Cache Notes
- Model Formats
- Quantization
- GGUF / GPTQ / AWQ / AWQ
- LoRA / QLoRA / QLoRA
- iMatrix
- Ollama
- Huggingface
- Attention
- Attention Sink
- Flash Attention
- Sage Attention
- Ring Attention
- MLA
- Position Embeddings
- Tokenization
- Word Embeddings
- Normalization
- Softmax / Logits / Logits
- Residual Connections
- Activation Functions
- Loss Functions
- Exp
- One-Hot Encoding
- Control Vectors
- GRITLM
- Sampling
- Speculative Decoding
- Continuous Batching
- Tensor Parallelism
- Pipeline Parallelism
- LLaMA Self-Extend
- LLaMA Batch Embedding
- Perplexity
- Likelihood
- Infill
- Grammars
- llguidance
- ChatML / Chat Templates / Chat Templates
- Prompt Engineering
- Fine-tuning
- DPO
- Reinforcement Learning
- Optimization Algorithms
- LBFGS
- Linear Regression
- Markov Chains
- XOR Problem
- Flow Matching
- Generative Deep Learning
- CLIP
- ViT
- LLaVA
- LLaMA Vision 3.2
- Granite Vision
- JEPA
- Image Preprocessing
- CLIP Search
- BLIP-2
- Mobile VLM
- LLaVA+
- LLM Overview
- Diffusion / Stable Diffusion / Stable Diffusion
- Apache Arrow
- ONNX
- PyTorch
- vLLM
- TRT-LLM
- Mistral
- Bloom
- Granite Model
- Mod
- Minja
- Trie
- Symbols
- Variables
- Count-based
- Background
- Security
- Memory
- Android
- Colab
- Groq
- ROC
- zDNN
- Spark
- Copilot
Exploration code for core AI/ML concepts, libraries, and frameworks.
| Project | Description |
|---|---|
| GGML | GGML C++ library exploration |
| Llama.cpp | Llama.cpp library exploration (inference, finetuning) |
| Python | Python ML examples |
| Rust | Rust ML examples (llm-chains, tch-rs, etc.) |
| vLLM | vLLM exploration |
| OpenVINO | OpenVINO Python examples |
| OpenVINO C++ | OpenVINO C++ examples |
| PyTorch | PyTorch & pybind examples |
| SIMD | SIMD instruction exploration |
| SIMD Assembly | Low-level SIMD assembly |
| SVE | ARM SVE exploration |
| NEON | ARM NEON examples |
| AMX | Intel AMX exploration |
| VNNI | VNNI instruction exploration |
| BLAS | OpenBLAS exploration |
| ROCm | AMD ROCm examples |
| SYCL | SYCL examples |
| KleidiAI | KleidiAI examples |
| Grammars | LLaGuidance grammar exploration |
| Tokenization | Tokenization examples |
| Data Structures | ML-relevant data structures |
| Image Processing | Image processing examples |
| JavaScript | TensorFlow.js examples |
| WASM | WebAssembly NN examples |
| Whisper | Whisper.cpp exploration |
| Templates | Minja template engine |
GPU compute exploration across multiple APIs.
| Project | Description |
|---|---|
| CUDA | CUDA examples in C++ |
| OpenCL | OpenCL examples |
| Vulkan | Vulkan examples |
| Kompute | Kompute (Vulkan compute) examples |
| Metal | Metal examples |
| ROCm | AMD ROCm/HIP examples |
| WebGPU | WebGPU examples |
| XRT | XRT examples |
Neural Processing Unit exploration (Hailo).
| Project | Description |
|---|---|
| Hailo | Hailo-10H AI accelerator, Raspberry Pi AI HAT+ |
Vector database examples and exploration.
| Project | Description |
|---|---|
| Qdrant | Qdrant examples (Python, Rust) |
| LanceDB | LanceDB examples (Python, Rust) |
Word and sentence embedding examples.
| Project | Description |
|---|---|
| Rust | Embeddings examples in Rust |
Audio processing and speech-to-text.
| Project | Description |
|---|---|
| Silero VAD | Silero Voice Activity Detection |
| Whisper.cpp | Whisper.cpp submodule |
AI agent frameworks and examples.
| Project | Description |
|---|---|
| llama-cpp-agent | AI agent using llama.cpp |
| Language | Description |
|---|---|
| Python | Huggingface API example |
| Rust | Candle example |
For a complete list of all notes, see the notes directory.