A split Vulkan loader. Lets a Vulkan client run on a GPU whose user-space
driver can't be loaded into that client's process — for example because the
driver is a Bionic-only blob expecting the AOSP hwvulkan HAL ABI while the
client is a glibc process, or because the driver lives on a different machine
entirely.
vkproxy splits the Vulkan stack into two halves connected by a small
RPC:
Vulkan client process Server process
┌──────────────────────────────┐ ┌─────────────────────────────┐
│ app / engine │ │ │
│ │ │ │ husky_vkserver │
│ ▼ │ │ (dlopen's the real │
│ libvulkan loader │ UDS │ Vulkan driver and │
│ │ │◄───────►│ forwards every call) │
│ ▼ │ (or │ │
│ mali_proxy_icd.so │ TCP) │ │
│ (ICD shim; marshals every │ │ │
│ call onto the wire) │ │ │
└──────────────────────────────┘ └──────────────┬──────────────┘
│
┌────────▼────────┐
│ real GPU driver │
│ (e.g. Mali on │
│ /dev/mali0) │
└─────────────────┘
Every Vulkan call goes:
- The Vulkan loader on the client side picks
mali_proxy_icd.sobased on the ICD JSON manifest (mali_proxy_icd.jsonlives under/usr/local/share/vulkan/icd.d/). - The ICD serialises the call's arguments into a wire format and writes
them to
/tmp/husky-vk.sock(UDS) or a TCP socket ifVKPROXY_REMOTEis set. husky_vkserverreads the request, calls the real driver, writes the reply back.
There are two server binaries from the same dispatch core:
husky_vkserver— Bionic / aarch64. Built with the Android NDK, runs in a Halium-style chroot where the Mali blob's expected libc and binder are available, dlopen'slibGLES_mali.so.husky_vkserver_pc— host glibc / x86_64. Talks TCP and dlopen's a stocklibvulkan.so.1. Useful for development without a Mali host — the wire protocol is the same.
vkproxy/
├── Makefile # builds server (NDK), ICD (aarch64-gcc), smoke client, PC server
├── codegen/
│ ├── gen.py # parses vk.xml; emits opcode enum, client stubs, server dispatch, fn table
│ └── vk.xml # the Vulkan registry
├── include/ # headers shared between client & server
│ ├── vkproxy_proto.h # wire framing: vkp_cmd_hdr, vkp_reply_hdr, SYNTH opcode enum
│ ├── vkproxy_proto_gen.h # codegen'd: every Vulkan command -> opcode id
│ ├── transport.h # client-side socket API (vkp_send_cmd / vkp_call / fd-passing)
│ ├── udp_frame.h # optional UDP frame-blob channel (for remote presents)
│ └── hwvulkan.h # subset of AOSP <hardware/hwvulkan.h> the server needs
├── protocol/ # Wayland protocol bindings (xdg-shell, linux-dmabuf), generated
├── client/ # mali_proxy_icd.so — the ICD the Vulkan loader loads
│ ├── mali_proxy_icd.json # ICD manifest installed to /usr/local/share/vulkan/icd.d/
│ ├── icd_entry.c # vk_icd* entry points the loader calls first
│ ├── stubs_gen.c # codegen'd: every Vulkan command -> marshal -> vkp_send_cmd/vkp_call
│ ├── manual_stubs.c # hand-written overrides for everything codegen can't do
│ │ (vkCreateInstance, vkCreateDevice, vkGetPhys*Properties,
│ │ the actually-complex pipeline/descriptor commands, swapchain,
│ │ etc.) plus _vkp_manual_override which vkGetDeviceProcAddr
│ │ must consult before _vkproxy_lookup
│ ├── dispatch.h # vkp_dispatchable / handle-translation helpers
│ ├── transport.c # AF_UNIX (or TCP via VKPROXY_REMOTE) socket plumbing
│ ├── wl_present.c/.h # Wayland present path (wl_display, wl_buffer construction)
│ ├── x_wl_surface.c # xdg-shell shim used when running under Xwayland
│ ├── udp_frame.c # UDP frame-blob receive plumbing (PC build only)
│ ├── udp_shm_present.c/.h # SHM presentation when frames arrive as raw bytes
│ ├── decode_h264.c/.h # H.264 decode path for PC mode (server side encodes)
│ └── udp_frame.h # local copy of the wire header
├── server/ # husky_vkserver (Bionic) + husky_vkserver_pc (glibc)
│ ├── main.c # vksrv main: AF_UNIX listener, per-client thread, signal handler
│ ├── main_pc.c # PC variant: TCP listener, no Bionic/AHB paths
│ ├── vk_init.c / vk_init.h # dlopen libGLES_mali.so, get hwvulkan_device_t, populate g_pfn
│ ├── vk_init_pc.c # PC variant: dlopen libvulkan.so.1, vkGetInstance/DeviceProcAddr
│ ├── vk_funcs_gen.c/.h # codegen'd: g_pfn struct of every Vulkan function pointer + populator
│ ├── dispatch_gen.c # codegen'd: case OP_vkXxx -> unmarshal args -> call g_pfn.vkXxx -> reply
│ ├── dispatch_manual.c # hand-written dispatch for OPs that need server-side state
│ │ (handle table, BC-texture transcode staging, dmabuf alloc via
│ │ dma_heap, MMU/registry quirks, etc.) — by far the biggest file
│ ├── handle_table.c/.h # 64-bit cookie ↔ dispatchable-handle map (locked, growable)
│ ├── encode_h264.c/.h # PC build only: encode rendered frames for the remote viewer
│ ├── udp_frame.c/.h # UDP send side of frame chunks
│ ├── logsink_inline.c # one-line LOG() macro that writes to stderr (journald in chroot)
│ └── pc_stubs/ue_husky.h # tiny stub of ue-husky for the PC build (no BC transcoder there)
├── smoke/ # smoke client used to validate the link layer
│ ├── vksmoke.c # connect, OP_vkp_hello, print server's "name/api/ext_count"
│ ├── logsink.c # same LOG() macro
│ ├── hwvulkan.h # local copy
│ └── Makefile
└── tests/ # standalone Vulkan triangle/cube tests
├── mincube.c / mincube2.c # raw Vulkan + Wayland present (the latter does textured cube)
├── mincube2.{vert,frag} # GLSL source
├── mincube2_{vert,frag}_spv.h # the same as SPIR-V byte arrays
├── vkp_hello_test.c # bare vkp_hello round-trip
└── xdg-shell-{client-protocol.h,protocol.c} # generated Wayland boilerplate (local copy)
Both directions use the same simple framing:
client → server server → client
┌────────────────────────────────┐ ┌──────────────────────────────────┐
│ struct vkp_cmd_hdr │ │ struct vkp_cmd_hdr (echo opcode) │
│ uint16_t opcode │ │ flags |= VKP_FLAG_REPLY │
│ uint16_t flags │ ├──────────────────────────────────┤
│ uint32_t len │ │ struct vkp_reply_hdr │
├────────────────────────────────┤ │ int32_t status (VkResult) │
│ payload (len bytes) │ │ uint32_t len │
└────────────────────────────────┘ ├──────────────────────────────────┤
│ payload (len bytes) │
└──────────────────────────────────┘
flags:
| bit | name | meaning |
|---|---|---|
| 0 | VKP_FLAG_EXPECTS_REPLY |
client wants a synchronous reply (used by vkp_call) |
| 1 | VKP_FLAG_REPLY |
the frame is a reply (server -> client direction) |
| 2 | VKP_FLAG_HAS_FD |
one fd is being passed via SCM_RIGHTS on this datagram |
Opcode space:
| Range | Meaning |
|---|---|
0x0000 – 0x7FFF |
One-to-one mirror of a real Vulkan command — generated by gen.py |
0x8000 – 0xFFFF |
Synthesised / non-Vulkan ops (OP_vkp_*) — handled by dispatch_manual.c |
The synthesised range covers things that need server-side state or custom marshalling:
OP_vkp_hello— protocol handshake; returns the GPU's device name, Vulkan API version, and instance extension count.OP_vkp_create_*/OP_vkp_destroy_*for instance, device, queue, command pool/buffer, image, buffer, image view, sampler, etc. The manual server side maintains the handle table and BC-transcode staging registry alongside.OP_vkp_map_memory/OP_vkp_unmap_memory— coordinate dmabuf-backedvkAllocateMemory, including sending the heap fd back viaSCM_RIGHTSso the client canmmapthe same memory.OP_vkp_cmd_pipeline_barrier,OP_vkp_cmd_copy_buffer_to_image,OP_vkp_cmd_copy_image_to_buffer, etc. — command recording that needsht_geton the dispatchable cmdbuf handle (codegen doesn't know which arg is dispatchable).OP_vkp_get_buffer_device_addr,OP_vkp_get_phys_mem_props, and other "spoofable" property queries.
Take vkCmdBindIndexBuffer as the worked example:
- The Vulkan loader inside the client process loads
libvulkan.so.1, scans/usr/local/share/vulkan/icd.d/, findsmali_proxy_icd.json,dlopensmali_proxy_icd.so, then callsvk_icdGetInstanceProcAddr(inicd_entry.c) for every entry point.icd_entry.ccalls_vkproxy_lookup, which walks the codegen'd table instubs_gen.c. For overrides that need hand work,manual_stubs.c's_vkp_manual_overrideis consulted first inside the ICD's ownvkGetDeviceProcAddr/vkGetInstanceProcAddr. - The client records its command-buffer call. The function pointer it
got is one of these:
- the codegen stub in
stubs_gen.c— marshals scalar args, sends the opcode viavkp_send_cmd(fire-and-forget), no reply. - a manual implementation in
manual_stubs.c— same protocol, but handles thingsgen.pydoesn't (variable-length arrays, pNext chains, dispatchable-handle translation, ABI mismatches between spec versions).
- the codegen stub in
vkp_send_cmd(inclient/transport.c) takes the global socket lock, writesvkp_cmd_hdr+ payload to the socket, and returns. For queries that need a reply,vkp_callblocks until the server replies.husky_vkserver's per-client thread reads the header, grows the payload buffer if needed, then dispatches:- opcode <
VKP_OP_BASE_SYNTH(0x8000): tryvkp_dispatch_manualfirst (it owns a fixed list of opcodes that need handle translation), then fall back to the codegen dispatchervkp_dispatchindispatch_gen.c. - opcode ≥ 0x8000:
vkp_dispatch_manualonly (handles allOP_vkp_*).
- opcode <
- The chosen handler unmarshals into stack-allocated Vulkan structs and
calls
g_pfn.vkCmdBindIndexBuffer— a real pointer obtained at bring-up by walking the driver'sGetInstanceProcAddr. The handle table (handle_table.c) translates the client-side cookie forVkCommandBufferinto the driver's actual dispatchable handle. - For commands that produce a reply (
vkAllocateMemory,vkCreateBuffer, thevkGet*Propertiesfamily), the handler callsvkp_send_replyto ship the returnedVkResult+ result struct back to the client.
Driven from upstream vk.xml. Walks every <command> element and
classifies its args:
- Scalars / by-value handles → emit a 1:1 marshal into a packed struct.
- Pointers to small fixed-size structs → emit a struct-copy.
- Anything else (variable-length arrays, pNext chains, ambiguous unions,
dispatchable handles, sType-based polymorphism) → emit a
vkp_not_implementedstub. Those commands are picked up bymanual_stubs.c/dispatch_manual.cand registered in_vkp_manual_overrideso the lookup chain finds the hand-written version first.
Outputs:
include/vkproxy_proto_gen.h—enum vkp_opcodes { OP_vkXxx = N, ... }client/stubs_gen.c— everyVKAPI_ATTR ... VKAPI_CALL vkXxx(...)plus a name → fn-pointer table_vkp_stubs[]consulted by_vkproxy_lookup.server/dispatch_gen.c—vkp_dispatch()is one big switch over opcodes that unmarshals and callsg_pfn.vkXxx.server/vk_funcs_gen.c/.h—struct vk_funcs g_pfn { PFN_vk...; }plus a populator that callsGetInstanceProcAddrfor every member.
Regenerate: python3 codegen/gen.py.
make # builds husky_vkserver (Android Bionic), mali_proxy_icd.so (glibc aarch64),
# and vkp_hello_test (smoke client). Output in build/.
make pc # builds husky_vkserver_pc (host x86_64 glibc) against a runtime-dlopen'd
# libvulkan.so.1. No ue-husky link — texture substitution is Mali-only.The Makefile splits compilers:
CC_AND=aarch64-linux-android29-clangfrom$NDK/toolchains/...for the Bionic server.CC_ARM64=aarch64-linux-gnu-gccfor the glibc ICD and smoke client.CC_HOST=gccfor the host-side PC build.
The ICD links with -Wl,-Bsymbolic so internal references to
vk* symbols bind to the ICD's own copies; otherwise the Vulkan
loader (which is loaded first into the process) would interpose, cause
infinite recursion, and reject the ICD.
| Target | Path on the GPU-side machine |
|---|---|
husky_vkserver |
/var/lib/machines/halium/data/local/tmp/husky_vkserver (runs in chroot) |
libue_husky.so (sibling) |
same dir, beside husky_vkserver (rpath $ORIGIN) |
mali_proxy_icd.so |
/usr/local/lib/vkproxy/mali_proxy_icd.so |
mali_proxy_icd.json |
/usr/local/share/vulkan/icd.d/mali_proxy_icd.json |
The server is run as a systemd unit (husky-vkserver.service). The
unit's ExecStart chroot-execs it and pipes stderr into
/tmp/vksrv.log inside the chroot.
A Vulkan client process picks up the ICD via:
export VK_ICD_FILENAMES="/usr/local/share/vulkan/icd.d/mali_proxy_icd.json"
export VK_LOADER_DRIVERS_SELECT="mali_proxy_icd.json"| Variable | Side | Effect |
|---|---|---|
VKPROXY_SOCKET |
server | UDS path (default /tmp/husky-vk.sock) |
VKPROXY_LISTEN |
PC | host:port for the TCP variant |
VKPROXY_REMOTE |
client | If set, dial TCP instead of UDS |
UEHUSKY_VERBOSE |
server | Turn on verbose op-by-op logging |
VKPROXY_SPOOF_SM6 |
client | Spoof shaderInt64/Float64/Int16 (for SM6 / D3D12). Off by default. |
VK_LOADER_DEBUG=all |
loader | Khronos loader trace; useful for "why didn't this ICD load" |
husky_vkserver's SIGSEGV/SIGABRT/SIGBUS/SIGILL handler (main.c):
- Writes
*** FATAL SIGNAL ***andsig=N tid=T code=K addr=A pc=P. - Walks the aarch64 frame-pointer chain via the
ucontext_tand prints up to 20 caller PCs. - Dumps
/proc/self/maps. - Re-raises with
SIG_DFLso systemd still sees a non-zero exit.
To resolve a PC to a library:offset, find the maps line whose VA range
covers the PC, then RVA_in_file = pc - va_start + file_offset. Open
the .so in IDA at its ImageBase and jump to that RVA.
These apply only when the back-end driver is Mali's Valhall blob. The
proxy itself is driver-agnostic; everything below is upstream-driver
behaviour worked around in dispatch_manual.c or the kernel module.
- Mali user-space race:
libGLES_mali.so:sub_1A65300walks a registry inside apthread_mutex_tit doesn't itself create. Two concurrent server threads can observe a partially-NULL'd registry pointer and dereferenceNULL + 0xC8. Mitigation: serialise all dispatch through a single mutex inmain.c. Lower throughput, no crash. - Mali kbase
CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND=y(kernel side): imports do not eager-map dma_bufs into the GPU page table; the fault handler refuses to demand-map and kills the context. Patch the kernel module to#undef CONFIG_MALI_DMA_BUF_MAP_ON_DEMANDinmali_kbase_mem.candmali_kbase_mem_linux.c. -Wl,-Bsymbolicis mandatory on the ICD or the loader rejects it with an infinite-recursion error.- GS/tess on Mali-G715: advertised but draw-time hangs the MCU.
Strip in
vkCreateDevicefeature filtering on the server.