Skip to content

Tenser-Linux/vkproxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vkproxy

A split Vulkan loader. Lets a Vulkan client run on a GPU whose user-space driver can't be loaded into that client's process — for example because the driver is a Bionic-only blob expecting the AOSP hwvulkan HAL ABI while the client is a glibc process, or because the driver lives on a different machine entirely.

vkproxy splits the Vulkan stack into two halves connected by a small RPC:

            Vulkan client process                     Server process
   ┌──────────────────────────────┐         ┌─────────────────────────────┐
   │  app / engine                │         │                             │
   │     │                        │         │   husky_vkserver            │
   │     ▼                        │         │   (dlopen's the real        │
   │  libvulkan loader            │   UDS   │    Vulkan driver and        │
   │     │                        │◄───────►│    forwards every call)     │
   │     ▼                        │  (or    │                             │
   │  mali_proxy_icd.so           │   TCP)  │                             │
   │  (ICD shim; marshals every   │         │                             │
   │   call onto the wire)        │         │                             │
   └──────────────────────────────┘         └──────────────┬──────────────┘
                                                           │
                                                  ┌────────▼────────┐
                                                  │ real GPU driver │
                                                  │ (e.g. Mali on   │
                                                  │  /dev/mali0)    │
                                                  └─────────────────┘

Every Vulkan call goes:

  1. The Vulkan loader on the client side picks mali_proxy_icd.so based on the ICD JSON manifest (mali_proxy_icd.json lives under /usr/local/share/vulkan/icd.d/).
  2. The ICD serialises the call's arguments into a wire format and writes them to /tmp/husky-vk.sock (UDS) or a TCP socket if VKPROXY_REMOTE is set.
  3. husky_vkserver reads the request, calls the real driver, writes the reply back.

There are two server binaries from the same dispatch core:

  • husky_vkserver — Bionic / aarch64. Built with the Android NDK, runs in a Halium-style chroot where the Mali blob's expected libc and binder are available, dlopen's libGLES_mali.so.
  • husky_vkserver_pc — host glibc / x86_64. Talks TCP and dlopen's a stock libvulkan.so.1. Useful for development without a Mali host — the wire protocol is the same.

Source tree

vkproxy/
├── Makefile                       # builds server (NDK), ICD (aarch64-gcc), smoke client, PC server
├── codegen/
│   ├── gen.py                     # parses vk.xml; emits opcode enum, client stubs, server dispatch, fn table
│   └── vk.xml                     # the Vulkan registry
├── include/                       # headers shared between client & server
│   ├── vkproxy_proto.h            # wire framing: vkp_cmd_hdr, vkp_reply_hdr, SYNTH opcode enum
│   ├── vkproxy_proto_gen.h        # codegen'd: every Vulkan command -> opcode id
│   ├── transport.h                # client-side socket API (vkp_send_cmd / vkp_call / fd-passing)
│   ├── udp_frame.h                # optional UDP frame-blob channel (for remote presents)
│   └── hwvulkan.h                 # subset of AOSP <hardware/hwvulkan.h> the server needs
├── protocol/                      # Wayland protocol bindings (xdg-shell, linux-dmabuf), generated
├── client/                        # mali_proxy_icd.so — the ICD the Vulkan loader loads
│   ├── mali_proxy_icd.json        # ICD manifest installed to /usr/local/share/vulkan/icd.d/
│   ├── icd_entry.c                # vk_icd* entry points the loader calls first
│   ├── stubs_gen.c                # codegen'd: every Vulkan command -> marshal -> vkp_send_cmd/vkp_call
│   ├── manual_stubs.c             # hand-written overrides for everything codegen can't do
│   │                                (vkCreateInstance, vkCreateDevice, vkGetPhys*Properties,
│   │                                 the actually-complex pipeline/descriptor commands, swapchain,
│   │                                 etc.) plus _vkp_manual_override which vkGetDeviceProcAddr
│   │                                must consult before _vkproxy_lookup
│   ├── dispatch.h                 # vkp_dispatchable / handle-translation helpers
│   ├── transport.c                # AF_UNIX (or TCP via VKPROXY_REMOTE) socket plumbing
│   ├── wl_present.c/.h            # Wayland present path (wl_display, wl_buffer construction)
│   ├── x_wl_surface.c             # xdg-shell shim used when running under Xwayland
│   ├── udp_frame.c                # UDP frame-blob receive plumbing (PC build only)
│   ├── udp_shm_present.c/.h       # SHM presentation when frames arrive as raw bytes
│   ├── decode_h264.c/.h           # H.264 decode path for PC mode (server side encodes)
│   └── udp_frame.h                # local copy of the wire header
├── server/                        # husky_vkserver (Bionic) + husky_vkserver_pc (glibc)
│   ├── main.c                     # vksrv main: AF_UNIX listener, per-client thread, signal handler
│   ├── main_pc.c                  # PC variant: TCP listener, no Bionic/AHB paths
│   ├── vk_init.c / vk_init.h      # dlopen libGLES_mali.so, get hwvulkan_device_t, populate g_pfn
│   ├── vk_init_pc.c               # PC variant: dlopen libvulkan.so.1, vkGetInstance/DeviceProcAddr
│   ├── vk_funcs_gen.c/.h          # codegen'd: g_pfn struct of every Vulkan function pointer + populator
│   ├── dispatch_gen.c             # codegen'd: case OP_vkXxx -> unmarshal args -> call g_pfn.vkXxx -> reply
│   ├── dispatch_manual.c          # hand-written dispatch for OPs that need server-side state
│   │                                (handle table, BC-texture transcode staging, dmabuf alloc via
│   │                                 dma_heap, MMU/registry quirks, etc.) — by far the biggest file
│   ├── handle_table.c/.h          # 64-bit cookie ↔ dispatchable-handle map (locked, growable)
│   ├── encode_h264.c/.h           # PC build only: encode rendered frames for the remote viewer
│   ├── udp_frame.c/.h             # UDP send side of frame chunks
│   ├── logsink_inline.c           # one-line LOG() macro that writes to stderr (journald in chroot)
│   └── pc_stubs/ue_husky.h        # tiny stub of ue-husky for the PC build (no BC transcoder there)
├── smoke/                         # smoke client used to validate the link layer
│   ├── vksmoke.c                  # connect, OP_vkp_hello, print server's "name/api/ext_count"
│   ├── logsink.c                  # same LOG() macro
│   ├── hwvulkan.h                 # local copy
│   └── Makefile
└── tests/                         # standalone Vulkan triangle/cube tests
    ├── mincube.c / mincube2.c     # raw Vulkan + Wayland present (the latter does textured cube)
    ├── mincube2.{vert,frag}       # GLSL source
    ├── mincube2_{vert,frag}_spv.h # the same as SPIR-V byte arrays
    ├── vkp_hello_test.c           # bare vkp_hello round-trip
    └── xdg-shell-{client-protocol.h,protocol.c}   # generated Wayland boilerplate (local copy)

Wire protocol

Both directions use the same simple framing:

client → server                            server → client
┌────────────────────────────────┐        ┌──────────────────────────────────┐
│ struct vkp_cmd_hdr             │        │ struct vkp_cmd_hdr (echo opcode) │
│   uint16_t opcode              │        │   flags |= VKP_FLAG_REPLY        │
│   uint16_t flags               │        ├──────────────────────────────────┤
│   uint32_t len                 │        │ struct vkp_reply_hdr             │
├────────────────────────────────┤        │   int32_t  status (VkResult)     │
│ payload (len bytes)            │        │   uint32_t len                   │
└────────────────────────────────┘        ├──────────────────────────────────┤
                                          │ payload (len bytes)              │
                                          └──────────────────────────────────┘

flags:

bit name meaning
0 VKP_FLAG_EXPECTS_REPLY client wants a synchronous reply (used by vkp_call)
1 VKP_FLAG_REPLY the frame is a reply (server -> client direction)
2 VKP_FLAG_HAS_FD one fd is being passed via SCM_RIGHTS on this datagram

Opcode space:

Range Meaning
0x0000 – 0x7FFF One-to-one mirror of a real Vulkan command — generated by gen.py
0x8000 – 0xFFFF Synthesised / non-Vulkan ops (OP_vkp_*) — handled by dispatch_manual.c

The synthesised range covers things that need server-side state or custom marshalling:

  • OP_vkp_hello — protocol handshake; returns the GPU's device name, Vulkan API version, and instance extension count.
  • OP_vkp_create_* / OP_vkp_destroy_* for instance, device, queue, command pool/buffer, image, buffer, image view, sampler, etc. The manual server side maintains the handle table and BC-transcode staging registry alongside.
  • OP_vkp_map_memory / OP_vkp_unmap_memory — coordinate dmabuf-backed vkAllocateMemory, including sending the heap fd back via SCM_RIGHTS so the client can mmap the same memory.
  • OP_vkp_cmd_pipeline_barrier, OP_vkp_cmd_copy_buffer_to_image, OP_vkp_cmd_copy_image_to_buffer, etc. — command recording that needs ht_get on the dispatchable cmdbuf handle (codegen doesn't know which arg is dispatchable).
  • OP_vkp_get_buffer_device_addr, OP_vkp_get_phys_mem_props, and other "spoofable" property queries.

How a call flows

Take vkCmdBindIndexBuffer as the worked example:

  1. The Vulkan loader inside the client process loads libvulkan.so.1, scans /usr/local/share/vulkan/icd.d/, finds mali_proxy_icd.json, dlopens mali_proxy_icd.so, then calls vk_icdGetInstanceProcAddr (in icd_entry.c) for every entry point. icd_entry.c calls _vkproxy_lookup, which walks the codegen'd table in stubs_gen.c. For overrides that need hand work, manual_stubs.c's _vkp_manual_override is consulted first inside the ICD's own vkGetDeviceProcAddr / vkGetInstanceProcAddr.
  2. The client records its command-buffer call. The function pointer it got is one of these:
    • the codegen stub in stubs_gen.c — marshals scalar args, sends the opcode via vkp_send_cmd (fire-and-forget), no reply.
    • a manual implementation in manual_stubs.c — same protocol, but handles things gen.py doesn't (variable-length arrays, pNext chains, dispatchable-handle translation, ABI mismatches between spec versions).
  3. vkp_send_cmd (in client/transport.c) takes the global socket lock, writes vkp_cmd_hdr + payload to the socket, and returns. For queries that need a reply, vkp_call blocks until the server replies.
  4. husky_vkserver's per-client thread reads the header, grows the payload buffer if needed, then dispatches:
    • opcode < VKP_OP_BASE_SYNTH (0x8000): try vkp_dispatch_manual first (it owns a fixed list of opcodes that need handle translation), then fall back to the codegen dispatcher vkp_dispatch in dispatch_gen.c.
    • opcode ≥ 0x8000: vkp_dispatch_manual only (handles all OP_vkp_*).
  5. The chosen handler unmarshals into stack-allocated Vulkan structs and calls g_pfn.vkCmdBindIndexBuffer — a real pointer obtained at bring-up by walking the driver's GetInstanceProcAddr. The handle table (handle_table.c) translates the client-side cookie for VkCommandBuffer into the driver's actual dispatchable handle.
  6. For commands that produce a reply (vkAllocateMemory, vkCreateBuffer, the vkGet*Properties family), the handler calls vkp_send_reply to ship the returned VkResult + result struct back to the client.

Codegen (codegen/gen.py)

Driven from upstream vk.xml. Walks every <command> element and classifies its args:

  • Scalars / by-value handles → emit a 1:1 marshal into a packed struct.
  • Pointers to small fixed-size structs → emit a struct-copy.
  • Anything else (variable-length arrays, pNext chains, ambiguous unions, dispatchable handles, sType-based polymorphism) → emit a vkp_not_implemented stub. Those commands are picked up by manual_stubs.c / dispatch_manual.c and registered in _vkp_manual_override so the lookup chain finds the hand-written version first.

Outputs:

  • include/vkproxy_proto_gen.henum vkp_opcodes { OP_vkXxx = N, ... }
  • client/stubs_gen.c — every VKAPI_ATTR ... VKAPI_CALL vkXxx(...) plus a name → fn-pointer table _vkp_stubs[] consulted by _vkproxy_lookup.
  • server/dispatch_gen.cvkp_dispatch() is one big switch over opcodes that unmarshals and calls g_pfn.vkXxx.
  • server/vk_funcs_gen.c/.hstruct vk_funcs g_pfn { PFN_vk...; } plus a populator that calls GetInstanceProcAddr for every member.

Regenerate: python3 codegen/gen.py.

Build

make                # builds husky_vkserver (Android Bionic), mali_proxy_icd.so (glibc aarch64),
                    # and vkp_hello_test (smoke client). Output in build/.

make pc             # builds husky_vkserver_pc (host x86_64 glibc) against a runtime-dlopen'd
                    # libvulkan.so.1. No ue-husky link — texture substitution is Mali-only.

The Makefile splits compilers:

  • CC_AND = aarch64-linux-android29-clang from $NDK/toolchains/... for the Bionic server.
  • CC_ARM64 = aarch64-linux-gnu-gcc for the glibc ICD and smoke client.
  • CC_HOST = gcc for the host-side PC build.

The ICD links with -Wl,-Bsymbolic so internal references to vk* symbols bind to the ICD's own copies; otherwise the Vulkan loader (which is loaded first into the process) would interpose, cause infinite recursion, and reject the ICD.

Deploy

Target Path on the GPU-side machine
husky_vkserver /var/lib/machines/halium/data/local/tmp/husky_vkserver (runs in chroot)
libue_husky.so (sibling) same dir, beside husky_vkserver (rpath $ORIGIN)
mali_proxy_icd.so /usr/local/lib/vkproxy/mali_proxy_icd.so
mali_proxy_icd.json /usr/local/share/vulkan/icd.d/mali_proxy_icd.json

The server is run as a systemd unit (husky-vkserver.service). The unit's ExecStart chroot-execs it and pipes stderr into /tmp/vksrv.log inside the chroot.

A Vulkan client process picks up the ICD via:

export VK_ICD_FILENAMES="/usr/local/share/vulkan/icd.d/mali_proxy_icd.json"
export VK_LOADER_DRIVERS_SELECT="mali_proxy_icd.json"

Runtime knobs

Variable Side Effect
VKPROXY_SOCKET server UDS path (default /tmp/husky-vk.sock)
VKPROXY_LISTEN PC host:port for the TCP variant
VKPROXY_REMOTE client If set, dial TCP instead of UDS
UEHUSKY_VERBOSE server Turn on verbose op-by-op logging
VKPROXY_SPOOF_SM6 client Spoof shaderInt64/Float64/Int16 (for SM6 / D3D12). Off by default.
VK_LOADER_DEBUG=all loader Khronos loader trace; useful for "why didn't this ICD load"

Debugging

husky_vkserver's SIGSEGV/SIGABRT/SIGBUS/SIGILL handler (main.c):

  1. Writes *** FATAL SIGNAL *** and sig=N tid=T code=K addr=A pc=P.
  2. Walks the aarch64 frame-pointer chain via the ucontext_t and prints up to 20 caller PCs.
  3. Dumps /proc/self/maps.
  4. Re-raises with SIG_DFL so systemd still sees a non-zero exit.

To resolve a PC to a library:offset, find the maps line whose VA range covers the PC, then RVA_in_file = pc - va_start + file_offset. Open the .so in IDA at its ImageBase and jump to that RVA.

Known sharp edges (Mali backend)

These apply only when the back-end driver is Mali's Valhall blob. The proxy itself is driver-agnostic; everything below is upstream-driver behaviour worked around in dispatch_manual.c or the kernel module.

  • Mali user-space race: libGLES_mali.so:sub_1A65300 walks a registry inside a pthread_mutex_t it doesn't itself create. Two concurrent server threads can observe a partially-NULL'd registry pointer and dereference NULL + 0xC8. Mitigation: serialise all dispatch through a single mutex in main.c. Lower throughput, no crash.
  • Mali kbase CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND=y (kernel side): imports do not eager-map dma_bufs into the GPU page table; the fault handler refuses to demand-map and kills the context. Patch the kernel module to #undef CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND in mali_kbase_mem.c and mali_kbase_mem_linux.c.
  • -Wl,-Bsymbolic is mandatory on the ICD or the loader rejects it with an infinite-recursion error.
  • GS/tess on Mali-G715: advertised but draw-time hangs the MCU. Strip in vkCreateDevice feature filtering on the server.

About

vulkan proxy to bridge the GPU connection from glibc to the bionic Mali library.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors