ggmlR includes a built-in zero-dependency ONNX loader (hand-written protobuf parser in C). Load any compatible ONNX model and run inference on CPU or Vulkan GPU — no Python, no TensorFlow, no ONNX Runtime required.
model <- ggml_onnx_load("path/to/model.onnx")
# Input / output info
cat("Inputs:\n"); print(ggml_onnx_inputs(model))
cat("Outputs:\n"); print(ggml_onnx_outputs(model))ggml_onnx_inputs() returns a list with
name, shape, and dtype for each
input tensor.
Inputs are named R arrays in NCHW order (matching the ONNX model’s expected layout).
# Random image batch — replace with real data
input <- array(runif(1 * 3 * 224 * 224), dim = c(1L, 3L, 224L, 224L))
result <- ggml_onnx_run(model, list(input_name = input))
cat("Output shape:", paste(dim(result[[1]]), collapse = " x "), "\n")For models with multiple inputs, pass a named list:
result <- ggml_onnx_run(model, list(
input_ids = array(as.integer(tokens), dim = c(1L, length(tokens))),
attention_mask = array(1L, dim = c(1L, length(tokens)))
))By default ggmlR tries Vulkan first and falls back to CPU automatically. To force a specific backend:
# Check what's available
if (ggml_vulkan_available()) {
cat("Vulkan GPU ready\n")
ggml_vulkan_status()
}
# Load with explicit backend hint
model_gpu <- ggml_onnx_load("path/to/model.onnx", backend = "vulkan")
model_cpu <- ggml_onnx_load("path/to/model.onnx", backend = "cpu")Weights are transferred to the GPU once at load time. Repeated calls
to ggml_onnx_run() do not re-transfer weights.
ggmlR supports 50+ ONNX operators, including:
Custom fused ops: RelPosBias2D (BoTNet).
For full working examples with real ONNX Zoo models see:
# GPU vs CPU benchmark across multiple models
# inst/examples/benchmark_onnx.R
# FP16 inference benchmark
# inst/examples/benchmark_onnx_fp16.R
# Run all supported ONNX Zoo models
# inst/examples/test_all_onnx.R
# BERT sentence similarity
# inst/examples/bert_similarity.RIf a model fails to load or produces wrong results:
Check operator support — print the model’s op
list with Python’s onnx package and compare against the
table above.
Verify protobuf field numbers — the built-in parser is hand-written; an unexpected field can cause silent mis-parsing. Dump raw field tags:
# Python: dump all field numbers seen in a TensorProto
import onnx, sys
m = onnx.load(sys.argv[1])
for init in m.graph.initializer:
raw = init.SerializeToString()
# inspect field tags with a raw protobuf parserNaN tracing — use the eval callback for per-node inspection rather than a post-compute scan (which aliases buffers and gives false readings).
Repeated-run aliasing —
ggml_backend_sched aliases intermediate buffers over weight
buffers. ggmlR calls sched_alloc_and_load() before each
compute to reset allocation. If you see correct results on the first run
but garbage on subsequent runs, this is the cause.
See also the ONNX debugging section in CLAUDE.md for
field number tables and the Python dump script.