Cross-platform GPU image processing

Run CUDA-based image processing on macOS via Metal. Supports upscaling, filtering, morphology, edge detection, and blending. Split images locally, process on cloud GPU, stitch results.

GitHub → Documentation →

What it does

CUDA-to-Metal translation

Write CUDA kernels that run on macOS Apple Silicon through automatic Metal translation.

Cross-platform builds

Same codebase compiles on macOS, Linux, and Windows with native GPU support on each.

Local preprocessing

CPU-based tiling and stitching with OpenCV or header-only stb_image. No external dependencies required for C version.

REST API

Upload images, create tiles, trigger upscaling, and download results via HTTP endpoints.

How it works

Split

Tile images locally with preprocess tool

Transfer

Send tiles to cloud GPU instance

Process

Upscale or filter on GPU

Stitch

Combine tiles into final image

Performance

Processing times for 1024×1024 images.

Operation	macOS Metal	Linux CUDA	CPU
2× Upscale	12ms	15ms	450ms
Gaussian Blur	8ms	10ms	180ms
Edge Detection	5ms	7ms	120ms
Color Conversion	3ms	4ms	45ms

API

POST /v1/images Upload image

GET /v1/images List images

POST /v1/images/:id/tiles Create tiles

POST /v1/tiles/:id/upscale Upscale tile

POST /v1/stitch Combine tiles

GET /health Status check

Install

Quick start

./scripts/setup.sh
./scripts/run.sh

Docker

docker build -t hybrid-compute .
docker run --rm hybrid-compute

macOS

brew install --cask miniforge
mamba install opencv cmake
mkdir build && cd build
cmake .. && make

Ubuntu

sudo apt install cmake libopencv-dev
mkdir build && cd build
cmake .. && make

Documentation

Who is this for? API Design Guide Testing Onboarding Compatibility Troubleshooting CI/CD Test Results Pipeline Status

Acknowledgments

This project would not have been possible without OpenCode Zen and Kilo Gateway.

This project is being acquired by libnudget, a startup building the future of hybrid compute infrastructure.

Join the discussion →