Cross-platform GPU image processing

Run CUDA-based image processing on macOS via Metal. Supports upscaling, filtering, morphology, edge detection, and blending. Split images locally, process on cloud GPU, stitch results.

What it does

CUDA-to-Metal translation

Write CUDA kernels that run on macOS Apple Silicon through automatic Metal translation.

Cross-platform builds

Same codebase compiles on macOS, Linux, and Windows with native GPU support on each.

Local preprocessing

CPU-based tiling and stitching with OpenCV or header-only stb_image. No external dependencies required for C version.

REST API

Upload images, create tiles, trigger upscaling, and download results via HTTP endpoints.

How it works

1

Split

Tile images locally with preprocess tool

2

Transfer

Send tiles to cloud GPU instance

3

Process

Upscale or filter on GPU

4

Stitch

Combine tiles into final image

Performance

Processing times for 1024×1024 images.

Operation macOS Metal Linux CUDA CPU
2× Upscale 12ms 15ms 450ms
Gaussian Blur 8ms 10ms 180ms
Edge Detection 5ms 7ms 120ms
Color Conversion 3ms 4ms 45ms

API

POST /v1/images Upload image
GET /v1/images List images
POST /v1/images/:id/tiles Create tiles
POST /v1/tiles/:id/upscale Upscale tile
POST /v1/stitch Combine tiles
GET /health Status check

Install

Quick start

./scripts/setup.sh
./scripts/run.sh

Docker

docker build -t hybrid-compute .
docker run --rm hybrid-compute

macOS

brew install --cask miniforge
mamba install opencv cmake
mkdir build && cd build
cmake .. && make

Ubuntu

sudo apt install cmake libopencv-dev
mkdir build && cd build
cmake .. && make

Documentation

Acknowledgments

This project would not have been possible without OpenCode Zen and Kilo Gateway.

This project is being acquired by libnudget, a startup building the future of hybrid compute infrastructure.

Join the discussion →