Stop Shipping Folders:
The Architecture of Single-Binary Deployment in C++
If you are deploying Computer Vision apps to edge devices—whether it's a Raspberry Pi 5, a Jetson Orin, or a ruggedized industrial controller—you know the pain of file paths. You ship a zip file, the customer unzips it, moves the folder, and suddenly your relative path./models/yolo.onnx resolves to nothing. The application crashes. The customer complains.
The solution is an architectural shift: Don't ship files. Ship a single binary.
By embedding your assets (ONNX models, config files, icons, and WAV audio) directly into the compiled C++ executable's .rdata section, you gain three massive advantages for B2B deployments:
- Zero Filesystem Dependencies: No need to worry about read permissions, missing folders, or working directory context.
- Version Integrity: The model matches the code, guaranteed. You cannot run v2.0 code with a v1.0 model by accident.
- Atomic Updates: Updating the software means replacing one file. No
rsyncof asset directories required.
Here is the complete engineering workflow for embedding assets, using the Santa Watch project as a case study.
Step 1: The Asset Pipeline (Python)
The first step is converting binary files (like .onnx models or .png images) into C-compatible byte arrays. While you could use a tool like xxd, a custom Python script gives us control over variable naming and memory alignment.
In generate_onnx_to_header.py, we use a robust implementation that doesn't just dump hex—it sanitizes variable names and enforces alignment.
import argparse
import os
import re
def sanitize_variable_name(name):
"""Ensures the variable name is valid for C++."""
# Replace non-alphanumeric characters with underscores
clean = re.sub(r'[^0-9a-zA-Z_]', '_', name)
# Ensure it doesn't start with a number
if clean[0].isdigit():
clean = '_' + clean
return clean
def convert_onnx_to_cpp_header(input_path, output_path, var_name=None):
if not os.path.exists(input_path):
print(f"Error: File '{input_path}' not found.")
return
# Determine variable name from filename if not provided
if not var_name:
base_name = os.path.splitext(os.path.basename(input_path))[0]
var_name = sanitize_variable_name(base_name)
file_size = os.path.getsize(input_path)
print(f"Converting '{input_path}' ({file_size/1024/1024:.2f} MB) to '{output_path}'...")
with open(input_path, 'rb') as f_in, open(output_path, 'w') as f_out:
# Write Header Guard
header_guard = f"{var_name.upper()}_H"
f_out.write(f"#ifndef {header_guard}\n")
f_out.write(f"#define {header_guard}\n\n")
# Write includes
f_out.write("#include <cstddef>\n\n") # For size_t
# Write Length constant
f_out.write(f"const size_t {var_name}_len = {file_size};\n\n")
# Write Array definition
# Using alignas to ensure compatibility with SIMD operations if needed by inference engines
f_out.write(f"alignas(16) const unsigned char {var_name}[] = {{\n")
byte_count = 0
while True:
chunk = f_in.read(1024) # Read in chunks for efficiency
if not chunk:
break
line_buffer = []
for byte in chunk:
line_buffer.append(f"0x{byte:02x}")
byte_count += 1
# Format: 16 bytes per line
if len(line_buffer) >= 16:
f_out.write(" " + ", ".join(line_buffer) + ",\n")
line_buffer = []
# Flush remaining bytes in the chunk
if line_buffer:
f_out.write(" " + ", ".join(line_buffer) + ",\n")
f_out.write("};\n\n")
f_out.write(f"#endif // {header_guard}\n")
print(f"Success! Generated header file at: {output_path}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Convert ONNX model to C++ header file.")
parser.add_argument("input", help="Path to input .onnx file")
parser.add_argument("output", help="Path to output .h file")
parser.add_argument("--name", help="Custom variable name for the C++ array", default=None)
args = parser.parse_args()
convert_onnx_to_cpp_header(args.input, args.output, args.name)Why alignas(16)?
When loading neural network weights, many inference engines (like ONNX Runtime or TFLite) utilize SIMD instructions (AVX2/NEON) that perform faster when data is aligned to 16-byte or 32-byte boundaries. If you simply dump a char array, the compiler might align it to 1 byte, forcing the inference engine to perform a costly memory copy before it can run the model.
Step 2: Load from Memory (OpenCV DNN)
Most OpenCV tutorials teach you to use readNetFromONNX("path/to/file"). However, for a single-binary architecture, we must use the overloaded function that accepts a buffer.
In src/detector.cpp, the engine demonstrates how to bridge the gap between the auto-generated header and the OpenCV API.
#include "santa_model_data.h" // The auto-generated header
Detector::Detector() {
// 1. Access the raw byte array and length from the header
const char* model_data = reinterpret_cast<const char*>(santa_hat_model);
size_t model_len = santa_hat_model_len;
if (model_data && model_len > 0) {
try {
// 2. Load directly from RAM
// OpenCV can parse the ONNX binary stream from memory
net_ = cv::dnn::readNetFromONNX(model_data, model_len);
// 3. Configure Backend
net_.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net_.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
} catch (...) {
// Fallback logic if memory is corrupted
use_legacy_mode_ = true;
}
}
}Step 3: Handling Multimedia (Audio & UI)
This technique isn't limited to AI models. You can embed UI assets and sound effects to create a truly standalone kiosk application.
Embedding Audio
For audio, the AudioService class handles playback directly from these memory pointers. On Windows, PlaySound supports the SND_MEMORY flag. On Linux, we write to a temporary buffer or pipe to a player.
Embedding Icons
For the UI, src/settings_page.cpp decodes PNGs from memory using cv::imdecode. This allows the app to render complex UI buttons without shipping an assets/ folder.
// Decoding an embedded PNG icon into an OpenCV Matrix std::vector<uchar> buf(icon_data, icon_data + icon_len); cv::Mat icon = cv::imdecode(buf, cv::IMREAD_UNCHANGED);
Step 4: Automating the Build (CMake)
Manually running Python scripts before every compile is a recipe for disaster. A professional setup integrates this directly into the build system so that if an asset changes, the header regenerates automatically.
In CMakeLists.txt, we use add_custom_command to bridge the Python generation step with the C++ compilation DAG.
# 1. Define the input model and output header
set(MODEL_FILE "${CMAKE_CURRENT_SOURCE_DIR}/models/santa_hat_final.onnx")
set(HEADER_FILE "${CMAKE_CURRENT_BINARY_DIR}/generated/santa_model_data.h")
# 2. Add custom command to run the script
add_custom_command(
OUTPUT ${HEADER_FILE}
COMMAND python3 ${CMAKE_CURRENT_SOURCE_DIR}/generate_model_header.py ${MODEL_FILE} "santa_model_data" > ${HEADER_FILE}
DEPENDS ${MODEL_FILE} ${CMAKE_CURRENT_SOURCE_DIR}/generate_model_header.py
COMMENT "Embedding ONNX model into C++ header..."
)
# 3. Add to target sources (triggers generation)
add_executable(SantaWatch src/main.cpp)
target_sources(SantaWatch PRIVATE ${HEADER_FILE})
target_include_directories(SantaWatch PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/generated)Conclusion
By treating assets as code, you shift the complexity from deployment time (where things go wrong on the customer's machine) to compile time (where you have control).
The Santa Watch project proves that even complex applications—featuring YOLOv11 inference, custom audio alerts, and a graphical dashboard—can be packaged into a highly portable, resilient artifact.
Next Steps for Engineers
- Audit your current
assets/folder. - Write a simple hex-dump script (or adapt
generate_onnx_to_header.py). - Refactor your
cv::imreadcalls tocv::imdecodeandreadNetFromONNX. - Enjoy the silence of zero "File Not Found" support tickets.
Need help architecting your Edge Pipeline?
We specialize in porting fragile Python prototypes into robust C++17 applications ready for factory floors and remote deployments.
Book a Code Review