Intel Data Center GPU Flex Series Receives New TensorFlow Acceleration

TensorFlow allows for a platform for AI machine-learning models to prepare, build, deploy, and implement several software and hardware options. Recently, Google and Intel created a way to let various hardware manufacturers release product support for data center devices without altering the original code. This new mechanism is called PluggableDevice. Today, Intel announced that they had added their Intel Data Center GPU Flex Series to the line of PluggableDevices, called Intel Extension for TensorFlow.

Intel and Google team members join forces to create a new TensorFlow acceleration by way of the new Intel Extension for TensorFlow

This new implementation for Intel and TensorFlow will allow Intel Data Center GPU Flex Series hardware and the company’s Intel Arc graphics. It is compatible with Linux and the Windows Subsystem for Linux by connecting to oneAPI. OneAPI is the open-source programming model standard that allows developers to utilize various accelerated architectures.

Intel Extension for TensorFlow PluggableDevice implementation. Image source: TensorFlow.

The plug-in, albeit simple to initiate, will allow developers to use various C API models that combine C++ with the SYCL programming models.

Device management: The Intel and Google developers implemented TensorFlow’s StreamExecutor C API utilizing C++ with SYCL and some exceptional support provided by the oneAPI SYCL runtime (DPC++ LLVM SYCL project). StreamExecutor C API defines stream, device, context, memory structure, and related functions, all have trivial mappings to corresponding implementations in the SYCL runtime.
Op and kernel registration: TensorFlow’s kernel and op registration C API allows device-specific kernel implementations and custom operations. To ensure sufficient model coverage, the development team matched TensorFlow native GPU device’s op coverage, implementing most performance-critical ops by calling highly-optimized deep learning primitives from the oneAPI Deep Neural Network Library (oneDNN). Other ops are implemented with SYCL kernels or the Eigen math library to C++ with SYCL so that it can generate programs to implement device ops.
Graph optimization: The Flex Series GPU plug-in optimizes TensorFlow graphs in Grappler through Graph C API and offloads performance-critical graph partitions to the oneDNN library through oneDNN Graph API. It receives a protobuf-serialized graph from TensorFlow, deserializes the graph, identifies and replaces appropriate subgraphs with a custom op, and sends the graph back to TensorFlow. When TensorFlow executes the processed graph, the custom ops are mapped to oneDNN’s optimized implementation for their associated oneDNN Graph partitions.
The Profiler C API lets PluggableDevices communicate profiling data in TensorFlow’s native profiling format. The Flex Series GPU plug-in takes a serialized XSpace object from TensorFlow, fills the object with runtime data obtained through the oneAPI Level Zero low-level device interface, and returns the object to TensorFlow. Users can display the execution profile of specific ops on The Flex Series GPU with TensorFlow’s profiling tools like TensorBoard.

For those parties interested in learning more about the new integration, you can check out the Intel Extension for TensorFlow blog page for further details.

The post Intel Data Center GPU Flex Series Receives New TensorFlow Acceleration by Jason R. Wilson appeared first on Wccftech.