From NeuroPilot
目录
1. NeuroPilot Introduction
1.1 NeuroPilot Software Ecosystem
1.1.1 Neural Network Framework Support
1.1.2 NeuroPilot Software Tools
1.1.2.1 Neuron SDK
1.1.2.2 Android Run-Time Libraries
1.2 MediaTek Device Capabilities
1.2.1 Hardware Support
1.2.1.1 Device Parametric Table
1.2.2 Devices
1.2.2.1 CPU
1.2.2.2 GPU
1.2.2.3 MVPU
1.2.2.4 MDLA
2. Hardware Support Specification11111
2.1 Hardware Specifications
2.1.1. Dimensity 9000
2.1.2. APU
2.1.2.1. MVPU 2.0
2.1.2.2. MDLA 3.0
2.2. Supported Operations
2.2.1. TFLite Operations
2.2.1.1. Supported Data Types
2.2.1.2. Supported NeuroPilot Operations
2.2.1.3. Supported Hardware Operations
2.2.1.3. Supported Hardware Operations
2.2.2. MDLA 3.0 Guidelines
2.2.2.1. General Restrictions
2.2.2.2. Supported OPs Specification
2.2.2.3. Limitations of Broadcasting
2.2.3. MVPU 2.0 Guidelines
2.2.3.1. General Restrictions
2.2.3.2. TensorFlow and TFLite Operations
2.3. NeuroPilot SDK
1. NeuroPilot Introduction

1.1 NeuroPilot Software Ecosystem
NeuroPilot is a collection of software tools and APIs which are at the center of MediaTek’s AI ecosystem. These tools are designed to fulfill the goal of “Edge AI”, which means that AI processing is performed locally on the device rather than remotely on a server. With NeuroPilot, users can develop and deploy AI applications on edge devices with extremely high efficiency. This makes a wide variety of AI applications run faster, while also keeping data private.
MediaTek’s hardware platforms, such as mobile System-on-Chip (SoC) and ultra-low power embedded devices, span different levels of compute density. MediaTek is deeply invested in creating an AI ecosystem with efficient yet powerful AI-processors in its devices, ranging from smartphones to smart homes, wearables, Internet-of-Things (IoT), and connected cars.
Open frameworks such as TensorFlow offer out-of-the-box usability, but typically lack optimized support for advanced hardware. NeuroPilot allows users to use all the available hardware resources of a MediaTek AI platform, beyond that offered by an open framework. NeuroPilot provides programming support for specialized device capabilities, which allows for better performance, power, memory consumption, and end-user experience.
1.1.1 Neural Network Framework Support
NeuroPilot’s software tools support common AI frameworks such as TensorFlow, PyTorch, and TensorFlow Lite (TFLite). NeuroPilot provides support for inspecting, loading, and converting models, either to MediaTek-optimized model formats, or open framework standard model formats.
For Android devices, NeuroPilot provides extensions for the Android Neural Network API (NNAPI). This enables developers and device makers to bring their code closer to the hardware, for better performance and power-efficiency on MediaTek devices. NeuroPilot also allows developers to use a ‘write once, apply everywhere’ flow for existing and future MediaTek devices, including smartphones, automotive, smart home, IoT, and more. This streamlines the creation process, saving cost and time to market.
1.1.2 NeuroPilot Software Tools
| Tool | Type | Description |
| AISimulator | Web tool | A simulator which simulates a neural network workload on MediaTek’s AI Processing Unit (APU). |
| Android Run-Time Libraries | Library | Libraries which provide NNAPI delegates for special-purpose hardware cores (GPU, VPU, MDLA), and support for dynamic scheduling. |
| 4.1.1. Converter | Command line tool | Convert a pre-trained and optimized PyTorch or TensorFlow model into a TensorFlow Lite model, and perform post-training quantization. |
| 4.1.2. Neuron SDK | Command line tools, API, Library | A TFLite model compiler which produces ready-to-run compiled binary models (.dla). |
| 4.1.3. Quantization | Command line tool | Optimize a model for efficient inference on MediaTek devices using quantization-aware training. |
1.1.2.1 Neuron SDK
Neuron SDK allows users to convert their custom models to MediaTek-proprietary binaries for deployment on MediaTek platforms. The resulting models are highly efficient, with reduced latency and a smaller memory footprint. Users can also create a runtime environment, parse compiled model files, and perform inference on the edge. Neuron SDK is aimed at users who are performing bare metal C/C++ programming for AI applications, and offers an alternative to the Android Neural Networks API (NNAPI) for deploying Neural Network models on MediaTek-enabled Android devices.
Neuron SDK consists of the following components:
- 4.1.2.1.1. Neuron Compiler (ncc-tflite): An offline neural network model compiler which produces statically compiled deep learning archive (.dla) files.
- 4.1.2.1.2. Neuron Runtime (neuronrt): A command line tool which executes a specified .dla file and reports the results.
- 4.1.2.2. Neuron Runtime API: A user-invoked API which supports loading and running compiled .dla files within a user’s C++ application
1.1.2.2 Android Run-Time Libraries
MediaTek provides several run-time libraries for Android devices. These libraries allow for greater control and utilization of MediaTek special-purpose cores. The main library which implements most of this capability is an optimized Android Neural Network API (NNAPI) library, which is part of the Android NDK. The NNAPI library provides NNAPI hardware delegates, which enables the use of the GPU, VPU, and MDLA cores when running neural networks. This means that any NNAPI application can use MediaTek acceleration cores without any special changes to the application code. This accelerator support also includes support for .tflite models running in the Android TFLite run-time layer.
| Note: |
| MediaTek provides ready-to-run Android libraries for all Android-compatible devices. The developer does not need to interact with these libraries, and there is no special settings required to use them. |
1.2 MediaTek Device Capabilities
Using MediaTek devices gives users extraordinary speed and efficiency for AI applications. MediaTek devices deliver outstanding performance while consuming very little power.
1.2.1 Hardware Support
NeuroPilot tools can use the following target compute devices to run neural network models.
- CPU
- GPU
- VPU (Vision Processing Unit)
- MDLA (MediaTek Deep Learning Accelerator)
Successful use of these cores depends on the following factors, which interact with a user’s model.
- Neural network framework format of the trained model.
- Hardware platform (e.g. part number and device capability).
- Required model accuracy. Models with high accuracy requirements might limit the type and significance of the optimizations that can be applied to the model. This might also limit the target devices that are able to run the model with the required performance and accuracy.
- Neural network model structure. Certain operation (OP) types are not supported on certain targets device. For details, refer to the Supported Operations section of the platform’s documentation.
| Note: |
|
1.2.1.1 Device Parametric Table
| Device | Operator Flexibility | Performance | Power Consumption | Data Types |
|---|---|---|---|---|
| CPU | Very High | Low | High | FP32, FP16, INT16, INT8 |
| GPU | Medium | Medium | Medium | FP32, FP16 |
| VPU | Medium | High | Low | FP32, FP16, INT16, INT8 |
| MDLA | Low | Very High | Low | FP16, INT16, INT8 |
As a general rule, you should target the most power-efficient device that your neural network or developer constraints can support. The lowest-power devices are also the highest performing.
1.2.2 Devices
1.2.2.1 CPU
The CPU is capable of running any neural network, and is guaranteed to support all existing and future NN operations. Support is provided in the TFlite subsystem from Google for Android devices. For native development, developers can use the TFlite C++ API. The CPU is the most flexible target device, but it is also the least optimized for power and performance.
1.2.2.2 GPU
The GPU provides neural network acceleration for floating point models.
-
ARM-based MediaTek platforms support GPU neural network acceleration via Arm NN and the Arm Compute Library.
-
Non-ARM MediaTek platforms support GPU neural network acceleration via Google’s TensorFlow Lite GPU delegate. This GPU delegate is able to accelerate a wide selection of TFlite operations.
1.2.2.3 MVPU
The MediaTek Vision Processing Unit (MVPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The MVPU also offers outstanding performance while running AI models.
1.2.2.4 MDLA
The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.
The MDLA uses a technique called tile-based layer fusion to help achieve high compute efficiency and bandwidth reduction. Tile-based layer fusion identifies and then fuses dependent inter-layer operations, in order to reduce the amount of data the MDLA brings on-chip.
2. Hardware Support Specification11111
The following MediaTek platforms support NeuroPilot 5:
2.1 Hardware Specifications
2.1.1. Dimensity 9000
| Feature | D9000 |
|---|---|
| Process | T-4nm |
| CPU | 1x Arm Cortex-X2 at 3.05GHz, 1MB L2 |
| GPU | Arm Mali-G710 MC10 |
| Memory | 4x LPDDR5X 7500MHz |
| Camera | 4K30 3-exp Video HDR x 3CAM |
| AI | MediaTek APU 590 |
| Video Decoder | 8K 30fps |
| Video Encoder | 8K 24fps |
| Display | 2480x2200 120Hz |
| Connectivity | Wi-Fi 6E 2x2, 160MHz bandwidth |
| Modem | 5G NR 3CC 300MHz with ET 60MHz |
2.1.2. APU
The MediaTek AI Processing Unit (APU) is a a high-performance hardware engine for deep-learning, optimized for bandwidth and power efficiency. The APU architecture consists of big, small, and tiny cores. This highly heterogeneous design is suited for a wide variety of modern smartphone tasks, such as AI-camera, AI-assistant, and OS or in-app enhancements.
2.1.2.1. MVPU 2.0
The Vision Processing Unit (VPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The VPU also offers outstanding performance while running AI models.
2.1.2.2. MDLA 3.0
The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.
The MDLA uses a technique called tile-based layer fusion to help achieve high compute efficiency and bandwidth reduction. Tile-based layer fusion identifies and then fuses dependent inter-layer operations, in order to reduce the amount of data the MDLA brings on-chip.
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
