You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
61 lines
2.2 KiB
61 lines
2.2 KiB
# Inference High-level APIs
|
|
This document describes the high-level inference APIs, one can use them to deploy a Paddle model for an application quickly.
|
|
|
|
The APIs are described in `paddle_inference_api.h`, just one header file, and two libaries `libpaddle_fluid.so` and `libpaddle_fluid_api.so` are needed for a deployment.
|
|
|
|
## PaddleTensor
|
|
We provide the `PaddleTensor` data structure to give a general tensor interface.
|
|
|
|
The definition is
|
|
|
|
```c++
|
|
struct PaddleTensor {
|
|
std::string name; // variable name.
|
|
std::vector<int> shape;
|
|
PaddleBuf data; // blob of data.
|
|
PaddleDType dtype;
|
|
};
|
|
```
|
|
|
|
The data is stored in a continuous memory `PaddleBuf,` and a `PaddleDType` specifies tensor's data type.
|
|
The `name` field is used to specify the name of an input variable,
|
|
that is important when there are multiple inputs and need to distinguish which variable to set.
|
|
|
|
## engine
|
|
The inference APIs has two different underlying engines
|
|
|
|
- the native engine, which is consists of the native operators and framework,
|
|
- the Anakin engine, which has an Anakin library embedded.
|
|
|
|
The native engine takes a native Paddle model as input, and supports any model that trained by Paddle,
|
|
the Anakin engine is faster for some model,
|
|
but it can only take the Anakin model as input(user need to transform the format first manually) and currently not all Paddle models are supported.
|
|
|
|
```c++
|
|
enum class PaddleEngineKind {
|
|
kNative = 0, // Use the native Fluid facility.
|
|
kAnakin, // Use Anakin for inference.
|
|
};
|
|
```
|
|
|
|
## PaddlePredictor and how to create one
|
|
The main interface is `PaddlePredictor,` there are following methods
|
|
|
|
- `bool Run(const std::vector<PaddleTensor>& inputs, std::vector<PaddleTensor>* output_data)`
|
|
- take inputs and output `output_data.`
|
|
- `Clone` to clone a predictor from an existing one, with model parameter shared.
|
|
|
|
There is a factory method to help create a predictor, and the user takes the ownership of this object.
|
|
|
|
```c++
|
|
template <typename ConfigT, PaddleEngineKind engine = PaddleEngineKind::kNative>
|
|
std::unique_ptr<PaddlePredictor> CreatePaddlePredictor(const ConfigT& config);
|
|
```
|
|
|
|
By specifying the engine kind and config, one can get a specific implementation.
|
|
|
|
## Reference
|
|
|
|
- [paddle_inference_api.h](./paddle_inference_api.h)
|
|
- [some demos](./demo_ci)
|