Serving Trained Model

You can use FloydHub to deploy your trained models as REST APIs with a single command: floyd run --mode serve.


Note that this feature is in preview mode and is not production ready yet. Contact us at support@floydhub.com for production ready serving support (in closed beta).

The floyd run command has a serve mode. Executing floyd run --mode serve from your terminal starts a Flask web server on FloydHub's servers and returns a REST endpoint that you can query.

In order to serve your model, you need to provide an app.py file (Flask application) which handles the incoming request and executes all the steps necessary to evaluate the model. Below is the scaffold for the app.py file. You can also view a complete example here:

from flask import Flask

Import all the dependencies you need to load the model, 
preprocess your request and postprocess your result
app = Flask(__name__)
MODEL_PATH = ... # TODO: Insert path to the model to load

def load_model(MODEL_PATH):
    """Load the model"""

def data_preprocessing(data):
    """Preprocess the request data to transform it to a format that the model understands"""

# Every incoming POST request will run the `evaluate` method
# The request method is POST (this method enables your to send arbitrary data to the endpoint in the request body, including images, JSON, encoded-data, etc.)
@app.route('/<path:path>', methods=["POST"])
def evaluate(path):
    """"Preprocessing the data and evaluate the model""""
    # TODO: Sanity check

    if flask.request.method == "POST":


# Load the model and run the server
if __name__ == "__main__":
    print(("* Loading model and starting Flask server..."
        "please wait until server has fully started"))


Before serving your model, you need to create a floyd_requirements.txt and add flask as a requirement in it.

Example: Deploying a Neural Style Transfer Model

We will use a pre-trained Neural Style Transfer model to demonstrate how model serving works on FloydHub. In the end you will be able to send any image to this API as a HTTPs request and it will return the style transfered image.


For this guide we will be using Fast Style Transfer project.

$ git clone https://github.com/floydhub/fast-style-transfer
$ cd fast-style-transfer
$ floyd init fast-style-transfer
Project "fast-style-transfer" initialized in the current directory

Serve Mode

As mentioned before, serving requires an app.py file. You can see the app.py file in the sample repository. This file handles the incoming request (take your image), executes the code in evaluate.py (run the model evaluation) and returns the style-transferred output image.

$ floyd run --env tensorflow-0.12:py2 --data narenst/datasets/neural-style-transfer-pre-trained-models/1:input --mode serve
Syncing code ...


URL to job: https://www.floydhub.com/narenst/projects/fast-style-transfer/5
URL to service endpoint: https://www.floydlabs.com/expose/mkxjJa46aJBdwP4AEdKxfU

Sending requests to the REST API

Now you can send any image file as request to this API and it will return the style transferred image.

curl -o taipei_output.jpg -F "[email protected]/images/taipei101.jpg" https://www.floydlabs.com/expose/mkxjJa46aJBdwP4AEdKxfU


You will see the default style (la_muse) is applied to the input image.

Serving FAQS

Here a list of Frequetly Asked Questions about FloydHub serving.

Where can I find others serving examples?

You can also check the projects of the PyTorch collection.

Do I have to pay for the entire duration that my serving endpoint is active?

When serving models, you will be charged for the full duration that your serving endpoint is active. So, remember to stop your job when you are no longer using the endpoint.

Serving is currently a beta feature. In the near future we will be able to charge users per API call.

What is the maximum uptime for serving?

In preview mode the API has a maximum uptime of 7 days. The 7 day timeout is common across all jobs you run on FloydHub, whether it be Jupyter notebooks, command jobs or serve jobs.

Is there any API rate limit?

There is no limit, but the Flask web server can handle only 1 request at a time. This mean that it will block all other requests during that time.