Deploying to Azure

Exploring the research done to see which Azure services best suit the pipeline.

Requirements of the Pipeline

One of the main goals of the project was for it to be cloud based. The research going into that is documented here - mainly how the services in Azure can be used to implement the architecture in a cloud-first way.

Why Azure?

Before heading into the nitty-gritty, it best to elaborate on why Azure was chosen over competitors such as Google Cloud and AWS.

As I see it, there are three main reasons:

Familiarity: both I and the Fontys in general are most familiar with Azure. Because of this, there are experts on site that could potentially help me if necessary.
Ease-of-use: Azure includes a lot of services that are very easy to set up and work out-of-the-box. This counts for some other platforms too, but this contrasts the approach something like AWS takes in my experience. There, it takes a far more low-level approach to cloud servicing, offering a lot of options but making it require more knowledge to set up.
Student offers: this is a minor point, but using student offers to get started quickly and easily made it easier to prototype using Azure.

Implementing the Pipeline

Each section here will explore a facet of the Azure implementation of the validation pipeline and describe the different approaches that were considered.

Hosting Containers

The first and most straightforward aspect of deployment is the ability to host containers. This is how the core services are deployed. There are two options to do this: the basic container instance or the more advanced app service. Billing is one of the main differences between the two, but the differences relevant to this project are more focused on advanced features such as autoscaling and authentication that are not included in container instances.

However, the goal is to get the pipeline running as fast and easily as possible. Because of this, all services are hosted using container instances. This offers the ability to quickly deploy images by linking the Docker Hub repository and generate a public URL.

The Issue of OpenFace

The OpenFace service offers an interesting problem: how to best run terminal commands remotely in Azure. As this was a rather niche issue, multiple options were explored to try to resolve it. These have been mentioned previously in the related articles.

Tracking Job Status

Core to the pipeline is the ability to track what is happening during a job (as described in a previous blog post). This leaves two options open: creating a custom service or using Azure services to achieve this goal.

Creating a custom service that received live updates of what each service is doing would require an extra service and storage location, as well as a real-time connection between each service and this extra logging service. This indicates it would be far more efficient to leverage an Azure service.

Using Logs

Azure does indeed have logging services: Azure Monitor, which contains the Application Insights platform. This came recommended by Mark Klerkx. Not only can it collect logs, but it also offers the opportunity to query, export or access them remotely using a REST API. The functionality present here is sufficient for the current proof-of-concept, while allowing the option for a future dedicated service using these logs open. More information on the implementation of this service can be seen here.

Data Storage

There are two types of data storage needed for the pipeline: one to host images and raw OpenFace analysis data, and a database that stores reports and parameter sets.

Azure offers two possible services suited to media and file storage: BLOB Storage and File Share. BLOB storage is a simple bucket where any file can be stored in an unstructured manner. File Share, on the other hand, is an online file system (similar to Finder on Mac or File Explorer on Windows, but in the cloud). This added structure makes it easy to mount onto a VM, for example (an option which I explored before).

File Share conflicted with one other service I wished to use, however: function apps. Based on my research, it is not possible as of yet to trigger function apps based on changes in a File Share. As my services are tied together using function apps, this forced me to use BLOB storage.

Database

There are two main types of databases: relational and non-relational. As I use Django, it is best combined with a relational database. The two industry standards in this regard are MySQL and PostGreSQL, the latter of which has a better reputation among my colleagues. It is the more modern and expandable of the two, and offers functionality MySQL does not (such as table inheritance). Azure has services for both, and in all fairness it does not matter much which of the two are chosen. I therefore opted for the PostGreSQL service.

Function Apps

Instead of constantly monitoring the BLOB with a service, it would be ideal to have the next service in the chain trigger automatically when something is uploaded. The most straightforward and easy option for this is the Function App service, similar to Google's Cloud Functions. This is a function hosted in the cloud that is automatically triggered based on developer parameters. These parameters could be a timer, a REST call or, as is the case here, data being uploaded to the BLOB. More on the implementation here.

PreviousMinimum Viable Product NextPipeline Components

Last updated 3 years ago

Was this helpful?