Data Storage
An overview of the BLOB storage and PostgreSQL database.
Overview
Data is stored in two ways: a BLOB storage for media and files, and a database for parameters and reports.
BLOB Storage
BLOB storage is used to store images, videos and files. These items are stored in containers. The first item that needs to be stored is the media. Because of this, the front-end creates a container when media is submitted.
Uploading
def save_to_blob(self, file_name, uploaded_file_url):
# Connect
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Generate name and create container
container_name = self.generate_name(10).lower()
blob_service_client.create_container(container_name)
# Upload media
blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_name)
with open(uploaded_file_url, "rb") as data:
blob_client.upload_blob(data)
The first step is to connect to the Azure storage account using the connection string provided in Azure. This is saved as an environment variable in this case.
To create a container, a name has to be provided for it. Each job (instance of media upload) in the pipeline has an associated UUID - in this case, the generate_name()
method. This can be used throughout the pipeline to track what container is meant in logs.
Once a container has been created, the uploaded media can be stored inside it. The upload completing triggers the Azure function, starting the next service in the pipeline.
Downloading
def get_blob(self, name):
# Connect to storage
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Determine which file is an image
container_client = blob_service_client.get_container_client(name)
media_name = self.media_name_from_blob(container_client)
# Get and download blob
blob_client = blob_service_client.get_blob_client(container=name, blob=media_name)
with open(media_name, "wb") as download_file:
download_file.write(blob_client.download_blob().readall())
return media_name
This function (logging omitted for brevity) downloads the image from the container and stores it locally. It then returns the name of the image for the analysis.
Database
The database is a simple PostGreSQL database used to store parameter sets and the data from the analysis. An overview of the structure can be seen here.
Items are stored in the database using the Django ORM. It works by creating models that represent the tables in the database. These tables are automatically generated based on the models. Multiple services can access the same database (in this case the front-end and the reporting module) as long as they use the same set of models. An example of a model can be seen here:
class ActionUnitReport(models.Model):
class ReportType(models.TextChoices):
IMAGE_POSED = 'IP', _('Image - Posed')
IMAGE_WILD = 'IW', _('Image - Wild')
VIDEO_SINGLE = 'VS', _('Video - Single')
VIDEO_MUTLIPLE = 'VM', _('Video - Multiple')
name = models.CharField(max_length=100, default='default', null=False)
url = models.CharField(max_length=300, null=False)
date = models.DateField('date added', default=datetime.date.today, null=False)
avatar = models.CharField(max_length=50, null=False)
type = models.CharField(
max_length=2,
choices= ReportType.choices,
default=ReportType.IMAGE_POSED,
null=False
)
def __str__(self):
return "%s %s" % (self.url, self.date)
class Meta:
db_table = "action_unit_report"
This model represents the collection of OpenFace data. The only noteworthy field is type: it is a sort of enum as defined by the ReportType
class. By defining the Meta
class, it is possible to define the name of the database table too.
Last updated
Was this helpful?