Challenge

Each year, over 22 million bags (0.56%) are mishandled in the aviation industry due to various reasons, including transfer mishandling, ticketing errors and failures during loading. While this number decreased by over 50% from 2007, it still amounted to an estimated cost of US$2.3 billion in 2017. Part of the process of handling mishandled luggage is to register the items in a database so that customers are able to inquire about its state. In order to do so, each item is assigned a code based on the IATA luggage chart. This code contains information about the baggage type, its colour, material, if it has zippers, locks, wheels etc. The identification of these properties is currently done manually by airport personnel, so this data study was conducted to assess whether the process could be (partly) automated.

Investigating the available data, we identified four luggage-properties that we wanted to focus on during the data study:

  •    What is the bag type? (hard, soft, rucksack, sports bag, etc)
  •    How many wheels does the bag have?
  •    What are the dominant colours?
  •    Are patterns printed onto the bag?

The data consisted of 20,000 unprocessed images without any annotations in terms of type, colour, etc.

Solution

The main techniques that were used to identify the individual features were:

  •    Object Detection with Convolutional Neural Networks
  •    clustering
  •    image filtering

By applying a human in the loop approach in combination with transfer-learning on the basis of a RetinaNet which was trained on the COCO dataset, we were able to quickly assemble a large dataset of annotated images as well as a model that was able to classify the correct bag type with an accuracy of about 90%.

The accuracy for detecting wheels was about 95%.

By using object detection instead of image classification, we could not only detect what the type of the bag is but also where in the image it is. This information could be utilised when it came to identifying the bag’s colour and detecting if a pattern is printed on the bag.

The colour identification was performed by using the DBScan clustering algorithms. It proved to be a good match for the problem since it does not require the numbers of clusters as a parameter, it supports uneven cluster sizes and non-flat geometries. Matching the resulting clusters with corresponding colours was done through a rule-based approach. In order to detect patterns, the images were convolved with a Gabor filter. This filter highlights texture-rich areas – monotonic areas result in black.

The output was then summed up and divided by the area to calculate a “texture density”. After calculating the texture densities for many pattern and non-pattern images, a threshold was established as the intersection of two kernel density estimations.

Outcome:

The outcome of the project was a detailed report containing descriptions of the individual steps of the data study, the problems we encountered, performance metrics visualisations, and suggestions for further improvements.

The data study was completed in about 5 weeks.