With the advent of social media and democratisation of technology, companies offering services to consumers now deal with millions of enquiries and complaints every year. We were approached by a consumer facing organisation who wanted  help with processing, triaging and understanding customer enquries raised via their customer relationship management platform. The enquiries were all gathered in the same interface and had to be processed one by one manually by an employee, resulting in a high processing and support time for the tenants.


We initially suggested grouping the claims into categories based on their subject, providing additional information and enabling the housing association to direct claims directly to the right person. But the claims were not labeled and manually reviewing and labelling them was not feasible within the project timeframe.

We therefore decided to extract information and structure the data using topic modelling. Topic modelling is an unsupervised machine learning technique, that is can automatically extract information from text data without human annotation. It could be used to:

  • see which topics are trending on social media
  • get a better understanding of users’ comments on a product
  • understand the customers complaints
  • organise your documents by relevant topics for an easier search in the future.

Various techniques exist to extract topics from text, among the most popular are Latent Dirichlet Allocation (LDA) and Negative Matrix Factorisation (NMF). We used LDA (and an interactive visualisation tool, pyLDAvis), which models each document as a mixture of topics. We extracted 4 relevant topics such as ‘need for repair in the house’ and ‘‘outdoor concerns’”(note that making sense of the topics is quite a manual task that requires you to explore the words and claims associated with the topics).

Our solution was used to suggest a categorisation scheme for future enquiries in order to redirect them to appropriate customer support agents within the business and reduce delays between enquiries being opened and a reply being sent.

This project was carried out over 4 weeks.