Filament Google Hackathon

On Saturday June 30th, A team of Filament software engineers and geeks spent their day off in the office aiming to build one or more Google Home skills with the chance to win some Google Home Minis. This is what happened…

Why would you work on your day off?

Many Filament staff (or Filamentalists as we like to call ourselves) spend a fair amount of time outside of work hacking, coding and building. I guess being inquisitive and curious comes with the territory when you’re a software engineer. The company paid for food and beers all day and it was a chance to hang out and socialise with the team outside of work hours and let our hair down a little. If you’re going to spend the weekend coding and building cool things anyway, then coming in to the office and having your food and drink provided makes this a win-win! The chance to win a Google Home Mini probably had something to do with it too.

We also had a huge amount of support from family and friends providing moral support and sugary snacks like these Filament logo caramel shortbreads.

These cakes were as delicious as they were pretty. Thanks to Jo for bringing them in!

All this said, there’s absolutely no pressure on people to turn up. Many of our staff have families and young children and it’s not always practical to come in on a day off. Equally, “It’s the weekend, I don’t want to go to work” is a perfectly valid reason for not attending too.

Finding a problem to tackle

We spent a fair amount of time trying to figure out what problem we wanted to tackle. We actually set up a spreadsheet of ideas during the week and encouraged our colleagues to vote on ideas. The only rules were:

* It should be Google Home/Google Assistant related
* It should be fun but also useful in some way
* We need to be able to prove the concept in 1 day

We spent a fair amount of time iterating around ideas and eventually settled on something that everyone agreed fit the criteria and was interesting for all attendees.

What we settled on…

The problem we focused on was an “I spy” style game that also facilitates image annotation for machine learning.

The idea is simple: show a picture of something to a human and ask them to identify the subject of the image. We make this task challenging and interesting by showing the user just a segment of the image.

We show the user a segment of each mage in the Google Home app and they have to identify the subject e.g. “cow”. Image “mother cow” by barockschloss under the terms of Creative Commons CC By 2.0 License

If a segment of an image has nothing interesting about it (e.g. it’s just a picture of some grass) then we expect users to never guess it correctly and over time we’re able to build up a model of which segments do and don’t contain interesting information (perfect for building an object detection machine learning model).

There are lots of other cool things we could start to learn from this exercise, like answering the question “if there are no cows in that segment, then what *is* in it?”. Unfortunately, we didn’t have time on the day.

How to Jovo about integrating with Google Home?

The key to building a Google Action is understanding the way that the infrastructure works. Google Home devices communicate directly with the Google Actions service which can either directly route the user’s request to your app or (easiest option for a hack) you can use Google’s DialogFlow to handle your app’s NLU and only post requests to your app when they need fulfilling.

When a user talks to a Google Home device their request goes to the Google Actions Server, gets routed to DialogFlow for NLU/NLP capabilities and finally gets serviced by your app when ‘fulfilment’ is necessary.

What does Fulfilment mean in this context? Basically, its when your app needs to interact with some kind of external service or do some complex logic. For example, booking a ticket for a movie theatre or ordering you a taxi.

In our picture matching game, Dialogflow can handle most of the simple stuff like starting a new game and quitting. We use fulfilment to request a segment of the image to show to the user and store information about their score and those all important guesses that we translate into tagged image data.

We use a handy framework called Jovo to handle the integration between DialogFlow and our app. This takes a lot of the guesswork out of whether the app will be able to communicate properly with DialogFlow and makes testing on a laptop easier by providing a REST endpoint that can be pasted right into DialogFlow’s fulfilment settings.

Sourcing Images

In the prototype application we made use of ImageNet which provides a database of images sorted into hierarchies based on the words that describe them. The guys spent a long time looking into which image categories were most appropriate and downloading the images for the prototype. We also used the classic Caltech 256 dataset, filtering for images with good resolution and such that the aspect ratio is suitable for displaying on a cellphone without much distortion.

They also spent a great deal of time segmenting images into small squares that could be shown in the Google Home app.

Making the Conversation Flow

Just as important as the technology was getting the conversational UX right. We wanted our game to be fun and entertaining rather than frustrating and boring so we had to work out what the optimal number of ‘tries’ are for an image and also allow the user to give up if they want to and move on to another challenge.

If a user gets a guess wrong or asks for a hint, we show them another segment of the image which may mean that they see part of an animal instead of a bit of background.

We were lucky to have Chatbot UX expert Rory on hand for part of this activity who was able to pass on some of his learnings.

Sourcing Images

In the prototype application we made use of ImageNet which provides a database of images sorted into hierarchies based on the words that describe them. The guys spent a long time looking into which image categories were most appropriate and downloading the images for the prototype.

They also spent a great deal of time segmenting images into small squares that could be shown in the Google Home app.

Putting it all together and seeing the Big Picture

After a good 7 or 8 hours of solid work, our hack was complete. The app, named “Big Picture” runs on your smartphone using the Google Assistant app and there’s a video of how it works here.

What next? Stuff we didn’t have time for…

We were pretty pleased with how Filament Big Picture turned out but there were a few bits and bobs that we didn’t have time to build on the day.

Firstly, we found that the app was pretty pedantic about what label the user gives back. For example, if the correct label is “cormorant” but you say “bird” the system will tell you that you’re wrong. Even though cormorant is absolutely a bird. Using some clever term expansion or even a hierarchy of words we could make the user experience a lot more pleasant. Even if we don’t want to accept “bird” as an answer we could have the app say “close but be more specific” rather than “no you’re wrong”.

We were also keen to look at how we could integrate existing ML models to the app to make it more fun. We found that there is a high proportion of image segments that don’t contain any information about the subject of the photo at all. This makes sense since photographers will often use their subject’s background to frame it and make it more photogenic. We considered using pre-trained algorithms like AlexNet to pre-filter some of the more boring images and make the game more entertaining.

Conclusion

We had a lot of fun at the Filament Hack day and we learned a lot about Google Actions, ImageNet and cormorants. We’ll definitely be running another one soon so keep your eyes peeled.