Case Study

From Natural Language to Data

How we developed a natural language search tool for Deutsche Bahn

With Spotti we established a new way of interacting with complex data that empowered Deutsche Bahn’s teams to work and collaborate across the whole cooperation. All by providing a new way to search.

Challenge

Making a complex data set actionable for a vast variety of use-cases through natural language search

Multiple filter values on Kayak

A typical challenge with a standard search is to know what to search for. If you do not know the exact terminology, you end up with less precise or no results at all. You know that the information you’re looking for is out there, but no matter what search terms you enter, you can’t find a relevant result. This problem is often addressed with an accumulation of interfaces, especially filters, drop-downs, and arrays of radio-buttons.

Popular search engines and speech interpretation recognition interfaces (Siri, Google Assistant) make efforts to bring searching in line with everyday conversation through natural language search. The search process is carried out in everyday language, phrasing questions similar to ones you would use in an actual conversation. This approach makes it much more likely to end up with a matching result, and the experience feels more natural. It won’t feel like talking to an algorithm.

CollegeHumor’s ‘If Google Was a Guy’ series satirises keyword-based search queries by placing them in a life-like context.

Project Background

Germany’s major train service operator, the Deutsche Bahn, provides transport service without requiring passengers to book in advance or reserve a seat. In relatively few cases, this results in overcrowded trains, which severely impact customer experience. To address that, Deutsche Bahn forecasts the passenger counts for nearly ten thousand trains per day, 100 days into the future.

Peak Spotting is a holistic application for yield management and is vital for experts who know their way around complex datasets.

The Peak Spotting project, we’ve been working on since 2018, builds innovative tools to enable Deutsche Bahn employees across the organization to work effectively with this critical dataset. Traditionally, Peak Spotting’s user interfaces are designed with powerful filters that allow quick drill-down into the data to the train level.

Coming from our ongoing collaboration with the Deutsche Bahn, we were aware of the complexity of the data and the challenge of speaking the right language to achieve predictable results. Before we introduced a natural language search with Spotti, the approach to query the database was made via a complex set of filters.

Brief

Making Deutsche Bahn’s complex prognosis data searchable and actionable to non-experts across the cooperation.

Summary of our Work

We developed an enhanced search engine to make forecast data actionable. The interface is simple and easy to access for everyone without requiring user manuals or onboarding through plain text. Where standard search engines return exact text matches, our solution understands the context and assists in making informed decisions to balance Germany’s train network.

Approach

To enable a natural language search with the Deutsche Bahn data, we established an interpretation layer that automatically distinguishes between space, time, metrics, and unique preferences. These are combined and applied as one filter. Imagine searching for a day in your calendar: the day should be in the next week and without any appointments. You would phrase a question similar to:

All days, next week and without any appointments are handled as key phrases and search interpretation. The query would search for today’s date, the next seven days, and return all days without any entry.

A search for a train would be phrased similarly. Imagine you are interested in all trains the following day at a specific time. Your question to the database would be: show me all trains tomorrow between 10 and 11 o’clock. Or, simplified: all trains tomorrow between 10 and 11 o’clock

All trains, tomorrow, 10 and 11 o’clock are handled as keywords for the system. The search can be as precise as needed, i.e., searching for trains with a load less than 50% at a specific time to a particular destination would be phrased: all trains with a prognosis of less than 50% between 8-9 o’clock from Berlin to Munich

The stated example, or similar searches, is used multiple times by traffic controllers at Deutsche Bahn’s long-distance rail service. Where standard search engines return exact text matches, our program understands the context and assists in making informed decisions to balance Germany’s train network. We managed to build a robust and effective system for interacting with the data by collecting and analyzing thousands of questions asked by Deutsche Bahn employees into a very rough prototype.

Based on Studio NAND’s release early and often mantra Deutsche Bahn’s teams were able to start testing and give feedback on the experience and search results after a couple of weeks into the project.

Conclusion

Natural language search can be implemented in a variety of applications given that the search process is transparent and delivers feedback on how it works.

We established a modular database layer that allows a flexible combination of many search criteria through natural language. Although this technique is not entirely new and can not compete with Siri or Google Assistant, we integrated an elegant solution simplifying the interaction with a complex database through natural language.

Highlighting key terms (trains, load-prognosis, …) let to direct feedback from the system increasing usability and understanding of the interpretation layer. This mutual understanding improved communication across departments. Through continuous tracking of the text in the search bar, we can extend the corpus of Spotti and improve its interpretation.

Our key learning from this project was to rethink search, feedback, and speech to data. We can see a more natural search be implemented in various applications, making it much easier to find what you are looking for. Just a few examples:

Calendar → all days this week with no appointments

Cooking → recipes with corn under 25 minutes

Weather → all days next week with above 14 C°

Finance → all stocks with > 10% increase yesterday

Got more ideas of how natural language could improve search in your company?

Related work
Explore related data visualizations across an ecosystem of devices, platforms, and physical objects.
© 2011 - 2020
Studio NAND GmbH