Chatbots in Financial Services: A Usability Study

You are here

Chatbots in Financial Services: A Usability Study

Posted January 21, 2019 in Business Technology & Digital Transformation Strategies, Data Analytics & Digital Technologies Cutter Business Technology Journal


In this article, Diarmuid Lane looks at text- versus voice-based question answering (QA) systems in financial services. He explores a larger question: how efficient are chatbots, really? While the age of NLP-based QA systems is well underway, there are still hurdles in usability, security, and privacy to address. Lane offers a reality check on some of the supporting technologies that enable transaction processing of all kinds. While we’re sometimes a little too eager to adopt new technologies, he reminds us that testing — in this case, usability testing — is a necessary step.

With the recent advancements in natural language processing (NLP), the widespread adoption of chatbots has become commonplace. The term “chatbot” now defines what’s known as question answering (QA) systems, which focus on locating and retrieving specific information based on a user’s question posed in natural language (i.e., text or voice). This approach differs from traditional search engines, which display a list of possible results to a user’s query, rather than a specific answer. The recent surge in such QA tools as Apple’s Siri, Amazon Alexa, IBM Watson Assistant, and Google Assistant exemplifies the prolif­erations of such systems. Today, contemporary organizations are rapidly adopting chatbot systems and developing QA systems to carry out several functions, from administrative tasks to complex HR roles.

As we explore in this article, the market penetration of both text- and voice-based QA systems is on the rise. This increased interest in, and development of, chatbots raises the following question: which medium of search is preferred, or is more effective, for managers within organizations — text or voice? To answer this question, let’s examine some insights I gained from interviews with finance professionals to develop my own study of perceived usability to quantify the preferred system by managers in the financial services industry.

Chatbots Show Increasing Commercial Potential

The International Data Corporation (IDC) estimates that data growth within the digital universe will double every two years from 2010 until 2020, totaling 40 trillion gigabytes, or 5,200 gigabytes for every man, woman, and child. Due to this growth, systems that can accurately and effectively locate specific data points within domain-specific information bases will become integral to understanding and managing vast amounts of information. IBM’s Watson is one such example of a QA system utilizing the power of artificial intelligence to harness data growth. Watson, with its advanced NLP capabilities, has been utilized for myriad use cases, from medical diagnosis to serving as an interface for several Web-based query systems, based on frequently asked questions. It has even participated in, and won, Jeopardy!, a popular TV game show.

Due to the ever-expanding growth of data generation, NLP consultant and columnist Robert Dale predicts that narrowly focused industry- or organization-specific chatbots will generate global revenue of US $623 billion by 2020. This startling prediction reconciles with the fact that 80% of businesses have plans to implement, or will have implemented, chatbots by 2020. Chatbots being released to today’s market increasingly allow customers to manage relationships with a business without interacting with a human. These applications range from entirely digital banks where chatbots handle all customer interactions to Facebook Messenger–enabled chatbots, which have a proven track record of enabling increased sales and customer interaction.

Fashion/lifestyle company Tommy Hilfiger, through use of a Messenger chatbot, recorded an 87% return rate to the chatbot after its first initial use, with consumers exchanging 60,000 messages and spending three and a half times more time conversing with the chatbot than via any other digital channel. And it seems that voice-based chatbots have penetrated the world of personal computing; yet to date, commercial chatbots have almost entirely been confined to text-based systems. However, as voice-based chatbot systems increasingly gain traction in the realm of personal computing, they will inevitably enter the workplace. We can see the adoption of such systems today in its infancy.  

Dynamic QA systems have the potential to create great benefits for organizations, but the preferred medium of interaction — text or voice — is not yet known. Given minimal current research comparing the two technologies, I ran a usability study to examine two competing QA proofs of concept (PoCs) — a text-based system developed using IBM Watson Assistant and a voice-based system developed using Amazon Alexa — in order to determine which medium of interaction professionals in the financial services industry prefer. Each of these automated QA systems was designed for querying data in natural language, developed specifically to enable financial services managers to query their data.

Concept of Usability: Why Test?

The International Organization of Standardization (ISO) defines usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.” Breaking that definition down further are the three main dimensions of usability:

  1. Efficiency — the amount of effort required to complete a task

  2. Effectiveness — the ability to complete a task

  3. Satisfaction — the degree to which users are happy with their experience while using a system

System usability is a key determinant for system usage and is pivotal for management to consider when designing and developing a system. Understanding how organizations and management gain insights into the method of adoption and role of certain technologies is essential to both system and business success. Thus, knowing the interaction preference associated with QA systems is an integral first step for firms in order to perceive how they might effectively embrace emerging technology.

Voice-based systems to date are aimed primarily at personal users rather than commercial organizations. This raises the question of which inter­action medium is best for organizations to implement — text or voice? Usability studies, such as the one presented in this article, can effectively answer this question.      

Key Insights Drawn from the Financial Services Industry

To begin my study, I drew from previous interviews conducted with four professionals from the financial services industry, each working in a different organization, to gain an understanding of how an organization would typically use a QA system. The interviewees held the following positions: chief operating officer/head of innovation EMEA (Europe, the Middle East, and Africa), financial modeling analyst, equity trader, and business intelligence analyst. Within their specific roles, the individuals carried out tasks similar to those that a QA system could complete and would, therefore, represent typical users of such a system. The proposed QA systems would supplement rather than supplant these individuals, with mundane data-retrieval tasks outsourced to such a QA system, increasing the individuals’ productivity on more valuable tasks. The interviews conducted gave an understanding of what effect an automated QA system has on the daily operations of those working in the financial services industry and provided insight to aid in the design of such a QA system. Moreover, the interviews highlighted what is of greatest importance in terms of QA system usability within the context of financial services.  

To establish the preferred interaction type (text or voice), I then developed the two POCs. These POCs reflected importance in terms of QA system usability (garnered from the previously mentioned interviews). Utilizing IBM Watson Assistant, I developed the text-based user interface (TUI) and presented it to the user through an HTML website format (see Figure 1). Using Amazon Alexa, I developed the voice-based user interface (VUI) and exhibited it to the user via an Amazon Echo device.

Figure 1 — IBM Watson Assistant  with sample question and answer.
Figure 1 — IBM Watson Assistant with sample question and answer.

By generating a list of queries, both POCs were designed to answer typical questions. The sample in Figure 1 is one such QA example — Question: “Where are the top three investor domiciles?” Answer: “The top three investor domiciles are Luxembourg, UK, and France.” The systems were designed to identical specifications to minimize any form of system bias.

To determine the best system, I devised a test (i.e., usability study). The study, conducted within a financial services organization, assessed the three previously mentioned ISO main dimensions of usability (efficiency, effectiveness, and satisfaction) utilizing the system usability scale (SUS) — see Figure 2 — to quantify, and assess, the usability of each POC. SUS is renowned as a leading industry usability survey and has been successfully extended to assess a wide range of both software and hardware products.

Figure 2 — System usability scale.
Figure 2 — System usability scale.

For the purpose of this study, I selected seven financial services professionals from the target financial services organization to participate. The selected participants were representative experts in their field who were comfortable with, and greatly experienced with, the types of data queried by the system. The participants were asked to make 10 typical queries on both the TUI and the VUI. The 10 queries consisted of locating and querying specific data points from a financial document. The tasks designed for both the TUI and the VUI allowed participants to utilize both systems in a workplace environment, which ensured accurate usability scores.

Study Findings

Upon concluding the usability study, the necessary metrics required to rank and rate the POCs were calculated. Overall, the TUI (IBM Watson Assistant) outperformed the VUI (Amazon Alexa).

The SUS scores were calculated for all participants and for both the VUI and TUI, along with a mean score for both systems. The scores were described using a grade-scale format, ranging from A to F based on the SUS score received. The grading scale quantified, or graded, the individual and mean SUS scores, allowing for a more meaningful interpretation of SUS scores. (For example, a system with a SUS score of 70 would fall within the 50th percentile [the industry average SUS score] and would receive the grade scale of C.)

Overall, the TUI outperformed the VUI in all but one participant session, in which case both systems received an A grading. On average, the TUI received an A, placing it in the upper band of the acceptability range, while the VUI received an average grade of C, placing it in the lower band of the acceptability range. The acceptability range covers scores between 70 and 100, with anything falling below the lower bound of 70, or grade C, deemed unacceptable from a usability perspective.

The Reality of Voice-Based System Adoption Within Financial Services

The study revealed that text is the preferred medium of interaction with a QA system for financial services professionals. Upon interviewing the seven usability testers, all felt that TUI QA systems were more suited to wide-scale adoption within the financial services industry. Most participants spoke of their familiarity with text-based systems and how a TUI would integrate more effectively into current organizational workflows.

Numerous usability and adoption issues appeared throughout the testing process. One participant noted the need to effectively “trust” hearing the answer to a question correctly while using the voice-based system, leading to lower usability scores. Another study participant claimed to be more “confident” using a TUI with the answer shown on screen. Since the roles fulfilled by the study participants are information-critical, this general lack of trust in a voice-based system resulted in lower usability scores and would thus hinder wide-scale adoption in the financial services industry.

Moreover, several study participants cited a glaring security flaw associated with a voice-based system: sound. Consensus formed among participants that fellow employees would be reluctant to use such a system because the voice-based system could relay sensitive information to the user. In an industry like financial services, where sensitive client and organization information is of paramount importance, such a flaw could hinder the adoption of voice-based systems.


Prior to the adoption and implementation of NLP-based QA systems, or chatbots, management must first make decisions about the preferred modality of interaction within their organization. Upon establishing a preferred medium of interaction, management can form an imple­mentation plan based on context-specific variables, such as focused testing on the preferred medium of interaction, specific context of use, and technology acceptance testing.

The concept of system preference and user experience (UX) is not solely linked to system usability and may be derived from industry, organizational, or personal norms and practices. The UX concept comprises all aspects of user interaction with a com­pany’s products and services. The goal of usability evaluation is to quantify how well users can learn to use a product to achieve specific goals. This initial usability study gives management a base UX understanding in relation to text- and voice-based QA systems.

Upon reaching a defined mode of preferred interaction, firms can begin to understand how to incorporate chat­bots. By explicitly studying and defining the preferred interaction medium, organizations can successfully foster adoption rates, especially in relation to deployment and engagement. In my study of professionals operating within a contemporary financial management firm, the usability of a text-based QA system ranked significantly higher than a voice-based equivalent. Furthermore, all usability study participants formed a consensus during a post-study interview that a text-based system was more suited to current organizational and industry practices. To improve voice-based systems, major flaws must be addressed, primarily involving the area of data security and privacy. Although voice-based QA systems are becoming increasingly prevalent in the personal computing world, this study shows that for professionals within the financial services industry the preferred medium of interaction with a chatbot is text.


The research highlighted in this article was funded and supported by State Street, through the State Street Advanced Technology Centre at University College Cork (UCC), Ireland.

About The Author

Diarmuid Lane's picture

Diarmuid Lane is a data scientist and former researcher with the State Street Advanced Technology Centre, University College Cork (UCC), Ireland. His research revolves around the concept of usability and user experience in relation to natural language processing–based question answering systems, or chatbots.