Since the early 2010s, the internet promised us a near-future in which we would be surrounded by self-driving cars and governed by all-knowing artificial intelligence.

Right now, however, few machine learning-centric startups have become as ubiquitous as Zoom, Notion or Slack. If the AI and machine learning industry is to be worth the $15 trillion it’s predicted to be by 2030, we only have another decade left to hit that goal! Using a hypothetical AI product targeted at recruitment agencies, let’s explore some challenges faced by AI/ML startups on their road to becoming household names.

Our imaginary startup uses speech recognition to help with candidate notetaking and natural language processing to create more efficient screening interviews. On the surface, it’s a rather exciting product: it’s trying to solve a huge problem (the laborious screening of candidates) by increasing efficiencies for recruitment using cutting-edge speech recognition (ASR) and natural language processing (NLP). Such a company would usually run into three challenges: how the machine learning and product create value for the customer; the inescapable fact that machine learning systems sometimes make errors; and relatively low returns from running the ML models themselves.

Defining value

We know established fields such as speech recognition (ASR) and natural language processing (NLP) are capable of astounding feats, but it’s challenging to make these processes accessible to the end user to solve their needs. In our example, to get a recruiter or a recruitment agency to pay for the product, our startup needs to understand how this product will generate money or value in the form of the aforementioned increase in efficiency. For the product to be successful, the ASR and NLP systems need to suggest improvements to interview calls so individual recruiters can implement the suggested changes, which in turn have to result in:

  1. More efficient and higher quality interviews
  2. A reduction in recruiters’ time spent time screening and an increase in candidates flowing through the pipelines
  3. The recruitment agency placing more candidates, improving its success rate resulting in better client retainment and a better track record for prospective clients

Lots has to happen before our product can generate value for the recruitment agency. This is not to say that such a product wouldn’t work or isn’t useful, but having numerous steps means that the efficiency savings created directly by the product cannot be easily measured. In other words, for a product to be successful, the ML system needs to be tightly coupled to the value the product generates for its users.

ML systems cost money

Another issue is the marginal cost. For our recruitment product, practicalities get in the way — a single recruiter would probably be on the phone with a candidate around 4 hours a day. At 260 working days a year, this equates to 1,000 hours of data per user per year to be processed. That’s a lot of input for a machine learning model to churn through.

The tech giants’ heavily optimised ASR APIs cost at least $0.73 per hour of speech data, and if we (quite reasonably) assume that this is the cheapest any ASR system could possibly cost, this equates to at least approximately $730 a year. This marginal cost will need to be absorbed by the customer recruitment agency. And that’s just the speech recognition.

Most machine learning systems require GPUs for fast inference times, which can be expensive to run. Especially for smaller startups that don’t have the resources to make the hardware optimisations needed for cheaper inference. A more viable ML product is one in which the ML system focuses on “low volume, high impact” data, where few inferences are needed over the course of a month or a year, but each run and output of the system is worth more.

Inevitable ML errors

It’s no secret that statistical machine learning systems can sometimes be wrong. Claims that state-of-the-art ML systems achieve accuracies of 95-99% often neglect to address the remaining 1-5% error rate, which amounts to a 1-in-100 or 1-in-20 chance of error every time you run the model. Taking our recruitment product: four hours of phone conversations is a lot of words; even if your ASR system is 99% accurate, that’s also a lot of erroneous predictions every day.

There’s an established solution for this, especially in the field of translation, where a human-in-the-loop process is used to spot and correct any errors that have arisen from the ML system. This process can be managed in two ways — by bringing the human-in-the-loop process in house, or by making the end users the humans-in-the-loop. However a company chooses to manage the process, it’s both an additional cost and another step in the chain to creating value.

ML-based products are possible

There are companies that avoid these three pitfalls and create successful commercial products. Unbabel, for instance, uses machine translation systems to empower a human-in-the-loop process for translating customer support messages and emails. Meanwhile, Calipsa uses computer vision systems to reduce false alarms from security camera alerts by up to 90%, so that the important alarms can be passed through to human operators.

Like these companies, Papercup has grappled with these challenges to ensure that the quality of our translation system has a direct impact for our customers; that we specifically target customers for whom we can create the most value, while our human-in-the-loop process ensures a high quality output that is free from machine learning errors.