Building machine learning products: lessons from Papercup's product team | Papercup Blog
Building machine learning products: lessons from Papercup's product team
January 31, 2024
7 min read

There is certainly no shortage of product building advice out there, but the majority of it doesn’t hold for products with machine learning (ML) at their core. As the boom in such products hitting the market shows no sign of abating, the aim of this article is to impart practical learnings from the Papercup product team’s experience utilizing a vast array of new capabilities. In turn, we hope to facilitate the development of other valuable products that can also harness machine learning.

Papercup is building a cross-lingual, speech-to-speech engine, initially focused on dubbing video for content owners and making the world’s content watchable in any language. To date, videos we've localized have reached over 750m non-English speakers. Our first product consists of a multi-stage pipeline, the output of which is validated and improved by skilled linguists. We utilize both off-the-shelf APIs for machine learning as well as researching and developing our own proprietary speech models, focused on expressivity. We are now a team of around 60 people, and roughly a third of our company is machine learning engineers and researchers.

We are going to take a look at:

  • What do we mean by ML products?
  • What type of markets are now within reach?
  • Key concepts
  • Teams: Who can build ML products?
  • Implications for product & startup economics
  • Reading list & references

💡 Disclosure: I am not an expert on the ethics, politics or in-depth development of machine learning models. I can only speak to this from a product and startup lens.

 

What do we mean by ML products?

To put it simply: products with machine learning models at their core, that would not have been viable without them. ML-enabled features including risk analysis, text prediction and recommendation systems have been deployed at scale for several decades in industries such as finance and the telecommunications sector. Machine learning has long been the driver of social media algorithms and autocomplete, but the swift adoption of ChatGPT marked a significant step change in product capabilities.

Former GitHub CEO Nat Friedman spoke to Ben Thompson at Stratechery about the paucity of real-world AI applications beyond Github Copilot:

“I left GitHub thinking, “Well, the AI revolution’s here and there’s now going to be an immediate wave of other people tinkering with these models and developing products”, and then there kind of wasn’t and I thought that was really surprising. So the situation that we’re in now is the researchers have just raced ahead and they’ve delivered this bounty of new capabilities to the world in an accelerated way, they’re doing it every day.
So we now have this capability overhang that’s just hanging out over the world and, bizarrely, entrepreneurs and product people have only just begun to digest these new capabilities and to ask the question, ‘What’s the product you can now build that you couldn’t build before that people really want to use?’ I think we actually have a shortage.” Source ($)

When we refer to ML products right now, we refer to a new set of capabilities with the potential to become new products. For these products to become a reality, entrepreneurs and product people have to figure out how to best harness them for real world uses.

What type of markets are now within reach?

One of the most on-point descriptions I have found on the potential impact of ML products on markets was written by Matt Bornstein & Martin Casado (a16z):

“AI has enormous potential to disrupt markets that have traditionally been out of reach for software. These markets – which have relied on humans to navigate natural language, images, and physical space – represent a huge opportunity, potentially worth trillions of dollars globally.”
Most AI applications look and feel like normal software. They rely on conventional code to perform tasks like interfacing with users, managing data, or integrating with other systems. The heart of the application, though, is a set of trained data models. These models interpret images, transcribe speech, generate natural language, and perform other complex tasks." Source

While ML products definitely look like traditional software, they do in fact feel different. The key concepts below go some way to explaining why. These are ideas I’ve found helpful for building products and providing users previously impossible affordances.

Key ML concepts for product managers

Failure modes

ML models fail frequently. It’s the product managers’ job to make sure they fail gracefully. Nat Friedman described the question succinctly reflecting on building GitHub’s Copilot:

“How do you take a model which is actually pretty frequently wrong and still make it useful?” Source ($)

Good examples of handling model failure gracefully:

Midjourney

In-painting allows users to reconstruct unsatisfactory or incomplete segments of their text-to-image response. Midjourney does not set the expectation of generating a pixel perfect image, as it allows the user to pick and choose what areas to regenerate.

Github Copilot

After initially prototyping a question-and-answer chatbot, GitHub built a code-synthesis autocomplete UI, which attempts to provide complete code segments to the developer. This provides a frequent touchpoint for the model and user, allowing the developer to cultivate an intuition for when they may or may not benefit from the suggestions. 

Not so good examples of handling model failure:

Tesla Full Self-Driving

Tesla’s FSD feature allows the car to drive self-directed, with the user instructed to pay attention and be prepared to use the wheel and brake at a moment’s notice. With full self-driving still beyond the horizon, the implementation of this feature encourages the user to relax and trust the underlying model in most circumstances. It just so happens that the moment the model fails the users’ response may be one of the more consequential events in their life.

Additional valid strategies to mitigate failure risk include model ensembles, model monitoring, redundancy, human oversight and explainability (ask ChatGPT for further reading).

Who can build ML products?

As an ML product manager, the shape of the teams you have exposure to will differ from those of PMs with strictly hardware or software backgrounds. These teams may have different constraints, timelines and hobbies to other engineering teams you have worked with. Here’s an illustration of the responsibilities of an ML product team based on my experience.

Core Research

Core research teams look for new knowledge. They research and develop new modeling solutions, and often read and write academic papers. This team can conduct fundamental or targeted research depending on their organization’s size, structure and timelines for research and development. Their cycles are usually slower than other groups within research and development, but their discoveries can set the foundation for long-term strategic defensibility.

Applied Research

Applied research teams look for solutions. They collaborate with core research and engineering teams to ensure the successful deployment and utilization of both proprietary models as well as off-the-shelf and open source projects where required. Their product cycles are closer to traditional software projects.

ML Platform

ML platforms teams, akin to DevOps, build tools and services to simplify, optimize and accelerate the ML lifecycle.

Data

Data may be an individual team, or it may be a shared responsibility across the research and development function. Data may be created in a whole host of ways depending on the use case (commissions, synthetically generated, purchased, annotated etc.).

Data cleaning and preparation is a primary role for the data function. Depending on the product use case, this may necessitate a full stack team i.e. for quality control or human feedback. Data is essential to the functioning of an ML-enabled product team, but plenty has been written about this elsewhere.

Research ≠ Production

ML research teams will look to narrow the scope of the problem they are working on in order to build something which has not been possible before. Papers and sample results can be closely controlled and will not always replicate or translate to real world impact.

On the other hand, sometimes models are released that have their true power under-appreciated at the time. There was nearly two years between the release of comparatively unheralded GPT-3, and the industry shifting ChatGPT, which was powered by a GPT-3 variant. OpenAI devoted significant effort to the productization of GPT-3.5, specifically on reinforcement learning from human feedback and the form factor to fine tune the model’s performance, with great success.

Understanding these caveats is key to managing expectations while still moving quickly and iterating on ML products.

Timelines and certainty

ML research cycles are longer than engineering. But paradoxically, industry progress in ML is (broadly) significantly faster than engineering. Skilled ML engineers can move extremely quickly, although model training time can still create bottlenecks.

Advances in one sub-field of ML is usually a leading indicator of progress in another, given the generalizability of tooling, architecture and modeling techniques. An ongoing example of this is the application of transformer architectures from large language models (LLMs) to large speech models (LSMs). Google’s BERT family of models, first introduced in 2018, is a prime example of a paradigm shifting technology arising from research, which impacted first it’s sub-field (Natural Language Processing, or NLP), before generalizing across machine learning and technology as a whole.

Models: Internal, off-the-shelf or API

ML products require performant ML models. These can be built internally, taken from the open source community, or queried via API. Especially for startups with limited resources (compute, capital, time), teams usually will not build and train their own base models from scratch. However, the right answer for your team will vary depending on what you are building and your product strategy.

Papercup’s ML dubbing system requires models capable of generating expressive speech, beyond the levels capable of most text-to-speech (TTS) systems. It makes sense for us to focus our model building capabilities here – on what is a more defensible intellectual property. Other parts of our video production pipeline and analytics stack require less bespoke models.

Our engineering and ML teams will almost always build on the open source communities or call on OpenAI’s API when needed to avoid reinventing the wheel.

Deterministic vs Probabilistic

One of the things we talk about when we say machine learning products feel different to software, is the deterministic (software) versus probabilistic (machine learning) dichotomy.

When you input ‘5+7’ into a calculator, you will always receive the number ‘12’. There's no uncertainty, no variation in output for the same input, and no learning from past computations. It's entirely deterministic.

Mach