Building Defensibility in Vertical SaaS with Proprietary Data

Julia Maltby
5 min readJan 24, 2024

Establishing defensibility is becoming an increasingly complex challenge, especially in the realm of AI-native vertical SaaS where the playing field is leveled by the widespread use of similar foundational models. In this environment, where everyone seems to be drawing from the same well of basic AI tools and capabilities, the key to generating defensibility lies not just in the technology itself, but in how companies access and utilize proprietary data.

Historically, much of the data generated by software companies was unstructured, rendering it underutilized or inaccessible due to its complexity and the need for nuanced, contextual understanding that earlier technologies couldn’t provide. However, with the recent advancements in LLMs, companies can now unlock and process this wealth of unstructured data effectively.

As we share here, the best AI-native vertical SaaS companies will be those that can access, grow, and train LLMs against unique datasets over time, leveraging the outputs to create straightforward and unequivocally valuable products for customers, and ultimately competitive moats for themselves.

All Proprietary Data is Not Created Equal:

There are two, primary types of private data that AI-native vertical SaaS companies can leverage to generate defensibility.

  • First Party Data. Data that they capture or generate directly. H2OK, a Flybridge portfolio company building an IoT platform for industrial liquid optimization, is an excellent example of this company archetype. By providing comprehensive sensor systems, H2OK produces a proprietary dataset — that compounds in value — that can be leveraged to drive quality improvements for customers. We believe that many winning AI-enabled SaaS companies will embed new data collection mechanisms into their workflows in this way (e.g. via cameras, sensors, etc.) Industries ripe for this type of disruption include everything from supply chain automation to urban planning. We also believe new data sets will be unlocked by incentivizing consumers and enterprises to share data they’ve previously kept private. Deco portfolio company Hansa, for example, enables SMBs to share financial and operating data to find and access new fintech products and services tailored to their needs.
  • Private Data. Private data that is generated elsewhere, but uploaded or synced to a software platform to become “actionable”, often in conjunction with other public datasets. While these platforms aren’t embedded in the data creation and collection process, finetuning and training models using proprietary data requires domain and technical expertise that shouldn’t be minimized and can certainly create defensibility overtime. Deco portfolio company Blumen Systems fits this company archetype. Their platform enables renewable energy developers to upload private project scope and planning documentation, which is mapped against a grid of standardized land, water, geography, environment, power and geology data to unlock proprietary site optimization insights.
Blumen Systems — platform screenshot

It is worth noting that while incorporating proprietary data is the gold standard, in many cases public data sets can be difficult to acquire and/or structure and utilize, making the outputs of these data aggregations somewhat proprietary. Flybridge portfolio company Finkargo, for example, analyzes and enhances public import and export data to identify, score, and underwrite customers for their fintech and supply chain SaaS offering. My partner Jeff Bussgang share’s more about the company’s approach here. With these company archetypes, data defensibility may arise after a software company onboards early customers who can provide feedback to continually improve the core offering.

Making Sense of Private, Unstructured Data:

The evolving landscape of AI in vertical SaaS is not only about harnessing proprietary data but also about redefining the value of previously underutilized unstructured data. The ability to structure and utilize this type of data, abundant yet often overlooked due to its complexity, creates immense potential for SaaS companies to drive customer value and create a sustainable competitive advantage that is difficult for competitors to replicate or overcome.

To provide a bit more context, in text analysis, LLMs can extract meaningful insights from raw text data, such as customer feedback, by identifying key themes, sentiments, and patterns. This capability is not just limited to understanding the explicit meaning of words, but also extends to grasping the contextual and implied meanings, which is crucial for accurate interpretation.

Beyond text analysis, LLMs’ abilities to integrate with other AI technologies like computer vision and speech recognition further amplifies their utility in handling unstructured data. In combination with these technologies, LLMs can provide a comprehensive analysis that encompasses both textual and non-textual elements. For example, in a customer support scenario, a LLM can analyze a customer query, interpret the sentiment behind the words, and also process any accompanying images or audio to provide a holistic understanding of the customer’s issue. This integration enables LLMs to offer nuanced insights that were previously unattainable with traditional data processing methods.

Moreover, LLMs are also adept at learning from the data they process, continually improving their understanding and accuracy. This aspect of continuous learning makes LLMs invaluable for vertical SaaS companies, leveraging these models to uncover trends, predict customer behavior, and make informed decisions, thereby gaining a competitive edge in their respective industries.

Looking Ahead:

From a venture perspective, understanding how vertical SaaS companies will build unique datasets and train models over time is paramount, and arguably of newfound importance to our investment criteria. However, these models must still be accompanied by all the “standard” vertical SaaS company attributes investors have prioritized for years — exceptional founder market fit, earned industry secrets, solving a “hair on fire” problem, unique distribution motions, high switching costs, economies of scale, etc. (Even the best data and models must be packaged into applications that create clear customer value.)

We’ll cover this in a separate post, but just as company attributes like proprietary data access and utilizing domain expertise to utilize unstructured data have gained newfound importance in generating defensibility, it’s likely that other criteria will become less critical, or shift in some capacity. As we note here, vertical SaaS products have historically served primarily as enablers of specific functions (e.g. payments, advertising, workflows etc.) Moving forward, AI-enabled vertical SaaS products will generate the outputs these systems previously enabled humans to perform. This shift will likely bring about broad changes in UX and workflow design.

As always, if you’re building an AI-enabled vertical software product, we’d love to connect — julia@flybridge.com.

--

--

Julia Maltby

Early Stage Investor @ Flybridge & X-Factor Ventures | GP @ The MBA Fund | Previously @ Underscore VC, WeWork, and Plum Alley Investments | Wharton MBA