What About Business Models?
In the first part of this article series, “Search Tools in a GPT World,” we looked at Search Tools in a product environment where AI GPTs are clearly on a tear. Now we’ll look at how technology and consumer sentiment shifts are impacting economics and business models.
The rise of GPTs introduces significant shifts in business models underpinning search and information retrieval. Search engines operate primarily on ad-driven models, based on traffic, clicks, and ranking. This impacts income to both search engines and the publishers to whom they drive traffic. We’re going to focus on the search engines themselves. In contrast to traditional search, GPTs seem mostly out of the starting gate with pay-per-use, subscription, or freemium models. This may be a reflection of the resource-intensive nature of generating real-time responses. It’s a simpler business model than search, which depends on a complex ad services ecosystem. As well, ads alone might not be sensible from a P&L perspective. Let’s review some of the business models.
- Cost Per Query (CPQ) Dynamics: GPTs have higher computational costs per interaction than search. This will change over time as efficiencies are realized. Regardless, cost for inferencing differs from index-based retrieval, putting GPT providers into a different realm of operational expenses. Pay-per-use models, tiered access, or limited free queries are emerging solutions, fostering a competitive landscape where efficiency and cost management are crucial. While providers will battle in terms of price/value, they’ll at least have some pricing power. With ad supported services, ancillary marketplaces dictate pricing power. And price elasticity for advertisers varies widely by category. Such differentials may not align with GPT costs.
Sectors like e-commerce or finance may tolerate higher ad prices and align ROI on costs, while others like entertainment or blogs may be more price-sensitive. With a GPT, this could mean cost misalignment given the computational expense of generating complex queries or rich outputs for certain categories may not correlate with the willingness of advertisers to pay higher ad rates. Hence mismatches between operational costs and monetization potential. This variability in elasticity could create opportunities for niche AI providers to target verticals where pricing elasticity is favorable, while general-purpose services may struggle to optimize across all categories. Departure from ad supported information retrieval reflects economic complexity regarding the real-world challenges of monetizing AI models at scale. - Subscription and Enterprise Integration: GPT services appear to gravitate towards subscription models, bundling tools with productivity features. For enterprises, GPTs offer custom solutions, enabling integration into internal knowledge systems, customer service, and workflows, creating direct monetization channels. Variable costs will always come back to the query though. So subscriptions will have to account for forecast usage.
- Value Shift from Ads to Utility: As GPTs deliver direct answers rather than drive users to websites, ad-based search models face disruption. They might just not make sense in GPT contexts. This could force companies to explore new ways of embedding ads or sponsored content within GPT responses, potentially sparking ethical debates about transparency and trust. Challenges include the legality and ethics of advertising on synthetic content derived from others, how to target, and whether vendor pricing power can be profitable.
Let’s consider some examples. Categories like Healthcare, Insurance, Travel, and such have higher Cost per Thousand (CPM) display rates, Cost per Acquisition (CPA) rates, and so on. Others like Automotive, Retail or general information have lower rates. This might be due to value, margins, competition, or some combination. And rates change with market dynamics, consumer behavior, and ad trends. So advertising models might not make sense here. We’ve already seen other industry challenges in cable and streaming television when it comes to channel bundling with just a handful of dozens of channels. How could anyone possibly sort out P&L contribution margins across billions of queries and thousands of general taxonomic categories with varying depths of inferencing per session? Not to mention what is likely lower ad attention predisposition of users focused on tasks more engaging than search.
Advertising may have some place in GPT results sets. But there’s complexity around doing so as compared to simple usage pricing. It’s likely this area will evolve. - Niche GPTs and Vertical Specialization: GPT versatility encourages emergence of specialized AI-tools, (legal, medical, financial), where users pay premiums for expert-level precision and domain-specific knowledge. This creates space for smaller, specialized players to carve out markets that traditional search engines may overlook. Interestingly, these solutions might be more cost-effective. We’ll discuss cost structures shortly, but to preview, techniques like prompt caching using vector databases can reduce expenses. This approach could be particularly effective in vertical applications, where user queries are more likely to overlap. Also note that vertical markets might use their own, (likely smaller), foundational models and therefore have better results than the larger well-known general models at lower costs.
In this evolving landscape, the fundamental shift is from driving users to content to delivering content directly or actual insights beyond existing content, forcing businesses to rethink engagement, monetization, and customer retention strategies for products and services using AI GPTs.
Cost Structures in the GPT Search Landscape
The cost structures of GPTs diverge significantly from traditional search engines, reshaping how services are priced and delivered. This shift introduces both opportunities and challenges for providers and users alike.
- Compute-Intensive Operations
GPTs require substantial computational power for each query, (which in the case of GPTs are referred to as inferences), making their per-use cost much higher than traditional search engines, which rely on pre-indexed databases. This creates a scalable but expensive model, where costs increase directly with usage. The following is repeated from an earlier article: Content Publishers Must Navigate the Rise of GenAI.
“It’s challenging to estimate, but the variable cost of a search result (Google, Microsoft) is likely fractions of a cent, similar for static web pages. Inferencing costs, however, could range from $0.01 to $0.10 as of late 2024, depending on token count and query size. With billions of queries, costs add up quickly. Inferencing costs also depend on how many “tokens” are being processed; that is, the size of the query. (AND the output.) So the variable costs may be challenging to estimate. (See Decoding the True Cost of Generative AI for Your Enterprise and Cost of AI: Inference, and this most excellent article: The Inference Cost Of Search Disruption – Large Language Model Cost Analysis.) This being said, AI companies up and down the stack are focusing on functional economics and costs are expected to drop over time.”
- Traditional Search: As discussed, Low cost per query.
- GPT Search: As discussed, Higher cost per query (cents to dollars, depending on complexity).
- Pay-Per-Use Models
GPT services may adopt a pay-per-query structure, charging based on complexity or length of responses. This aligns with cloud services, reflecting resource consumption rather than flat fees. This model works well for infrequent or specialized use but may deter heavy, ongoing usage due to unknown variable costs. Example: Charging based on tokens processed, with tiers for more extensive queries.
- Subscription-Based Access
To balance cost predictability for users and ensure consistent revenue streams, many GPT providers introduce subscription models with usage caps or unlimited access. This model mimics SaaS pricing, appealing to businesses and power users. Tiers may offer varying degrees of complexity, response speed, or customization, enabling scalability without per-use cost anxiety.
- Freemium and Limited Access
Freemium models provide basic, limited access for free, while monetizing premium features. This is akin to how search engines offer free results but monetize through ads and enhanced services. Free queries may come with throttled performance or simplified responses, nudging users toward paid plans.
- Enterprise and Custom Licensing
Larger enterprises often prefer contracts or licensing arrangements that bundle GPT services into broader ecosystems, offering predictable, flat-rate pricing at scale. This mirrors bulk API deals and enhances integration into internal workflows, where GPTs serve as embedded tools rather than standalone services. - Token-Based Pricing
A granular form of pay-per-use, token-based models charge based on the number of words or characters processed. This has transparency but may complicate cost predictions. E.g., 1,000 tokens might equate to about 750 words, with pricing tiers scaling accordingly.
- Hybrid Models (Ads + Paid Access)
Providers may experiment with hybrid models, with free or low-cost access supplemented by ads or sponsored content. This balances cost while retaining elements of traditional search monetization.
Sidebar: Content Costs
There’s several popular sources to integrate existing Large Language Models. LLM foundational models are large-scale neural networks trained on diverse datasets to generate human-like text and perform complex language tasks. But they don’t have everything and they’re static. They’re starting places for building LLM solutions and include whatever the current versions are from offerings by OpenAI, Anthropic, Google, Meta, Mistral, and others. (And let’s not forget about IBM and Watson. It’s an older name, but with some serious present day superpowers.) In the prior list of cost structures, I ignored the cost of content itself. Not to mention extraction and transformations along the machine learning operations, (MLOps), path. For a variety of applications that need more data, enhanced content will be its own large cost; to acquire, clean up, and maintain. This is its own large and potentially controversial topic, so it’ll have to be considered elsewhere. Suffice it to say, at a high level there’s two content sources, owned and captured. (I.e., maybe I have my own corpus of info, (medical data, customer service data, etc.), and then there’s that which might be scraped from elsewhere; whether paid or not. Let’s also remember there’s a few general purposes for such content; basic training to allow for natural language processing at all, possibly fine tuning for a category, and then additional retrieval of content for currency. Each of these is its own sizable topic. And will all also need to be part of the cost equation.
Sidebar: Retrieval Augmented Generation (RAG) Costs
A foundational large language model is typically trained on some massive dataset. In so doing, we can use it for all manner of queries. But for some specific data, we’re really using this corpus primarily to allow for synthesis of text generation. Custom datasets added to the model using something called fine-tuning can give us more precision and accuracy, and yet are also expensive up front. And in any case, don’t include more current info. Retrieval Augmented Generation, (RAG) can add additional information into a query with better context. However, when done well, this requires an entirely new data pipeline and corpus, typically in the form of a vector database. (And perhaps multiple sources.) Such queries need to themselves get a good search result from this data source prior to going through the LLM foundational model. In other words, what’s being fed into the LLM will still not be all available information. Even large “token context windows” might not include all pertinent information in a custom database used to enhance queries. However, the larger the amount of RAG data poured into such queries, the more expensive these will be as well. Bottom line is that RAG can be a truly useful method, but setup, maintenance… the whole RAG ops process… is likely a non-trivial expense.
Conclusion
The transformation from traditional search to GPT-driven models signifies a signficant shift in the economics and strategies of online information retrieval. As businesses navigate the complexities of higher per-query costs and evolving monetization approaches, the focus will increasingly center on balancing operational efficiency with user value. This landscape not only challenges incumbents but also opens doors for new players to redefine how knowledge is accessed and monetized in the new AI landscape. The new cost structures favor flexibility and scalability but introduce economic friction, prompting innovation in how GPT services are packaged, marketed, and consumed.
And next is the final article in this series: Strategic Responses to GenAI from Search
(Header image courtesy of freepik.com)