
How to Track LLM Traffic in Google Analytics: Old vs. New Approaches
Tracking traffic from Large Language Models (LLMs) like ChatGPT, Google Gemini, and others has become increasingly important for webmasters and digital marketers
How to Track LLM Traffic in Google Analytics: Old vs. New Approaches
Introduction
Tracking traffic from Large Language Models (LLMs) like ChatGPT, Google Gemini, and others has become increasingly important for webmasters and digital marketers. Traditionally, this has been accomplished using regular expressions (regex) within analytics platforms such as Google Analytics 4 (GA4). However, as LLMs evolve, leveraging their capabilities to detect and interpret LLM-generated traffic offers a more dynamic and efficient approach.
Traditional Approach: Using Regular Expressions
Regular expressions are sequences of characters that define search patterns, often used for pattern matching within strings. In the context of GA4, regex can filter and segment traffic originating from known LLM sources. Here's a step-by-step guide to implementing this method:
Steps to Track LLM Traffic Using Regex in GA4
- Access Your GA4 Account: Log into your Google Analytics 4 account.
- Navigate to Reports: Go to
Reports
>Acquisition
>Traffic acquisition
. - Apply a Filter: Click the
Add filter
button (represented by a+
icon). - Set the Dimension: Select
Session source / medium
as your dimension. - Choose the Operation: Opt for "Matches regex" as the operation.
- Input the Regex Pattern: Enter a regex pattern designed to capture traffic from various AI sources.
Example Regex Pattern
^.*(ai|\.openai|copilot|chatgpt|gemini|gpt|neeva|writesonic|nimble|
outrider|perplexity|bard|edgeservices|astastic|copy\.ai|bnngpt).*
This pattern matches traffic from multiple AI platforms by identifying specific keywords in the referral URLs. Implementing such filters allows for the segmentation and analysis of traffic originating from these sources.
Challenges with the Regex Approach
While regex provides a straightforward method for filtering known LLM traffic, it has notable limitations:
- Static Nature: Regex patterns require manual updates to include new LLMs or changes in referral URL structures.
- Maintenance Overhead: Continuous monitoring and updating of regex patterns are necessary to ensure accuracy.
- Scalability Issues: As the number of LLMs grows, managing comprehensive regex patterns becomes increasingly complex.
Modern Approach: Leveraging LLMs for Detection
Given the rapid advancement of LLMs, a more adaptive approach involves utilizing LLMs themselves to detect and interpret LLM-generated traffic. This method capitalizes on the capabilities of LLMs to analyze text and discern patterns indicative of AI generation.
Recent studies have explored the effectiveness of LLM-based detectors in distinguishing between human-generated and LLM-generated texts. These detectors operate by analyzing linguistic features and generation patterns unique to AI-produced content.
Implementing LLM-Based Detection
To adopt this modern approach:
- Integrate LLM Detection Tools: Utilize tools and models designed to detect AI-generated content within your analytics pipeline.
- Analyze Traffic Content: Apply these detection models to analyze incoming traffic, identifying sessions likely generated by LLMs.
- Automate Updates: Benefit from the adaptive nature of LLMs, which can adjust to new AI behaviors without manual intervention.
The real win here happens when you combine your analytics data source (e.g. Google Analytics) with an MCP approach so that this sort of LLM based analysis happens opaquely between the data source and the presentation layer. In other words, you want the final output to already have this sort of meta-analysis ready to go and performed automatically rather than requiring one more step by the analyst.
Advantages of the LLM-Based Approach
- Adaptability: LLMs can dynamically adjust to detect new AI-generated content without requiring predefined patterns.
- Reduced Maintenance: Minimizes the need for constant manual updates, as LLMs learn and evolve with new data.
- Enhanced Accuracy: Leverages advanced linguistic analysis to improve the detection of AI-generated traffic.
Conclusion
While regular expressions and lexical analysis have served as a practical tool for tracking LLM traffic in analytics platforms, the rapid evolution of AI technologies calls for more sophisticated methods. Employing LLMs to detect and interpret AI-generated traffic offers a forward-thinking solution that enhances accuracy, reduces maintenance, and adapts seamlessly to the ever-changing landscape of digital interactions. As LLMs continue to integrate into various facets of the digital world, adopting such advanced detection methods will be crucial for accurate traffic analysis and informed decision-making.
Be the first to experience Bashy
Bashy is currently in private beta, and we’re inviting innovative agencies to get early access. Sign up now to streamline your reporting and wow your clients.