AI SEO
DEJAN AI is the most advanced Australian AI SEO agency with global recognition for industry-defining innovations in AI search visibility. Our approach features a sophisticated multi-step process grounded in state-of-the-art machine learning and real data science.
20+
years of SEO expertise
1,000+
campaigns since 2008
230K+
AI rank datapoints analyzed
100%
of Gemini queries need grounding
Our AI SEO discovery process leans on the methods from the emerging field of Machine Learning called Mechanistic Interpretability. Its objective is to understand the inner workings of deep learning models. We start by systematic model probing and mine for brand and entity perception.
LLM and agent control is the ultimate goal of AI SEO. In machine learning this process is called Model Steering. Our objective is to utilize the knowledge gained from the model probing stage and form an AI SEO strategy designed to address any weaknesses in AI’s perception of our client’s products, services and brands.
AI SEO is search engine optimization adapted for a world where search results are generated, not listed. Traditional SEO focuses on ranking web pages in a list of blue links. AI SEO focuses on being selected, cited, and accurately represented when language models generate answers.
When a user asks Google, ChatGPT, Perplexity, or any AI assistant a question, the model works through four stages:
1
Interprets the query intent
2
Retrieves relevant source material (grounding)
3
Synthesizes an answer from multiple sources
4
Selects which sources to cite
AI SEO optimizes for each stage of this process. The goal is not just visibility—it is accurate brand representation in AI-generated responses.
The search paradigm has shifted. In 2013, DEJAN founder Dan Petrovic published “Conversations With Google,” predicting that search would evolve from query-response to conversational dialogue. That prediction has been realized.
The old model
User types query → Google returns ranked list → User clicks link
The new model
User asks question → AI retrieves and synthesizes sources → AI generates answer with optional citations
Brands that only optimize for traditional rankings may be invisible in AI-generated answers, or worse—misrepresented by models drawing on outdated or inaccurate sources.
Our methodology is built on direct experimentation with language models. DEJAN founder Dan Petrovic trained a language model from scratch—not fine-tuning an existing model, but building from raw noise with a custom tokenizer and masked language modeling. This foundational work informs every aspect of our approach.
PHASE 1
We begin by understanding what language models currently believe about your brand. This is not speculation—it is measurable. Token Probability Analysis examines how models complete sentences about your brand, and we analyze these probabilities to determine:
Tree Walker maps the branching paths of possible completions, revealing where models are confident about your brand and where they are uncertain or incorrect. Brand Relevance Scoring provides an exact probability score for the question: “Is this brand relevant for this entity?”
PHASE 2
Language models understand the world through entities and their relationships. We map:
This mapping uses our Query Fan-Out Model, available on Hugging Face, which generates expanded query variations to probe the full scope of model associations.
PHASE 3
When AI systems generate grounded answers, they retrieve and cite sources. Citation Mining is our process for discovering which sources models actually select. Our Citation Mining Tool produces actionable data:
Critically, we can also see what the model retrieved but chose not to cite. This reveals Selection Rate Optimization opportunities—content that is being seen but not selected.
PHASE 4
Not every query triggers search grounding. Asking “what is 2+2” will not cause the model to search. Asking “what are the best project management tools in 2026” will. Our Query Deserves Grounding models predict whether a query will trigger grounding behavior in Google and OpenAI systems—preventing wasted optimization effort on queries that will never retrieve external sources.
PHASE 5
With complete diagnostic data, we execute targeted optimization across three fronts:
PHASE 6
Traditional rank tracking measures position in a list. AI visibility tracking measures presence in generated answers. AI Rank tracks your brand’s visibility across AI systems over time, and AI Flux measures volatility in AI search results. We track citation frequency, brand mention accuracy and sentiment, entity association strength, and competitive share of AI visibility.
DEJAN has developed an extensive suite of machine learning tools for AI SEO. These are not wrappers around third-party APIs—they are purpose-built systems based on original research.
| Tool | Function |
|---|---|
| Tree Walker | Maps token probability distributions to find high and low entropy completions in model output |
| Brand Relevance Tool | Calculates exact probability scores for brand-entity relevance |
| Citation Mining Tool | Harvests citations from AI responses with confidence scoring and source attribution |
| Query Deserves Grounding (Google) | Predicts whether queries will trigger search grounding in Google AI |
| Query Deserves Grounding (OpenAI) | Predicts whether queries will trigger search grounding in OpenAI systems |
| Gemini Token Probabilities | Analyzes token-level probability data from Google’s Gemini models |
| Brand AI Sentiment | Measures sentiment polarity in AI-generated brand mentions |
| Tool | Function |
|---|---|
| Query Fan-Out Generator | Expands seed queries into comprehensive query sets for testing |
| Query Fan-Out Model (Hugging Face) | Open model for generating query variations at scale |
| Universal Query Classification | Classifies query intent using machine learning |
| Oxy (Query Gap) | Identifies synthetic queries and gaps from Search Console data |
| Content Substance Classifier | Evaluates content depth and substance |
| AI Content Detection | Identifies AI-generated content |
| Tool | Function |
|---|---|
| LinkBERT | Predicts natural internal linking opportunities within text |
| Penguin | Link optimization tool for penalty avoidance |
| Google Entity Search | Explores Google’s entity understanding |
| Chunk Norris | Optimizes content chunking for retrieval systems |
| Tool | Function |
|---|---|
| AI Rank | Tracks brand visibility in AI-generated responses |
| AI Flux | Measures volatility in AI search results |
| Algoroo | Monitors traditional search volatility |
| Text Sentiment | General sentiment analysis |
Our workflow incorporates Facebook Prophet for time series forecasting, Logistic Regression for classification tasks, the mixedbread-ai embedding model for semantic similarity, and FAISS for efficient vector search.
When AI systems ground their responses, they often retrieve more sources than they cite. Selection Rate Optimization improves the likelihood that your content, once retrieved, is actually cited. We measure it as (times cited) / (times retrieved) × 100. Improving selection rate is often more efficient than pursuing new citation opportunities—you are optimizing content the model already knows about.
“Dan Petrovic made a super write up around Chrome’s latest embedding model with all the juicy details on his blog. Great read.”
Jason Mayes
Web AI Lead, Google
“We were given our very own bespoke internal link recommendation engine that leverages world-class language models and data science. It’s one thing to theorize about the potential of machine learning in SEO, but it’s entirely another to witness it first-hand. It changed my perspective on what’s possible in enterprise SEO.”
Scott Schulfer
Senior SEO Manager, Zendesk
“Dan was so crucial and critical to the leaked document blog post that I wrote, and that’s had such big impacts on our company. So Dan, I really thank you for that.”
Mike King
iPullRank
“There’s a man named Dan Petrovic who does a lot of testing, and he has pulled in data specifically from Gemini that shows Google’s AI Overviews and AI Mode are really looking at a 160-character block of text to find the answer to that question.”
Lily Ray
Amsive Digital
“Dan Petrovic built an entire vector model that maps out all the concepts on a website… That’s the kind of AI innovation I’m most excited about—not AI replacing our jobs, but AI making our jobs easier.”
Gianluca Fiorelli
ILoveSEO
“Dan Petrovic is putting out some of the best, most advanced, most well-researched content in the SEO field right now.”
Moz
Industry recognition
Brands competing in informational queries
When users ask AI assistants for recommendations, comparisons, or explanations, will your brand be mentioned? If competitors are cited and you are not, you lose visibility in a channel that is rapidly growing.
Companies in complex or technical industries
Language models struggle with nuance. If your industry involves technical distinctions, regulatory specifics, or complex product differentiation, you need to ensure models represent you accurately.
Businesses affected by AI-generated misinformation
Models can perpetuate outdated information, competitor narratives, or simply incorrect claims. AI SEO includes identifying and correcting these misrepresentations.
Organizations seeking to own entity definitions
If you created a methodology, coined a term, or developed a unique process, AI SEO ensures models attribute these to you rather than generic descriptions or competitors.
Grounding
The process by which a language model retrieves external information to support its response. A “grounded” response cites real sources rather than generating from training data alone.
Token
The basic unit of text that language models process. Tokens may be words, parts of words, or punctuation. Token probabilities indicate how likely the model considers each possible next token.
Entropy
A measure of uncertainty in model output. High entropy means the model is uncertain between many possible completions. Low entropy means the model is confident about what comes next.
Citation Mining
The process of systematically querying AI systems and extracting which sources they cite, enabling analysis of citation patterns and opportunities.
Selection Rate
The percentage of times content is cited when it is retrieved by an AI system. Low selection rate indicates content is seen but not chosen.
Query Fan-Out
Expanding a single query into many variations to comprehensively test model behavior across phrasings and intents.
Entity Association
The strength of connection between two entities in a model’s understanding. Strong associations cause consistent co-occurrence in model outputs.
Query Deserves Grounding
A prediction of whether a given query will cause an AI system to retrieve external sources or answer from training data alone.
Masked Language Modeling
A training technique where the model learns to predict hidden portions of text. Used in training models like BERT.
Double Descent
The phenomenon where very large models, contrary to earlier assumptions, continue improving rather than overfitting. This discovery enabled modern large language models.
Book a conference call with our senior strategy team to discuss your project in detail.