ChatGPT FAILED? 💥 Which is the BEST AI to Pass Detection?

Caleb Ulku 8:18
Transcript
0:00
0:00 Do you think your AI written content is undetectable?
0:03 ZeroGPT claims an accuracy rate of 98% for detecting AI written content.
0:08 Other AI detectors have similar claims.
0:10 Now, if you're using AI to create content for SEO for other purposes,
0:14 and let's face it, you really should be if you're not,
0:17 you're walking a tightrope.
0:19 On the one side, there's the efficiency and scalability that AI generated content offers.
0:24 On the other, there's the increasing risk of your content being flagged and penalized
0:28 by platforms and search engines, especially as those detectors spend more and more resources
0:33 to improve their AI content detection. So the real challenge now isn't just about creating
0:39 content with AI. It's about creating content that's generally helpful to users, valuable for SEO,
0:45 and capable of passing increasingly stringent AI detection tests. In fact, just a few months ago,
0:51 Google rolled out a helpful content update. But here's the thing. Not all AI writing tools are
0:57 created equally. Some are much better producing content that not only evades detection, but also
1:03 truly engages human readers and satisfies search engine algorithms. And that is exactly what we're
1:08 going to explore today. I'm going to do a deep dive into the three leading AI writing assistants,
1:14 Claude 3.5, ChatGPT, and Perplexity. But we're not just going to look at their ability to generate
1:21 content. We know all three of them can crank out content when asked. We're going to also examine
1:26 how well they can create material that's valuable to your audience, optimized for search engines,
1:31 and capable of passing even the most stringent AI detection tests. Remember, the goal here isn't to
1:37 deceive. Our goal is to use AI as a tool to create generally helpful, high-quality content more
1:43 efficiently. And to do that, we need to understand which tools are up to the task. So in the next few
1:48 minutes, I'm going to compare these three AI tools across four different factors. Number one, quality
1:54 and helpfulness of the content.
1:56 This will be judged by professional copywriters.
1:59 Number two ability to pass AI detection Number three how SEO optimized is the content for Google search and number four how effective are they in following specific character or word limits
2:11 that we give them in the prompt by the end of this video you'll know exactly which AI tool to
2:14 use to keep your SEO strategies effective and undetectable so to get started we gave all three
2:20 models the exact same single prompt with no edits requested if the content generation is finished
2:26 Let me show you right here. This is the exact prompt. I'm going to give you a link in the
2:31 description so you can go and download this prompt there. You don't need to worry about
2:34 pausing the video or copying it or anything like that. At my agency, our typical content
2:39 generation is a three-step process with three individual prompts. But for here, to keep
2:44 everything equal as possible, I did one prompt for all three of them. So let's start with how
2:50 they did on content quality and helpfulness. So the way we evaluated content quality is they asked
2:56 my professional copywriters to read the content written by each of the three tools and then
3:01 evaluate it on a scale of one to five for how helpful in engaging and informative that content
3:07 was. Now, honestly, none of the three articles were particularly in-depth or informative. They're
3:14 all targeting the keyword Plumber Houston, but they also closely agreed with the scores with
3:20 cloud scoring four to five perplexity at three and a half and chat GPT coming in the worst at a
3:25 three out of five. The ChatGPT content was very generic, very vague, and had a lot of fluff in it,
3:32 even though in the prompt, we very specifically told the tool to generate concise and minimize
3:38 fluff. The second area that we looked at was AI detector tests. And in this case, for all three of
3:44 them, we used ZeroGPT, which is a free tool for AI content detection. So this score here, 97%,
3:52 this was ChatGPT.
3:54 Perplexity scored 55% and Cloud came in at 15.4.
4:00 Here's a summary of those three scores.
4:02 Again Cloud significantly better at evading AI detection I ran this same test across several other AI detection tools with very similar results In general Cloud was not seen as AI
4:15 written from most of the tools and ChatGPT was very clearly had a lot of
4:20 flags for being AI written. The actual articles you can see them here this is
4:25 the perplexity article if you'd like to read it yourself.
4:29 This here is the cloud article that it wrote.
4:36 And there's clearly some formatting issues that we would want to improve.
4:51 So the next metric that we evaluated these three articles for was how well optimized
4:56 they are for SEO.
4:58 To decide this, I used a tool called Page Optimizer Pro, which is a tool that we use
5:02 all the time at my agency.
5:04 What Page Optimizer Pro does is it analyzes the content that's ranking well on Google
5:09 and looks at LSI keywords, keyword usage, patterns, to evaluate what Google's algorithm
5:16 is looking for when it is reading content.
5:20 And we compared the current top-ranking results for Plumber Houston to the three articles
5:25 that we wrote.
5:26 Now, one of the key features of Page Optimizer Pro is it recommends a target length, but
5:31 we did 1,000 words, which is well below the recommended 3,000 words that Page Optimizer
5:36 Pro had.
5:37 So all three of the scores ended up being fairly low because the content just wasn't as long
5:42 as what Page Optimizer Pro wanted.
5:45 But let me compare them anyway.
5:47 So the overall optimization score is on a scale of 100.
5:50 Cloud scored a 67.
5:52 Perplexity, very, very similar, scored a 66.
5:55 and ChatGPT was significantly lower at 59.
5:58 So we're seeing a consistent theme here.
6:01 ChatGPT had the lowest in optimization score,
6:04 it had the lowest in the content quality score,
6:06 and it had the worst AI detection score The last test was a check on the word count The prompt requested 1 words of content So we decided to see how close do they actually come to writing a thousand words I also asked all three tools to
6:22 generate title tags for that article. I asked them to generate 15 different
6:27 title tag suggestions and gave them some guidelines for what typically performs
6:31 well in terms of improving click-through rate for title tags. Historically, Chat
6:36 ChatGPT has really struggled with obeying character limits, so I wanted to see how it did when given a 60 character limit and how Cloud and Perplexity did with the same.
6:45 So starting with the word count, we see Cloud wrote 1,065 words, Perplexity wrote 1,010, very close to being the same figure, and ChatGPT wrote an extra 400 words.
6:57 Not awful, but certainly did not obey the requested length that I gave it.
7:02 So taking a look at the title tags, for all of them, I gave it a limit of 60 characters.
7:08 I told it what I'm trying to optimize for, Plumber Houston, and asked for 15 different ideas.
7:14 So for perplexity, it gave 15 ideas, and these are all within 60 characters.
7:20 Cloud gave 15 ideas, all within 60 characters.
7:23 And ChatGPT gave 15 ideas, all within 60 characters.
7:27 So in terms of the title tag challenge of writing a very specific number of characters,
7:33 all three models did a good job.
7:35 The overall result is that Claude significantly outperformed ChatGPT.
7:40 Perplexity was very close, but I still have to give Claude the edge.
7:43 Now, of course, once your content is generated, the next step is making sure that Google indexes it.
7:49 Check out this video where I show you exactly how to get AI-generated content indexed in two minutes or less.
7:54 And I tell you, after going through this comparison for all these three major tools, we are changing the content generation AI that we use.
8:04 Right now, historically, I've been using ChatGPT at my agency.
8:09 After seeing what Cloud 3.5 Sonnet is capable of, going forward, we're going to be using Cloud for all of our content generation.

Caleb Ulku compares three AI writing tools — Claude 3.5, ChatGPT, and Perplexity — across four criteria: content quality, AI detection evasion, SEO optimization, and adherence to word/character limits. Using a single standardized prompt targeting the keyword 'Plumber Houston,' he finds Claude 3.5 consistently outperforms the others: it scored highest on content quality (4-5/5), was nearly undetectable by ZeroGPT (15.4% AI score vs. ChatGPT's 97%), achieved the best SEO optimization score (67/100), and most accurately followed the 1,000-word limit. ChatGPT ranked last in every category, producing generic, fluffy content that was easily flagged as AI-written. As a result, Ulku announces his agency is switching from ChatGPT to Claude 3.5 for all content generation.

AI Writing Tool Comparison AI Content Detection & Evasion SEO Content Optimization Prompt Instruction Compliance AI-Assisted Content Strategy for Agencies Unknown Host
  • Claude 3.5 Sonnet is the best AI tool for creating content that evades AI detection (only 15.4% flagged by ZeroGPT), compared to ChatGPT which scored 97% — making Claude far safer for SEO content workflows.
  • ChatGPT consistently underperforms across content quality, SEO optimization, and instruction-following (word count), even when explicitly told to be concise and avoid fluff — don't rely on it for production SEO content.
  • For SEO-optimized content, use a tool like Page Optimizer Pro to benchmark your AI-generated articles against top-ranking competitors and identify keyword and length gaps before publishing.
  • All three tools (Claude, Perplexity, ChatGPT) successfully adhered to a strict 60-character title tag limit when generating 15 suggestions — character-constrained tasks are reliable across all three models.
  • A single prompt is sufficient for basic comparison testing, but for production-quality AI content, a multi-step (3-prompt) process yields significantly better results.
Q&A 15
Which AI writing tool is best at evading AI detection according to this comparison?

Claude 3.5 significantly outperformed the other tools at evading AI detection. When tested with ZeroGPT, Claude scored only 15.4% (meaning it was largely not detected as AI-written), Perplexity scored 55%, and ChatGPT scored 97% — meaning ChatGPT was almost entirely flagged as AI-generated content. The same pattern held across several other AI detection tools tested.

How did Claude, ChatGPT, and Perplexity compare in content quality and helpfulness?

Professional copywriters evaluated each tool's content on a scale of 1 to 5 for helpfulness, engagement, and informativeness. Claude scored the highest at 4 to 5 out of 5, Perplexity came in at 3.5, and ChatGPT scored the lowest at 3 out of 5. The ChatGPT content was described as very generic, vague, and full of fluff — even though the prompt specifically instructed it to be concise and minimize fluff.

What were the SEO optimization scores for Claude, Perplexity, and ChatGPT?

Using Page Optimizer Pro, which analyzes top-ranking Google content for LSI keywords and keyword usage patterns, the three tools scored as follows out of 100: Claude scored 67, Perplexity scored 66 (very close to Claude), and ChatGPT scored significantly lower at 59. All scores were relatively low because the articles were written at 1,000 words, well below the 3,000-word target recommended by Page Optimizer Pro.

How accurately did Claude, Perplexity, and ChatGPT follow the requested 1,000-word count limit?

Claude wrote 1,065 words, Perplexity wrote 1,010 words — both very close to the requested 1,000-word target. ChatGPT wrote approximately 1,400 words, exceeding the limit by about 400 words. While not drastically off, ChatGPT clearly did not obey the requested length.

How well did Claude, ChatGPT, and Perplexity follow a 60-character title tag limit?

All three tools performed equally well on the title tag character limit test. Each was asked to generate 15 title tag suggestions optimized for 'Plumber Houston' within a 60-character limit. Perplexity, Claude, and ChatGPT all produced 15 ideas that stayed within the 60-character limit. This was the one area where all three models performed comparably.

What four factors were used to compare Claude, ChatGPT, and Perplexity in this test?

The four factors used to compare the three AI writing tools were: (1) Quality and helpfulness of the content, judged by professional copywriters on a scale of 1 to 5; (2) Ability to pass AI detection tests, using ZeroGPT and other tools; (3) How well SEO-optimized the content is for Google search, evaluated using Page Optimizer Pro; and (4) How effectively each tool follows specific character or word limits given in the prompt.

What is ZeroGPT and what accuracy rate does it claim for detecting AI content?

ZeroGPT is a free AI content detection tool that claims an accuracy rate of 98% for detecting AI-written content. In this comparison, it was used to test all three AI writing tools — Claude 3.5, ChatGPT, and Perplexity — to see how detectable their generated content was.

What is Page Optimizer Pro and how was it used in this AI writing tool comparison?

Page Optimizer Pro is an SEO tool that analyzes content currently ranking well on Google, examining LSI keywords, keyword usage, and patterns to evaluate what Google's algorithm looks for when reading content. In this comparison, it was used to score the SEO optimization of articles generated by Claude, Perplexity, and ChatGPT (all targeting 'Plumber Houston') against the top-ranking results for that keyword. Scores are given on a scale of 100, and the tool also recommends a target content length — in this case, 3,000 words.

Which AI tool did the agency switch to for content generation after this comparison, and why?

After the comparison, the agency decided to switch from ChatGPT to Claude 3.5 Sonnet for all content generation. Claude outperformed ChatGPT across every metric tested: it produced higher-quality, more engaging content (rated 4–5 vs. 3 out of 5), scored far better on AI detection (15.4% vs. 97% detected), achieved a higher SEO optimization score (67 vs. 59), and more accurately followed the requested word count. The presenter stated that after seeing Claude 3.5 Sonnet's capabilities, the agency would use Claude going forward.

Why is it important for AI-generated content to pass AI detection tests?

As AI detectors become more sophisticated and platforms and search engines invest more resources into AI content detection, there is an increasing risk that AI-generated content will be flagged and penalized. This is especially relevant for SEO, where being penalized by Google can significantly hurt a website's rankings and visibility. The challenge is not just creating content with AI, but creating content that is genuinely helpful to users, valuable for SEO, and capable of passing stringent AI detection tests.

What was the overall winner of the Claude vs. ChatGPT vs. Perplexity comparison for SEO content generation?

Claude 3.5 was the overall winner of the comparison. It outperformed ChatGPT across all four metrics — content quality (4–5/5), AI detection evasion (only 15.4% detected), SEO optimization score (67/100), and word count accuracy (1,065 words vs. the requested 1,000). Perplexity was a close second, performing similarly to Claude on most metrics, but Claude still had the edge. ChatGPT consistently performed the worst across all categories.

What was the testing methodology used to compare the three AI writing tools?

All three AI tools — Claude 3.5, ChatGPT, and Perplexity — were given the exact same single prompt with no edits requested. The prompt asked for 1,000 words of content targeting the keyword 'Plumber Houston,' along with 15 title tag suggestions within a 60-character limit. The presenter noted that at his agency, content generation typically involves a three-step process with three individual prompts, but a single prompt was used here to keep the comparison as equal as possible. The resulting content was then evaluated on four metrics: content quality, AI detection, SEO optimization, and adherence to word/character limits.

What were ChatGPT's main weaknesses in this AI writing tool comparison?

ChatGPT performed the worst across nearly every metric tested. Its content was rated 3 out of 5 by professional copywriters — described as very generic, vague, and full of fluff, despite the prompt explicitly requesting concise content with minimal fluff. It scored 97% on ZeroGPT's AI detection test, meaning it was almost entirely flagged as AI-written. Its SEO optimization score was 59 out of 100, the lowest of the three tools. It also failed to follow the requested 1,000-word limit, producing about 1,400 words instead. The only area where it matched the others was in generating title tags within the 60-character limit.

Should you use AI to create SEO content, and what is the real challenge in doing so?

According to the video, you should absolutely be using AI to create SEO content if you aren't already, due to the efficiency and scalability it offers. However, the real challenge is not simply generating content with AI — it's creating content that is genuinely helpful to users, valuable for SEO, and capable of passing increasingly stringent AI detection tests. The goal should not be to deceive, but to use AI as a tool to create high-quality content more efficiently. Not all AI writing tools are equally capable of meeting these combined requirements.

How did Perplexity perform overall compared to Claude and ChatGPT?

Perplexity performed solidly in the middle — better than ChatGPT but slightly behind Claude in most metrics. It scored 3.5 out of 5 for content quality, 55% on ZeroGPT's AI detection test (meaning it was partially flagged as AI-written), 66 out of 100 for SEO optimization (very close to Claude's 67), and wrote 1,010 words — very close to the requested 1,000. It also successfully generated 15 title tags within the 60-character limit. While Perplexity was competitive with Claude, Claude still held the overall edge.