How to Fixed Crawled Currently Not Indexed Issue?

An SEO technician has published a case study on how to solve a unique crawl problem that is Crawled Currently Not Indexed on his website.

The solution you find may not be universal to others experiencing this problem, but your way of identifying and solving the problem is to solve the technical SEO problem.

Also read: google spam update

What happened to indexing your site was really strange. But your solution is simple and it makes sense.

What is Crawled – Currently Not Indexed?

There are many case reports of traces that are “Crawled Currently Not Indexed” on Facebook, Twitter, and even John Mueller’s business hours.

A recent Hangouts during business hours asked why the Google Search Console (GSC) says “Crawled Currently Not Indexed”, but when I click it, it seems to be indexed. Looks like. John Mueller replied that it was just a delay between reports.

Also, at another meeting place during business hours, John Mueller pointed out that it’s perfectly normal for a website to lack many indexed pages.

“…if you have a smaller site and you’re seeing a significant part of your pages are not being indexed, then I would take a step back and try to reconsider the overall quality of the website and not focus so much on technical issues for those pages.

The other thing to keep in mind with regards to indexing, is it’s completely normal that we don’t index everything off of the website.

And over time, when you get to like 200 pages on your website and we index 180 of them, then that percentage gets a little bit smaller.”

Both are good reasons to explain why crawled non-indexed issues occur to some people, but that’s not why Adam Gent discovered it.

Adam Gent has discovered a completely different issue.

This was clearly a question of Google’s own algorithm.

There was nothing wrong with the site itself.

The problem was with Google indexing.

Also read: PageSpeed Insights New Version

Why Crawled – Currently Not Indexed Occurs?

Adam reviewed the GSC Index Coverage Report and found that Google was crawling the feed and indexing it as if it were an HTML page.

He took random words from those pages and created a website-he searched for those words and found that the content of the feed page was indexed.

To make matters worse, Google seems to have normalized the content of the RSS feed of the actual web page, explaining why the actual web page is struck through but not indexed.

WordPress-generated feed page

The strange thing about this is that when you look at the source page, it appears as a web page, usually not as an XML file.

Screenshot of a cached RSS page

I may be wrong, but it doesn’t look like a regular RSS feed. It looks like an HTML page.

The code behind it is actually XML, but this isn’t what most feeds usually look like.

Could it have influenced why Google chose to normalize the stream?

It’s hard to understand how this happens.

That’s because under normal circumstances, there are so many signals, such as internal links, that cause Google to prefer legitimate HTML pages.

Also read: semrush keyword magic tool 

How Adam Fixed the Issue?

After knowing what had happened, Adam deleted the WordPress-generated feed page, sent the feed URL, crawled it, and searched for 404 pages.

After removing these pages from the index, sending the correct URL to Google fixed the issue within a few days.

What Caused the Issue?

Adam wrote that the problem seems to be on the Google side.

When I asked, someone said that Google seemed to start indexing feeds a few years ago, but they thought the issue was fixed.

I’m not an XML expert, but it’s unusual for a feed to look like an HTML page instead of the usual XML layout that is displayed without HTML styles.

The feed doesn’t look good, so what it looks like can be the root cause.

Anyway, if you’re having trouble with a crawl that isn’t currently indexed, you need to check one more thing in case it happens.

Source: A Curious Case of Canonicalization

Leave a Comment

%d bloggers like this: