In earlier posts (starting here) I explained how I changed my domain structure and used Google Search Console to try getting my photos indexed. And in a recent post I said I thought I was seeing progress. Well, no surprise here – but things aren’t that simple.
After changing the domains, submitting site maps and requesting indexing, I waited, and after a couple of weeks my number of indexed pages started to increase. That number changed every few days, peaked at 225, and then… started declining.
In the details of GSC’s Coverage Report, I see a crazy mashup of old and new URLs, in both the Valid and Excluded categories. The old URLs are from my old domain structure and no longer work; but Google hasn’t purged them, yet. The new, current URLs have suffered various fates.
Among the “Valid” pages, the 62 listed as “Submitted and indexed” are in fact current gallery pages and some of my individual photos – but not many, out of over 600. Under “Indexed, not submitted in sitemap” are 122 pages – some current blog posts, but many obsolete photo URLs which should ideally go away. And that’s the good news.
Under “Excluded”, there are 474 pages listed as “Discovered, not currently indexed”: Google knows about them, but hasn’t decided if they’re worthy of indexing. They all seem to be current pages for photos I’d like to have indexed, but they’re in limbo for unknown reasons.
Also under “Excluded” are 208 “Crawled, currently not indexed” pages which are a mashup of old and new URLs; seeing current image pages here is discouraging because it means Google looked at them and said “no thanks”.
So what’s going on? Obviously Google doesn’t index a whole site, or purge an obsolete version, all at once. Mostly likely, several independent processes are doing different things over time: one looks for new pages, another evaluates their content and decides if they should be indexed, one checks for dead pages and de-indexes them, and so on. As this happens, pages move from one category to another, maybe spending some time in 2 at once. In theory this should all evolve steadily towards correct indexing of valid content. In practice, things grind on for an unknown amount of time, and it’s not possible to predict how your site will end up.
I recently found this page which has quite a bit of interesting information on Google’s indexing, along with some typically vague and evasive answers from a Google spokesman (e.g. “just focus on making your site awesome”). The bottom line seems to be that Google doesn’t index an entire site, but won’t say exactly why, although it’s clear that they’re looking for a lot of text content on a page, and that repetitious stuff like auto-generated catalog pages is skipped. It may be that many individual photo pages on SmugMug look too ‘thin’ to interest Google. If so, adding to the descriptions might help – or might be a big waste of time.
And do these GSC reports reflect reality? That’s a very good question and the answer seems to be “no, at least not yet”. If I enter “site:jimhphoto.com” or “site:gallery.jimhphoto.com” I see only a handful of pages, some new, some old and invalid. So who knows.
As of right now, my count of indexed pages has bounced up from a low of 180 back to 186. I’ll keep watching and experimenting, and hope to learn more over time.
UPDATE: My count of indexed pages eventually climbed to a high of about 220 – then crashed. I explain what happened in this post.