Sitemaps to Nowhere

I’ve related (starting here) my long struggle to get my photo work indexed by Google – an uphill battle. But recently, I tangled with sitemaps and eventually made a breakthrough.

I lost a big chunk of indexing in April, when Google switched to “mobile first” page evaluation. Eventually the bleeding stopped; some of my blog was indexed, but hardly anything in my gallery subdomain. A few more pages showed up in the following weeks, then progress stalled.

I’d submitted sitemaps for the blog site and the gallery subdomain, but they apparently never worked. Google Search Console showed weird and contradictory results: “success” in reading the sitemap index file, but “unable to fetch” for one of the 4 referenced maps, nothing for the other 3, and a big fat 0 for discovered links. The “last read” date was stuck 6 months in the past. In reality, the sitemap files were accessible and correct, but the only coverage reported was under “indexed, not submitted in sitemap”. It didn’t make any sense.

I tried re-submitting the sitemaps – that bumped up the “submitted” date, but “last read” never changed. Nothing more got indexed.

I posted all this in Google’s “Search Console Community” forum and got some replies, basically offering these answers:

  • Don’t worry about what Search Console says; things are probably fine. Just keep waiting.
  • You don’t need sitemaps anyway, Google will find your pages without them.
  • Google never indexes an entire site no matter what. Quit complaining.

It was actually sort of weird: Gold and Platinum-level “Product Experts” wouldn’t even acknowledge the obvious errors and contradictions in what Search Console was reporting. After a few days of being blown off by these Google toadies, I decided to figure it out myself.

I tried completely removing my sitemaps from GSC and resubmitting them. Immediately, all the bad data reappeared, including the “last read” date 6 months in the past. So obviously, these results were cached. I then cleared the map entries again and waited several days before resubmitting. No luck, the same old junk was restored. Whatever had gone wrong the first time Google tried to read those maps, Google was never going to forget it.

My blog’s sitemap was automatically created by an SEO plugin; I couldn’t affect its content or change its file name. So I installed a different SEO plugin, and submitted the new and different sitemaps. Bingo.

The bad results were gone; within a couple of days, GSC reported success in reading the new map, and showed hundreds of “Discovered” URLs from the sitemaps. “Discovered, not currently indexed” is good; URLs in that category have a chance. “Crawled, not currently indexed” is the discard pile; Google looked at those pages and didn’t like them.

And even though I’d only replaced the sitemap for the blog site, things improved for the SmugMug gallery subdomain too, with 682 new URLs now in the queue for possible indexing, and a handful already indexed. Apparently Google was now looking at the site fresh, using the new sitemaps.

My indexed count initially dropped again, by a few pages, but soon started moving up – see that bump near the end of the Coverage graph, at the start of this post? I am now – as they say – cautiously optimistic.

UPDATE: Google now seems a bit confused.

9 Replies to “Sitemaps to Nowhere”

  1. Howdy! I found your site in a Google search about SmugMug SEO because I seem to be having the exact opposite problem. So see? Your SEO is working! Because I found you!

    But my problem is, rather than Google preferring my primary https://hamor.com/ domain (Squarespace website and blog), it’s decided that my https://photos.hamor.com/ subdomain (photo archive) is where all the cool kids should go.

    Google for “Sean Sosik-Hamor” and https://photos.hamor.com/ is there on the first or second page depending on the weather. But https://hamor.com/ is nowhere to be found. Anywhere.

    Google for “Hamor Photography” and my Google Business listing shows up but half of the preferred helper sitelinks are for, you guessed it, https://photos.hamor.com/ instead of https://hamor.com/.

    So this confirms your assumption that textual content is the culprit. I’ve had https://hamor.com/ since 1995 but it’s only been a splash page since 2003. Only this month have I started populating it with content.

    But https://photos.hamor.com/ has been active since 2008. And I include IPTC metadata, headlines, titles, descriptions/captions, and keywords on almost every photo. And most of my galleries have a verbose description as well.

    So, as far as Google is concerned, https://photos.hamor.com/ has been more active since 2008 even though it’s not my preferred landing page.

    But some good news about the mobile first indexing. It takes about a year for Google to finally catch up. My wife’s Web site at https://www.hamorhollow.com/ was averaging ~1,000 visits per month in April, 2019. That month we switched from an ancient non-mobile WordPress template to mobile-friendly Squarespace.

    Google immediately flipped the switch to mobile first and our traffic slowly increased. Peaked at ~2,750 visits in July, 2019 and stayed steady for a while. Then bam, ~3,925 in October, 2019. Stayed steady for a while. Then bam, ~8,500 visits per month from December, 2019 through today.

    So, after all that, thanks for the great series of articles. And be patient. Google will eventually catch up with your changes.

    Kind regards, Sean.

    1. Hi Sean,
      Recently, on a POD forum, it was claimed that Google needs to see a minimum of 400 words of text before it will index your page. Others chimed in to confirm this, claiming Google makes this known to ‘insiders’. Naturally I could find no confirmation of this on the web. But, 400 words is about 10 times longer than any of my photo or gallery descriptions. If true, it would explain a lot.

      According to Search Console’s report on sitemap coverage, Google now seems to be indexing my main (blog) domain but not paying attention to the URLs in the subdomain for the photo gallery. It’s finding those pages in the sitemap, and it’s not rejecting them; it seems to be just not looking at them for some reason. But the numbers don’t add up and aren’t making sense. I’ll be posting more on this in the future.

      1. Aha! I’ll definitely keep the 400-word minimum in mind. The more you know and all that! The…more you type?

        The good news is that I manage all of my SmugMug galleries on both my and my wife’s SmugMug accounts using the SmugMug Lightroom plug-in. I was lucky enough to help alpha test the plug-in for David Parry from SmugMug and he implemented many of my feature requests.

        So, if 400 words is one of Google’s baselines, then it’d be trivial to use Lightroom to add (or remove) long descriptions to the gallery titles (tracked as Published Collections in Lightroom) and/or IPTC metadata for each photo. Then just smack Republish and all the photo and gallery metadata is updated without having to re-upload each photo (or edit each photo manually through the web site).

        Thanks again for the great resource, Sean.

        1. If in fact Google demands 400 words, I have a problem. I think a nice punch-y description sells a photo; but a padded-out and boring one un-sells it. 400 words is a lot. I’m going to keep searching for some confirmation of this claim.

          1. First, hopefully I’ve boosted your SEO a little! Ha!

            https://hamor.com/blog/trouble-in-photography-seo-paradise-with-smugmug-and-squarespace

            Next, apologies, I’m not saying add a 400-word description/caption to every photo. By long I meant add a one- or two-sentence AP-style description/caption to each photo. That way, when SmugMug displays your gallery, the gallery page will be full of content (all of your short and unique descriptions/captions combined on the same page should be more than 400 words).

            I’ve passively confirmed this. I used to dutifully add AP-style descriptions/captions to every photo. And my oldest SmugMug galleries with this AP-style format have the best ranking on Google.

            For example, one of the first Google hits for fire haverhill roofing company (without quotes) is:

            https://photos.hamor.com/Journalism/Two-alarm-fire-at-27-5th-Ave/

            But my newer galleries don’t perform nearly as well.

            Googling “Ubuntu at KubeCon 2019 San Diego” (with quotes) results in zero hits from Hamor Photography Archive even though the title of my gallery is exactly that:

            https://photos.hamor.com/Event/Ubuntu-at-KubeCon-2019-San-Diego/

            Ironically my stuffie hedgehog, that travels around the world with me on assignment, does show up because I photographed it at KubeCon. I’m guessing that’s because it’s a collection gallery that automatically imports all SmugMuge photos with the keyword “Chippoke” from both my and my wife’s accounts. So that Chippoke gallery has thousands of unique sentences. Go figure.

            While that isn’t definitive proof it definitely supports my assumption.

            So, using Lightroom, Photo Mechanic, or your preferred IPTC-editor, populate your IPTC fields. Using the fire photos as an example:

            Title: Two-alarm fire at 27 5th Ave. in Haverhill, MA (January 04, 2006)

            Headline: Two-alarm fire at 27 5th Ave. in Haverhill, MA (January 04, 2006)

            Description/Caption: Red Cross volunteers hand out supplies at a two-alarm blaze at 27 5th Ave. in Haverhill, MA.

            For some reason Lightroom refers to the verbose description field as either Description or Caption depending on where you are. The same field is referred to as Description in the IPTC editor but Caption in the Caption editor.

            And I use the same text for both headline and title to keep them consistent. Some CMS systems import the title while others import the headline.

          2. Maybe we need to use browser tools to inspect the code for these ‘gallery’ pages and see what text is really visible to the crawler. Depending on the gallery code, a lot of stuff is only pulled in on demand in response to the viewer hovering over an image, etc. But today, with ‘mobile first’ crawling, I don’t know if the desktop site even matters. I have to find out how to inspect the mobile version of the site.

  2. Nice Info, Have you looked at the Robots.TXT I have a very small understanding of it, it looks like they are blocking eg: Googlebot-Image from some of the folders /keyword/ /Gallery/ /popular/ Should those be blocked??
    Cheers
    thanks again for the above info

    1. As with all things Google, I find conflicting opinions. Some say Google stopped paying attention to robots.txt years ago. Others say it speeds up crawling by excluding stuff that wouldn’t interest Google anyway. From what I’m now seeing in Search Console, Google is finding the pages I want it to find, but isn’t in any hurry to index them. I’ll be writing a new post shortly that updates my Google story.

  3. I also use Yoast pro and it woks great, but I had a problem and it was my robots.txt being messed up and not being read. Some how my host messed it up.

    Great blog, stay well, I’ll be back
    Lou

Leave a Reply

Your email address will not be published. Required fields are marked *