In an earlier post, I expressed by own annoyance when certain types of sites choose to not display publish dates on their web pages. And how this is especially annoying when it’s an article that presumes to be talking about current statistics or other aspects where knowing the date context is useful. My goal in that post was to convince any publishers who might happen across the article as to the value of including the date. (Here’s the link to: Should You Put Dates on Blog Posts and Articles?)
This blog is mostly geared towards the Product/Business side of things. But today, it’s more for end users, whom of course may be business users with the need or just desire to find publish dates of content. I’d like to try to offer web users, surfers, researchers, whatever your self-identifying characteristic may be, some techniques to try to find content publish dates when they’re not clearly provided. These methods are not necessarily accurate, precise, or at all reliable. But they may be all you’ve got.
Why does this matter? Well, it might not. The assumption here is that there are certain types of content, (business research for example), where knowing the date is important enough to warrant the effort to look into the publish date, and possibly the last modified date.
Any customer centric service should show the date clearly in it’s content.
and should result in the date showing up in Search Engine Result Pages (SERPs) as well.
Unfortunately, a variety of publishers don’t provide this information. This is most often an intentional choice. Sometimes it’s done because the publisher honestly believes their content isn’t at all date related and is what they might call “evergreen.” But usually it’s due to a perception that their content may be more favorably treated by search engines or perceived as fresher by users when there’s no date. Some may even put the date in, but remove it after the content is more than a few months old.
For those of you who actually find the date of content useful or actually need it for some reasons; to properly reference something, to understand the context of a claim or whatever your purpose, this is not merely an inconvenience where you’ll just click away. It may be worth it to you to work at least a little to seek out the information you’ve been rudely denied for what is usually no good reason.
For Users: Finding the Publish Date Anyway
If you’re a user that really wants to find an original publish date when it isn’t listed, you’ll often be mostly out of luck. It’s likely not possible to find out – at least for sure – what the original publish date of an online page unless it’s displayed. (And even if displayed, it could be false, but for now let’s just leave that aside.) Still, there are some things you can try to get a clue as to publish date if it’s important to you.
Find the Date Basics
- Contact the Site. If it’s really important, you could just try to contact the author or site and ask!
- Change Date Scope to Check for Existence: Do a search and change the date scope, maybe starting at a year and going further back or more recently, (depending on what you care about), to get a sense of when the page was found. Though one problem is it’s unclear what date Google uses for this!
- Check Comments: See if there are any comments at the bottom of the page or in a “Comments” link. Chances are the first comments were somewhat close to when the article was originally published.
- Look in the URL. A lot of times, Content Management Systems (CMS) will insert the date as part of the web page address. Though certainly not always. The pre-disposition of most sites is to put in categories and article keywords over dates. But it doesn’t hurt to look if this is the information you’re seeking.
- It could look something like this:
- This would at least get you to the month; in this case, October of 2016.
- It could look something like this:
Finding the Date: More Advanced Techniques
- The Internet Archive may have the page. The archive certainly doesn’t have anything. And won’t likely have things to the day. But you might be able to see how far back in time the page existing.
- HTTP Headers: Using web browser developer tools, you can try to look at something called the HTTP headers. Within here should be something called a Last Modified date. I’ll repeat again, this just gives you a clue. It could easily be wrong. The server may not be set correctly, someone could have intentionally changed it (though that’s a bit of a sophisticated lie and maybe unlikely), or maybe someone just modified the page and that would update the date. So it wouldn’t be original publish date.
- Google Index Date: When a page is created on the Internet, usually a search engine will index it. So a page has it’s actual publish date, then there’s a date the search engine finds it so that’s the page’s index date. And there’s maybe a cache date, which is the last date the search engine’s crawlers/bots looked at the page. In some cases, search engines – like Google – will display the date in the results. If they don’t, you can try this at Google:
Note: No guarantee this will always work. Google does change things from time to time!
Do a search for the URL you’re curious about. What that means is, put the whole URL into the search engine’s search bar and then click on search.
In results, click on the “Tools” link and then under “Any time” choose “Custom Range…”
Now, make a crazy range. Such as from 1/1/1980 before the consumer internet was even really available up through end of the current year.
You should see something like this, which has the original index date! Note how a typical search won’t have this date.
Check the Cache: Google maintains a cache. This is basically a kind of backup copy of the page. They use it for whatever reasons they do, but our concern is date info. Once again, do a search for your target page of interest.
Look for a small down arrow next to the URL. Click it and then click on Cache.
And you’ll get the last time Google reviewed the page. This date might not match any displayed date on a page or any other date you may have found. It’s just the last time the Google crawlers brought the page into the index. If you do have other date info, and this date is earlier, it’s possible a site owner updated their page. Or the site owner is playing games and updating their page to a more recent date to act more current.
Yes, there’s actually more. But this is already too much info on one page. So we’re going to continue this in a Part 2 post. Please see Finding Web Page Publish Dates When Not Displayed-Part-2.