Tech
Bury the Lede
22 November 2020
The most fundamental rule of the internet is that content is cheap. If you want to fill a website with content, you don’t even have to create it. Sure, there are bots that will churn low-grade content for you if you like, but it’s easier just to scrape the content of other websites. So you set up a site, fill it with scraped content, slap some Google ads on it, profit!
For years my content has been scraped by hundreds of sites. Astronomy articles have high appeal, and a scraper site can typically earn a few pennies off one of my articles. The same is true for anyone who writes popular science posts. It’s annoying, but not a big deal. Google prefers quality original content over churn, so over time, the scraper sites get demoted to the point where they are basically delisted. Scrapers have to keep creating new sites to trick Google, while your site’s ranking grows over time.
But over the past six months, I’ve noticed a change, and it’s a bit disturbing.
Let me give you an example. Three days ago I wrote an article about old stars in the Milky Way. Within hours after it goes live, the copy sites start picking it up. Within a day sites further down the food chain have copied those sites, often using a bot to substitute words and phrases so it looks somewhat “original.” And on and on…
This is perfectly normal, but what isn’t normal is that the original post on my site isn’t really getting views. There’s a spike immediately as the bots grab the post, and then very little. In the past, I’d get a little hill in view numbers just for being a new post.
So I did a general Google search. Sure enough, Google lists nearly thirty copy pages. The first has the entire article, lists me as the author as if I write for them, and is smothered with ads and trackers. It has seven AdSense spots that flip to new ads every 15 seconds or so, an infinite scroll of sponsored links at the bottom of the page, and 9 tracker scripts. As you can imagine it looks a horrid mess. That site was then scraped by other sites. They don’t list me as the author but do link back to the first scraper site. By page 3, Google is offering pages that have rewritten copies of rewritten copies of my original, so that you get sentences such as:
This is smart since stars are born throughout the dense fuel and mud throughout the galactic airplane.
All of these sites are covered with ads, and Google is happy to offer them when you search for my post. But the original article on my website is nowhere to be seen. Even if you search on my site for the specific post, Google says the page doesn’t exist. The pages do show up on Google after about a week, but even then it’s often listed below a site that scrapes my content.
But let’s give Google the benefit of the doubt. It’s possible that it just takes time for Google to update my site. Many scraper sites work very hard to get Google to add their pages almost immediately. I don’t spend that kind of SEO effort, so perhaps it takes several days for Google to index my site and update their list. But that isn’t the case. While the science posts don’t appear for days, the fiction posts appear within hours. They sometimes show up within minutes. So Google is rapidly indexing my site, listing fiction posts almost immediately, but hiding the science posts for days.
Again, let’s assume I’m doing something wrong. There are several reasons why Google might de-list a website. If you have bot content, scrape content from other sites, play misleading SEO games, or have a really slow site. I don’t do any of the first three, so perhaps it’s that my site isn’t optimized. Google doesn’t want to offer a link that just hangs waiting to load, and the scraper sites often work very hard to have fast page views. Optimization is so important to Google that it even has a tool to rate your site’s speed and usability. Perhaps mine is just rated much lower than the scraper sites, and that’s why Google favors them.
A quick series of tests shows that isn’t the case. Google ranks my site as 98 on mobile and 100 on desktops. My site typically loads within a second, which is Google’s gold standard. The scraper sites are all given failing grades. The scraper sites are horribly designed, not mobile-friendly, and aren’t living up to the standards Google says it prefers.
So what gives?
There is one area where my site is lacking. I don’t use Google AdSense. I also don’t use Google Analytics. So my website doesn’t give Google any user information, and pageviews on my site don’t earn Google any money. The scraper sites do. They let Google track users on their site a dozen different ways, and they generate income for Google. When Google refers you to those sites to read my articles, Google earns money. If Google were to refer you to my site Google would earn nothing. By delaying my page listings for several days, the scraper sites are given breathing room to earn money for Google before my content loses its freshness.
All of this is circumstantial. I can’t prove Google is doing this intentionally. But it is clear that Google’s behavior, intentional or not, is costing me pageviews and audience. If the situation doesn’t change it will become increasingly difficult for me to grow my audience.
Google is a private company. It’s not obligated to list my website at all, much less prioritize it over sites that use my content to earn them money. But it does put me in an unfortunate position. One solution would be to start putting ads on the site, use Google Analytics, and hand over user information so that Google has a financial incentive to start prioritizing my site over the scrapers. The other option is to stick to my principles and watch my pageviews continue to fall.