Need of Site's Cleanup to avoid Rankings Problems
It may seem harmless, but that cruft might just be harming your entire site's ranking potential. The low quality, thin quality, duplicate content types of pages that can cause issues even if they don't seem to be causing a problem today.
1. What is low Quality?
If you were to, for example, launch a large number of low quality pages, pages that Google thought were of poor quality, that users didn't interact with, you could find yourself in a seriously bad situation, and that's for a number of reasons. So Google, yes, certainly they're going to look at content on a page by page basis, but they're also considering things domain wide.
There are also other probably non-directly Panda kinds of related things, like site-wide analysis of things like algorithmic looks at engagement and quality. So, for example ,there was a recent analysis of the Phantom II update that Google did, which hasn't really been formalized very much and Google hasn't said anything about it. But one of the things that they looked at in that Phantom update was the engagement of pages on the sites that got hurt versus the engagement of pages on the sites that benefited, and you saw a clear pattern. Engagement on sites that benefited tended to be higher. On those that were hurt, tended to be lower. So again, it could be not just Panda but other things that will hurt you here.
2. How do I identify what's low Quality on my site(s)?
So let's talk about some ways to proactively identify low Quality and then some tips for what we should do afterwards.
- Filter that low Quality away!
The second one is basically we say, "Hey, here's the average time on site, here's the median time on site, here's the average bounce rate, median bounce rate, average pages per visit, median, great. Now take me 50% below that or one standard deviation below that. Now show me all that stuff, filters that out."
This process is going to capture thin and low quality pages, the ones we've been showing you in pink. It's not going to catch the orange ones. Duplicate content pages are likely to perform very similarly to the thing that they are a duplicate of. So this process is helpful for one of those, not so helpful for other ones.
- Sort that low Quality!
For that process, you might want to use something like Screaming Frog or OnPage.org, which is a great tool, or Moz Analytics. Basically, in this case, you've got a cruft sorter that is essentially looking at filtration, items that you can identify in things like the URL string or in title elements that match or content that matches, those kinds of things, and so you might use a duplicate content filter. Most of these pieces of software already have a default setting. In some of them you can change that. I think OnPage.org and Screaming Frog both let you change the duplicate content filter.
3. Functional Overview
GA or Omniture or Webtrends, they can totally mislead you, especially for pages with very few visits, where you just don't have enough of a sample set to know how they're performing or ones that the engines haven't indexed yet. So if something hasn't been indexed or it just isn't getting search traffic, it might show you misleading metrics about how users are engaging with it that could bias you in ways that you don't want to be biased. So be aware of that. You can control for it generally by looking at other stats or by using these other methods.
Is it useful to some visitors, but not search engines? Like you don't want searchers to find it in the engines, but if somebody goes and is paging through a bunch of pages and that kind of thing, okay, great, I can use no index, follow for that in the meta robots tag of a page.
If there's no reason bots should access it at all, like you don't care about them following the links on it, this is a very rare use case, but there can be certain types of internal content that maybe you don't want bots even trying to access, like a huge internal file system that particular kinds of your visitors might want to get access to but nobody else, you can use the robots.txt file to block crawlers from visiting it. Just be aware it can still get into the engines if it's blocked in robots.txt. It just won't show any description. They'll say, "We are not showing a site description for this page because it's blocked by robots."
Courtesy & Copyright
https://creativesaints.com/
http://graphicwebdesign.in/
https://www.papeel.com.br/
https://moz.com/blog?page=81
https://moz.com/blog/clean-site-cruft-before-it-causes-ranking-problems-whiteboard-friday
https://moz.com/blog?page=112
https://moz.com/blog?page=100
https://moz.com/blog?page=129