By David Sottimano on April 17th, 2012
A while back I wrote a post about finding your site’s biggest technical (SEO) flaws in 60 minutes, and I had Evan contact me about doing a similar type of audit for this website. Evan you’re a brave man, and I promise to be as ruthless as possible ;)
The point of this post (audit) is to show you what a professional SEO can do for you in less than an hour to help you improve your site. Since I’ve only given myself 60 minutes to complete this, I’ll likely miss out on some things but I’m going to concentrate on big issues such as:
- Algorithm readiness – specifically Panda related issues (duplicate content, freshness, page layout)
- Crawlability / Architecture & On page issues (markup, keyword targeting, JS etc..)
- Identifying opportunities for Evan (new markets, keywords to target)
Here are the tools I’m going to use to rip through this:
- Microsoft IIS SEO Crawler – Specific for types of redirection, markup errors, meta data
- Screaming Frog SEO Spider – All meta data, response codes, and more
- Seomoz toolbar - Nofollow links
- Searchmetrics Essentials - Check for traffic / ranking anomalies in a flash
- Spyonweb – Anything owned that might have dupe content?
- Builtwith.com – Understand what server / technology is powering the site
- Pagerank toolbar for Chrome – Mainly for spotting architecture problems
- Chome (regular), Firefox & Opera (JS disabled) – The browser line up
- HTTPfox – Firefox plug in to listen to HTTP requests
- Compare text files online and cloaking checker - Check for bot / user cloaking
- Lots of Google search queries and looking through source code
The 1 hour SEO audit starts now…
Without analytics data, I need Searchmetrics to give me a picture the site’s status (search visibility, problems, keywords ranking). Right away I can see a steep drop around late February of 2011, which is very closely correlated to the initial Google Panda update. (http://www.seomoz.org/google-algorithm-change#2011)
So I could be wrong and it could be something completely different (but I’m probably not wrong ;). This drop has my attention and I’ll be paying close attention to thin content, low value pages and duplicate content.
Potential Panda threats
This is an entire subdomain devoted to duplicating content based on article tagging. I’m about as happy about tag related pages as I am about sitting to someone who smells on the bus.
Why is this a problem: This is a breeding ground for low quality pages that are essentially duplicate content. Duplicate content is a big no no for SEO because it confuses ranking ability between pages and in worst case scenarios, can harm all pages with similar contents’ ability to rank well..if at all. Also, Google wanted to weed out low value, duplicate content pages in their series of Panda updates – this subdomain could be one of the victims.
How did I find it: By using this Google operator search query
How much of a problem is it: We’ll call it ~50k pages worth of duplicate content on a site of ~150k pages – so, pretty bad.
How does it happen: This page exists because it’s been tagged in a category called “growth” (maybe automatically by the CMS or an author). If you copy and paste a snippet of text from the page, you should see it ranking 1st if it’s unique, except this page isn’t ranking at all for the snippet of text!
What to do: Cull them all. With poor page titles such as “growth” and reeking of duplicate content, the best thing to do here is to 301 redirect all of these pages back to the root domain (brownie points if you can 301 each tag page to a related deep page on the main domain).
Potential Page Layout Algorithm Update
The forum content on this site is actually pretty darn good. Evan and co. post some great topics, but you might not think it’s that valuable based on first impressions. Google released a page layout algorithm update this year that targets pages with excessive ad content above the fold. Every forum page has a banner ad and Google adsense block before the actual content (example: http://forums.evancarmichael.com/viewtopic.php?f=6&t=4023&start=15) which makes it a perfect victim for the update or future roll outs. Also, if you’ll notice in the example forum page I provided, you’ll notice &start=15, remember that..I’ll explain below.
How much of a problem is it: From what I can see, it doesn’t seem like the forum has been affected (please note, I don’t have access to Evan’s analytics and I could be dead wrong). However I do know that the &start= parameter (for pagination) is causing around 3k duplicate pages.
How does it happen: (only explaining the start= parameter) This is basic pagination for most websites when you have too many items / products / posts to consume on one page, you paginate. How you structure the URLs for page 2,3,4 doesn’t really matter – it still affects the site exactly the same way.
What to do:
For the page layout: Move the adsense to the sidebar, or make it less prevalent on the page – I would keep the banner ad at the top.
For the pagination parameter: In this case, we want Google to index all of the forum posts, give the forum post links pagerank – but not the paginated URLs. This is a perfect case of using the robots directive to NOINDEX,FOLLOW which means “Hey Google, I want you to see what’s on page 2,3,4 but I don’t need you wasting your time adding these pages to your index. Also, the first page is the only result I want users to see.” Ok, so I uncomplicated it – hopefully you get the picture.
Alternatively, you could also use rel=canonical to point every paginated version back to the first page. Technically, it works – but it really shouldn’t since the content on the paginated versions isn’t entirely duplicate. The rel=canonical tag transfers pagerank (page/link authority) where the NOINDEX doesn’t. This is a bit devious and that’s why it’s my second choice in this case -so use at your own peril to game the engines!
Crawlability / Architecture & On page Issues
This is the bulk of this post, and I’ll keep it as brief as possible to fly through it.
Problem 1: A full copy of the website (www and non www version exist)
One of the basic checks of a seasoned SEO is to check if the www & non www version of the site exist. Basically, they are different sites that host the same content and that’s just how the internet was made :) Although Google does a good job at trying to canonicalize the preferred version you set in webmaster tools, it’s still cutting the domain’s link authority. People don’t always link to you with your preferred version, and since they are separate sites – you won’t get all of the “link juice”.
How did I find it: By using this Google operator search query
How much of a problem is it: It’s a biggie, I think this site is missing out on around 65 linking root domains and around 600 links.
How does it happen: When you buy a shiny new domain, you get the www and non www version.
What to do: Write a redirect rule (in this case the HTACCESS file for Apache) to redirect all non www pages to their www counterparts.
Problem 2: Unnecessary Redirection
Our principle example is: http://www.evancarmichael.com/blog/famous-quotes/inspirational-business-quotes/ and the redirection link is “#2: Oprah Winfrey, Harpo”
But here’s some more:
How did I find it: Screaming Frog, Internal links report & IIS Crawler. Confirmed it by using HTTPfox.
How much of a problem is it: A straight link to a page passes 100% link juice, but a 301 redirect won’t pass the full value. It’s hard to control on external sites because we can’t always control how someone links to us, however it shouldn’t happen on our own sites.
How does it happen: You create a new page and 301 redirect the old page to the new page to conserve link authority. Except, you forget to correct internal links pointing to the new pages.
What to do: Change the links to the right targets. For example, the link to Oprah’s profile (http://www.evancarmichael.com/Famous-Entrepreneurs/514/summary.php) on this page (http://www.evancarmichael.com/blog/famous-quotes/inspirational-business-quotes/) should go directly to http://oprahwinfrey.evancarmichael.com/.
Problem 3: More duplication, this time it’s WordPress powered blog.
This is the same type of problem I discussed above as a Panda threat, except this time it’s a stock problem with the WordPress CMS. Category, Author and Tag pages are just duplicate content pages that can serve a user well in terms of navigation – but can be a problem when they compete with actual blog posts.
How did I find it: Screaming Frog
How much of a problem is it: Right now there are a ~2k of pages with duplicate content – however the bigger this blog gets, the worse the problem will become.
How does it happen: It’s just a WordPress stock feature
What to do: Easiest way to fix it is to include the NOINDEX, FOLLOW directive again on all author, category and tag pages. To do this super easily I recommend Yoast’s SEO plugin – it’s so good it could put me out of a job, trust me on this one.
Alternatively, you can also use the rel=canonical tag to point back to a main article. It works to transfer page/link authority, doesn’t reduce Googlebot crawl inefficiencies and also is not the correct use of the tag (because the pages aren’t entirely a duplicate, only sections). Use at your own peril to game the engines!
Problem 4: Duplicate & inefficient page titles
- http://www.evancarmichael.com/Retail/3844/What-is-CRM.html title: What is CRM?
- http://www.evancarmichael.com/Technology/6763/What-is-CRM.html title: What is CRM?
- http://www.evancarmichael.com/Business-Coach/2830/Managing-Change.html title: Managing Change
- http://www.evancarmichael.com/Leadership/2076/Managing-Change.html title: Managing Change
Poor page title keyword targeting
- http://www.evancarmichael.com/Going-Green/2340/Cars.html title: Cars
- http://www.evancarmichael.com/Business-Coach/2554/Be.html title: Be
Page titles are crucial for SEO and need to be as descriptive as possible, as well as completely unique. There is no compromise here, we just have to do it to help Google match content to queries.
How did I find it: Screaming Frog
How much of a problem is it: These pages aren’t being seen, either because they have duplicate page titles and Google is confused on which page to return, or the titles aren’t relevant matches to the queries – example: “be”.
How does it happen: I * think * this is due to user generated content.
What to do: Evaluate each article, do some keyword research and title each page according to it’s content. As a general rule, each page title on your site needs to be absolutely unique. You can read more about cannibalization here.
Bearing in mind I did a quick check – I looked at top keywords surrounding Entrepreneur and sadly, didn’t see the site ranking on the first page for top volume terms. So I’d like to see this site take a piece of the ~50k exact match searches per month (US) around these terms:
Also, I’d like to point out that there’s a new kid on the block in this field that’s gaining momentum – how about we create some pages targeting “startups” ?
Things I should tell you
Overall, it’s a pretty good site – but there is opportunity. I’ve only included a few points in this post but rest assure I performed well over 50 different checks in an hour. In the interest in the length of the blog post and to make sure you’re still awake – I kept it brief.
A special thank you to Evan for volunteering his site for the audit – it’s been fun!
So what did you think, was it useful? Do you have any questions?
Feel free to leave them in the comments below, and feel free to send me a tweet every now and again ;)
Baby Panda http://icanhascheezburger.com/