Archives for 

seo

SEO Finds In Your Server Log

Posted by timresnik

I am a huge Portland Trail Blazers fan, and in the early 2000s, my favorite player was Rasheed Wallace. He was a lightning-rod of a player, and fans either loved or hated him. He led the league in technical fouls nearly every year he was a Blazer; mostly because he never thought he committed any sort of foul. Many of those said technicals came when the opposing player missed a free-throw attempt and ‘Sheed’ passionately screamed his mantra: “BALL DON’T LIE.”

‘Sheed’ asserts that a basketball has metaphysical powers that acts as a system of checks and balances for the integrity of the game. While this is debatable (ok, probably not true), there is a parallel to technical SEO: marketers and developers often commit SEO fouls when architecting a site or creating content, but implicitly deny that anything is wrong. 
 
As SEOs, we use all sorts of tools to glean insight into technical issues that may be hurting us: web analytics, crawl diagnostics, and Google and Bing Webmaster tools. All of these tools are useful, but there are undoubtedly holes in the data. There is only one true record of how search engines, such as Googlebot, process your website. These are web server logs. As I am sure Rasheed Wallace would agree, logs are a powerful source of oft-underutilized data that helps keep the integrity of your site’s crawl by search engines in check. 
 
 
A server log is a detailed record of every action performed by a particular server. In the case of a web server, you can get a lot of useful information. In fact, back in the day before free analytics (like Google Analytics) existed, it was common to just parse and review your web logs with software like AWStats
 
I initially planned on writing a single post on this subject, but as I got going I realized that there was a lot of ground to cover. Instead, I will break it into 2 parts, each highlighting different problems that can be found in your web server logs:
 
  1. This post: how to retrieve and parse a log file, and identifying problems based on your server’s response code (404, 302, 500, etc.).
  2. The next post: identifying duplicate content, encouraging efficient crawling, reviewing trends, and looking for patterns and a few bonus non-SEO related tips. 

Step #1: Fetching a log file

Web server logs come in many different formats, and the retrieval method depends on the type of server your site runs on. Apache and Microsoft IIS are two of the most common. The examples in this post will based on an Apache log file from SEOmoz. 
 
If you work in a company with a Sys Admin, be really nice and ask him/her for a log file with a day’s worth of data and the fields that are listed below. I’d recommend keeping the size of the file below 1 gig as the log file parser you’re using might choke up. If you have to generate the file on your own, the method for doing so depends on how your site is hosted. Some hosting services store them in your home directory in a folder called /logs and will drop a compressed log file in that folder on a daily basis. You’ll want to make sure to it includes the following columns:
 
  • Host: you will use this to filter out internal traffic. In SEOmoz’s case, RogerBot spends a lot of time crawling the site and needed to be removed for our analysis. 
  • Date: if you are analyzing multiple days this will allow you to analyze search engine crawl rate trends by day. 
  • Page/File: this will tell you which directory and file is being crawled and can help pinpoint endemic issues in certain sections or with types of content.
  • Response code: knowing the response of the server — the page loaded fine (200), was not found (404), the server was down (503) — provides invaluable insight into inefficiencies that the crawlers may be running into.
  • Referrers: while this isn’t necessarily useful for analyzing search bots, it is very valuable for other traffic analysis.
  • User Agent: this field will tell you which search engine made the request and without this field, a crawl analysis cannot be performed.
Apache log files by default are returned without User Agent or Referrer — this is known as a “common log file.” You will need to request a “combine log file.” Make your Sys Admin’s job a little easier (and maybe even impress) and request the following format:
 
LogFormat “%h %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-agent}i\””
 
For Apache 1.3 you just need “combined CustomLog log/acces_log combined”
 
For those who need to manually pull the logs, you will need to create a directive in the httpd.conf file with one of the above. A lot more detail here on this subject.  
 

Step #2: Parsing a log file

You probably now have a compressed log file like ‘mylogfile.gz’ and it’s time to start digging in. There are myriad software products, free and paid, to analyze and/or parse log files. My main criteria for picking one includes: the ability to view the raw data, the ability to filter prior to parsing, and the ability to export to CSV. I landed on Web Log Explorer (http://www.exacttrend.com/WebLogExplorer/) and it has worked for me for several years. I will use it along with Excel for this demonstration. I’ve used AWstats for basic analysis, but found that it does not offer the level of control and flexibility that I need. I’m sure there are several more out there that will get the job done. 
 
The first step is to import your file into your parsing software. Most web log parsers will accept various formats and have a simple wizard to guide you through the import. With the first pass of the analysis, I like to see all the data and do not apply any filters. At this point, you can do one of two things: prep the data in the parse and export for analysis in Excel, or do the majority of the analysis in the parser itself. I like doing the analysis in Excel in order to create a model for trending (I’ll get into this in the follow-up post). If you want to do a quick analysis of your logs, using the parser software is a good option. 
 
Import Wizard: make sure to include the parameters in the URL string. As I will demonstrate in later posts this will help us find problematic crawl paths and potential sources for duplicate content.
 
 
You can choose to filter the data using some basic regex before it is parsed. For example, if you only wanted to analyze traffic to a particular section of your site you could do something like: 
 
 
Once you have your data loaded into the log parser, export all spider requests and include all response codes:
 
 
Once you have exported the file to CSV and opened in Excel, here are some steps and examples to get the data ready for pivoting into analysis and action: 
 
1. Page/File: in our analysis we will try to expose directories that could be problematic so we want to isolate the directory from the file. The formula I use to do this in Excel looks something like this. 
 
Formula: <would like to put this is a textbox of some sort>
=IF(ISNUMBER(SEARCH(“/”,C29,2)),MID(C29,(SEARCH(“/”,C29)),(SEARCH(“/”,C29,(SEARCH(“/”,C29)+1)))-(SEARCH(“/”,C29))),”no directory”)
 
2. User Agent: in order to limit our analysis to the search engines we care about, we need to search this field for specific bots. In this example, I’m including Googlebot, Googlebot-Images, BingBot, Yahoo, Yandex and Baidu. 
 
Formula (yeah, it’s U-G-L-Y)
 
=IF(ISNUMBER(SEARCH(“googlebot-image”,H29)),”GoogleBot-Image”, IF(ISNUMBER(SEARCH(“googlebot”,H29)),”GoogleBot”,IF(ISNUMBER(SEARCH(“bing”,H29)),”BingBot”,IF(ISNUMBER(SEARCH(“Yahoo”,H29)),”Yahoo”, IF(ISNUMBER(SEARCH(“yandex”,H29)),”yandex”,IF(ISNUMBER(SEARCH(“baidu”,H29)),”Baidu”, “other”))))))
 
Your log file is now ready for some analysis and should look something like this:
 
 
Let’s take a breather, shall we?
 

Step # 3: Uncover server and response code errors

The quickest way to suss out issues that search engines are having with the crawl of your site is to look at the server response codes that are being served. Too many 404s (page not found) can mean that precious crawl resources are being wasted. Massive 302 redirects can point to link equity dead-ends in your site architecture. While Google Webmaster Tools provides some information on such errors, they do not provide a complete picture: LOGS DON’T LIE.
 
The first step to the analysis is to generate a pivot table from your log data. Our goal here is to isolate the spiders along with the response codes that are being served. Select all of your data and go to ‘Data>Pivot Table.’
 
On the most basic level, let’s see who is crawling SEOmoz on this particular day:
 
 
There are no definitive conclusions that we can make from this data, but there are a few things that should be noted for further analysis. First, BingBot is crawling the site at about an 80% more clip. Why? Second, ‘other’ bots account for nearly half of the crawls. Did we miss something in our search of the User Agent field? As for the latter, we can see from a quick glance that most of which is accounting for ‘other’ is RogerBot — we’ll exclude this. 
 
Next, let’s have a look at server codes for the engines that we care most about.
 
 
I’ve highlighted the areas that we will want to take a closer look. Overall, the ratio of good to bad looks healthy, but since we live by the mantra that “every little bit helps” let’s try to figure out what’s going on. 
 
1. Why is Bing crawling the site at 2x that of Google? We should investigate to see if Bing is crawling inefficiently and if there is anything we can do to help them along or if Google is not crawling as deep as Bing and if there is anything we can do to encourage a deeper crawl. 
 
By isolating the pages that were successfully served (200s) to BingBot the potential culprit is immediately apparent. Nearly 60,000 of 100,000 pages that BingBot crawled successfully were user login redirects from a comment link. 
 
 
The problem: SEOmoz is architected in such a way that if a comment link is requested and JavaScript is not enabled it will serve a redirect (being served as a 200 by the server) to an error page. With nearly 60% of Bing’s crawl being wasted on such dead-ends, it is important that SEOmoz block the engines from crawling. 
 
The solution: add rel=’nofollow’ to all comment and reply to comment links. Typically, the ideal method for telling and engine not to crawl something is a directive in the robots.txt file. Unfortunately, that won’t work in this scenario because the URL is being served via the JavaScript after the click. 
GoogleBot is dealing with the comment links better than Bing and avoiding them altogether. However, Google is crawling a handful of links sucessfully that are login redirects. Take a quick look at the robots.txt and you will see that this directory should probably be blocked. 
 
2. The number of 302s being served to Google and Bing is acceptable, but it doesn’t hurt to review in case there are better ways for dealing with some of edge cases. For the most part SEOmoz is using 302s for defunct blog category architecture that redirects the user to the main blog page. They are also being used for private message pages /message, and a robots.txt directive should exclude these pages from being crawled at all. 
 
3. Some of the most valuable data that you can get from your server logs are links that are being crawled that resolve in a 404. SEOmoz has done a good job managing these errors and does not have an alarming level of 404s. A quick way to identify potential problems is to isolate 404s by directory. This can be done by running a pivot table with “Directory” as your row label and count of “Directory” in your value field. You’ll get something like:
 
 
The problem: the main issue that’s popping here is 90% of the 404s are in one directory, /comments. Given the issues with BingBot and the JavaScript driven redirect mentioned above this doesn’t really come as a surprise. 
 
The solution: the good news is that since we are already using rel=’nofollow’ on the comment links these 404s should also be taken care of. 
 

Conclusion

Google and Bing Webmaster tools provide you information on crawl errors, but in many cases they limit the data. As SEOs we should use every source of data that is available and after all, there is only one source of data that you can truly rely on: your own. 
 
LOGS DON’T LIE!
 
And for your viewing pleasure, here’s a bonus clip for reading the whole post.
 

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Back to the Future: Forecasting Your Organic Traffic

Posted by Dan Peskin

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

Great Scott! I am finally back again for another spectacularly lengthy post, rich with wonderful titles, and this time – statistical goodness. It just so happens, that in my past short-lived career, I was a Forecast Analyst (not this kind). So today class, we will be learning about the importance of forecasting organic traffic and how you can get started. Let’s begin our journey.

I just put this here because it looks really cool.

Forecasting is Your Density. I Mean, Your Destiny

Why should I forecast? Besides the obvious answer – it’s f-ing cool to predict the future, there are a number of benefits for both you and your company.

Forecasting adds value in both an agency and in-house setting. It provides a more accurate way to set goals and plan for the future, which can be applied to client projects, internal projects, or overall team/dept. strategy.

Forecasting creates accountability for your team. It allows you to continually set goals based on projections and monitor performance through forecast accuracy (Keep in mind that exceeding goals is not necessarily a good thing, which is why forecast accuracy is important. We will discuss this more later).

Forecasting teaches you about inefficiencies in your team, process, and strategy. The more you segment your forecast, the deeper you can dive into finding the root of the inaccuracies in your projections. And the more granular you get, the more accurate your forecast, so you will see that segmentation is a function of accuracy (assuming you continually work to improve it).

Forecasting is money. This is the most important concept of forecasting, and probably the point in where you decided that you will read the rest of this article.

The fact that you can improve inefficiencies in your process and strategy through forecasting, means you can effectively increase ROI. Every hour and resource allocated to a strategy that doesn’t deliver results can be reallocated to something that proves to be a more stable source of increased organic traffic. So finding out what strategies consistently deliver the results you expect, means you’re investing money into resources that have a higher probability of delivering you a larger ROI.

Furthermore, providing accurate projections, whether it’s to a CFO, manager, or client, gives the reviewer a more compelling reason to invest in the work that backs the forecast. Basically, if you want a bigger budget to work with, forecast the potential outcome of that bigger budget and sell it. Sell it well.

Okay. Flux Capacitor, Fluxing. Forecast, Forecasting?

Contraption that I have no clue what it does

I am going to make the assumption that everyone’s DeLorean is in the shop, so how do we forecast our organic traffic?

There are four main factors to account for in an organic traffic forecast: historical trends, growth, seasonality, and events. Historical data is always the best place to start and create your forecast. You will want to have as many historical data points as possible, but the accuracy of the data should come first.

Determining the Accuracy of the Data

Once you have your historical data set, start analyzing it for outliers. An outlier to a forecast is what Biff is to George McFly, something you need to punch in the face and then make wash your car 20 years in the future. Well something like that.

The quick way to find outliers is to simply graph your data and look for spikes in the graph. Each spike is associated with a data point, which is your outlier, whether it spikes up or down. This way does leave room for error, as the determination of outliers is based on your judgement and not statistical significance.

The long way is much more fun and requires a bit of math. I’ll provide some formula refreshers along the way.

Calculating the mean and the standard deviation of your historical data is the first step.

Mean

Formula for finding the mean

Standard Deviation
 

Standard Deviation Formula

Looking at the standard deviation can immediately tell you whether you have outliers or not. The standard deviation tells you how close your data falls near the average or mean, so the lower the standard deviation, the closer the data points are to each other.

You can go a step further and set a rule by calculating the coefficient of variation (COV). As a general rule, if your COV is less than 1, the variance in your data is low and there is a good probability that you don’t need to adjust any data points.

Coefficient of Variation (COV)

Coefficient of Variation Formula

If all the signs point to you having significant outliers, you will now need to determine which data points those are. A simple way to do this is calculate how many standard deviations away from the mean your data point is.

Unfortunately, there is no clear cut rule to qualify an outlier with deviations from the mean. This is due to the fact that every data set is distributed differently. However, I would suggest starting with any data point that is more than one deviation from the mean.

Making your decision about whether outliers exist takes time and practice. These general rules of thumb can help you figure it out, but it really relies on your ability to interpret the data and be able to understand how each data point affects your forecast. You have the inside knowledge about your website, your equations and graphs don’t. So put that to use and start making your adjustments to your data accordingly.

Adjusting Outliers

Ask yourself one question: Should we account for this spike? Having spikes or outliers is normal, whether you need to do anything about it is what you should be asking yourself now. You want to use that inside knowledge of yours to determine why the spike occurred, whether it will happen again, and ultimately whether it should accounted for in your future forecast.

Organic Search Traffic Graph

In the case that you don’t want to account for an outlier, you will need to accurately adjust it down or up to the number it would have been without the event that caused the anomaly.

For example, let’s say you launched a super original infographic about the Olympics in July last year that brought your site an additional 2,000 visits that month. You may not want to account for this as it will not be a recurring event or maybe it fails to bring qualified organic traffic to the site (if the infographic traffic doesn’t convert, then your revenue forecast will be inaccurate). So the resulting action would be to adjust the July data point down 2,000 visits.

On the flipside, what if your retail electronics website has a huge positive spike in November due to Black Friday? You should expect that rise in traffic to continue this November and account for it in your forecast. The resulting action here is to simply leave the outlier alone and let the forecast do it’s business (This is also an example of seasonality which I will talk about more later).

Base Forecast

When creating your forecast, you want to create a base for it before you start incorporating additional factors into it. The base forecast is usually a flat forecast or a line straight down the middle of your charted data. In terms of numbers, this can be simply be using the mean for every data point. The line down the middle of the data follows the trend of the graph, so this would be the equivalent of the average but accounting for slope too. Excel provides a formula which actually does this for you:

=FORECAST(x, known_y’s,known_x’s)

Given the historical data, excel will output a forecast based on that data and the slope from the starting point to end point. Dependent on your data, your base forecast could be where you stop, or where you begin developing an accurate forecast.

Now how do you improve your forecast? It’s a simple idea – account for anything and everything the data might not be able to account for. Now you don’t need to go overboard here. I would draw the line well before you start forecasting the decrease in productivity on Fridays due to beer o clock. I suggest accounting for three key factors and accounting for them well; growth, seasonality, and events.

Growth

You have to have growth. If you aren’t planning to grow anytime soon, then this is going to be a really depressing forecast. Including growth can be as simple as adding 5% month over month, due to a higher level estimate from management, or as detailed as estimating incremental search traffic by keyword from significant ranking increases. Either way, the important part is being able to back your estimates with good data and know where to look for it. With organic traffic, growth can come from a number of sources but these are a couple key components to consider:

Are you launching new products?

New product being built by Doc Brown

New products means new pages, and dependent on your domain’s authority and your internal linking structure, you can see an influx of organic traffic. If you have analyzed the performance of newly launched pages, you should be able to estimate on average what percentage of search traffic from relevant and target keywords they can bring over time.

Using Google Webmaster Tools CTR data and the Adwords Tool for search volume are your best bet to acquire the data you need to estimate this. You can then apply this estimate to search volumes for the keywords that are relevant to each new product page and determine the additional growth in organic traffic that new product lines will bring.

Tip: Make sure to consider your link building strategies when analyzing past product page data. If you built links to these pages over the analyzed time period, then you should plan on doing the same for the new product pages.

What ongoing SEO efforts are increasing?

Did you get a link building budget increase? Are you retargeting several key pages on your website? These things can easily be factored in, as long as you have consistent data to back it up. Consistency in strategy is truly an asset, especially in the SEO world. With the frequency of algorithm updates, people tend to shift strategies fairly quickly. However, if you are consistent, you can quantify the results of your strategy and use it improve your strategy and understand its effects on the applied domain.

The general idea here is that if you know historically the effect of certain actions on a domain, then you can predict how relative changes to the domain will affect the future (given there are no drastic algorithm updates).

Let’s take a simple example. Let’s say you build 10 links to a domain per month and the average Page Authority is 30 and Domain Authority is 50 for the targeted pages and domain when you started. Over time you see as a result, your organic traffic increase by 20% for the pages you targeted on this campaign. So if your budget increases and allows you to apply the same campaign to other pages on the website, you can estimate an increase in organic traffic of 20% to those pages.

This example assumes the new target pages have:

  • Target keywords with similar search volumes
  • Similar authority at prior to the campaign start
  • Similar existing traffic and ranking metrics
  • Similar competition

While this may be a lot to assume, this is for the purpose of the example. However, these are things that will need to be considered and these are the types of campaigns that should be invested in from a SEO standpoint. When you find a strategy that works, repeat it and control the factors as much as possible. This will provide for an outcome that is the least likely to diverge from expected results.

Seasonality

To incorporate seasonality into a organic traffic forecast, you will need to create seasonal indices for each month of the year. A seasonal index is an index of how that month’s expected value relates to the average expected value. So in this case, it would be how each month’s organic traffic compares with average or mean monthly organic traffic.

So let’s say your average organic traffic is 100,000 visitors per month and your adjusted traffic for last November was 150,000 visitors, then your index for November is 1.5. In your forecast you simply multiply by this weight for the corresponding index month.

To calculate these seasonal indices, you need data of course. Using adjusted historical data is the best solution, if you know that it reflects the seasonality of the website’s traffic well.

Remember all that seasonal search volume data the Adwords tool provides? That can actually be put to practical use! So if you haven’t already, you should probably get with the times and download the Adwords API excel plugin from SEOgadget (if you have API access). This can make gathering seasonal data for a large set of keywords quick and easy.

What you can do here, is gather data for all the keywords that drive your organic traffic, aggregate it, and see if the trends in search align with the seasonality you are observing in your adjusted historical data. If there is a major discrepancy between the two, you may need to dig deeper into why or shy away from accounting for it in your forecast.

Events

This one should be straightforward. If you have big events coming up, find a way to estimate their impact on your organic traffic. Events can be anything from a yearly sale, to a big piece of content being pushed out, or a planned feature on a big media site.

All you have to do here is determine the expected increase in traffic from each event you have planned. This all goes back to digging into your historical data. What typically happens when you have a sale? What’s the change in traffic when you launch a huge content piece? If you can get an estimate of this, just add it to the corresponding month when the event will take place.

Once you have this covered, you should have the last piece to a good looking forecast. Now it’s time to put it to the test.

Forecast Accuracy

So you have looked into your crystal ball and finally made your predictions, but what do you do now? Well the process of forecasting is a cycle and you now need to measure the accuracy of your predictions. Once you have the actuals to compare to your forecast, you can measure your forecast accuracy and use this to determine whether your current forecasting model is working.

There is a basic formula you can use to compare your forecast to your actual results, which is the mean absolute percent error (MAPE):

MAPE formula

This formula requires you to calculate the mean of the absolute percent error for each time period, giving you your forecast accuracy for the total given forecast period.

Additionally, you will want to analyze your forecast accuracy for just a single period if your forecast accuracy is low. Looking at the percent error month to month will allow you to pin point where the largest error in your forecast is and help you determine the root of the problem.

Keep in mind that accuracy is crucial if organic traffic is a powerful source of product revenue for your business. This is where exceeding expectations can be a bad thing. If you exceed forecast, this can result in stock outs on products and a loss in potential revenue.

Consider the typical online consumer, do you think they will wait to purchase your product on your site if they can find it somewhere else? Online shoppers want immediate results, so making sure you can fulfil their order makes for better customer service and less bounces on product pages (which can affect rank as we know).

Google Results for Vizio 19in

Walmart Vizio TV
 

Top result for this query is out of stock, which will not help maintain that position in the long term.

Now this doesn’t mean you should over forecast. There is a price to pay on both ends of the spectrum. Inflating your forecast means you could be bringing in excess inventory as it ties to product expectations. This can bring in unnecessary inventory expenses such as increased storage costs and tie up cash flow until the excess product is shipped. And dependent on product life cycles, continuing this practice can lead to an abundance of obsolete product and huge financial problems.

So once you have measured your forecast to actuals and considered the above, you can repeat the process more accurately and refine your forecast! Well this concludes our crash course in forecasting and how to apply it to organic traffic. So what are you waiting for? Start forecasting!

Oh and here is a little treat to get you started.

Are you telling me you built a time machine…in Excel?

Well no, Excel can’t help you time travel, but it can help you forecast. The way I see it, if you’re gonna build a forecast in Excel, why not do it in style?

I decided that your brain has probably gone to mush by now, so I am going to help you on your way to forecasting until the end of days. I am providing a stylish little excel template that has several features, but I warn you it doesn’t do all the work.

It’s nothing to spectacular, but this template will put you on your way to analyzing your historical data and building your forecast. Forecasting isn’t an exact science, so naturally you need to do some work and make the call on what needs to be added or subtracted to the data.

What this excel template provides:

  • The ability to plug in the last two years of monthly organic traffic data and see a number of statistical calculations that will allow you to quickly analyze your historical data.
  • Provides you with the frequency distribution of your data.
  • Highlights the data points that are more than a standard deviation from the mean.
  • Provides you with some metrics we discussed (mean, growth rate, standard deviation, etc).

Oh wait there’s more?

The expression on your face right now.

Yes. Yes. Yes. This simple tool will graph your historical and forecast data, provide you with a base forecast, and a place to easily add anything you need to account for in the forecast. Lastly, for those who don’t have revenue data tied to Analytics, it provides you with a place to add your AOV and Average Conversion Rate to estimate future organic revenue as well. Now go have some fun with it.

________________________________________________________________________________________

Obviously we can’t cover everything you need to know about forecasting in a single blog post. That goes both from a strategic and mathematical standpoint. So let me know what you think, what I missed, or if there are any points or tools that you think are applicable for the typical marketer to add to their skillset and spend some time learning.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Back to the Future: Forecasting Your Organic Traffic

Posted by Dan Peskin

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

Great Scott! I am finally back again for another spectacularly lengthy post, rich with wonderful titles, and this time – statistical goodness. It just so happens, that in my past short-lived career, I was a Forecast Analyst (not this kind). So today class, we will be learning about the importance of forecasting organic traffic and how you can get started. Let’s begin our journey.

I just put this here because it looks really cool.

Forecasting is Your Density. I Mean, Your Destiny

Why should I forecast? Besides the obvious answer – it’s f-ing cool to predict the future, there are a number of benefits for both you and your company.

Forecasting adds value in both an agency and in-house setting. It provides a more accurate way to set goals and plan for the future, which can be applied to client projects, internal projects, or overall team/dept. strategy.

Forecasting creates accountability for your team. It allows you to continually set goals based on projections and monitor performance through forecast accuracy (Keep in mind that exceeding goals is not necessarily a good thing, which is why forecast accuracy is important. We will discuss this more later).

Forecasting teaches you about inefficiencies in your team, process, and strategy. The more you segment your forecast, the deeper you can dive into finding the root of the inaccuracies in your projections. And the more granular you get, the more accurate your forecast, so you will see that segmentation is a function of accuracy (assuming you continually work to improve it).

Forecasting is money. This is the most important concept of forecasting, and probably the point in where you decided that you will read the rest of this article.

The fact that you can improve inefficiencies in your process and strategy through forecasting, means you can effectively increase ROI. Every hour and resource allocated to a strategy that doesn’t deliver results can be reallocated to something that proves to be a more stable source of increased organic traffic. So finding out what strategies consistently deliver the results you expect, means you’re investing money into resources that have a higher probability of delivering you a larger ROI.

Furthermore, providing accurate projections, whether it’s to a CFO, manager, or client, gives the reviewer a more compelling reason to invest in the work that backs the forecast. Basically, if you want a bigger budget to work with, forecast the potential outcome of that bigger budget and sell it. Sell it well.

Okay. Flux Capacitor, Fluxing. Forecast, Forecasting?

Contraption that I have no clue what it does

I am going to make the assumption that everyone’s DeLorean is in the shop, so how do we forecast our organic traffic?

There are four main factors to account for in an organic traffic forecast: historical trends, growth, seasonality, and events. Historical data is always the best place to start and create your forecast. You will want to have as many historical data points as possible, but the accuracy of the data should come first.

Determining the Accuracy of the Data

Once you have your historical data set, start analyzing it for outliers. An outlier to a forecast is what Biff is to George McFly, something you need to punch in the face and then make wash your car 20 years in the future. Well something like that.

The quick way to find outliers is to simply graph your data and look for spikes in the graph. Each spike is associated with a data point, which is your outlier, whether it spikes up or down. This way does leave room for error, as the determination of outliers is based on your judgement and not statistical significance.

The long way is much more fun and requires a bit of math. I’ll provide some formula refreshers along the way.

Calculating the mean and the standard deviation of your historical data is the first step.

Mean

Formula for finding the mean

Standard Deviation
 

Standard Deviation Formula

Looking at the standard deviation can immediately tell you whether you have outliers or not. The standard deviation tells you how close your data falls near the average or mean, so the lower the standard deviation, the closer the data points are to each other.

You can go a step further and set a rule by calculating the coefficient of variation (COV). As a general rule, if your COV is less than 1, the variance in your data is low and there is a good probability that you don’t need to adjust any data points.

Coefficient of Variation (COV)

Coefficient of Variation Formula

If all the signs point to you having significant outliers, you will now need to determine which data points those are. A simple way to do this is calculate how many standard deviations away from the mean your data point is.

Unfortunately, there is no clear cut rule to qualify an outlier with deviations from the mean. This is due to the fact that every data set is distributed differently. However, I would suggest starting with any data point that is more than one deviation from the mean.

Making your decision about whether outliers exist takes time and practice. These general rules of thumb can help you figure it out, but it really relies on your ability to interpret the data and be able to understand how each data point affects your forecast. You have the inside knowledge about your website, your equations and graphs don’t. So put that to use and start making your adjustments to your data accordingly.

Adjusting Outliers

Ask yourself one question: Should we account for this spike? Having spikes or outliers is normal, whether you need to do anything about it is what you should be asking yourself now. You want to use that inside knowledge of yours to determine why the spike occurred, whether it will happen again, and ultimately whether it should accounted for in your future forecast.

Organic Search Traffic Graph

In the case that you don’t want to account for an outlier, you will need to accurately adjust it down or up to the number it would have been without the event that caused the anomaly.

For example, let’s say you launched a super original infographic about the Olympics in July last year that brought your site an additional 2,000 visits that month. You may not want to account for this as it will not be a recurring event or maybe it fails to bring qualified organic traffic to the site (if the infographic traffic doesn’t convert, then your revenue forecast will be inaccurate). So the resulting action would be to adjust the July data point down 2,000 visits.

On the flipside, what if your retail electronics website has a huge positive spike in November due to Black Friday? You should expect that rise in traffic to continue this November and account for it in your forecast. The resulting action here is to simply leave the outlier alone and let the forecast do it’s business (This is also an example of seasonality which I will talk about more later).

Base Forecast

When creating your forecast, you want to create a base for it before you start incorporating additional factors into it. The base forecast is usually a flat forecast or a line straight down the middle of your charted data. In terms of numbers, this can be simply be using the mean for every data point. The line down the middle of the data follows the trend of the graph, so this would be the equivalent of the average but accounting for slope too. Excel provides a formula which actually does this for you:

=FORECAST(x, known_y’s,known_x’s)

Given the historical data, excel will output a forecast based on that data and the slope from the starting point to end point. Dependent on your data, your base forecast could be where you stop, or where you begin developing an accurate forecast.

Now how do you improve your forecast? It’s a simple idea – account for anything and everything the data might not be able to account for. Now you don’t need to go overboard here. I would draw the line well before you start forecasting the decrease in productivity on Fridays due to beer o clock. I suggest accounting for three key factors and accounting for them well; growth, seasonality, and events.

Growth

You have to have growth. If you aren’t planning to grow anytime soon, then this is going to be a really depressing forecast. Including growth can be as simple as adding 5% month over month, due to a higher level estimate from management, or as detailed as estimating incremental search traffic by keyword from significant ranking increases. Either way, the important part is being able to back your estimates with good data and know where to look for it. With organic traffic, growth can come from a number of sources but these are a couple key components to consider:

Are you launching new products?

New product being built by Doc Brown

New products means new pages, and dependent on your domain’s authority and your internal linking structure, you can see an influx of organic traffic. If you have analyzed the performance of newly launched pages, you should be able to estimate on average what percentage of search traffic from relevant and target keywords they can bring over time.

Using Google Webmaster Tools CTR data and the Adwords Tool for search volume are your best bet to acquire the data you need to estimate this. You can then apply this estimate to search volumes for the keywords that are relevant to each new product page and determine the additional growth in organic traffic that new product lines will bring.

Tip: Make sure to consider your link building strategies when analyzing past product page data. If you built links to these pages over the analyzed time period, then you should plan on doing the same for the new product pages.

What ongoing SEO efforts are increasing?

Did you get a link building budget increase? Are you retargeting several key pages on your website? These things can easily be factored in, as long as you have consistent data to back it up. Consistency in strategy is truly an asset, especially in the SEO world. With the frequency of algorithm updates, people tend to shift strategies fairly quickly. However, if you are consistent, you can quantify the results of your strategy and use it improve your strategy and understand its effects on the applied domain.

The general idea here is that if you know historically the effect of certain actions on a domain, then you can predict how relative changes to the domain will affect the future (given there are no drastic algorithm updates).

Let’s take a simple example. Let’s say you build 10 links to a domain per month and the average Page Authority is 30 and Domain Authority is 50 for the targeted pages and domain when you started. Over time you see as a result, your organic traffic increase by 20% for the pages you targeted on this campaign. So if your budget increases and allows you to apply the same campaign to other pages on the website, you can estimate an increase in organic traffic of 20% to those pages.

This example assumes the new target pages have:

  • Target keywords with similar search volumes
  • Similar authority at prior to the campaign start
  • Similar existing traffic and ranking metrics
  • Similar competition

While this may be a lot to assume, this is for the purpose of the example. However, these are things that will need to be considered and these are the types of campaigns that should be invested in from a SEO standpoint. When you find a strategy that works, repeat it and control the factors as much as possible. This will provide for an outcome that is the least likely to diverge from expected results.

Seasonality

To incorporate seasonality into a organic traffic forecast, you will need to create seasonal indices for each month of the year. A seasonal index is an index of how that month’s expected value relates to the average expected value. So in this case, it would be how each month’s organic traffic compares with average or mean monthly organic traffic.

So let’s say your average organic traffic is 100,000 visitors per month and your adjusted traffic for last November was 150,000 visitors, then your index for November is 1.5. In your forecast you simply multiply by this weight for the corresponding index month.

To calculate these seasonal indices, you need data of course. Using adjusted historical data is the best solution, if you know that it reflects the seasonality of the website’s traffic well.

Remember all that seasonal search volume data the Adwords tool provides? That can actually be put to practical use! So if you haven’t already, you should probably get with the times and download the Adwords API excel plugin from SEOgadget (if you have API access). This can make gathering seasonal data for a large set of keywords quick and easy.

What you can do here, is gather data for all the keywords that drive your organic traffic, aggregate it, and see if the trends in search align with the seasonality you are observing in your adjusted historical data. If there is a major discrepancy between the two, you may need to dig deeper into why or shy away from accounting for it in your forecast.

Events

This one should be straightforward. If you have big events coming up, find a way to estimate their impact on your organic traffic. Events can be anything from a yearly sale, to a big piece of content being pushed out, or a planned feature on a big media site.

All you have to do here is determine the expected increase in traffic from each event you have planned. This all goes back to digging into your historical data. What typically happens when you have a sale? What’s the change in traffic when you launch a huge content piece? If you can get an estimate of this, just add it to the corresponding month when the event will take place.

Once you have this covered, you should have the last piece to a good looking forecast. Now it’s time to put it to the test.

Forecast Accuracy

So you have looked into your crystal ball and finally made your predictions, but what do you do now? Well the process of forecasting is a cycle and you now need to measure the accuracy of your predictions. Once you have the actuals to compare to your forecast, you can measure your forecast accuracy and use this to determine whether your current forecasting model is working.

There is a basic formula you can use to compare your forecast to your actual results, which is the mean absolute percent error (MAPE):

MAPE formula

This formula requires you to calculate the mean of the absolute percent error for each time period, giving you your forecast accuracy for the total given forecast period.

Additionally, you will want to analyze your forecast accuracy for just a single period if your forecast accuracy is low. Looking at the percent error month to month will allow you to pin point where the largest error in your forecast is and help you determine the root of the problem.

Keep in mind that accuracy is crucial if organic traffic is a powerful source of product revenue for your business. This is where exceeding expectations can be a bad thing. If you exceed forecast, this can result in stock outs on products and a loss in potential revenue.

Consider the typical online consumer, do you think they will wait to purchase your product on your site if they can find it somewhere else? Online shoppers want immediate results, so making sure you can fulfil their order makes for better customer service and less bounces on product pages (which can affect rank as we know).

Google Results for Vizio 19in

Walmart Vizio TV
 

Top result for this query is out of stock, which will not help maintain that position in the long term.

Now this doesn’t mean you should over forecast. There is a price to pay on both ends of the spectrum. Inflating your forecast means you could be bringing in excess inventory as it ties to product expectations. This can bring in unnecessary inventory expenses such as increased storage costs and tie up cash flow until the excess product is shipped. And dependent on product life cycles, continuing this practice can lead to an abundance of obsolete product and huge financial problems.

So once you have measured your forecast to actuals and considered the above, you can repeat the process more accurately and refine your forecast! Well this concludes our crash course in forecasting and how to apply it to organic traffic. So what are you waiting for? Start forecasting!

Oh and here is a little treat to get you started.

Are you telling me you built a time machine…in Excel?

Well no, Excel can’t help you time travel, but it can help you forecast. The way I see it, if you’re gonna build a forecast in Excel, why not do it in style?

I decided that your brain has probably gone to mush by now, so I am going to help you on your way to forecasting until the end of days. I am providing a stylish little excel template that has several features, but I warn you it doesn’t do all the work.

It’s nothing to spectacular, but this template will put you on your way to analyzing your historical data and building your forecast. Forecasting isn’t an exact science, so naturally you need to do some work and make the call on what needs to be added or subtracted to the data.

What this excel template provides:

  • The ability to plug in the last two years of monthly organic traffic data and see a number of statistical calculations that will allow you to quickly analyze your historical data.
  • Provides you with the frequency distribution of your data.
  • Highlights the data points that are more than a standard deviation from the mean.
  • Provides you with some metrics we discussed (mean, growth rate, standard deviation, etc).

Oh wait there’s more?

The expression on your face right now.

Yes. Yes. Yes. This simple tool will graph your historical and forecast data, provide you with a base forecast, and a place to easily add anything you need to account for in the forecast. Lastly, for those who don’t have revenue data tied to Analytics, it provides you with a place to add your AOV and Average Conversion Rate to estimate future organic revenue as well. Now go have some fun with it.

________________________________________________________________________________________

Obviously we can’t cover everything you need to know about forecasting in a single blog post. That goes both from a strategic and mathematical standpoint. So let me know what you think, what I missed, or if there are any points or tools that you think are applicable for the typical marketer to add to their skillset and spend some time learning.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Personalization and SEO – Whiteboard Friday

Posted by randfish

Personalization usage data and user data give marketers deep insights into their users’ interests and actions. But how can you make the most out of these complex data sets to better serve your SEO campaigns?

In this week’s Whiteboard Friday, Rand takes us through the intricate world of personalization and how it affects SEO. We’d love to hear your thoughts and tips in the comments below! 



Video Transcription

“Howdy, SEOmoz fans. Welcome to another edition of Whiteboard Friday. This week I’m wearing a hoodie and a T-shirt, so it must be informal. I want to take you in a casual fashion into the topic of personalization user data and usage data, and these are complex topics. This Whiteboard Friday will not be able to cover all of the different areas that user and usage data and personalization touch on. But what I do hope to do is expose you to some of these ideas, give you some actionable insights, and then allow you guys to take some of those things away, and we can point to some other references. There are lots of folks who have done a good job in the search world of digging in deep on some of these other topics.
Let’s start by talking about some of the direct impacts that personalization usage data have. Of course, by personalization usage data I mean the areas where Google is showing you or other users specific things based on your usage activities, where they are leveraging usage data, broad usage data, for many users to come up with different changes to these types of search results, and where they’re leveraging user personalization on a macro level, taking the aggregate of those things and creating new types of results, re-ranking things and adding snippets. I’ll talk about each of those.
In these direct impacts, one of the most important ones to think about is location awareness. This is particularly important obviously if you’re serving a local area, but you should be aware that location biases a lot of searches that may not have intended to be local simply by virtue of their geography. If you’re at a point, if I’m here in downtown Seattle, there is location awareness that affects the results ordering. I can perform searches, for example for Coffee Works, and I will get these Seattle Coffee Works results.
Perhaps if I was in Portland, Oregon and they had a Coffee Works in Portland, I would be getting those Coffee Works results. Usage history also gives Google hints about your location, meaning that even if you’re searching on your smartphone or searching on your laptop, and you said, “Don’t share my location,” Google and Bing will still try to figure this out, and they’ll try to figure it out by looking at your search history. They’ll say to themselves, “Hey, it looks like this user has previously done searches for Madison Markets, Seattle Trader Joe’s, used our maps to get directions from Capitol Hill to Queen Anne. I can guess, based on that usage data, that you are in Seattle, and I will try and give you personalized results that essentially are tied to the location where I think you’re at.”
A fascinating example of this is I was searching on my desktop computer last night, which I have not made it location aware specifically, but I did a search for a particular arena in Dublin, which is where the DMX Conference, that I’m going to in a couple days and speaking at, is going to be held. Then I started typing in the name of the hotel I was at, and it’s a brand name hotel. What do you know? That location came up, the Dublin location of the brand hotel, even though that hotel has locations all over the world. How do they know? They know because I just performed a search that was related to Dublin, Ireland, and therefore they’re thinking, oh yeah, that’s probably where he’s looking for this hotel information as well. Very, very smart usage history based personalization.
Do be aware search suggest is also affected directly by personalization types of results. If you are doing a search that is going to be biased by some element of personalization, either your search history or your location, those kinds of things, auto-suggest will come up with those same biases as the rankings might.
Next, I want to talk about the semantics of how you perform queries and what you’re seeking can affect your search as well. Search history is an important bias here, right? Basically, if I’ve been doing searches for jewelry, gemstones, wedding rings, those kinds of things, and I do a search for ruby, Google and Bing are pretty smart. They can realize, based on that history, that I probably mean ruby the stone, not Ruby the programming language. Likewise, if I’ve just done searches for Python, Pearl and Java, they might interpret that to mean, “Aha, this person is most likely, when they’re searching for Ruby, looking for the programming language.” This makes it very hard if you’re a software engineer who’s trying to look for gemstones, by the way. As you know, the ruby gem is not just a gem. It’s also part of the programming protocol.
This gets very interesting. Even seemingly unrelated searches and behavior can modify the results, and I think this is Google showing their strength in pattern matching and machine learning. They essentially have interpreted, for example, as disparate things as me performing searches around the SEO world and them interpreting that to mean that I’m a technical person, and therefore as I do searches related to Ruby or Python, they don’t think the snake or the gemstone. They think the programming language Python or the programming language Ruby, which is pretty interesting, connecting up what is essentially a marketing discipline, SEO a technical marketing discipline, and connecting up those programming languages. Very, very interesting. That can modify your results as well.
Your social connections. So social connections was a page that existed on Google until last year. In my opinion, it was a very important page and a frustrating page that they’ve now removed. The social connections page would show, based on the account you were inside of, all your contacts and how Google connected you to them and how they might influence your search results.
For example, it would say randfish@gmail.com,which is my Gmail account that I don’t actually use, is connected to Danny Sullivan because Rand has emailed Danny Sullivan on that account, and therefore we have these accounts that Danny Sullivan has connected to Google in one way or another. In fact, his Facebook account and several other accounts were connected through his Quora account because Quora OAuths into those, and Google has an agreement or whatever, an auth system with Quora. You could see, wow, Google is exposing things that Danny Sullivan has shared on Facebook to me, not directly through Facebook, but through this protocol that they’ve got with Quora. That’s fascinating. Those social connections can influence the content you’re seeing, can influence the rankings where you see those things. So you may have never seen them before, they may have changed the rankings themselves, and they can also influence the snippets that you’re seeing.
For example, when I see something that Danny Sullivan has Plus One’d or shared on Google+, or I see something that Darmesh Shah, for example, has shared on twitter, it will actually say, “Your friend, Darmesh, shared this,” or “Your friend, Danny Sullivan, shared this,” or “Danny Sullivan shared this.” Then you can hover on that person and see some contact information about them. So fascinating ways that social connections are being used.
Big take-aways here, if you are a business and you’re thinking about doing marketing and SEO, you have to be aware that these changes are taking place. It’s not productive or valuable to get frustrated that not everyone is seeing the same auto-suggest results, the same results in the same order. You just have to be aware that, hey, if we’re going to be in a location, that location could be biasing for us or against us, especially if you’re not there or if something else is taking your place.
If people are performing searches that are related to topics that might have more than one meaning, you have to make sure that you feel like your audience is well tapped into and that they’re performing searches that they are aware of your products getting more content out there that they might be searching for and building a bigger brand. Those things will certainly help. A lot of the offline branding kinds of things actually help considerably with this type of stuff.
Of course, social connections and making sure that your audience is sharing so that the audience connected to them, even if they’re not your direct customers, this is why social media strategy is so much about not just reaching people who might buy from you, but all the people who might influence them. Remember that social connections will be influenced in this way. Right now, Google+ is the most powerful way and most direct way to do this, but certainly there are others as well as the now removed social connections page, helped show us.
What about some indirect impacts? There are actually a few of these that are worth mentioning as well. One of those indirect impacts that I think is very important is that you can see re-ranking of results, not just based on your usage, but this can happen or may happen, not for certain, but may happen based on patterns that the engines detect. If they’re seeing that a large number of people are suddenly switching away from searching ruby the gemstone to Ruby the language, they might bias this by saying, “You know what, by default, we’re going to show more results or more results higher up about Ruby the programming language.”
If they’re seeing, boy a lot of people in a lot of geographies, not just Seattle, when they perform a Coffee Works search, are actually looking for Seattle Coffee Works, because that brand has built itself up so strongly, you know what, we’re going to start showing the Seattle Coffee Works location over the other ones because of the pattern matching that we’re seeing. That pattern matching can be a very powerful thing, which is another great reason to build a great brand, have a lot of users, and get a lot of people around your product, your services, and your company.
Social shares, particularly what we’ve heard from the search engines, Bing’s been a little more transparent about this than Google has, but what Bing has basically said is that with social shares, the trustworthiness, the quality, and the quantity of those shares may impact the rankings, too. This is not just on an individual basis. So they’re not just saying, “Oh well, Danny Sullivan shared this thing with Rand, and so now we’re going to show it to Rand.” They’re saying, “Boy, lots of people shared this particular result around this topic. Maybe we should be ranking that higher even though it doesn’t have the classic signals.” Those might be things like keywords, links, and all the other things, anchor text and other things that they’re using the ranking algorithm. They might say, “Hey the social shares are such a powerful element here, and we’re seeing so much of a pattern around this, that we’re going to start re-ranking results based on that.” Another great reason to get involved in social, even if you’re just doing SEO.
Auto-suggest can be your friend. It can also be your enemy. But when you do a search today, Elijah and I just tried this, and do a search for Whiteboard space, they will fill in some links for you – paint, online, information. Then I did the same search on my phone, and what do you think? Whiteboard Friday was the second or third result there, meaning, they’ve seen that I’ve done searches around SEOmoz before and around SEO in general. So they’re thinking, “Aha. You, Rand, you’re a person who probably is interested in Whiteboard Friday, even though you haven’t done that search before on this particular phone.” I got a new phone recently.
That usage data and personalization is affecting how auto-suggest is suggesting or search suggest is working. Auto-suggest, by the way, is also location aware and location biased. For example, if you were to perform this search, whiteboard space, in Seattle, you probably would have a higher likelihood of getting Friday than in, let’s say, Hong Kong, where Whiteboard Friday is not as popular generally. I know we have Hong Kong fans, and I appreciate you guys, of course. But those types of search suggests are based on the searches that are performed in a local region, and to the degree that Google or Bing can do it, they will bias those based on that, so you should be aware.
For example, if lots and lots of people in a particular location, and I have done this at conferences, it’s actually really fun to ask the audience, “Hey, would everyone please perform this particular search,” and then you look the next day, and that’s the suggested search even though it hadn’t been performed previously. They’re looking at, “Oh, this is trending in this particular region.” This was a conference in Portland, Oregon, where I tried this, a blogging conference, and it was really fun to see the next day that those results were popping up in that fashion.
Search queries. The search queries that you perform, but not just the ones the you perform, but the search queries as a whole, kind of in an indirect, amalgamated, pattern matching way, may also be used to form those topic models and co-occurrences or brand associations that we’ve discussed before, which can have an impact on how search results work and how SEO works. Meaning that, if lots of people start connecting up the phrase SEOmoz with SEO or SEOmoz with inbound marketing, or those kinds of things, it’s very likely or you might well see that Google is actually ranking pages on that domain, on SEOmoz’s domain, higher for those keywords because they’ve built an association.
Search queries, along with content, are one of the big ways that they put those topics together and try to figure out, “Oh yeah, look, it seems like people have a strong association with GE and washer/dryers, or with Leica and cameras or with the Gap and clothing.” Therefore, when people perform those types of searches, we might want to surface those brands more frequently. You can see this in particular when you perform a lot of ecommerce-related searches and particular brands come up. If you do a search for outdoor clothing and things like Columbia Sportswear and REI and those types of brands are popping up as a suggestion, you get a strong sense of the types of connections that Google might build based on these things.
All right, everyone. I hope you’ve enjoyed this edition of Whiteboard Friday. I hope you have lots of great comments, and I would love to jump in there with you and suggestions on how you people can dig deeper. We will see you again next week.”

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Announcing the Just-Discovered Links Report

Posted by The_Tela

Hey everyone, I’m Tela. I head up data planning at SEOmoz, working on our indexes, our Mozscape API, and other really fun technical and data-focused products. This is actually my first post on the blog, and I get to announce a brand new feature – fun!

One of the challenges inbound marketers face is knowing when a new link has surfaced. Today, we’re thrilled to announce a new feature in Open Site Explorer that helps you discover new links within an hour of them going up on the web: the Just-Discovered Links report.

This report helps you capitalize on links while they’re still fresh, see how your content is resonating through social channels, gauge overall sentiment of the links being shared, give you a head start on instant outreach campaigns, and scope out which links your competitors are getting. Just-Discovered Links is in beta, and you can find it in Open Site Explorer as a new tab on the right. Ready to learn more? Let’s go!

What is the Just-Discovered Links report?

This report is driven by a new SEOmoz index that is independent from the Mozscape index, and is populated with URLs that are shared on Twitter. This means that if you would like to have a URL included in the index, just tweet it through any Twitter account.

One note: The cralwers respect robots.txt and politeness rules, which would prevent such URLs from being indexed. Also, we won’t index URLs that return a 500 status code.

search results

Who is it for?

Our toolsets and data sources are expanding to support a wider set of inbound marketing activities, but we designed Just-Discovered Links with link builders in mind.

Getting started

You can search Just-Discovered Links through the main search box on Open Site Explorer. Enter a domain, subdomain, or specific URL just as you would when using the Inbound Links report. Then select the Just-Discovered Links beta tab. The report gives PRO members up to 10,000 links with anchor text and the destination URL, as well as Domain Authority and Page Authority metrics.

One important note on Page Authority: we will generally not have a Page Authority score available for new URLs, and will show [No data] in this case. So, when you see [No data], it generally indicates a link on a new page.

You can also filter the results using many of the same filter drop-downs you are used to using in other reports in Open Site Explorer. These include followed and no-followed links, and 301s; as well as internal or external links, and links to specific pages or subdomains. Note: We recommend you start searches using the default “pages on this root domain” query, and refine your search from there.

How does it work?

When a link is tweeted, we crawl that URL within minutes. We also crawl all of the links on the page that have been tweeted. These URLs, their anchor text, and their meta data (such as nofollow, redirect, and more) are stored and indexed. It may take up to an hour for links to be retrieved, crawled, and indexed.

We were able to build this feature rapidly by reusing much of the technology stack from Fresh Web Explorer. The indexes and implementation are a little different, but the underlying technology is the same. Dan Lecocq, the lead engineer on both projects, recently wrote an excellent post explaining the crawling and indexing infrastructure we use for Fresh Web explorer.

There are a few notable differences: we don’t use a crawl scheduler because we just index tweeted URLs as they come in. That’s how we are able to include URLs quickly. Also, unlike Fresh Web Explorer, the Just-Discovered Links report is focused exclusively on anchor text and URLs, so we don’t do any de-chroming as that would mean excluding some links that could be valuable.

How is it different?

Freshness

Freshness of data continues to be a top priority when we design new products. We have traditionally released indexes on the timeframe of weeks. With this report, we have a new link index that is updated in about an hour. From weeks to an hour – wow! We’ll be providing additional details in the future on what this means.

URL coverage

This index includes valuable links that may be high-quality and topically relevant to your site or specific URL but are new, and thus have a low Page Authority score. This means they may not be included in the Mozscape index until they have been established and earned their own links. With this new index, we expect to uncover high-quality links significantly faster than they would appear in Mozscape.

I want to clarify that we are not injecting URLs from the Just-Discovered Links report into our Mozscape index. We will be able to do this in the future, but we want to gather customer feedback and understand usage before connecting these two indexes. So for now, the indexes are completely separate.

How big is the index?

We have seeded the index and are adding new URLs as they are shared, but don’t yet have a full 30 days worth of data in the index. We are projecting that the index will include between 250 million and 300 million URLs when full. We keep adding data, and will be at full capacity in the next week. 

How long will URLs stay in the index?

We are keeping URLs in the index for 30 days. After that, URLs will fall out of the index and not appear in the Just-Discovered Links report. However, you can tweet the URL and it will be included again.

How long does it take to index a URL?

We are able to crawl and include URLs in the live index within an hour of being shared on Twitter. You may see URLs appear in the report more quickly, but generally you can expect it to take about an hour.

Why did you choose Twitter as a data source?

About 10% of tweets include URLs, and many Twitter users share links as a primary activity. However, we would like to include other data sources that are of value. I’d love to hear from folks in the comments below on data sources they would like to see us consider for inclusion in this report.

How much data can I get?

The Just-Discovered Links report has the same usage limits as the Inbound Links report in Open Site Explorer. PRO customers can retrieve 10,000 results per day, community members can get 20 results, and guests can see the first five results.

What is “UTC” in the Date Crawled column?

We report time in UTC, or Coordinated Universal Time format. This time format will be familiar for our European customers, but might not be as familiar for customers in the states. The time zones for UTC are ahead of Eastern Standard Time, so US customers will see links where the time-stamp appears to be in the future, but this is really just a time zone issue. We can discover links quickly, but can’t predict links before they happen. Yet, anyways 🙂

CSV export

You can export a CSV with the results from your Just-Discovered Links report search. The CSV export will be limited to 5,000 links for now. We plan to increase this to 10,000 rows of data in the near future. We need to re-tool some of Open Site Explorer’s data storage infrastructure before we can offer a larger exports, and don’t have an exact ETA for this addition quite yet.

export search results

This is a beta release

We wanted to roll this out quickly so we can gather feedback from our customers on how they use this data, and on overall features. We have a survey where you can make suggestions for improving the feature and leave feedback. However, please keep in mind the fact that this is a beta when deciding how to use this data as part of your workflow. We may make changes based on feedback we get that result in changes to the reports.

Top four ways to use Just-Discovered Links

Quick outreach is critical for link building. The Just-Discovered Links report helps you find link opportunities within a short time of being shared, increasing the likelihood that you’ll be able to earn short-term link-building wins and build a relationship with long-term value. Here are four ways to use the recency of these links to help your SEO efforts:

  1. Link building: Download the CSV and sort based on anchor text to focus on keywords you are interested in. Are there any no-followed links you could get switched to followed? Sort by Domain Authority for new links to prioritize your efforts.
  2. Competitor research: See links to your competitor as they stream-in. Filter out internal links to understand their link building strategy. See where they are getting followed links and no-followed links. You can also identify low-quality link sources that you may want to avoid. Filter by internal links for your competitors to identify issues with their information architecture. Are lots of their shared links 301s? Are they no-following internal links on a regular basis?
  3. Your broken links: The CSV export shows the http status code for links. Use this to find 404 links to your site and reach-out to get the links changed to a working URL.
  4. Competitor broken links: Find broken links going to your competitors’ sites. Reach out and have them link to your site instead.

what you can do with Just-Discovered Links

Ready to find some links?

We’ve been releasing new versions of our Mozscape index about every two weeks. An index that is continuously updated within an hour is new for us, too, and we’re still learning how this can make a positive impact on your workflow. Just as with the release of Fresh Web Explorer, we would love to get feedback from you on how you use this report, as well as any issues that you uncover so we can address them quickly.

The report is live and ready to use now. Head on over to Open Site Explorer’s new Just-Discovered Links tab and get started!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →