Q & A thread: March 27, 2006

by Matt Cutts on March 28, 2006

in Google/SEO

Okay, let’s try tackling a few questions from the Grab bag thread. Just a hint for next time: if your question takes three paragraphs to ask, your odds of getting an answer go down. spacer

Q: “Is Bigdaddy fully deployed?”
A: Yes, I believe every data center now has the Bigdaddy upgrade in software infrastructure, as of this weekend.

Q: “What’s the story on the Mozilla Googlebot? Is that what Bigdaddy sends out?”
A: Yes, I believe so. You will probably see less crawling by the older Googlebot, which has a User-Agent of “Googlebot/2.1 (+www.google.com/bot.html)”. I believe crawling from the Bigdaddy infrastructure has a new User-Agent, which is “Mozilla/5.0 (compatible; Googlebot/2.1; +www.google.com/bot.html)”

Q: “Do you take Emmy with you to San Francisco?”
A: Nope, Emmy is a true indoors cat; she doesn’t like to travel.

Q: “Any new word on sites that were showing more supplemental results?”
A: An additional crawling change to show more sites from those sites was checked in late last week, but it may still take a little bit of time (another few days) for that to show up in the index. I’ll keep an eye on sites that people have given as examples to see how those sites are showing up.

Q: “Is the RK parameter turned off, or should we expect to see it again?”
A: I wouldn’t expect to see the RK parameter have a non-zero value again.

Q: “What’s an RK parameter?”
A: It’s a parameter that you could see in a Google toolbar query. Some people outside of Google had speculated that it was live PageRank, that PageRank differed between Bigdaddy and the older infrastructure, etc.

Q: “Now that Bigdaddy is out, will there be a new export of PageRank anytime soon?” and “Will the deployment of BigDaddy stabilise the rolling PR issues we are experiencing at present?”
A: I’ll ask around about that. If there aren’t any logistical obstacles, I’ll ask if we could make a new set of PageRanks visible within the next couple weeks. I’d expect that as Bigdaddy stabilizes everywhere, the variation in toolbar PR for individual urls is more like to settle down too.

Q: “This datacentre 64.233.185.104/ works differently to all of the others. Noticed just a few hours ago. . . . . Where does that DC fit into the scheme of things? Is it mainly made from newly spidered data?”
A: Sharp eyes, g1smd. That wouldn’t surprise me. As Bigdaddy cools down, that frees us up to do new/other things.

Q: “Not so much a question… GET A PSP!”
A: I got one today, TallTroll. I picked up Me and My Katamari (MAMK) and a PSP that turned out to have firmware v1.52 on it. So I could upgrade to 2.0, then downgrade to 1.5 so I could run homebrew programs. But I think MAMK requires firmware 2.5 or 2.6 to play, which means a one-way upgrade or maybe using RunUMD or a similar program. Suffice it to say I’m having fun just geeking around. spacer

Q: “Can you give us a general way of getting a good idea in front of Google?”
A: If it’s bizdev, there’s a bizdev dept. at Google you could contact. If it’s not a business/patent/proprietary idea, I’d mention it here or blog about it somewhere. Writing a snail mail letter could work well too.

Q: “Did you check out the guys all painted in silver doing the robot on milk crates in San Fran?”
A: Nope, that’s down by Fisherman’s Wharf. We’re hanging near Union Square.

Q: “Why do you focus your attention so much on SEOs and not at webmasters who make actual quality websites?”
A: I think that’s an issue I have personally, because I spend so much of my time looking at spam. Lots of other people focus on helping general webmasters, like the Sitemaps team, for example. I have started to do “SEO Advice” posts instead of just “SEO Mistakes” posts, but you’re right: I personally could use a reminder to keep focusing on the sites that make quality content and how to pull those sites up, not just how to counter sites that cheat. Thanks for bringing that up.

Q: “My sitemap has about 1350 urls in it. . . . . its been around for 2+ years, but I cannot seem to get all the pages indexed. Am I missing something here?”
A: One of the classic crawling strategies that Google has used is the amount of PageRank on your pages. So just because your site has been around for a couple years (or that you submit a sitemap), that doesn’t mean that we’ll automatically crawl every page on your site. In general, getting good quality links would probably help us know to crawl your site more deeply. You might also want to look at the remaining unindexed urls; do they have a ton of parameters (we typically prefer urls with 1-2 parameters)? Is there a robots.txt? Is it possible to reach the unindexed urls easily by following static text links (no Flash, JavaScript, AJAX, cookies, frames, etc. in the way)? That’s what I would recommend looking at.

Q: “When I change a robots.txt to exclude more existing files from being crawled, how long does it take for them to be removed from the index? Perhaps the answer is a function of how often the site is crawled and it’s PR?”
A: It is a function of how often the site is crawled. I believe in the past that every several hundred page fetches or several days, the bot would re-check the robots.txt. Note that for supplemental results, you need recrawling to happen by the supplemental Googlebot in order for the robots.txt file to take affect on those pages. If you’re really sure you never want those pages to be seen, you can use our url removal tool to remove urls for six months at a time. But I’d be very careful with the url removal tool unless you’re an expert. If you make a mistake and (for example) remove your entire site, that’s your responsibility. Google can sometimes clear out self-removals, but we don’t guarantee it.

Q: “I would love to be able to search for html code and see how that ranks.”
A: I would like that too. Indexing non-visible things like punctuation, JavaScript, and HTML would be great, but it would also bulk up the size of the index. Any time you’re considering a new feature (e.g. our numrange search), you have to trade off how much the index would get bigger versus the utility of the feature. My guess is that we wouldn’t offer this any time soon.

Q: “Seriously, How do you plan on picking which of these questions to answer?”
A: I’m tackling the ones that looked interesting, short, and general enough that more than one person would be interested.

Q: “I am seeing a lot of sites with “%09″ (tab) and “%20″ (space) in front of the URL in Googles index.”
A: I’ll ask someone about that.

Q: (paraphrasing) The sitemaps validation fetch seems to happen with a User-Agent of “-”? My auto-reject rules reject that user agent.
A: I’ll ask someone about that. You could whitelist the IP range that Googlebot comes from in the mean time.

Q: “If one were to offer to sell space on their site (or consider purchasing it on another), would it be a good idea to offer to add a NOFOLLOW tag so to generate the traffic from the advertisement, but not have the appearence of artificial PR manipulation through purchasing of links?”
A: Yes, if you sell links, you should mark them with the nofollow tag. Not doing so can affect your reputation in Google.

Q: “On sites directed to international audiences with the same (high quality) content in several languages is it better to do several TLDs like mydomain.com, mydomain.de, mydomain.fr, mydomain.eu and so on or do subdomains like en.mydomain.eu, de.mydomain.eu, fr.mydomain.eu or something else like mydomain.com/en, mydomain.com/de, mydomain.com/fr?”
A: Good question. If you’ve only got a small number of pages, I might start out with subdomains, e.g. de.mydomain.eu or de.mydomain.com. Once you develop a substantial presence or number of pages in each language, that’s where it often makes sense to start developing separate domains.

Q: “Any results on why IDN Domains don’t show pagerank?”
A: I’ve seen a couple that do, but I’ll check into why most don’t. My guess is that there’s a normalization issue somewhere in the toolbar PageRank pathway.

Q: “Would it be possible to add a date range to queries? I might get 91,000,000 results, but the first 200 are 2-3 years old. I would like to limit results to items no more than 6-12 months old.”
A: Check out our advanced search page for this option. Tara Calashain also did some really interesting digging into this too, e.g. this info she uncovered. Google Hacks is a pretty solid book if you’d like to read more fun Google hacks.

Q: “What about the problem of directories and shopping comparison spam overriding real pages?”
A: Fair feedback. I heard that recently from a Googler, too. Sometimes we think of spam as strictly things like hidden text, cloaking, etc. But users think of spam as noise: things that they don’t want. If they’re trying to get information, fix a problem, read reviews, etc., then sites that like aren’t as helpful.

Q: “Are you planning to visit/speak in the UK at all in the near future?”
A: Sadly not. I’m hitting the Boston Pubcon and SES San Jose, but I can only do 4-5 conferences a year.

Q: “The one thing that seems to be getting to people generally, is what are the post Big Daddy intentions? Fixes, spam issues, regeneration of ‘pure’ indices, supp. issues, PR and BL update, etc.”
A: I can’t give a timeline (e.g. “scaling up communication in April, more work on canonicalization in May”) because priorities can change, esp. depending on machine issues, deployments of new binaries, webspam developments, etc. Short-term, I wouldn’t be surprised to see some refreshing in supplemental results relatively soon, and potentially different PageRanks visible in the next couple weeks.

Q: “Even Matt is afraid to use a redirect from www.mattcutts.com/ to www.mattcutts.com/blog/ because Google might penalize his website and put it into supplemental hell.”
A: Heh. No, that’s not it. I’m deliberately leaving them separate as a test case to see how we do now and down the road.

Q: “Just like you told me a couple of months ago, the Supplemental Googlebot (SG) got around to my site and things got sorted out. Thanks. . . . . If you are in San Fran and want to check out the Monterey Aquarium, could you please write a short review? I’ve been thinking of visiting and wondering if it is worth the trip.”
A: I would definitely recommend the Monterey Bay Aquarium, especially if you can find a coupon or other good deal. I highly recommend the otters, the kelp forest, and the jellyfish area.

{ 110 comments… read them below or add one }

spacer Stephen March 29, 2006 at 12:35 am

Hey Matt,

That is some pretty impressive posting spacer

I have noted that a couple of sites that I believe had canonical probs have come back – but only sites that have been sent to your engineers.

Not sure if this is a conincidence or that a correction is starting to roll out. If it is a correction then cool spacer – will it hit some sites before others – depening on crawl cycle etc ? – If it is a engineers intervention then when would you want reports of these ?

Cheers.

spacer Stephen March 29, 2006 at 12:51 am

Oops – just to clarify what I would call a correction for these sites.

EG: Site:domain.com – domain.com is first.

domain.com as a phrase – domain.com is first

etc – eg the Homepage returns to its true value – rest of the site seems to follow spacer

spacer OWG March 29, 2006 at 1:03 am

Matt, some great answers there, thanks.

This will help put to bed some of the crap that floats around about the Google mystique LOL.

I know that the Supplemental hell and the Lack of deep crawling are especially important to some people spacer

spacer TallTroll March 29, 2006 at 1:15 am

I’m pretty sure that MAMK only requires firmware 2.0 to run, so you should be able to back and forth as required. You need 2.0 for the browser though – depends how much surfing you want to do. AFAIK, the only game that requires 2.5 is EXIT, so you should be able to wait until a downgrade form the 2+ f/wares is available before going there.

I find that Soulseek, a USB cable and a PSP is a memory hungry combo though…. need to get a 2Gb card soon spacer

spacer McMohan March 29, 2006 at 1:22 am

Matt, that was a fair amount of time spent on writing answers this night. Thanks.
Apart from addressing supps, canonicals, pagerank re-calculation etc, will there be an imminent change in ranks as a result of these corrections?

spacer jake March 29, 2006 at 1:34 am

Hi Matt,

As part of your review of the supplemental problem, are you also monitoring any sites whose pages have simply vanished (rather than gone supplemental)? I think the BD bug is responsible for both types of errant behaviour – sometimes it just refuses to index tens of thousands of pages, despite crawling them over and over again. That’s what we see anyway. None of the supplemental tweaks have yet made any difference to the missing pages problem.

spacer Henry Elliss March 29, 2006 at 1:43 am

Well well, you can answer questions about Google and SEO very well, but you didn’t answer my “why are there no blue foods in nature” question?! I shalln’t be picking you as my phone-a-friend on Millionaire any time soon Mr Cutts… well, unless they start asking SERP questions in the next few shows! )

P.S. Saw a mobile dog-grooming van drive past our office the other day, called “Mutt Cutts” – I had a little chuckle.

spacer Paul Reilly March 29, 2006 at 1:55 am

Cheers for all these answers..

I do have one question though, with so many different sources of Pagerank, Live Pagerank, future Pagerank etc. What would you suggest we use to see an accurate measurement?

spacer Asle Ommundsen March 29, 2006 at 2:00 am

Please answer this:

From: www.mattcutts.com/blog/miscellaneous-monday-march-27-2006/#comment-19408

«For accessibility purposes, my site has ‘skip navigation’ etc… to allow screen readers to get straight to the content. [..] so I have ‘hidden’ these accessibility links using display:none in the stylesheet. [..] Will Google regard this as hidden text and penalise my site?»

spacer Jeremy March 29, 2006 at 2:19 am

On TLDs and international audiences: When a site is in one language how should it be expressed to Google that it is for a global audience?

For example restaurant reviews and shopping could be seen as local and localised respectively; but product reviews (where the product is available globally), encyclopaedia entries and reference material are more for a global audience.

There are suggestions the site be duplicated at the various TLDs e.g. .com, .co.uk, .ca, .au, etc. But this wastes bandwidth for the site and the google bots, encourages link splitting and can confuse the users.

The geo of the IP doesn’t always work as for example 1and1.co.uk gives out German IP addresses, and many other websites use US hosting for cheaper costs.

Just wondering for a clarification on how this issue should be tackled as the various Google SERPs are becoming more and more local even if the user is not requesting pages only from their contry (google.com vs. google.co.uk or even it seems google.com used from a US ip vs. google.com used from a UK ip).

P.S. Keep up the good work!

spacer Wayne March 29, 2006 at 2:53 am

Matt thank you for taking the time to answer all these questions. What you are doing here says a lot about your character and commentment to the webmaster community.

I didnt get to ask a question but let me try now. If I agree to buy you Starbucks every morning could you place my website at the top of the results spacer Since my new site isnt ranked yet, thats all I can afford is one cup per morning spacer

spacer Harith March 29, 2006 at 3:21 am

Thanks for your time, Matt.

Very generous of you. Much appreciated.

spacer HaHa March 29, 2006 at 3:57 am

Very disappointed no comments on expired domains.
Looks like we will continue to see domains such as
macalstr.edu/
astronomy-national-public-observatory.org/
rarestonemuseum.com
iasicongress2005.org
papyrusinternational.org/
and many others in the adult serps.

Seems like its all too hard for the webspam team and this reflects badly both on google and the adult internet industry.

spacer Maria March 29, 2006 at 4:03 am

So how long does it take for 301′s to take effect across all the DC’s? Even Y*hoo and M*N don’t seem to have a problem with it. spacer

spacer Eternal Optimist March 29, 2006 at 4:36 am

Matt, Firstly many thanks for both your time and efforts. I appreciate that you cannot be specific on certain points, due to the nature of privacy at Google.

Is it within your power to explain exactly what the following GoogleBots do? [You already answered 5.0 above ] – thanks spacer

crawl-66-249-65-225.googlebot.com
Mozilla/4.0 compatible ZyBorg/1.0
Mozilla/4.0 compatible ZyBorg/1.0 Dead Link Checker
Mozilla/5.0
Googlebot/2.1

spacer Chris Bartow March 29, 2006 at 5:11 am

Thanks for answering these questions! Great information.

The URL Removal Tool has been broken for weeks. For example I’ve tried to remove directory.sysice.com from the index cause I took it down a few months ago, but I just get a Page Not Found when I try to submit it.

spacer 301 Redirect Problem March 29, 2006 at 5:12 am

The biggest problem that I’ve seen many worry about here and that google is way behind in addressing is 301 redirects with domain moves from domain1 to domain2 and Matt seems to forever be ignoring this question .. Even though it was asked about more than 3 to 4 times in the list of questions here and in many other comment posts by viewers Matt and google continue to ignore it or give vague answers about how or when google plans to address this..

Matt can you please once and for all address the question and webmasters concerns of how and when we can expect to see googles / bigdaddy properly handle domain name moves using 301 redirects?

spacer Andrew March 29, 2006 at 5:23 am

One comment that you may not publish but I hope will read… WHAT is going on at blogger? It is google’s worst product by a country mile. Regularly unreliable and I can’t recall a single new feature that has been added since you brought it on board. It is dreadful and if I hadn’t been unfortunate to *start* using it I wouldn’t still be using it. I try and warn everyone away and it makes me sad spacer

spacer Olney March 29, 2006 at 5:43 am

Thank you Matt for taking the time to answer questions or even to look into the IDN Domain issue with the pagerank. These domains will truly advance the international internet experience.

spacer ClickyB March 29, 2006 at 5:50 am

Hi Matt,

Great effort answering so many questions, thank you.

One thing I’m still curious about (so are many others):
[blockquote]A: Yes, if you sell links, you should mark them with the nofollow tag. Not doing so can affect your reputation in Google.[/blockquote]
Does this include linked images?

spacer ClickyB March 29, 2006 at 5:53 am

Damn…. if there are 2 choices I always make the wrong one – lol – sorry about the

blown tags

spacer

spacer Kestrel March 29, 2006 at 5:59 am

Hi,

If BD is out now then how comes SERP’s are showing pages that haven’t existed for 9 months plus and return 404′s?

Cheers,

K

spacer Ronald R March 29, 2006 at 6:04 am

Good job in answering so many questions, and I know you can’t answer every single one. But, it’s a shame you didn’t answer one of the most popular questions, about the loss of pages. Did you not want to answer it, or did you just miss it?

Thanks

spacer Ulysee March 29, 2006 at 6:09 am

No answer……………
It has been three months since spam has taken over the majority of adult search results in Google.

It’s strange to see “somewhat” relevant results one day Dec 26th then Dec 27th just about the whole adult white hat community was wiped out, filter maybe?.

I believe that the adult serp problem is bigger than the supplementals – I just hope that it’s not being ignored.

What I am saying here applies to the entire adult industry in Google not just my little ole site.

spacer Mike (Germany) March 29, 2006 at 6:18 am

========
Q: “Now that Bigdaddy is out, will there be a new export of PageRank anytime soon?” and “Will the deployment of BigDaddy stabilise the rolling PR issues we are experiencing at present?”
A: I’ll ask around about that. If there aren’t any logistical obstacles, I’ll ask if we could make a new set of PageRanks visible within the next couple weeks. I’d expect that as Bigdaddy stabilizes everywhere, the variation in toolbar PR for individual urls is more like to settle down too.
========

Hi Matt,

I think, it would better, the PageRank is not visible in the toolbar.

spacer Your fan March 29, 2006 at 6:36 am

Hi can you post some photos of Emmy? We are cat lover.

spacer SEO Swede March 29, 2006 at 6:37 am

I have reported several sites that use different spamming techniques. But nothing happens. For exampel look at this site www.kickoff-konferens.se/rw/ and go to the bottom of the site. They mention Mirror1, mirror2, mirror3 and mirror4. Why don´t Google exclude them? It feels like its ok to spam i Sweden and get top positions..

// Not so fun being a white hat SEO in Sweden.

spacer Ryan March 29, 2006 at 6:41 am

and this reflects badly both on google and the adult internet industry.

Ahh.. that’s why so many people say bad things about porn… expired domains. Here I was thinking it was some sort of morals issue.

Mike, I agree with that.. Take visible pagerank out of everything. People put way more faith and dependance in it than they should, and it’s still easy to fake.

Give some site a PR higher than 4 and they instantly think they’re worth millions and have hit the big time.

spacer JohnMu March 29, 2006 at 7:25 am

>You could whitelist the IP range that Googlebot comes from in the mean time.
Do you have a listing of all the Googlebot IP addresses?
Thanks.

spacer Victor March 29, 2006 at 7:36 am

Q: “What about the problem of directories and shopping comparison spam overriding real pages?”

A: Fair feedback. I heard that recently from a Googler, too. Sometimes we think of spam as strictly things like hidden text, cloaking, etc. But users think of spam as noise: things that they don’t want. If they’re trying to get information, fix a problem, read reviews, etc., then sites that like aren’t as helpful.

To balance that feedback: We maintain a niche B2B directory and customer feedback and high listing CTR seems to indicate that a large number of visitors are indeed looking to “buy” products when they type in a product keyword and the directory is indeed relevant.

Google has to make an educated/algorithmic guess about the searchers intent (Information or Purchase). If an action keyword complimenting the product keyword is not specified in a search, the type of product itself can be used to yield a decent intent relevancy.

SERPs should not be flooded with directories, but there is always bound to be more -ve feedback on directories, since there are a lot more individual site webmasters than there are directories!

spacer Rob L March 29, 2006 at 7:47 am

I have a question about one of your answers.

Yo

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.