Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 09-17-2002, 11:10 PM   #1
DrewKole
Confirmed User
 
Join Date: Aug 2001
Posts: 5,193
Having google NOT spider you? How?

Basically, I've got a ton of mirrors of 1) TGP galleries and 2) AVS sites, same site mirrored to different avs's...

Does anyone know a definite way of having google not spider or index a page?

Id like to have all but 1 of my mirrors not-spidered, since google apparently doesnt like mirrors on the same domain, it considers them spam, and who the hell knows where the googlebot might find one of my mirrors.

Any hints on how to get away with the mirroring of pages, without using multiple domains for the same fuckin site? =)
DrewKole is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 09-17-2002, 11:15 PM   #2
BJ
Confirmed User
 
BJ's Avatar
 
Join Date: Mar 2002
Location: asia
Posts: 5,590
How should I request that Google not crawl part or all of my site?

The standard for robot exclusion given at http://www.robotstxt.org/wc/norobots.html provides for a file called robots.txt that you can put on your server to exclude Googlebot and other web crawlers. (Googlebot has a user-agent of "Googlebot".)

Googlebot also understands some extensions to the robots.txt standard. Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate the end of a name. For example, to prevent Googlebot from crawling files that end in .gif, you may use the following robots.txt entry:

User-Agent: Googlebot
Disallow: /*.gif$
Please note that Googlebot does not interpret a 401/403 response ("Unauthorized"/"Forbidden") to a robots-txt fetch as a request not to crawl any pages on the site. To prevent Googlebot and other web crawlers from crawling any page on your site, you may use the following robots.txt entry:

User-Agent: *
Disallow: /
Please note also that each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, if you wanted to allow all filetypes to be served via http but only .html pages to be served via https, the robots.txt file for the http protocol (http://yourserver.com/robots.txt) would be:

User-Agent: *
Allow: /
The robots.txt file for the https protocol (https://yourserver.com/robots.txt) would be:

User-Agent: *
Disallow: /
Allow: /*.html$
Another standard which is more convenient for page-by-page use involves adding a <META> tag to an HTML page to tell robots not to index the page or not to follow the links it contains. This standard is described at http://www.robotstxt.org/wc/exclusion.html. You may also want to read what the HTML standard has to say about these tags. Remember that changing your server's robots.txt file or changing the <META> tags on its pages will not cause an immediate change in the results that Google returns, since your changes must propagate to Google's next index of the web before being reflected in Google search results.
BJ is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 09-17-2002, 11:31 PM   #3
XM
Confirmed User
 
Join Date: Jan 2001
Location: SVK
Posts: 406
or use directly meta tags in particular page:
Don't index, but follow links:
META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW"

Don't index, don't follow links either:
META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW"

don't forget to close it into brackets, this f**king VB code doesn't allow me to post them...

XM

Last edited by XM; 09-17-2002 at 11:36 PM..
XM is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-21-2003, 04:33 AM   #4
Pornwolf
Drunk and Unruly
 
Pornwolf's Avatar
 
Join Date: Jan 2002
Location: Hollywood
Posts: 22,712
Meta Tags 101. Take notes guys.
__________________
I've trusted my sites to them for over a decade...

Webair, bitches.
Pornwolf is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-21-2003, 05:18 AM   #5
Jer
God is Brazilian
 
Join Date: Feb 2001
Location: Brazil
Posts: 10,601
Jer is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.