![]() |
![]() |
![]() |
||||
Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
![]() ![]() |
|
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. |
|
Thread Tools |
![]() |
#1 |
Confirmed User
Join Date: Aug 2001
Posts: 5,193
|
Having google NOT spider you? How?
Basically, I've got a ton of mirrors of 1) TGP galleries and 2) AVS sites, same site mirrored to different avs's...
Does anyone know a definite way of having google not spider or index a page? Id like to have all but 1 of my mirrors not-spidered, since google apparently doesnt like mirrors on the same domain, it considers them spam, and who the hell knows where the googlebot might find one of my mirrors. Any hints on how to get away with the mirroring of pages, without using multiple domains for the same fuckin site? =) |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#2 |
Confirmed User
Join Date: Mar 2002
Location: asia
Posts: 5,590
|
How should I request that Google not crawl part or all of my site?
The standard for robot exclusion given at http://www.robotstxt.org/wc/norobots.html provides for a file called robots.txt that you can put on your server to exclude Googlebot and other web crawlers. (Googlebot has a user-agent of "Googlebot".) Googlebot also understands some extensions to the robots.txt standard. Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate the end of a name. For example, to prevent Googlebot from crawling files that end in .gif, you may use the following robots.txt entry: User-Agent: Googlebot Disallow: /*.gif$ Please note that Googlebot does not interpret a 401/403 response ("Unauthorized"/"Forbidden") to a robots-txt fetch as a request not to crawl any pages on the site. To prevent Googlebot and other web crawlers from crawling any page on your site, you may use the following robots.txt entry: User-Agent: * Disallow: / Please note also that each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, if you wanted to allow all filetypes to be served via http but only .html pages to be served via https, the robots.txt file for the http protocol (http://yourserver.com/robots.txt) would be: User-Agent: * Allow: / The robots.txt file for the https protocol (https://yourserver.com/robots.txt) would be: User-Agent: * Disallow: / Allow: /*.html$ Another standard which is more convenient for page-by-page use involves adding a <META> tag to an HTML page to tell robots not to index the page or not to follow the links it contains. This standard is described at http://www.robotstxt.org/wc/exclusion.html. You may also want to read what the HTML standard has to say about these tags. Remember that changing your server's robots.txt file or changing the <META> tags on its pages will not cause an immediate change in the results that Google returns, since your changes must propagate to Google's next index of the web before being reflected in Google search results. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#3 |
Confirmed User
Join Date: Jan 2001
Location: SVK
Posts: 406
|
or use directly meta tags in particular page:
Don't index, but follow links: META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW" Don't index, don't follow links either: META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW" don't forget to close it into brackets, this f**king VB code doesn't allow me to post them... XM |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#4 |
Drunk and Unruly
Join Date: Jan 2002
Location: Hollywood
Posts: 22,712
|
Meta Tags 101. Take notes guys.
__________________
I've trusted my sites to them for over a decade... Webair, bitches. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#5 |
God is Brazilian
Join Date: Feb 2001
Location: Brazil
Posts: 10,601
|
![]() |
![]() |
![]() ![]() ![]() ![]() ![]() |