View Single Post
Old 02-01-2004, 06:31 AM  
DVTimes
xxx
 
Industry Role:
Join Date: Jun 2003
Location: UK
Posts: 31,547
Google introduced the language limit in April 2000 with eleven languages which was expanded as of Aug. 2000 to 24. As of July 2001, Russian was added. In Nov. 2001, Arabic and Turkish and then in early 2002 Catalan, Croatian, Indonesian, Serbian, Slovak, and Slovenian joined the group for the following 34 language limit options. These are available on the Advanced Search page and their Language Tools page.

Arabic
Bulgarian
Catalan
Chinese (Simplified & Traditional)
Croatian
Czech
Danish
Dutch
English
Estonian
Finnish
French
German
Greek
Hebrew
Hungarian
Icelandic
Indonesian
Italian
Japanese
Korean
Latvian
Lithuanian
Norwegian
Polish
Portuguese
Romanian
Russian
Serbian
Slovak
Slovenian
Spanish
Swedish
Turkish
To choose more than one at a time use the preferences page, which also offers a choice for which of 14 languages the surrounding text will be displayed in.

In May 2000, a family filter was added which tries to exclude adult Web pages. Turn it on from the preferences page.

The Advanced Search offers a domain limit, which can be used to limit results to those from the specified domain or it can be used to exclude results from a specified domain.

Stop Words:
Google does ignore frequent words. Its documentation mentions terms such as 'the', 'of', 'and', and 'or'. However, it also notes that these can be searched by putting + in front of them. As of March 2000, 'the' was a stop word that could not be searched even with the + sign. But by 2002, 'the' could be searched with the plus. Be sure to only place the + in front of stop words. If a + is placed in front of a non-stop word in the same query, all + signs will be ignored. As of Nov. 2001, stop words within a phrase no longer require a + sign and will automatically be searched. Also, if only stop words are entered even without phrase markings, they will be searched.

Sorting:
Results are sorted by relevance which is determined by Google's PageRank analysis, determined by links from other pages with a greater weight given to authoritative sites. Pages are also clustered by site. Only two pages per site will be displayed, with the second indented. Others are available via the [ More results from . . . ] link. If the search finds less than 1,000 results when clustered with two pages per site and if you page forward to the last page, after the last record the following message will appear:


In order to show you the most relevant results, we have omitted some entries very similar to the 63 already displayed. If you like, you can repeat the search with the omitted results included.
Clicking the "repeat the search" option will bring up more pages, some of which are near or exact duplicates of pages already found while others are pages that were clusted under a site listing. However, clicking on that link will not necessarily retrieve all results that have been clustered under a site. You can also just add &filter=0 to the end of a search results URL. To see all results available on Google, you need to check under each site cluster as well as using the "repeat this search" option.

Display:
The display includes the title, URL, a brief extract showing text near the search terms, the file size, and for many hits, a link to a cached copy of the page. This cached copy is from Google's index and may be older than the version currently available on the Web. The cached copy will display highlighted search terms. If more than one search term is used, each has a different color highlighting. The default output is 10 hits per screen, but the searcher can also choose 20, 30, 50, or 100 hits at a time on the preferences page. In June 1999, numeric relevance scores and "phase match" or "partial phrase match" indicators were removed. In Sept. 1999, the graphic relevancy bar with its link to a link: search was removed. At the same time, a GoogleScout link was added. GoogleScout is now just labeled as "Similar pages" and find other pages similar in linkage patterns to the displayed hit. In April 2000, Google started clustering results by site. Formerly, hits from the same site would be listed indented under the first. As of April 2000, only the first two hits are displayed (with the second one indented) and the rest available under a
[ More results from hostname ]
link.

With the addition of non-HTML files in 2001, Google added two notes to the display to identify those files. Before the title in the first line of the display, [PDF] or [PS] or [XLS] is used to denote the different file format. On some, a second line of the display lists
File Format: PDF/Adobe Acrobat - View as Text.

Around Aug. 2001, Google started refreshing the indexing of certain pages (those with daily updates) more frequently than the rest of the database. These were marked with "Fresh!" after the URL and size. In Dec. 2001, this tag was changed to list the indexing date. As of Feb. 2002, 3 million pages were being refreshed on an almost daily basis.

Unique: Google was the first general search engine that provides access to pages at the time they were indexed, designated as "cached" pages. For an alternative sources for cached pages see the archives page. Google is also the only search engine that searches for some characters. As of Sept. 2003, it would search for the ampersand & and the underscore _ characters by themselves or as part of a character hahahahahahahaha In other words, a search on adv_search gets different results than "adv search" and &tc differs from tc. While it would not search # or + in most cases, it does differentiate c#, c++, c+, and c. It does not, however, differentiate c*, c+@, or c+-, interpreting c* as c and both c+- and c+@ as c+. (These c+ type strings are all various programming languages.) Other punctuation marks may change the sorting of results.
__________________
The Affiliate Program
DVTimes is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote