View Single Post
Old 06-10-2009, 06:51 PM  
FightThisPatent
Confirmed User
 
Join Date: Aug 2003
Location: Austin, TX
Posts: 4,090
Quote:
Originally Posted by Pleasurepays View Post

even more interesting is how it was determined that google does not/can not detect simple word replacement schemes from dictionary databases across what could potentially be 1000's of domains?
google does have a team of PhD, who look at how data is related to each other, fuzzy logic, relationships, etc.

so you can take words from a sentence and compute vectors to represent those phrases.. and so when searching for a phrasing (that also allows for some derivatives in the sentence structure), that the result can be quickly found, rather than having to do a slow and old-tech method of keyword/phrase searching through multi-terrabytes of data.

so i do believe that google does do this.... and given the fact that they are scanning/OCR literary text at libraries, it allows them to have more data.

by looking at the page rank weighting of websites, that includes the linking relationships, then overlaying the content to see duplicates, that they do decide which website is the authority or the originator or who is a more credible source for the text, in order to give that site the higher ranking. to do so, means you need to demote/penalize the duplicates to force them down on page rank that allows the true ones to bubble up.

google should be using this brain trust to then wipe out the SEO garbage websites, so that it leaves only relevant results.

bing seems to be doing this.

i know that M$ has people who are reviewing websites to categorize them as porn, mainstream, etc, etc....so to use that human data and combined with computing, to be able to filter out the garbage, thus making search results more relevant.

thats where the wikipedia founder is trying to create a human verified search engine (kinda like dmoz but without the ego trips).

i see the days of free traffic from google diminish and the SEO people get weeded out, all for the sake of improving quality.

also, google is a domain registrar, so they have access to domain registration data, so if you have loaded up alot of websites on the same IP, or using the same account, they can use that information as part of the cross-referencing.. which is why SEO gamers have multiple domain accounts at multiple domain registries, hosting on multiple webhosts to try and outsmart google.


Fight the hypothetical theorizing!
__________________

http://www.t3report.com
(where's the traffic?) v5.0 is out! |
http://www.FightThePatent.com
| ICQ 52741957
FightThisPatent is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote