Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar Mark Forums Read
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 07-19-2016, 07:54 AM   #1
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Node.XXX - Porn Search Engine

A little project I have been working on is https://node.xxx

It's a search engine that only indexes adult websites and aggressively deals with spam sites, preventing them from being indexed.

It will also only index canonical sites, so no white labels get into the search index.

At the moment it's a little slow to respond to queries but that will improve as new caching servers are deployed.

It's still got a way to go in development but it is live and the current index is around 100 million pages. It supports complex queries which are documented here.

The search engine is infinitely scalable and while it's currently crawling html, pdf, json, xml, rss, video and images it's only returning text based results right now.

Later this year once I've perfected image search that will be rolled out, with video search to follow around March 2017.

Have a look and let me know what you think at https://node.xxx
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 08:47 AM   #2
Jigster715
So Fucking Banned
 
Industry Role:
Join Date: Jul 2015
Location: elmer blackwood mansion
Posts: 1,459
How do we submit sites for indexing?
Jigster715 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 08:50 AM   #3
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by Jigster715 View Post
How do we submit sites for indexing?
To add a site you need to register for an account and then use the Add Site feature in the user dashboard.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 08:52 AM   #4
Barry-xlovecam
It's 42
 
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
Looks good -- when you get decent traffic hit us up about buying some ads
Barry-xlovecam is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 08:52 AM   #5
Jigster715
So Fucking Banned
 
Industry Role:
Join Date: Jul 2015
Location: elmer blackwood mansion
Posts: 1,459
Quote:
Originally Posted by AdultKing View Post
To add a site you need to register for an account and then use the Add Site feature in the user dashboard.
Ah, ok. I will do that. So far, it is loading sites that scrape our sites and not the real sites.
Jigster715 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 08:53 AM   #6
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by Jigster715 View Post
Ah, ok. I will do that. So far, it is loading sites that scrape our sites and not the real sites.
PM me the domains, that shouldn't be happening.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 08:54 AM   #7
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by Barry-xlovecam View Post
Looks good -- when you get decent traffic hit us up about buying some ads
Right now the main focus is on engineering. The goal is to get a search down to 1.2 seconds or less. At the moment it's a bit slow - but that will improve as things are refined.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 09:05 AM   #8
Klen
 
Klen's Avatar
 
Industry Role:
Join Date: Aug 2006
Location: Little Vienna
Posts: 32,234
So it shows only sites which are manually submitted to it ?
Klen is online now   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 09:10 AM   #9
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by KlenTelaris View Post
So it shows only sites which are manually submitted to it ?
No. It discovers sites automatically. However crawling the web isn't trivial, so the number of domains currently indexed is relatively small. Lots of sites discovered won't end up in the index. Examples of sites that the search engine will exclude are white label sites, mass embed tube sites (such as sites that just embed videos from the main tubes). Spammy sites are excluded and if a site has too many popups or any kind of sneaky redirects then they won't get indexed either.

There are a lot of crap sites on the adult web and the focus of this search engine is to only index sites of a certain quality. It's not perfect yet, but it's getting better all the time.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 09:31 AM   #10
Klen
 
Klen's Avatar
 
Industry Role:
Join Date: Aug 2006
Location: Little Vienna
Posts: 32,234
Quote:
Originally Posted by AdultKing View Post
No. It discovers sites automatically. However crawling the web isn't trivial, so the number of domains currently indexed is relatively small. Lots of sites discovered won't end up in the index. Examples of sites that the search engine will exclude are white label sites, mass embed tube sites (such as sites that just embed videos from the main tubes). Spammy sites are excluded and if a site has too many popups or any kind of sneaky redirects then they won't get indexed either.

There are a lot of crap sites on the adult web and the focus of this search engine is to only index sites of a certain quality. It's not perfect yet, but it's getting better all the time.
How exactly you will determine which one is real tube and which was is embed tube ?
Klen is online now   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 09:31 AM   #11
redwhiteandblue
Bollocks
 
redwhiteandblue's Avatar
 
Industry Role:
Join Date: Jun 2007
Location: Bollocks
Posts: 2,792
What's the UA of the crawler so I can whitelist it?
redwhiteandblue is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 09:33 AM   #12
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by redwhiteandblue View Post
What's the UA of the crawler so I can whitelist it?
NodeBot 1.0/G
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:06 AM   #13
teg0
Confirmed User
 
teg0's Avatar
 
Join Date: Jan 2006
Location: Gringo in Puerto Rico
Posts: 4,197
Nice work. I'm working on something similar, but different. My own twists.
__________________
OV Tube - Tube Script Software
teg0 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:08 AM   #14
Nicky
Judge Jury and Executioner
 
Nicky's Avatar
 
Industry Role:
Join Date: Mar 2003
Location: Sweden
Posts: 30,052
I regged and added some sites
__________________

Hardlinks and blog posts available on a popular blog with DR 43 and over 3000 referring domains.
gfynicky @ gmail.com
Nicky is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:13 AM   #15
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by teg0 View Post
Nice work. I'm working on something similar, but different. My own twists.
It's an expensive exercise rolling out a search engine.

If anyone is interested in learning more about how it all works, I have a dedicated Node support channel in my slack team. Just visit Join the Adult Industry community on Slack! to get an auto invite.

I'm happy to answer questions and get into technical detail about how it all works and how I've built out the architecture.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:20 AM   #16
Bladewire
StraightBro
 
Bladewire's Avatar
 
Industry Role:
Join Date: Aug 2003
Location: Monarch Beach, CA USA
Posts: 56,232
I think it's great that you're always working on something new let's hope this one sticks and does well
Bladewire is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:33 AM   #17
Serge Litehead
Confirmed User
 
Serge Litehead's Avatar
 
Industry Role:
Join Date: Dec 2002
Location: Behind the scenes
Posts: 5,190
we did something like that back in 04-06.

search engine is a huge engineering and expensive project.

AdultKing, even if you get your queries time down to half it is still slow. At some point working on our SE project we decided to dump MYSQL and had written out own db engine which were way efficient than mysql. For instance 30mg db in mysql only weighted 1.4mg in our engine, querying was ridiculously fast in speed too no matter how huge database was due to our own indexing tech, we could show all results not up to 1000 like every other SE did and does.

Good memories and definitely great experience. Our development took ~1.5 years between myself and another programmer working 12-16hrs a day no weekends.

The project was wrapped up due to lack of financing, we got engine ready out of beta and were developing webmaster area for buying ads getting ready for marketing when it got stalled.
it was written in delphi/php.
__________________
Serge Litehead is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:36 AM   #18
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by Bladewire View Post
I think it's great that you're always working on something new let's hope this one sticks and does well
Time will tell.

It's not going to stick if query times take as long as they do now.

Current average time for results to be returned is 5 seconds. I need to get it down to 1.2 seconds max. Otherwise people just won't use it.

There's also the challenge of ensuring that the index remains as spam free as possible.

I've been working on this project for quite a long time and even launched a search engine years ago which didn't stick - the problems with that were the limitations of processing power and storage - now things are better with better infrastructure options available.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:42 AM   #19
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by holograph View Post
we did something like that back in 04-06.

search engine is a huge engineering and expensive project.

AdultKing, even if you get your queries time down to half it is still slow. At some point working our SE project we decided to dump MYSQL and had written out own db engine which were way efficient than mysql.
It's terribly expensive. Node is a cluster of 16 nodes at the moment and I'm adding another 16 this week.

The architecture is all NoSQL, the crawler and search engine are written in C and borrow some of the concepts, but not the code, of Lucene. The ranking algorithm is adaptive and reprocesses the index twice a day.

I have development group of servers running where I am tuning the search portion and currently have results within 1.8 seconds max, but I think 1.2 seconds is the sweet spot to make the thing usable.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:47 AM   #20
bns666
Confirmed Fetishist
 
bns666's Avatar
 
Industry Role:
Join Date: Mar 2005
Location: Fetishland
Posts: 11,488
nice, good luck
__________________
CAM SODASTRIPCHAT
CHATURBATESKYPE SEX CAMS
bns666 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:53 AM   #21
Serge Litehead
Confirmed User
 
Serge Litehead's Avatar
 
Industry Role:
Join Date: Dec 2002
Location: Behind the scenes
Posts: 5,190
we had dynamically updated index cache for our search results
along crawler bots we had bots doing indexing (results caching to be precise), which were updating all relevant indexes for a new page for existing search terms, this way indexes were always uptodate and results displayed very quickly.
__________________
Serge Litehead is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 10:59 AM   #22
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by holograph View Post
we had dynamically updated index cache for our search results
along crawler bots we had bots doing indexing (results caching to be precise), which were updating all relevant indexes for a new page for existing search terms, this way indexes were always uptodate and results displayed very quickly.
I've got search caching built in but it's off at the moment while I work out some infrastructure issues.

The main reason I announced it on GFY tonight was in the hope that people could break it
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 11:24 AM   #23
Adnium_Ivana
Confirmed User
 
Adnium_Ivana's Avatar
 
Industry Role:
Join Date: Jun 2016
Location: Toronto
Posts: 1,094
Quote:
Originally Posted by AdultKing View Post
A little project I have been working on is https://node.xxx

It's a search engine that only indexes adult websites and aggressively deals with spam sites, preventing them from being indexed.

It will also only index canonical sites, so no white labels get into the search index.

At the moment it's a little slow to respond to queries but that will improve as new caching servers are deployed.

It's still got a way to go in development but it is live and the current index is around 100 million pages. It supports complex queries which are documented here.

The search engine is infinitely scalable and while it's currently crawling html, pdf, json, xml, rss, video and images it's only returning text based results right now.

Later this year once I've perfected image search that will be rolled out, with video search to follow around March 2017.

Have a look and let me know what you think at https://node.xxx
Not an adult site but just tried searching for some and a) the speed in which the search came up is pretty impressive and b) I even found our ad network (Adnium & GSM) indexed on xbiz.com. Pretty impressive stuff you got going on here
__________________

Skype - ivana.gsmi
Email - [email protected]
[URL="https://adnium.com/ref/3168"]
Adnium_Ivana is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 11:55 AM   #24
teg0
Confirmed User
 
teg0's Avatar
 
Join Date: Jan 2006
Location: Gringo in Puerto Rico
Posts: 4,197
Quote:
Originally Posted by AdultKing View Post
It's an expensive exercise rolling out a search engine.

If anyone is interested in learning more about how it all works, I have a dedicated Node support channel in my slack team. Just visit Join the Adult Industry community on Slack! to get an auto invite.

I'm happy to answer questions and get into technical detail about how it all works and how I've built out the architecture.
cool, joined
__________________
OV Tube - Tube Script Software
teg0 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 11:57 AM   #25
CaptainHowdy
Too lazy to set a custom title
 
CaptainHowdy's Avatar
 
Industry Role:
Join Date: Dec 2004
Location: Happy in the dark.
Posts: 92,073
Very nice, AK !
__________________
Enroll in the SWAG Affiliate Asian Live Cam Program and get 9 free quality link-backs!
Get those links up ASAP! --> TJEEZERS.Cam. Setup in 48 Hours max.
CaptainHowdy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 07:40 PM   #26
johnnyloadproductions
Account Shutdown
 
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
Open source has become very powerful, as long as you know how to plug and play libraries together you can do a lot as just a single developer.
johnnyloadproductions is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-19-2016, 08:02 PM   #27
sandman!
Icq: 14420613
 
sandman!'s Avatar
 
Industry Role:
Join Date: Mar 2001
Location: chicago
Posts: 15,419
looks good
__________________
Need WebHosting ? Email me for some great deals [email protected]
sandman! is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 07:53 AM   #28
TheMaster
Confirmed User
 
Join Date: Nov 2003
Location: Prague
Posts: 2,732
looking good, when image and video gets added, I think that's when people start using it
__________________
TheMaster is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 08:00 AM   #29
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by TheMaster View Post
looking good, when image and video gets added, I think that's when people start using it
Yep. But baby steps first.

Current priority is to speed up search results.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 08:57 AM   #30
rabbit
Confirmed User
 
Join Date: Jul 2003
Location: Montreal
Posts: 2,124
how do you rank the results? seems arbitrary... resource pages show up before homepage, etc.
__________________

Got a paysite? Get it reviewed by RabbitsReviews and TheBestPorn
rabbit is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 09:02 AM   #31
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by rabbit View Post
how do you rank the results? seems arbitrary... resource pages show up before homepage, etc.
Obviously I won't be providing the precise method of ranking results however the reason you see what you're seeing is that weighting on brand home pages is turned off at the moment. When I turn it on you'll see root domains of brands appear at the top of results.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 09:35 AM   #32
Hazlewood
Confirmed User
 
Hazlewood's Avatar
 
Join Date: Sep 2006
Location: Toronto
Posts: 1,555
Quote:
Originally Posted by AdultKing View Post
A little project I have been working on is https://node.xxx

It's a search engine that only indexes adult websites and aggressively deals with spam sites, preventing them from being indexed.

It will also only index canonical sites, so no white labels get into the search index.

At the moment it's a little slow to respond to queries but that will improve as new caching servers are deployed.

It's still got a way to go in development but it is live and the current index is around 100 million pages. It supports complex queries which are documented here.

The search engine is infinitely scalable and while it's currently crawling html, pdf, json, xml, rss, video and images it's only returning text based results right now.

Later this year once I've perfected image search that will be rolled out, with video search to follow around March 2017.

Have a look and let me know what you think at https://node.xxx

can you please email me to discuss something higher level. haze at grandslammedia.com
__________________

Skype: hazegsm
Hazlewood is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 09:45 AM   #33
freecartoonporn
Confirmed User
 
freecartoonporn's Avatar
 
Industry Role:
Join Date: Jan 2012
Location: NC
Posts: 7,683
what search engine you guys using ?

i used sphinx before but it uses cpu a lot. and i had only ~7 mil records
freecartoonporn is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 09:52 AM   #34
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by Hazlewood View Post
can you please email me to discuss something higher level. haze at grandslammedia.com
There's a channel for live discussion of Node.XXX at Adult Industry Slack Team
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 09:53 AM   #35
Hazlewood
Confirmed User
 
Hazlewood's Avatar
 
Join Date: Sep 2006
Location: Toronto
Posts: 1,555
Quote:
Originally Posted by AdultKing View Post
There's a channel for live discussion of Node.XXX at Adult Industry Slack Team
I dont want to join with my email. I want to discuss business with you in a private setting. Give me your details then. Thank you
__________________

Skype: hazegsm
Hazlewood is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 09:55 AM   #36
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by freecartoonporn View Post
what search engine you guys using ?

i used sphinx before but it uses cpu a lot. and i had only ~7 mil records
Sphinx won't do the job.

This is clustered nodes of individual components. Crawling is seperate from Indexing. Ranking is seperate from Indexing. Crawls are performed through caching proxies on their own servers.

You could do this with Nutch and ElasticSearch but the overhead would be much greater than this system has.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 11:41 AM   #37
freecartoonporn
Confirmed User
 
freecartoonporn's Avatar
 
Industry Role:
Join Date: Jan 2012
Location: NC
Posts: 7,683
Quote:
Originally Posted by AdultKing View Post
Sphinx won't do the job.

This is clustered nodes of individual components. Crawling is seperate from Indexing. Ranking is seperate from Indexing. Crawls are performed through caching proxies on their own servers.

You could do this with Nutch and ElasticSearch but the overhead would be much greater than this system has.
so you made custom search engine ? i mean not using sphinx / lucene ?

just curious. as search engine site is in my todo list .
freecartoonporn is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 12:07 PM   #38
Struggle4Bucks
Sieg Hi!
 
Struggle4Bucks's Avatar
 
Industry Role:
Join Date: May 2011
Location: Lissabon
Posts: 3,613
I was searching for "quick fuck" but the load was too slow...
__________________
Half troll half amazing!
Struggle4Bucks is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 09:01 PM   #39
JJE
Confirmed User
 
JJE's Avatar
 
Industry Role:
Join Date: Sep 2014
Posts: 46
A while ago I was involved in developing a search interface that had a massive index. From memory it was nearly 300m documents including web pages, social media posts, etc. Was mainly text.

Given that we had limited hardware (and hardware was less powerful than it is now) we primarily focused our attention on the crawling/indexing method. We tried to do as much processing there as we could that was adaptable and could be 'redone' on tweaks. From there we were able to shard data accordingly and with significant focus on parsing search input we were able to 'avoid' querying data that wasn't relevant to the search input. There was of course a fail-over that could be triggered and we offered supplementary results, where our algorithm wasn't sure it could query the whole index.

Our benchmark for maintaining quality was this: get the first 1,000 results for the sharded/highly processed method within 90% similarity as if the full index was being queried, and we did. So essentially, we were returning nearly identical results whilst only hitting in some cases 1-2% of the index. For reference a 1m sized query result was well under 0.5s. Most searches were basically instant. Not sure if this is of any help to you, you're probably already doing or considering these methods. Good luck.
JJE is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-20-2016, 09:08 PM   #40
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by JJE View Post
A while ago I was involved in developing a search interface that had a massive index. From memory it was nearly 300m documents including web pages, social media posts, etc. Was mainly text.

Given that we had limited hardware (and hardware was less powerful than it is now) we primarily focused our attention on the crawling/indexing method. We tried to do as much processing there as we could that was adaptable and could be 'redone' on tweaks. From there we were able to shard data accordingly and with significant focus on parsing search input we were able to 'avoid' querying data that wasn't relevant to the search input. There was of course a fail-over that could be triggered and we offered supplementary results, where our algorithm wasn't sure it could query the whole index.

Our benchmark for maintaining quality was this: get the first 1,000 results for the sharded/highly processed method within 90% similarity as if the full index was being queried, and we did. So essentially, we were returning nearly identical results whilst only hitting in some cases 1-2% of the index. For reference a 1m sized query result was well under 0.5s. Most searches were basically instant. Not sure if this is of any help to you, you're probably already doing or considering these methods. Good luck.

Thanks for the great post.

I had a bit of a hackathon with a couple of the other people helping me last night. We have managed to get query time down to an average of .9 seconds down from an average of 5.

We're using SERP caching methods as well as a refined index, so now we've started a full recrawl.

I'll be rolling out the new version of the search engine in the next day or two.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-23-2016, 08:37 AM   #41
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
As promised a couple of days ago a new version has been rolled out.

Tonight I activated the new Search servers and caching.

The index had to be rebuilt to accommodate the changes we made to indexing.

Average search time is now down from 5 seconds to 0.9 seconds.

Submitting Sites

A number of webmasters have been submitting sites that just wont be indexed. In most of these cases the crawler has already been over your site before submission and discarded it for the reasons below.

Node.XXX only indexes canonical content.

Node.XXX will NOT index embed tube sites, white label dating or white label cam sites.

If your site has more than one pop up , malware, more than 4 ads above the fold or a poor user experience it will not be indexed.

Node.XXX is all about quality sites. The search engine is designed to filter out sites that are bad for the surfer. So if you have a lot of ads or have more than one popup or a sneaky redirect then your site just won't be indexed.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-24-2016, 09:51 AM   #42
Klen
 
Klen's Avatar
 
Industry Role:
Join Date: Aug 2006
Location: Little Vienna
Posts: 32,234
What about other equation of formula? Meaning , howmuch money do you plan to spend on marketing or you will do just guerrilla marketing ?
Klen is online now   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-24-2016, 10:01 AM   #43
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by KlenTelaris View Post
What about other equation of formula? Meaning , howmuch money do you plan to spend on marketing or you will do just guerrilla marketing ?
There's a plan to market it, but the tech needs to be refined first. It's going to be a long, expensive process to get there. Suffice to say I'm not throwing money at this idea for the fun of it.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-25-2016, 08:27 AM   #44
Smut-Talk
I talk smut
 
Industry Role:
Join Date: Jul 2016
Location: Somewhere on the webz
Posts: 176
Nice project!

I tried it.
My Comment:
you need to filter www/http and subdomains...
got the exact same search results from different subdomains from tube8,
jp. and .de and www.

pretty nice though.

Reminds me a bit of Free Porn Search Engine :: pornharmony.com
that site does some awesome matching content search.
The longer you search, the better the results are.

Goodluck!
Smut-Talk is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-25-2016, 08:47 AM   #45
pornguy
Too lazy to set a custom title
 
pornguy's Avatar
 
Industry Role:
Join Date: Mar 2003
Location: Homeless
Posts: 62,910
Sad to see another SE loaded with tubes.
__________________
PornGuy skype me pornguy_epic

AmateurDough The Hottes Shemales online!
TChicks.com | Angeles Cid | Mariana Cordoba | MAILERS WELCOME!
pornguy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-25-2016, 09:56 AM   #46
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by Smut-Talk View Post
Nice project!

I tried it.
My Comment:
you need to filter www/http and subdomains...
got the exact same search results from different subdomains from tube8,
jp. and .de and www.

pretty nice though.
Actually that does happen, but it takes a while for the searchable index to catch up. You'll find that those results will disappear in a few days and you'll only get results relative to your location - eg if you're in japan you'll see jp.whatever and in the us or global you'll see domain.com



Quote:
Reminds me a bit of Free Porn Search Engine :: pornharmony.com
that site does some awesome matching content search.
The longer you search, the better the results are.

Goodluck!
Thanks.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-25-2016, 10:03 AM   #47
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
Quote:
Originally Posted by pornguy View Post
Sad to see another SE loaded with tubes.
We don't index embedding tubes, or the spammy crap tubes that don't host their own content.

What should we do with the canonical tubes ? Ban them from the index ? They exist, people use them. We do proactively ban torrent sites and file lockers from the index, but where do we draw the line ? Should we remove the tubes from the index too ?
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-25-2016, 11:46 AM   #48
HowlingWulf
Confirmed User
 
HowlingWulf's Avatar
 
Join Date: Nov 2001
Posts: 1,662
I submitted a few. We'll see what happens.
__________________
WP Porn directory/review theme Maddos $35.

webmaster AT howling-wulf.com
HowlingWulf is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-25-2016, 12:04 PM   #49
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
URL submissions to the site are algorithmically checked and not manually checked except for a few cases.

However I do see reports on submission failure rates and there have been a lot of sites submitted that are rejected by the search engine because of too many ads or too many popups or popunders.

If a surfer visits your site, clicks once and then has a popup take over their screen then your site just won't be included in the index. Likewise if you have more than 4 ads above the fold then your site also won't be included in the index.

The key decisions of whether a site is included on Node.XXX are:

1. Does the site provide a good user experience ? Good
2. Does the site have too many ads ? Bad
3. Does the site have takeover popups on click ? Bad
4. Is the site spammy ? Bad
5, Does the site embed large amounts of content from other sources ? Bad
6. Is the site a white label ? Exclusion
7. Is the site an embedding tube ? Exclusion
8. Does the site have too many spammy links pointing to it ? Bad
9. Does the site have malware or unwanted redirects ? Exclusion.
10. Does the site show different content to mobile and desktop users? Exclusion

Even if a site is included in the index, every time it is re-crawled these considerations apply. We also check the site from proxy nodes using various User Agents to be sure that sites aren't trying to fool Node.XXX

So to sum up. Good user experience and original content will see a site included in the index. Bad user experience or spam etc will see the site excluded.
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-25-2016, 10:51 PM   #50
Paul Markham
Too old to care
 
Paul Markham's Avatar
 
Industry Role:
Join Date: Jun 2001
Location: On the sofa, watching TV or doing my jigsaws.
Posts: 52,943
Will surfers prefer it to Google?
Paul Markham is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks

Tags
search, index, engine, sites, video, queries, https://node.xxx, rss, xml, json, images, returning, complex, based, text, pdf, html, infinitely, documented, scalable, crawling, results, porn, march, follow
Thread Tools



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.