Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
|
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. |
|
Thread Tools |
07-19-2016, 07:54 AM | #1 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Node.XXX - Porn Search Engine
A little project I have been working on is https://node.xxx
It's a search engine that only indexes adult websites and aggressively deals with spam sites, preventing them from being indexed. It will also only index canonical sites, so no white labels get into the search index. At the moment it's a little slow to respond to queries but that will improve as new caching servers are deployed. It's still got a way to go in development but it is live and the current index is around 100 million pages. It supports complex queries which are documented here. The search engine is infinitely scalable and while it's currently crawling html, pdf, json, xml, rss, video and images it's only returning text based results right now. Later this year once I've perfected image search that will be rolled out, with video search to follow around March 2017. Have a look and let me know what you think at https://node.xxx |
07-19-2016, 08:47 AM | #2 |
So Fucking Banned
Industry Role:
Join Date: Jul 2015
Location: elmer blackwood mansion
Posts: 1,459
|
How do we submit sites for indexing?
|
07-19-2016, 08:50 AM | #3 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
|
07-19-2016, 08:52 AM | #4 |
It's 42
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
|
Looks good -- when you get decent traffic hit us up about buying some ads
|
07-19-2016, 08:52 AM | #5 |
So Fucking Banned
Industry Role:
Join Date: Jul 2015
Location: elmer blackwood mansion
Posts: 1,459
|
|
07-19-2016, 08:53 AM | #6 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
|
07-19-2016, 08:54 AM | #7 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
|
07-19-2016, 09:05 AM | #8 |
Industry Role:
Join Date: Aug 2006
Location: Little Vienna
Posts: 32,234
|
So it shows only sites which are manually submitted to it ?
|
07-19-2016, 09:10 AM | #9 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
No. It discovers sites automatically. However crawling the web isn't trivial, so the number of domains currently indexed is relatively small. Lots of sites discovered won't end up in the index. Examples of sites that the search engine will exclude are white label sites, mass embed tube sites (such as sites that just embed videos from the main tubes). Spammy sites are excluded and if a site has too many popups or any kind of sneaky redirects then they won't get indexed either.
There are a lot of crap sites on the adult web and the focus of this search engine is to only index sites of a certain quality. It's not perfect yet, but it's getting better all the time. |
07-19-2016, 09:31 AM | #10 | |
Industry Role:
Join Date: Aug 2006
Location: Little Vienna
Posts: 32,234
|
Quote:
|
|
07-19-2016, 09:31 AM | #11 |
Bollocks
Industry Role:
Join Date: Jun 2007
Location: Bollocks
Posts: 2,792
|
What's the UA of the crawler so I can whitelist it?
__________________
Interserver unmanaged AMD Ryzen servers from $73.00 |
07-19-2016, 09:33 AM | #12 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
|
07-19-2016, 10:06 AM | #13 |
Confirmed User
Join Date: Jan 2006
Location: Gringo in Puerto Rico
Posts: 4,197
|
Nice work. I'm working on something similar, but different. My own twists.
__________________
OV Tube - Tube Script Software |
07-19-2016, 10:08 AM | #14 |
Judge Jury and Executioner
Industry Role:
Join Date: Mar 2003
Location: Sweden
Posts: 30,052
|
I regged and added some sites
__________________
Hardlinks and blog posts available on a popular blog with DR 43 and over 3000 referring domains. gfynicky @ gmail.com |
07-19-2016, 10:13 AM | #15 | |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Quote:
If anyone is interested in learning more about how it all works, I have a dedicated Node support channel in my slack team. Just visit Join the Adult Industry community on Slack! to get an auto invite. I'm happy to answer questions and get into technical detail about how it all works and how I've built out the architecture. |
|
07-19-2016, 10:20 AM | #16 |
StraightBro
Industry Role:
Join Date: Aug 2003
Location: Monarch Beach, CA USA
Posts: 56,232
|
I think it's great that you're always working on something new let's hope this one sticks and does well
|
07-19-2016, 10:33 AM | #17 |
Confirmed User
Industry Role:
Join Date: Dec 2002
Location: Behind the scenes
Posts: 5,190
|
we did something like that back in 04-06.
search engine is a huge engineering and expensive project. AdultKing, even if you get your queries time down to half it is still slow. At some point working on our SE project we decided to dump MYSQL and had written out own db engine which were way efficient than mysql. For instance 30mg db in mysql only weighted 1.4mg in our engine, querying was ridiculously fast in speed too no matter how huge database was due to our own indexing tech, we could show all results not up to 1000 like every other SE did and does. Good memories and definitely great experience. Our development took ~1.5 years between myself and another programmer working 12-16hrs a day no weekends. The project was wrapped up due to lack of financing, we got engine ready out of beta and were developing webmaster area for buying ads getting ready for marketing when it got stalled. it was written in delphi/php.
__________________
|
07-19-2016, 10:36 AM | #18 | |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Quote:
It's not going to stick if query times take as long as they do now. Current average time for results to be returned is 5 seconds. I need to get it down to 1.2 seconds max. Otherwise people just won't use it. There's also the challenge of ensuring that the index remains as spam free as possible. I've been working on this project for quite a long time and even launched a search engine years ago which didn't stick - the problems with that were the limitations of processing power and storage - now things are better with better infrastructure options available. |
|
07-19-2016, 10:42 AM | #19 | |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Quote:
The architecture is all NoSQL, the crawler and search engine are written in C and borrow some of the concepts, but not the code, of Lucene. The ranking algorithm is adaptive and reprocesses the index twice a day. I have development group of servers running where I am tuning the search portion and currently have results within 1.8 seconds max, but I think 1.2 seconds is the sweet spot to make the thing usable. |
|
07-19-2016, 10:47 AM | #20 |
Confirmed Fetishist
Industry Role:
Join Date: Mar 2005
Location: Fetishland
Posts: 11,488
|
nice, good luck
|
07-19-2016, 10:53 AM | #21 |
Confirmed User
Industry Role:
Join Date: Dec 2002
Location: Behind the scenes
Posts: 5,190
|
we had dynamically updated index cache for our search results
along crawler bots we had bots doing indexing (results caching to be precise), which were updating all relevant indexes for a new page for existing search terms, this way indexes were always uptodate and results displayed very quickly.
__________________
|
07-19-2016, 10:59 AM | #22 | |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Quote:
The main reason I announced it on GFY tonight was in the hope that people could break it |
|
07-19-2016, 11:24 AM | #23 | |
Confirmed User
Industry Role:
Join Date: Jun 2016
Location: Toronto
Posts: 1,094
|
Quote:
|
|
07-19-2016, 11:55 AM | #24 | |
Confirmed User
Join Date: Jan 2006
Location: Gringo in Puerto Rico
Posts: 4,197
|
Quote:
__________________
OV Tube - Tube Script Software |
|
07-19-2016, 11:57 AM | #25 |
Too lazy to set a custom title
Industry Role:
Join Date: Dec 2004
Location: Happy in the dark.
Posts: 92,073
|
Very nice, AK !
__________________
Enroll in the SWAG Affiliate Asian Live Cam Program and get 9 free quality link-backs! Get those links up ASAP! --> TJEEZERS.Cam. Setup in 48 Hours max. |
07-19-2016, 07:40 PM | #26 |
Account Shutdown
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
|
Open source has become very powerful, as long as you know how to plug and play libraries together you can do a lot as just a single developer.
|
07-19-2016, 08:02 PM | #27 |
Icq: 14420613
Industry Role:
Join Date: Mar 2001
Location: chicago
Posts: 15,419
|
looks good
__________________
Need WebHosting ? Email me for some great deals [email protected] |
07-20-2016, 07:53 AM | #28 |
Confirmed User
Join Date: Nov 2003
Location: Prague
Posts: 2,732
|
looking good, when image and video gets added, I think that's when people start using it
__________________
|
07-20-2016, 08:00 AM | #29 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
|
07-20-2016, 08:57 AM | #30 |
Confirmed User
Join Date: Jul 2003
Location: Montreal
Posts: 2,124
|
how do you rank the results? seems arbitrary... resource pages show up before homepage, etc.
|
07-20-2016, 09:02 AM | #31 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Obviously I won't be providing the precise method of ranking results however the reason you see what you're seeing is that weighting on brand home pages is turned off at the moment. When I turn it on you'll see root domains of brands appear at the top of results.
|
07-20-2016, 09:35 AM | #32 | |
Confirmed User
Join Date: Sep 2006
Location: Toronto
Posts: 1,555
|
Quote:
can you please email me to discuss something higher level. haze at grandslammedia.com |
|
07-20-2016, 09:45 AM | #33 |
Confirmed User
Industry Role:
Join Date: Jan 2012
Location: NC
Posts: 7,683
|
what search engine you guys using ?
i used sphinx before but it uses cpu a lot. and i had only ~7 mil records
__________________
SSD Cloud Server, VPS Server, Simple Cloud Hosting | DigitalOcean
|
07-20-2016, 09:52 AM | #34 | |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Quote:
|
|
07-20-2016, 09:53 AM | #35 | |
Confirmed User
Join Date: Sep 2006
Location: Toronto
Posts: 1,555
|
Quote:
|
|
07-20-2016, 09:55 AM | #36 | |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Quote:
This is clustered nodes of individual components. Crawling is seperate from Indexing. Ranking is seperate from Indexing. Crawls are performed through caching proxies on their own servers. You could do this with Nutch and ElasticSearch but the overhead would be much greater than this system has. |
|
07-20-2016, 11:41 AM | #37 | |
Confirmed User
Industry Role:
Join Date: Jan 2012
Location: NC
Posts: 7,683
|
Quote:
just curious. as search engine site is in my todo list .
__________________
SSD Cloud Server, VPS Server, Simple Cloud Hosting | DigitalOcean
|
|
07-20-2016, 12:07 PM | #38 |
Sieg Hi!
Industry Role:
Join Date: May 2011
Location: Lissabon
Posts: 3,613
|
I was searching for "quick fuck" but the load was too slow...
__________________
Half troll half amazing! |
07-20-2016, 09:01 PM | #39 |
Confirmed User
Industry Role:
Join Date: Sep 2014
Posts: 46
|
A while ago I was involved in developing a search interface that had a massive index. From memory it was nearly 300m documents including web pages, social media posts, etc. Was mainly text.
Given that we had limited hardware (and hardware was less powerful than it is now) we primarily focused our attention on the crawling/indexing method. We tried to do as much processing there as we could that was adaptable and could be 'redone' on tweaks. From there we were able to shard data accordingly and with significant focus on parsing search input we were able to 'avoid' querying data that wasn't relevant to the search input. There was of course a fail-over that could be triggered and we offered supplementary results, where our algorithm wasn't sure it could query the whole index. Our benchmark for maintaining quality was this: get the first 1,000 results for the sharded/highly processed method within 90% similarity as if the full index was being queried, and we did. So essentially, we were returning nearly identical results whilst only hitting in some cases 1-2% of the index. For reference a 1m sized query result was well under 0.5s. Most searches were basically instant. Not sure if this is of any help to you, you're probably already doing or considering these methods. Good luck. |
07-20-2016, 09:08 PM | #40 | |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Quote:
Thanks for the great post. I had a bit of a hackathon with a couple of the other people helping me last night. We have managed to get query time down to an average of .9 seconds down from an average of 5. We're using SERP caching methods as well as a refined index, so now we've started a full recrawl. I'll be rolling out the new version of the search engine in the next day or two. |
|
07-23-2016, 08:37 AM | #41 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
As promised a couple of days ago a new version has been rolled out.
Tonight I activated the new Search servers and caching. The index had to be rebuilt to accommodate the changes we made to indexing. Average search time is now down from 5 seconds to 0.9 seconds. Submitting Sites A number of webmasters have been submitting sites that just wont be indexed. In most of these cases the crawler has already been over your site before submission and discarded it for the reasons below. Node.XXX only indexes canonical content. Node.XXX will NOT index embed tube sites, white label dating or white label cam sites. If your site has more than one pop up , malware, more than 4 ads above the fold or a poor user experience it will not be indexed. Node.XXX is all about quality sites. The search engine is designed to filter out sites that are bad for the surfer. So if you have a lot of ads or have more than one popup or a sneaky redirect then your site just won't be indexed. |
07-24-2016, 09:51 AM | #42 |
Industry Role:
Join Date: Aug 2006
Location: Little Vienna
Posts: 32,234
|
What about other equation of formula? Meaning , howmuch money do you plan to spend on marketing or you will do just guerrilla marketing ?
|
07-24-2016, 10:01 AM | #43 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
There's a plan to market it, but the tech needs to be refined first. It's going to be a long, expensive process to get there. Suffice to say I'm not throwing money at this idea for the fun of it.
|
07-25-2016, 08:27 AM | #44 |
I talk smut
Industry Role:
Join Date: Jul 2016
Location: Somewhere on the webz
Posts: 176
|
Nice project!
I tried it. My Comment: you need to filter www/http and subdomains... got the exact same search results from different subdomains from tube8, jp. and .de and www. pretty nice though. Reminds me a bit of Free Porn Search Engine :: pornharmony.com that site does some awesome matching content search. The longer you search, the better the results are. Goodluck! |
07-25-2016, 08:47 AM | #45 |
Too lazy to set a custom title
Industry Role:
Join Date: Mar 2003
Location: Homeless
Posts: 62,910
|
Sad to see another SE loaded with tubes.
__________________
PornGuy skype me pornguy_epic AmateurDough The Hottes Shemales online! TChicks.com | Angeles Cid | Mariana Cordoba | MAILERS WELCOME! |
07-25-2016, 09:56 AM | #46 | ||
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
Quote:
Quote:
|
||
07-25-2016, 10:03 AM | #47 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
We don't index embedding tubes, or the spammy crap tubes that don't host their own content.
What should we do with the canonical tubes ? Ban them from the index ? They exist, people use them. We do proactively ban torrent sites and file lockers from the index, but where do we draw the line ? Should we remove the tubes from the index too ? |
07-25-2016, 12:04 PM | #49 |
Raise Your Weapon
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,605
|
URL submissions to the site are algorithmically checked and not manually checked except for a few cases.
However I do see reports on submission failure rates and there have been a lot of sites submitted that are rejected by the search engine because of too many ads or too many popups or popunders. If a surfer visits your site, clicks once and then has a popup take over their screen then your site just won't be included in the index. Likewise if you have more than 4 ads above the fold then your site also won't be included in the index. The key decisions of whether a site is included on Node.XXX are: 1. Does the site provide a good user experience ? Good 2. Does the site have too many ads ? Bad 3. Does the site have takeover popups on click ? Bad 4. Is the site spammy ? Bad 5, Does the site embed large amounts of content from other sources ? Bad 6. Is the site a white label ? Exclusion 7. Is the site an embedding tube ? Exclusion 8. Does the site have too many spammy links pointing to it ? Bad 9. Does the site have malware or unwanted redirects ? Exclusion. 10. Does the site show different content to mobile and desktop users? Exclusion Even if a site is included in the index, every time it is re-crawled these considerations apply. We also check the site from proxy nodes using various User Agents to be sure that sites aren't trying to fool Node.XXX So to sum up. Good user experience and original content will see a site included in the index. Bad user experience or spam etc will see the site excluded. |
07-25-2016, 10:51 PM | #50 |
Too old to care
Industry Role:
Join Date: Jun 2001
Location: On the sofa, watching TV or doing my jigsaws.
Posts: 52,943
|
Will surfers prefer it to Google?
|