GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   Tech Is Anyone Using The Full Data Dump From HubTrafiic?.. (https://gfy.com/showthread.php?t=1307568)

EddyTheDog 01-05-2019 10:53 AM

Is Anyone Using The Full Data Dump From HubTrafiic?..
 
What tools are you using? - It's massive and I am struggling...

Thanks.....

Klen 01-05-2019 12:17 PM

You should be careful with word choice otherwise you can attract currently sober into thread :D
And yes i use full dump as well, i made my own script to parse it plus script to parse weekly updates.

CjTheFish 01-05-2019 12:27 PM

Good luck, I have never been able to successfully full from those dumps. Hope someone can shed more light on it as well.

EddyTheDog 01-05-2019 12:59 PM

Quote:

Originally Posted by KlenTelaris (Post 22391156)
You should be careful with word choice otherwise you can attract currently sober into thread :D
And yes i use full dump as well, i made my own script to parse it plus script to parse weekly updates.


lol - True...

wankawonk 01-05-2019 02:12 PM

yes I have the full pornhub/redtube/tube8 dumps on my sites

like klentelaris I use my own script

you can fit the whole thing on a single elasticsearch node but you need a decently beefy machine

it's definitely too big for most of the cookie cutter scripts people use, smart-cj, wp-tube, tube-ace

I bet mechbunny could handle it though, with a big enough server

EddyTheDog 01-05-2019 02:37 PM

Quote:

Originally Posted by wankawonk (Post 22391218)
yes I have the full pornhub/redtube/tube8 dumps on my sites

like klentelaris I use my own script

you can fit the whole thing on a single elasticsearch node but you need a decently beefy machine

it's definitely too big for most of the cookie cutter scripts people use, smart-cj, wp-tube, tube-ace

I bet mechbunny could handle it though, with a big enough server

I haven't looked at ElasticSearch yet - Is easy enough to import it?..

Klen 01-05-2019 04:11 PM

Quote:

Originally Posted by wankawonk (Post 22391218)
yes I have the full pornhub/redtube/tube8 dumps on my sites

like klentelaris I use my own script

you can fit the whole thing on a single elasticsearch node but you need a decently beefy machine

it's definitely too big for most of the cookie cutter scripts people use, smart-cj, wp-tube, tube-ace

I bet mechbunny could handle it though, with a big enough server

I do know those script are extremely resource hungry for some reason, tho when i was testing with my script it worked fast on a 512 mb vps.

babeterminal 01-05-2019 04:23 PM

https://www.wpxtube.com/

babeterminal 01-05-2019 04:25 PM

those script are extremely resource hungry they all are we need some way to break it down above cuts off at set amounts then carries on very slow

brassmonkey 01-05-2019 04:37 PM

Quote:

Originally Posted by babeterminal (Post 22391301)

:1orglaugh:1orglaugh:1orglaugh:1orglaugh nope!

wankawonk 01-05-2019 04:40 PM

Quote:

Originally Posted by babeterminal (Post 22391301)

LOL I checked out some of their demo tubes. Barely functional, slow as shit, some of them are already shut down

good fuckin luck with that

babeterminal 01-05-2019 04:44 PM

Quote:

Originally Posted by wankawonk (Post 22391317)
LOL I checked out some of their demo tubes. Barely functional, slow as shit, some of them are already shut down

good fuckin luck with that

telll me about it, mines stuck on php5 lol if i upgrade anything i think it will break

next.......... lol

babeterminal 01-05-2019 04:46 PM

i think i will have to start from scratch and go with the guy who is jetting around the world and airport staff are smashing his equipment what is his tube script?

AdultKing 01-06-2019 12:59 AM

Quote:

Originally Posted by EddyTheDog (Post 22391118)
What tools are you using? - It's massive and I am struggling...

Thanks.....

One way to deal with it is to import it to a No SQL database and use something like elastic and an api to feed a front end site and give search functionality. This is all free, you just need to know how to put the pieces together.

The advantage of doing it this way is you can run just one box to feed an unlimited number of websites with data through the API.

EddyTheDog 01-06-2019 02:07 AM

Quote:

Originally Posted by AdultKing (Post 22391500)
One way to deal with it is to import it to a No SQL database and use something like elastic and an api to feed a front end site and give search functionality. This is all free, you just need to know how to put the pieces together.

The advantage of doing it this way is you can run just one box to feed an unlimited number of websites with data through the API.

That's pretty much what I have been doing most of the day - There has been a lot of head scratching and swearing but I think I am getting there...

Thanks.....

AdultKing 01-06-2019 02:18 AM

Quote:

Originally Posted by EddyTheDog (Post 22391506)
That's pretty much what I have been doing most of the day - There has been a lot of head scratching and swearing but I think I am getting there...

Thanks.....

Just a tip - SSD drives make a big difference in performance.

Klen 01-06-2019 03:04 AM

Quote:

Originally Posted by AdultKing (Post 22391500)
One way to deal with it is to import it to a No SQL database and use something like elastic and an api to feed a front end site and give search functionality. This is all free, you just need to know how to put the pieces together.

The advantage of doing it this way is you can run just one box to feed an unlimited number of websites with data through the API.

You are so comical lately . "only way to deal with it" :1orglaugh
And speaking about databases, i tested mariadb while i was doing import of this for first time, but for some reason , while mariadb loads content faster then mysql, it is terribly slow when it comes to inserting-it took 12 hours to import hubtraffic dump, while with mysql it took only 3 hours. So i figured out how to optimize mysql to load content fast and there was no need anymore for mariadb.

babeterminal 01-06-2019 07:03 AM

one thing this all reminds me of when tube scripts and xml/atom/css like some of you i was trying different ways to get this to work,

that i fucked up score feeds somehow and started showing hundreds of thousands of hits in the program admin, he was asking what the hells going on, i will see if i can take some screenshots, he didnt close account as i generally had no idea what was going on, still dont

AdultKing 01-06-2019 07:10 AM

Quote:

Originally Posted by KlenTelaris (Post 22391525)
You are so comical lately . "only way to deal with it" :1orglaugh
And speaking about databases, i tested mariadb while i was doing import of this for first time, but for some reason , while mariadb loads content faster then mysql, it is terribly slow when it comes to inserting-it took 12 hours to import hubtraffic dump, while with mysql it took only 3 hours. So i figured out how to optimize mysql to load content fast and there was no need anymore for mariadb.

You need to learn to read, I said "ONE WAY TO DEAL WITH IT" :helpme

babeterminal 01-06-2019 07:27 AM

no joy can only go backstats 2009, this was 06/08

Klen 01-06-2019 09:05 AM

Quote:

Originally Posted by AdultKing (Post 22391611)
You need to learn to read, I said "ONE WAY TO DEAL WITH IT" :helpme

Damn brain auto-correct :1orglaugh

CurrentlySober 01-06-2019 09:54 AM

https://i.imgur.com/rItRjGt.jpg

:thumbsup:thumbsup:thumbsup

brassmonkey 01-06-2019 10:46 AM

maybe find a tool that can split it to smaller files

shake 01-06-2019 11:55 AM

I wrote a python script to deal with it, keeps everything in memory (I have an old server with tons of ram) and it finishes in a few minutes.

babeterminal 01-06-2019 04:56 PM

Quote:

Originally Posted by shake (Post 22391748)
I wrote a python script to deal with it, keeps everything in memory (I have an old server with tons of ram) and it finishes in a few minutes.

what did you name it?

brassmonkey 01-06-2019 06:39 PM

Quote:

Originally Posted by babeterminal (Post 22391867)
what did you name it?

white snake???? :1orglaugh:1orglaugh:1orglaugh:1orglaugh:1orglaugh :1orglaugh

brassmonkey 01-06-2019 07:00 PM

split axe may do it. i have a copy if u need it

freecartoonporn 01-06-2019 07:31 PM

for huge files i have used "load data infile" works best if you dont want any manipulation done on data.

EddyTheDog 01-06-2019 07:43 PM

Quote:

Originally Posted by brassmonkey (Post 22391908)
split axe may do it. i have a copy if u need it

I just found/used a prog from filesplitter.org - It's Freeware and worked like a charm - Not bad, it's a 9GB file and the prog looks like it was made for Windows 3.11...

brassmonkey 01-06-2019 07:46 PM

Quote:

Originally Posted by EddyTheDog (Post 22391920)
I just found/used a prog from filesplitter.org - It's Freeware and worked like a charm - Not bad, it's a 9GB file and the prog looks like it was made for Windows 3.11...

:thumbsup:thumbsup glad you found one! cheers!

AdultKing 01-06-2019 09:20 PM

Quote:

Originally Posted by EddyTheDog (Post 22391920)
I just found/used a prog from filesplitter.org - It's Freeware and worked like a charm - Not bad, it's a 9GB file and the prog looks like it was made for Windows 3.11...

On any unix/linux/mac system you can use the command split to split files.

example splits a file of any size into smaller files, each with 5000 lines

Code:

split -l 5000 anyfile.ext newfile
If split isn't on your system:

on Ubuntu/Debian type

Code:

sudo apt-get install split
on Centos/RHEL

Code:

sudo yum install split
on MacOS

Code:

brew install split

EddyTheDog 01-06-2019 09:33 PM

Ooops - I split into 500mb chunks and I am doing a find and replace | to ^t - I probably should gone a bit smaller - lol...

AdultKing 01-07-2019 03:06 PM

Quote:

Originally Posted by EddyTheDog (Post 22391941)
Ooops - I split into 500mb chunks and I am doing a find and replace | to ^t - I probably should gone a bit smaller - lol...

How did you go, did you get there?

babeterminal 01-07-2019 03:36 PM

how did that go for you 9gb to a 500mb then what? did it work?

HowlingWulf 01-07-2019 05:27 PM

No I just use their API to pull in the good videos I want and bypass all the junk shitty vids.

AdultKing 01-07-2019 05:28 PM

Quote:

Originally Posted by HowlingWulf (Post 22392590)
No I just use their API to pull in the good videos I want and bypass all the junk shitty vids.

The API is good, but if you're building out a massive embed site the dump is better.

EddyTheDog 01-07-2019 05:37 PM

Quote:

Originally Posted by HowlingWulf (Post 22392590)
No I just use their API to pull in the good videos I want and bypass all the junk shitty vids.

Too easy. Although I am using the API for some stuff like tags etc...

Too be honest it's a chance to learn about 'big data' and related tech. I can then use that knowledge for something that might actually make me some money!....

AdultKing 01-07-2019 05:44 PM

Quote:

Originally Posted by EddyTheDog (Post 22392596)
Too easy. Although I am using the API for some stuff like tags etc...

Too be honest it's a chance to learn about 'big data' and related tech. I can then use that knowledge for something that might actually make me some money!....

Good idea, learning stuff like distributed database, elastic etc is all useful.

If you're keen to learn the most popular PHP framework around, which includes support for technologies related to big data, such as Algolia then learn Laravel.

https://laravel.com

https://laracasts.com

EddyTheDog 01-07-2019 06:03 PM

Quote:

Originally Posted by AdultKing (Post 22392600)
Good idea, learning stuff like distributed database, elastic etc is all useful.

If you're keen to learn the most popular PHP framework around, which includes support for technologies related to big data, such as Algolia then learn Laravel.

https://laravel.com

https://laracasts.com

I have been thinking about Laravel - I looked at it years ago - Maybe it's time to look again...

EddyTheDog 01-08-2019 01:01 PM

OK - So this is the flow I used:

Download/Unzip CSV dump -> Use FileSplitter to split CSV into 500mb files -> Used NotePad++ to find and replace pipes with tabs (makes DB import easier) -> Used Navicat to import into MongoDB...

Now I just have to learn how to use the data lol.....

Notes: This was all done on a local Windows machine - Navicat will import them into an external MongoDB instance if needed - It's a lot of data and you need to be patient - Either that or start a large Google Cloud instance and do it on there, I wish I had done that lol...

freecartoonporn 01-08-2019 06:47 PM

people using API ,

API Limit is about 40 requests per 10 second

so its not feasible to use api for search queries/tag queries on live site, you need to implement your own search function.

AdultKing 01-08-2019 10:18 PM

Quote:

Originally Posted by freecartoonporn (Post 22393510)
people using API ,

API Limit is about 40 requests per 10 second

so its not feasible to use api for search queries/tag queries on live site, you need to implement your own search function.

it's fine if you cache requests.


All times are GMT -7. The time now is 05:44 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123