![]() |
Is Anyone Using The Full Data Dump From HubTrafiic?..
What tools are you using? - It's massive and I am struggling...
Thanks..... |
You should be careful with word choice otherwise you can attract currently sober into thread :D
And yes i use full dump as well, i made my own script to parse it plus script to parse weekly updates. |
Good luck, I have never been able to successfully full from those dumps. Hope someone can shed more light on it as well.
|
Quote:
lol - True... |
yes I have the full pornhub/redtube/tube8 dumps on my sites
like klentelaris I use my own script you can fit the whole thing on a single elasticsearch node but you need a decently beefy machine it's definitely too big for most of the cookie cutter scripts people use, smart-cj, wp-tube, tube-ace I bet mechbunny could handle it though, with a big enough server |
Quote:
|
Quote:
|
|
those script are extremely resource hungry they all are we need some way to break it down above cuts off at set amounts then carries on very slow
|
Quote:
|
Quote:
good fuckin luck with that |
Quote:
next.......... lol |
i think i will have to start from scratch and go with the guy who is jetting around the world and airport staff are smashing his equipment what is his tube script?
|
Quote:
The advantage of doing it this way is you can run just one box to feed an unlimited number of websites with data through the API. |
Quote:
Thanks..... |
Quote:
|
Quote:
And speaking about databases, i tested mariadb while i was doing import of this for first time, but for some reason , while mariadb loads content faster then mysql, it is terribly slow when it comes to inserting-it took 12 hours to import hubtraffic dump, while with mysql it took only 3 hours. So i figured out how to optimize mysql to load content fast and there was no need anymore for mariadb. |
one thing this all reminds me of when tube scripts and xml/atom/css like some of you i was trying different ways to get this to work,
that i fucked up score feeds somehow and started showing hundreds of thousands of hits in the program admin, he was asking what the hells going on, i will see if i can take some screenshots, he didnt close account as i generally had no idea what was going on, still dont |
Quote:
|
no joy can only go backstats 2009, this was 06/08
|
Quote:
|
|
maybe find a tool that can split it to smaller files
|
I wrote a python script to deal with it, keeps everything in memory (I have an old server with tons of ram) and it finishes in a few minutes.
|
Quote:
|
Quote:
|
split axe may do it. i have a copy if u need it
|
for huge files i have used "load data infile" works best if you dont want any manipulation done on data.
|
Quote:
|
Quote:
|
Quote:
example splits a file of any size into smaller files, each with 5000 lines Code:
split -l 5000 anyfile.ext newfile on Ubuntu/Debian type Code:
sudo apt-get install split Code:
sudo yum install split Code:
brew install split |
Ooops - I split into 500mb chunks and I am doing a find and replace | to ^t - I probably should gone a bit smaller - lol...
|
Quote:
|
how did that go for you 9gb to a 500mb then what? did it work?
|
No I just use their API to pull in the good videos I want and bypass all the junk shitty vids.
|
Quote:
|
Quote:
Too be honest it's a chance to learn about 'big data' and related tech. I can then use that knowledge for something that might actually make me some money!.... |
Quote:
If you're keen to learn the most popular PHP framework around, which includes support for technologies related to big data, such as Algolia then learn Laravel. https://laravel.com https://laracasts.com |
Quote:
|
OK - So this is the flow I used:
Download/Unzip CSV dump -> Use FileSplitter to split CSV into 500mb files -> Used NotePad++ to find and replace pipes with tabs (makes DB import easier) -> Used Navicat to import into MongoDB... Now I just have to learn how to use the data lol..... Notes: This was all done on a local Windows machine - Navicat will import them into an external MongoDB instance if needed - It's a lot of data and you need to be patient - Either that or start a large Google Cloud instance and do it on there, I wish I had done that lol... |
people using API ,
API Limit is about 40 requests per 10 second so its not feasible to use api for search queries/tag queries on live site, you need to implement your own search function. |
Quote:
|
All times are GMT -7. The time now is 05:44 AM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123