Page 2 of 4

Re: How many of you are working right now?

Posted: Tue Apr 19, 2011 2:13 pm
by ashley
slothrop wrote:Trying to figure out a non sucky way of making a scalable web-crawler that works with grim dynamically generated sites. Which basically means flipping between two designs (neither of which would work) for five minutes and then going back to the internet.
Eh?

wget has a spider tool, you can use this to scrape links on websites and then pump those into another process that indexes the content in a database for example? Store the URL's in a database table with a unique index on the URL itself, filtering out any duplicates?

Re: How many of you are working right now?

Posted: Tue Apr 19, 2011 2:27 pm
by slothrop
ashley wrote:
slothrop wrote:Trying to figure out a non sucky way of making a scalable web-crawler that works with grim dynamically generated sites. Which basically means flipping between two designs (neither of which would work) for five minutes and then going back to the internet.
Eh?

wget has a spider tool, you can use this to scrape links on websites and then pump those into another process that indexes the content in a database for example? Store the URL's in a database table with a unique index on the URL itself, filtering out any duplicates?
Nah, it has to deal in a reasonably sensible way with sites that generate content dynamically, have silly numbers of internal links, and might (for instance) keep giving you the same (or similar) content at an arbitrary number of different urls but also needs to start losing stuff that no longer exists again in a reasonably timely fashion, preferably without having to completely re-crawl the site. AIUI, google analytics believes that one of the sites we're talking about has about six billion individual links...

Edit - I mean, the core of using wget (or python urllib) and a parser (beautiful soup) is fine, but getting something that performs well while handling sites with phenomenal amounts of duplication and a fairly high turnover is what's interesting. Think it's sorted, now, though.

Re: How many of you are working right now?

Posted: Tue Apr 19, 2011 3:42 pm
by Zöo Pop
wub wrote:
ashley wrote: windows+d to minimise everything, or windows+L for emergency lock the computer

Well I never knew that :D



[Still finding Alt + Tab is a more natural position for my hand to rest at though]
Good know know. But Alt + Tab seems more natural for me too.

Just got a job at office, I use to work at a garbage dump so huge change.

Re: How many of you are working right now?

Posted: Tue Apr 19, 2011 4:00 pm
by ashley
slothrop wrote:
ashley wrote:
slothrop wrote:Trying to figure out a non sucky way of making a scalable web-crawler that works with grim dynamically generated sites. Which basically means flipping between two designs (neither of which would work) for five minutes and then going back to the internet.
Eh?

wget has a spider tool, you can use this to scrape links on websites and then pump those into another process that indexes the content in a database for example? Store the URL's in a database table with a unique index on the URL itself, filtering out any duplicates?
Nah, it has to deal in a reasonably sensible way with sites that generate content dynamically, have silly numbers of internal links, and might (for instance) keep giving you the same (or similar) content at an arbitrary number of different urls but also needs to start losing stuff that no longer exists again in a reasonably timely fashion, preferably without having to completely re-crawl the site. AIUI, google analytics believes that one of the sites we're talking about has about six billion individual links...

Edit - I mean, the core of using wget (or python urllib) and a parser (beautiful soup) is fine, but getting something that performs well while handling sites with phenomenal amounts of duplication and a fairly high turnover is what's interesting. Think it's sorted, now, though.
Everyone knows the solution is more hardware to support inefficient code :lol:

Especially as now Google are opening up cores to boffins...

http://www.theregister.co.uk/2011/04/15 ... _donation/

Re: How many of you are working right now?

Posted: Tue Apr 19, 2011 9:03 pm
by Shum
So tempted to pick a cave, but no I usually work at home even though I'm not home at the moment.

Re: How many of you are working right now?

Posted: Tue Apr 19, 2011 9:24 pm
by pkay
At work... mad surfin

Re: How many of you are working right now?

Posted: Wed Apr 20, 2011 12:37 am
by jameshk
Brett get me a job working with you ;)

Re: How many of you are working right now?

Posted: Wed Apr 20, 2011 1:14 am
by Molzie
I work from home and often browse on the shitter so...

<HOME/WORK/TOILET>

Re: How many of you are working right now?

Posted: Thu Apr 21, 2011 3:17 pm
by murky21
completely given up on everything at work now, it's too sunny and too close to 4 days off...

Boredom has reached next levels as in the last hour I have done things such as look at all of the jokes comments on Skrillex's FB fan page, started a skream photoshop job and ran a google image search on the word 'tits'

Re: How many of you are working right now?

Posted: Thu Apr 21, 2011 3:34 pm
by ashley
When I get bored at work I go and twitter on the toilet

Re: How many of you are working right now?

Posted: Thu Apr 21, 2011 4:15 pm
by dubmatters
In my second year at uni, but work full time in a pub. I can't actually belive I would prefer to work in an office again.

Re: How many of you are working right now?

Posted: Thu Apr 21, 2011 4:21 pm
by clifford_-
id rather spend the day in an office, nice and clean, infront of a computer, than spend all day grafting and getting horribly filthy on a building site!


*the grass is always greener...

Re: How many of you are working right now?

Posted: Fri Apr 22, 2011 8:20 am
by Atac
I work at 3 different self-serve frozen yogurt shops.
This is LA and ice cream isn't cool anymore.

But basically since it's self serve I just hang out on my laptop until the customer's ready.
Pretty simple job considering I'm still in school. Gets boring as fuck after a while though. I sneak in my MPK Mini some days to work on tunes. Its scary though because I think my boss would shit himself if he saw that.


*EDIT*
I work alone and my tnuc boss pops in and out whenever he feels like ruining my day.

Re: How many of you are working right now?

Posted: Fri Dec 23, 2011 10:57 am
by murky21
This one's out to all the fam inside the ride at work right now, and anyone who isnt going to do a single work related process, not even a single email :Q:

Re: How many of you are working right now?

Posted: Fri Dec 23, 2011 10:59 am
by wub
I am at the office, however working is a loose term. TBH the only reason I came in today is that I'm going to my parents this evening and it's on the way. I'm letting my team go at half 12 if it doesn't pick up.

Re: How many of you are working right now?

Posted: Fri Dec 23, 2011 11:02 am
by nousd
I'd be working if I had a job.

Re: How many of you are working right now?

Posted: Fri Dec 23, 2011 11:03 am
by gwa
3 finish IF the payments team get shit done in time :(

Re: How many of you are working right now?

Posted: Fri Dec 23, 2011 11:03 am
by Riddles
im at work, but no ones doing much really, office has got a half day so im here for another hour and then home :D

Re: How many of you are working right now?

Posted: Fri Dec 23, 2011 11:14 am
by autobot
Eating cake, Internal meeting with a bottle of wine soon

Re: How many of you are working right now?

Posted: Fri Dec 23, 2011 11:14 am
by murky21
autobot wrote:Internal meating with a bottle of wine soon
Fixed