Website Logfile Analyss: The LJ Image Leeches

Grruwlf!  I'm Snoop-dog,¹ your friendly neighborhood troll.  This post is the first in what may become an occasional series on the topic of website logfile analysis.  No, wait!  It isn't *that* boring!  In fact, it's far more interesting than watching grass grow, waiting for a pot of water to boil, etc.
 
¹Yes, it seems I've acquired a new nickname.  Begone, PieSplatz!  Thou art *so* 2004!

My logfile contains 6,393 entries for the period 14-Sep-04 through 31-Dec-04.  Eliminating duplicative entries reduces the total to about 4,206 accesses.  I got hits from 1,237 different IP addresses, which I've grouped into perhaps 738 distinct people visiting my website.  I've managed to find names for only 49 of these people, but those named visitors were responsible for 44.6% of all the accesses.  Interesting search-queries that have brought people to my site include "pheromone cockroach starve male", "eating deer organ meats", "dog humping cookie monster", "children having sex pedo crimes frequency......  *snore zzzzzzzzzzzzzzz*

Huh?  Oh, sorry.  Must have dozed off!  Today's topic is the "LJ image leeches".  These are induhviduals who like to look at just the pictures embedded in public LiveJournal posts, while ignoring the surrounding text.  During Q4 of 2004 I made four public-posts-with-pictures and captured website hits from 112 of these, um, "people" I suppose you could call them.  Or they could be called "bandwidth-sucking sex-starved pimply-faced adult-wannabe waste products", but that would be unfair to the ones without acne.  So here is what I have learned about them:

LJ image leeches

LiveJournal's homepage offers a link called Latest Posts, which is a live feed of what seems to be the 200 public journal entries that were posted within the last minute.  Apparently this is supposed to entice you to become a member (and perhaps someday join [info]paidmembers) by showing you that ordinary people like yourself are posting the same kind of boring drivel that you would write, so why not join the club?  I did, and look what LJ has done for me!  Or something.

There were 12 people who happened to look upon my entries via this "Latest Posts" link.  They aren't really leeches, my logfile-analyser program just draws them that way.  8 of these people were shown an image that looked like
Pyesetz thinks that you are:
     An LJ image leech!
 of:
Your town name
If that was you and you were offended, I'm sorry.  But in all likelihood, you didn't read more than 10 or 20 of the 200 posts that LJ showed you, so you probably never even *saw* this image buried in post #162, even though Furtopia's CPU had worked feverishly to produce it for you, while spending [info]whiteshephard's electricity that I don't even have to pay for!  So, um, creating this image for you was sort of a waste of money, in a way.

LJ offers RSS feeds for journals.  They demonstrate this feature by offering an RSS feed for "Latest Posts" (see the button on LJ's homepage).  Of course, this feed generates 200 posts a minute, so you can't practically add it to your RSS aggregator.  What can you do with it?  Why, lots of stuff!  Like, um, ...  Oh!  Oh!  I know!  You can write a program to filter out the text and just yoink the images from the XML stream.²  Kewl!  That would be, like, *so* original!!!

If you don't want images in your journal to be included in this feed, just enter the undocumented command set latest_optout yes at the admin console.

Websites that cater to leeches

Based on the referer-URLs I see, it seems that 10 or more induhviduals have written programs to yoink images from the "Latest Posts" RSS feed.  More than three dozen websites offer copies of these programs for public use.  Here are the sites that at least three people used when leeching my pictures:
43http://www.soosed.com/lj/
12http://cubed.nu/lj.php
10http://www.fuzzysquid.com/LJ.php
7http://www.csr-networks.com/lj.php
5http://www.hotelhell.com/lj.php
5http://gutterslide.com/lj/
4http://object.qpalzm.com/fun/livejournal-images/livejournal-images.php
4http://krues8dr.com/images/livejournal/
3http://www.portalofevil.com/lj.php
3http://ga2so.com/lj.php

As you can tell from some of the site names, LJ image-leeching appeals to a rather low class of people.  When I click on one of the above links and scan through the photos, I tend to look for things that aren't G-rated.  I tried one of these programs just now and (I kid you not) one of the images it threw at me was a photo of two women going at it using what seemed to be a banana.  Unfortunately that photo was yanked so I can't link to it, but here's some bare breasts (NSFW).

So naturally I assume that everyone approaches the image-leech feeds the same way I do.  Once a picture of a nekkid woman shows up, anything less smutty gets skimmed over as uninteresting, which means the $$$ bandwidth used for sending you that picture was wasted.  Bandwidth is not free, although it might seem free if you have a prix fixe Internet access plan.  Your ISP is paying by the kilobyte for those transfers, as is the website that's serving the images you didn't bother looking at.  Wasted kilobytes mean higher costs that get passed on as higher prices for everyone.³

Another set of hits that I have classified under leeches comes from webcollage.  At first I thought the collage was a slightly higher art form than the general leech, but still there's plenty of nudity and that tends to outshine the rest of the images in the collage.  For example, more bare breasts (NSFW unless you're a doctor).  Webcollage's author, Jamie Zawinski, is vague in his website text about the sources for his images, but obviously the LJ "Latest Posts" feed is one of his sources because the hits I see from webcollage come in during those same few minutes after posting when the other leeches show up.  The webcollage program lies about its referer—it claims it's actually reading the target journal posts—but at least it identifies itself in the browser-ID field.

Bastard leeches

Five leeches used bastardized web browsers that refused to send referer-URL's.  I inferred that they were leeches because they showed up immediately after I posted a journal entry and they looked only at the images in the new entry.  Geez, what a bunch of schnorrers!  Maybe you can't trust me 'cause I'm over 30, but I think sending the referer-URL is the least you can do, considering that the hosting site is paying for the computer to store these images on, and also for that precious bandwidth you're wasting when you don't even *look* at the picture you requested—and all they ask is that you tell them how you heard that they were offering an image for free download.

One guy(?) got called "a leech" to his face by two of my programs, but that wasn't immediately after a new posting and he was trying to look at all the images on my main LJ page. He used a browser that had been hacked to send the same URL for both destination and referer.  That's cheating!  I can understand refusing to send a referer because you're paranoid about online privacy.  Many websites will let you have free downloads even if you don't agree to tell them who sent you, although some don't.  But sending a string that seems to be a referer, in order to fool the stupid computer program into letting you have something its owner would not want you to have if he saw what you were doing, is just plain theft-by-deception.  I wonder if that juvenile bastard is still stealing candy from dollar stores.  Perhaps someday he'll fail to pull off a major art heist and land his sorry ass in jail.

Interesting leeches

One of the things I like to do with my logs is to extract "vignettes", little stories about people who seem to have feelings, as expressed solely in the sequence of my files that they chose to look at.  There's very few stories to tell among the leeches.  For the vast majority of them it's the same story over and over: wham, bam, (silence).  Because of the snapshot nature of leeching, most of them don't remember me in the morning and they almost never come back for another helping of my doggy goodness.  Here are two that did come back, but they probably still don't remember me:
In this extract from one of my analysis programs, each entry begins with ¢ to indicate that the person is a leech (because they're little "resour¢e-wasters").  Then their geographic locus is shown, since it's the closest thing to a name for them.  The next part shows the IP address they used and the ISP to which that address is assigned.  Then the dates and times and files that they accessed (,20041006-sunny.jpg is one of my sukkah pictures).  The arrow is followed by an interpretation for the referer-URL ("¢LJ-leech" is my code for all the websites mentioned above).  This is really terribly boring, isn't it?

Okay, finally we get to the good guys!  These people actually cared about my random images enough to click on them:
What the first guy saw, besides the nekkid girls, was an image that told him I thought he was a leech living in The Hague, so he looked at the cookie contest announcement, then went back to his nekkid girls.  The second guy was told he's a leech of Cleveland, so he looked at the "Now Five Winners!" post.  This still didn't satisfy his curiosity, so then he looked at my main journal page.

*Zzz zzz zzz zzz zzz*  All right, I'm done talking about leeches.  There's no way any other website-analysis topic could possibly be as boring as this one.

-- Pyesetz the Dog

Comments on this essay at LiveJournal
Next installment ("The Search-engine Users")
Main page for Logfile Analysis '04

Addenda

[info]krues8dr (the owner of the like-named website mentioned above) writes as follows:
²I did want to mention to you that the script that's used by myself and many of the others you noted actually are employing the Latest Image feed, not the Latest Post feed, so there's no spidering of the content for images, LJ's already done that.  And since that feed doesn't have any other content, I can only assume that it exists only for the purpose of showing those images.  Something to consider, at the least.

³As an individual who manages my own hosting, and is employed in the hosting business, I totally understand your concerns about bandwidth stealing.  Of course, as you've noted, there are plenty of ways to configure your server to prevent many of the basic types of stealing, but […] it's become almost impossible to stop people from getting images directly.  Luckily, bandwidth costs are dropping rapidly, so it's more a question of server load than expense these days.