In the Ghetto

I was recently searching for an MP3 downloadable version of Elvis Presley’s “In the Ghetto,” having heard an excerpt in this episode of This American Life. I came across this version, which is quite likely the strangest cover of an Elvis song I’ve ever heard. I hope it’s not a sign that I’m out of tune with popular culture because I’ve never heard of dictionaraoke before.

Bad MSN

The MSNBot recently attempted to request a page on a website I run that is not publicly linked. In fact, no page from the domain name in question is publicly linked. The robots.txt file excludes all robots. My .htaccess file also blocks HTTP requests from known search engine robot IP address ranges (including MSNBot’s). Moreover, the page requested wasn’t the top level page (i.e., http://domain.com/), but some page buried therein (i.e., http://domain.com/some_dir/some_page.html).

The only request I have in my server logs from MSNBot is for this buried page—MSNBot (at least, identified as MSNBot) never requested any of the pages that would be necessary to find this buried page. The universe of people who have access the website at this domain is very small; I can in fact identify every single IP address in the server log as someone I know.

I can only conceive of two hypotheses for this, both of them would be a bad sign for MSNBot:

  • The URL did appear in some emails to hotmail.com addresses; is it conceivable that MSN actually pulls out URLs from emails for spidering? Seems quite unlikely.
  • MSNBot visited the domain disguised, both by IP address and user agent, as someone else, to find the URL in question. I would hope MSNBot wouldn’t engage in such a poor practice, but maybe they do it to detect cloaking or similar manipulative practices.

I don’t mean to be a conspiracy theorist, but can anyone conceive of any other way the MSNBot could have even found out about the URL in question?

Ads in RSS

Google is beta-testing AdSense for RSS feeds. I hope this catches on, and encourages more content providers to put the full text of their entries in their RSS feeds, rather than just initial snippets, which makes the feed nearly worthless for offline reading. I’ve complained about snippety RSS before, as have many others. If the only obstacle to including full text in feeds is fear of lack of revenue, this should fix that. (Presumably the fear is not bandwidth-related — 1 or 2 kilobytes versus a fraction of a kilobyte per entry shouldn’t be an issue for anyone anymore, if it ever was).

Why Is Verizon Not Able To Deal?

I keep intending to write my omnibus Verizon gripe entry, but small Verizon gripes keep getting in the way.

The latest: I’ve had crippled DSL service for over a week now. At best, I’m getting 80 KBps down and half of that up, while I’m supposed to be getting 300-400 KBps down and 80-90 KBps up (still nothing to write home about). When I called over a week ago, they said they were in the process of fixing it and usually these outages were just several hours but this one might be a day or two.

The problem is actually at Verizon’s “trunk.” When I do a speed test on the Verizon site, it’s fine—in other words, my Internet connection to Verizon is full speed. Somewhere after the packets reach Verizon’s routing area it slows to a crawl. It would seem like this would be much easier to fix than a problem closer to the edge, but so far no luck.

The Verizon support person I just spoke with said they only just became aware of it because of the phone calls coming in. They had one team working on it, but now that they realize it’s a “big” problem, they have three teams working on it.

Do they really rely on subscribers to call in to find out their network is down? Don’t they monitor this kind of thing? Why is Verizon not able to deal?

Okay, gripe out.

Patry Copyright Blog

Noted IP practitioner and scholar William Patry started a blog. One thing I love about the blog so far is that he doesn’t write with the careful restraint of many active practitioners. He’s quite willing to talk about how badly courts bungled the law—which I suppose is justified, since he drafted some of the legislation at issue when he was the copyright counsel to the House of Representatives.

My favorite recent bit is from this commentary on the Bootleg Statute, which many—including some courts—have criticized as being unconstitutional under the Copyright Clause:

In 1994, I had been practicing copyright law for 13 years. I was well aware of the limited Times restriction. Everyone involved was aware of it. Do critics think that in making the bootleg right perpetual we meant to legislate under the Copyright Clause but just had a memory lapse, or that we said, “Hell, let’s draft an unconsitutional provision; why not, its bound to be fun?” The answer is, no, we didn’t draft a copyright or copyright-like provision at all. We drafted a sui generis right under the Commerce Clause. (For those who are wondering, Congress is not in the habit of saying in a statute, “hey this is the power we are legislating under.” See also Woods v. Taylor, 333 U.S. 138 (1948)).

Craigslist into Outer Space

Maybe I missed this the first time around, but I just noticed that craigslist is providing an opportunity to have free postings sent into outer space·.

From the FAQ:

Q: Is this a hoax.
A: No.

I also noticed that craigslist is supporting the Spread Firefox campaign by posting links to the campaign on just about every page.

Go craiglist!

Clever Referer Spam

Update (2/26/06): Someone associated with the ‘nipple huggers’ site has written to complain about my accusations here. She also has left a couple of comments below. Just to be clear, there is no evidence that the site sends email spam, uses obtrusive popups, or installs spyware/adware, etc., on your computer. It appears simply that someone has attempted to optimize their position in search results by generating HTTP requests to other popular sites with their domain name in the referer field.


I used to have a big problem with “referer spam.” What is referer spam? My weblog lists “inbound links” on the right column so visitors can see who else has linked here. Since many weblogs provide a similar list, spammers began to create “spurious” inbound links so their URLs would appear in the right column of many weblogs, thus boosting their Google PageRank·. Usually, if you went back to the site that ostensibly linked to my weblog, it would be a porn or gambling site with no true links to my weblog.

This was easy enough to fix: I wrote a handmade filter that regularly checks all the putative inbound links and verifies that they do, in fact, link to my site.

Just today, I found my first instance of a spammer adaptation: the inbound link came from a site selling “nipple huggers” — some sort of jewelry that I don’t quite understand. I was curious how the site escaped my “referer check” script, so I checked it out. It turns out the “nipple hugger” site does link to my blog, with the link text “PopUp Scam – Click X to Close.” The linked page on my site has nothing to do with popup scams, but it is an interesting workaround to my filter. Rather than generating fake/spurious links, apparently real visitors to the “nipple hugger” site click on the link to my blog, and generate “real” referer links. Just today, I received inbound links from ten different hosts from the “nipple hugger” page.

I can’t think of any clever way to automatically filter these sorts of inbound links, because they really don’t look any different from genuine inbound links. At this point, I’m just inserting a keyword filter for known bad referers (just the “nipple hugger” at this point). Suggestions for more clever ways to escalate this arms race are welcome.

(I really hope my site doesn’t become a top search result for “nipple hugger” now. If it does, please, look elsewhere, I don’t even know what they are!)

Future of Legal Blogging and Snippety RSS

Interesting evolving blog/article on The Future of Legal Blogging on Between Lawyers, a legal weblog I strongly recommend to lawyers and law students but for one problem: they only include a snippet of each blog entry in their RSS feed. In fact, all of the Corante-hosted weblogs seem to do this. I understand that they have commercial sponsors and would like you to visit the website so your eyeballs can be exposed to those sponsors’ names, but it seems like such a backwards way to do it. I usually read blogs offline through my aggregator (for example, on the train), and not being able to read the full entry means I often won’t see it at all.

slashdot seems to have recently solved this problem by including an occasional advertisement in the RSS feed itself. Why hasn’t anyone else figured this out?

Update: I stand corrected. Strangely, http://www.corante.com/betweenlawyers/index.xml gives the full text in the feed; http://www.corante.com/betweenlawyers/index.rdf does not. I was automatically subscribed to the .rdf version by my newsreader. I wonder if this is intentional.

Scary Cat

Update 6/25/06: This page has strangely become quite popular. It wasn’t supposed to be a serious attempt at anything. It was an initial experiment to see if I could do video workflow — from camcorder to linux laptop to the web.

Update: Oops! Apparently the syndicated version of this post had the wrong movie URLs. Fixed now—please try again.

I’ve been learning Linux Video in anticipation of a digital camcorder that I’ve ordered—a Panasonic NV-GS400. My understanding is every new parent needs a digital camcorder or will never be able to remember their child’s childhood. So here I go.

Anyway, here’s my first feature-length (well, one minute) film, entitled Scary Cat. It has a fair amount of dramatic tension, so you may want to get a cup of chamomille tea first. It is available in three versions:

The only license restriction is attribution—see the “by” Creative Commons license.

Created using Kino, which deserves to be boosted at least two positions in its Google placement.

I’d welcome feedback—especially let me know if you can’t play the video. Be gentle, though, it’s my first attempt.