Lazyweb Request: Profiling Timer Expired?

Dear Lazyweb:

I have a bash script with a while loop that takes a long time to process. It restores file modification times for complicated reasons not worth discussing here. Removing some nonessential stuff, I have the following code (I know it could be rewritten to be elegant, or at least collapsed into a single line):

``
cat ./preserve_file_mod_times | while read x
do
  filename=`echo $x|sed 's/|.*//g'`
  lastmod=`echo $x|sed 's/^.*|//g'`
  touch -t "$lastmod" "$filename"
done
``

As the input file has gotten longer, the loop now frequently fails:

376 Profiling timer expired touch -t "$lastmod" "$filename"

I’ve googled this error and understand what it is (i.e. SIGPROF) but not how to fix or workaround it. Any hints?

Thanks!

Windows curly quotes, accented characters on Linux Samba Shares and Cygwin XTerm: How to get Windows-1252 (AKA CP1252) from Linux

Before I forget: I have a bunch of files I mirror between Windows/NTFS and Linux/ext4 filesystems that include not only accented characters but curly quotes in the filenames. (I know: the easiest solution would be to just get rid of the extended characters). The curly quotes were created in Windows, so don’t render properly in standard Linux character sets (UTF-8, iso8859-1, iso8859-15, etc.).

This all came up because iTunes under Windows couldn’t find curly-quote files when it was reading from the exported Samba share filesystem rather than an attached NTFS drive. The files showed up as missing because they had different filenames.

The solution was not easily google-able, so for the record, in brief, add this to the [Global] section of /etc/samba/smb.conf:

unix charset = cp1252
display charset = cp1252

And reload Samba.

Also, to make the characters render properly from a terminal on the Linux box, first create the relevant character set:

sudo localedef -f CP1252 -i en_US en_US.CP1252

Now you can use this charset on your Linux box, and, like magic, the curly characters will be back:

export LC_ALL='en_US.cp1252'

Free Tip: How to resize scanned PDFs with ghostscript for Adobe Acrobat OCR

I’m unaware of any free tool to perform OCR on a PDF and embed the resulting data in the PDF itself so it is text-searchable. If anyone knows of one, let me know. In the meantime, I use Acrobat Professional for this essential functionality.

High resolution PDFs produced by my scanner (HP Officejet Pro L7700) usually give the following error when I try to perform Acrobat OCR:

This page is larger than the maximum page size of 45 inches by 45 inches.

Surprisingly, there doesn’t seem to be any way to resize the page size of a PDF within Acrobat. It’s possible to print to a new PDF of the correct size, but this operation cannot easily be batched. If I apply the “crop” tool to resize the page in Acrobat, I get this error:

Page size may not be reduced.

Many report these issues in Adobe’s forums. The most common responses suggest reconfiguring the scanner or buying a new one.

I found nothing quick and easy after some googling for a simple ghostscript recipe to perform the batch pre-processing necessary to allow Acrobat to do the OCR. It’s not hard to do, just a bit of a trial-and-error pain to get the right switches.

For posterity, then, here is a simple command-line to make this happen (here under Windows, but could obviously easily be adapted for any other platform). First, download the latest ghostscript for your platform (at this time, 8.64 for Windows). Then:

gswin32c -dQUIET -dNOPAUSE -dBATCH -sPAPERSIZE=letter -sDEVICE=pdfwrite -sOutputFile=OUTPUT.pdf -dPDFFitPage INPUT.pdf

And a simple inelegant script to batch process (again, under Windows/cygwin, but easily adaptable). Feel free to make more elegant:

#!/bin/bash
for x in "$@"
do
echo -n Processing $x ...
if [ ! -e "$x" ]
then
echo File $x missing. Exiting.
exit 1
fi
if [ -e gs_shrink_to_letter.pdf ]
then
echo Tempfile gs_shrink_to_letter.pdf exists. Exiting.
exit 1
fi
if ( gswin32c -dQUIET -dNOPAUSE -dBATCH -sPAPERSIZE=letter -sDEVICE=pdfwrite -sOutputFile=gs_shrink_to_letter.pdf -dPDFFitPage "$x" )
then
echo Success.
mv gs_shrink_to_letter.pdf "$x"
else
echo Error occurred, exiting.
exit $?
fi
done

 
After converting your PDFs as above, you can then apply Acrobat batch OCR without a hitch.

iptables router failure after Debian Lenny upgrade solved by setting MTU

I recently upgraded my home router box to Debian Lenny. Everything went fairly smoothly, with a few exceptions. My NFS mounts no longer worked because apparently wildcards are no longer allowed in IP addresses in /etc/exports; the export addresses needed to be translated to subnet format (e.g., 192.168.98.* becomes 192.168.98.0/824).

But after a power failure last night, the router box rebooted and I was no longer able to access the Internet from any clients on my LAN. Strangely, I could ping or traceroute external hosts and perform DNS lookups, but web surfing and ssh timed out after an initial handshake. I noticed by telnetting to port 80 of an external host, I got an error back from an invalid HTTP request (e.g. “oeunthioues”), but if I sent a standard valid request (GET /index.html HTTP/1.0), the connection just hung with no response.

I won’t recount all the false leads I had in diagnosing this problem. It turned out that the Internet-facing NIC on my router box had been reset to a low MTU. By setting the MTU on the LAN clients to that low number, or raising the MTU on the Internet-facing NIC back to 1500, the problem was solved:

# ifconfig eth2 mtu 1500

After restarting networking on the router box, the MTU was again set back down to 576, which is apparently the default MTU for an X.25 network. I have no idea why the interface is getting that value by default (where it wasn’t before), so I just added a hack to /etc/network/interfaces to fix it:

iface eth2 inet dhcp
  post-up /sbin/ifconfig eth2 mtu 1500

Interestingly, pre-up didn’t work.

Hopefully I’ve included enough relevant terms in this entry that others with this problem will find it. It was hard to diagnose because no errors appeared in any log file, and I had partial but not complete connectivity from internal clients to the Internet. My first guess was that it was due to the iptables upgrade, but in fact it was entirely unrelated.

[tags]Debian, Lenny, iptables, firewall, router, MTU[/tags]

A Much Simpler Fix for the r8169 “Link-Down” Problem

There is a widespread problem with the Linux driver for the Realtek 8168/8169 cards where the modules load properly and the card is visible but no link is detected. E.g.:

Jun 21 18:28:41 localhost kernel: r8169: eth0: link down
Jun 21 18:28:41 localhost kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready

There are lots of details and suggested solutions from the Ubuntu people. None of the suggestions worked for me, however. Several of them suggest configuring the card under Windows, but the box containing this device is a single-boot linux fileserver. The “wake-on-LAN” functionality seems to be implicated, but not in a way I can see how to fix.

After much head-banging (in the bad sense), I found a simple solution:

(1) Install ethtool
(2) Modify /etc/network/interfaces as follows (substitute your r8169 interface for ‘eth1’ and other settings accordingly):

iface eth1 inet static
pre-up /usr/sbin/ethtool -s eth1 autoneg off
address 192.168.98.1
netmask 255.255.255.0

This did the trick for me where no other solution would work. Of course, link autodetection no longer occurs, but that’s a small price to pay for connectivity.

This is a Debian etch installation using a slightly more recent kernel (2.6.25-2-686).

As an interesting side note, on this new box, the interface appears as eth0 in the kernel logs, but is actually mapped as eth1. Similarly, a second Ethernet interface appears in the log as a different device number than that by which it is referenced. Any ideas why?

Update 6/22/08: Still not getting 1000BaseT (Gigabit), however. If I force 1000BaseT with ethtool -s eth1 speed 1000, the link goes down again (even with autoneg off). The same card in another box, however, detects the link and goes to 1000BaseT automatically. So I’m stuck at 100BaseT.

Update 6/24/08: Linux 2.6.26-rc5 fixes the problem 100% for me.

[Tags]realtek, r8169, linux, drivers, hacking[/Tags]

Windows Tool of the Century

Okay, maybe “century” is overblown, but the True X-Mouse Gizmo for Windows is the best thing since sliced bread (if you use Windows):

Have you ever paid attention to striking difference in the thickness of forefingers in X11/Unix and MS Windows users, respectively? The latter have much more muscular forefingers that often suffer from chronic aches in their joints. They also much more often develop mouse arm, pain in the neck and shoulders, and other troubles known as Repetitive Stress Syndrome and associated with excessive usage of a pointing device. Why?

The Gizmo even solves the classic X copy/paste buffer problem where you want to select text to paste the contents of your copy buffer over it, only to replace your copy buffer with the newly selected text. Linux could use such a solution, as well!

Linux Installation Video

Speaking of funny embeddable videos, here’s another one that will appeal to at least some segment of my readers. I don’t know if this one has already made the rounds and I’m late for the memetrain. If so, I blame the fact that I’m getting on in years.

And here, via Tikirobot, is a not-funny video about Dasher, an amazing information-efficient typing system. Again, I suspect I’m late to the party on this one:

Finding Hardware That Doesn’t Suck

I’m sure I’m not the first one to have this complaint, but I really wish there was a list somewhere of the Best (x) that Just Works under GNU/Linux, where (x) is a device or card. Right now, I just want a good analog video capture card (for editing and converting VHS home videos to DVD). I keep coming across webpages that say, “I’ve gotten this to work under kernel 2.2…”

I’d just like for every hardware buying experience to not be a four or five hour expedition. Just tell me what I want, and I’ll find just choose the lowest price from pricewatch and call it a day.

And, hey, while I’m complaining: does anyone have any idea why tcextract locks up vim for several seconds at a time even when run at nice 19?

Update: I should clarify if anyone actually wants to answer my video capture question that it is for a laptop, so it either needs to be a USB2, Firewire, or PCMCIA device.

Blinkflash

Free software hack discovery of the day: Blinkflash, the unofficial winkflash commandline client.

Competition in the web-based photo printing business is heating up, and Winkflash is the best priced I’ve found so far. With an introductory coupon code, 4×6 prints are only 6 cents each; and normally they are 12 cents each, with $0.99 flat rate shipping. We just made our first order, so we’ll see how the quality is, but these days most of these services seem to provide comparable results.

The main problem is that the two bulk upload systems winkflash provides—a Java applet and an Internet Explorer “drag and drop” control—don’t work under GNU/Linux. So you’re stuck uploading photos one by one with a web form.

Enter Blinkflash—now you can upload your photos right from the command line, with Unix-ish efficiency. Blinkflash just submits the photos to the web form upload system, but it saves an awful lot of time.

Hopefully Winkflash doesn’t mind this program—it can only generate more revenue for them. I suppose they might have trademark concerns, but I don’t think that is fatal.

I think I’ll package it for Debian and make a few tweaks. For one thing, it only works with the UK version of Winkflash, but that can be fixed with an extra command line switch. Also, you have to enter your username and password on the command line—there should be a way to store that in a .rc file. But it’s in fairly legible python (isn’t all python code legible?), so I think I should be able to take care of these things quickly.

Novell Public Service Announcement

Novell Public Service Announcement. Cute, but requires Flash. Why not just make it a downloadable movie file?