Lazyweb Request: Profiling Timer Expired?

Dear Lazyweb:

I have a bash script with a while loop that takes a long time to process. It restores file modification times for complicated reasons not worth discussing here. Removing some nonessential stuff, I have the following code (I know it could be rewritten to be elegant, or at least collapsed into a single line):

cat ./preserve_file_mod_times | while read x
  filename=`echo $x|sed 's/|.*//g'`
  lastmod=`echo $x|sed 's/^.*|//g'`
  touch -t "$lastmod" "$filename"

As the input file has gotten longer, the loop now frequently fails:

376 Profiling timer expired touch -t "$lastmod" "$filename"

I’ve googled this error and understand what it is (i.e. SIGPROF) but not how to fix or workaround it. Any hints?


Lazyweb Search Request: Easy Content Management

Dear Lazyweb:

Can you suggest open source content management software that meets these criteria:

  • Cross-platform
  • Very easy to install and configure (from the admin side) and use (from the user side) — I’m thinking as easy from both sides as vqwiki
  • Drag-and-drop to upload content — ideally, the user could drag a DOC file from desktop into a widget in the browser to upload
  • Quick searching and indexing, at least of common file-types (including DOC and PDF)
  • Ability to set up arbitrary metadata elements and values that can assist as filters to searching

Generally, content will be located by search, rather than via any particular folder hierarchy.

Does it exist? There seem to be a lot of open source CMS options, but at least at first glance they may be overkill with a significant learning curve at least on the admin side.

Windows curly quotes, accented characters on Linux Samba Shares and Cygwin XTerm: How to get Windows-1252 (AKA CP1252) from Linux

Before I forget: I have a bunch of files I mirror between Windows/NTFS and Linux/ext4 filesystems that include not only accented characters but curly quotes in the filenames. (I know: the easiest solution would be to just get rid of the extended characters). The curly quotes were created in Windows, so don’t render properly in standard Linux character sets (UTF-8, iso8859-1, iso8859-15, etc.).

This all came up because iTunes under Windows couldn’t find curly-quote files when it was reading from the exported Samba share filesystem rather than an attached NTFS drive. The files showed up as missing because they had different filenames.

The solution was not easily google-able, so for the record, in brief, add this to the [Global] section of /etc/samba/smb.conf:

unix charset = cp1252
display charset = cp1252

And reload Samba.

Also, to make the characters render properly from a terminal on the Linux box, first create the relevant character set:

sudo localedef -f CP1252 -i en_US en_US.CP1252

Now you can use this charset on your Linux box, and, like magic, the curly characters will be back:

export LC_ALL='en_US.cp1252'

Free Tip: How to resize scanned PDFs with ghostscript for Adobe Acrobat OCR

I’m unaware of any free tool to perform OCR on a PDF and embed the resulting data in the PDF itself so it is text-searchable. If anyone knows of one, let me know. In the meantime, I use Acrobat Professional for this essential functionality.

High resolution PDFs produced by my scanner (HP Officejet Pro L7700) usually give the following error when I try to perform Acrobat OCR:

This page is larger than the maximum page size of 45 inches by 45 inches.

Surprisingly, there doesn’t seem to be any way to resize the page size of a PDF within Acrobat. It’s possible to print to a new PDF of the correct size, but this operation cannot easily be batched. If I apply the “crop” tool to resize the page in Acrobat, I get this error:

Page size may not be reduced.

Many report these issues in Adobe’s forums. The most common responses suggest reconfiguring the scanner or buying a new one.

I found nothing quick and easy after some googling for a simple ghostscript recipe to perform the batch pre-processing necessary to allow Acrobat to do the OCR. It’s not hard to do, just a bit of a trial-and-error pain to get the right switches.

For posterity, then, here is a simple command-line to make this happen (here under Windows, but could obviously easily be adapted for any other platform). First, download the latest ghostscript for your platform (at this time, 8.64 for Windows). Then:

gswin32c -dQUIET -dNOPAUSE -dBATCH -sPAPERSIZE=letter -sDEVICE=pdfwrite -sOutputFile=OUTPUT.pdf -dPDFFitPage INPUT.pdf

And a simple inelegant script to batch process (again, under Windows/cygwin, but easily adaptable). Feel free to make more elegant:

for x in "$@"
echo -n Processing $x ...
if [ ! -e "$x" ]
echo File $x missing. Exiting.
exit 1
if [ -e gs_shrink_to_letter.pdf ]
echo Tempfile gs_shrink_to_letter.pdf exists. Exiting.
exit 1
if ( gswin32c -dQUIET -dNOPAUSE -dBATCH -sPAPERSIZE=letter -sDEVICE=pdfwrite -sOutputFile=gs_shrink_to_letter.pdf -dPDFFitPage "$x" )
echo Success.
mv gs_shrink_to_letter.pdf "$x"
echo Error occurred, exiting.
exit $?

After converting your PDFs as above, you can then apply Acrobat batch OCR without a hitch.

iptables router failure after Debian Lenny upgrade solved by setting MTU

I recently upgraded my home router box to Debian Lenny. Everything went fairly smoothly, with a few exceptions. My NFS mounts no longer worked because apparently wildcards are no longer allowed in IP addresses in /etc/exports; the export addresses needed to be translated to subnet format (e.g., 192.168.98.* becomes

But after a power failure last night, the router box rebooted and I was no longer able to access the Internet from any clients on my LAN. Strangely, I could ping or traceroute external hosts and perform DNS lookups, but web surfing and ssh timed out after an initial handshake. I noticed by telnetting to port 80 of an external host, I got an error back from an invalid HTTP request (e.g. “oeunthioues”), but if I sent a standard valid request (GET /index.html HTTP/1.0), the connection just hung with no response.

I won’t recount all the false leads I had in diagnosing this problem. It turned out that the Internet-facing NIC on my router box had been reset to a low MTU. By setting the MTU on the LAN clients to that low number, or raising the MTU on the Internet-facing NIC back to 1500, the problem was solved:

# ifconfig eth2 mtu 1500

After restarting networking on the router box, the MTU was again set back down to 576, which is apparently the default MTU for an X.25 network. I have no idea why the interface is getting that value by default (where it wasn’t before), so I just added a hack to /etc/network/interfaces to fix it:

iface eth2 inet dhcp
  post-up /sbin/ifconfig eth2 mtu 1500

Interestingly, pre-up didn’t work.

Hopefully I’ve included enough relevant terms in this entry that others with this problem will find it. It was hard to diagnose because no errors appeared in any log file, and I had partial but not complete connectivity from internal clients to the Internet. My first guess was that it was due to the iptables upgrade, but in fact it was entirely unrelated.

[tags]Debian, Lenny, iptables, firewall, router, MTU[/tags]

WordPress “Pages” No Longer Work

All the “pages” linked from my weblog — for example, my “about” page and my PGP key — are broken. I’ve posted in the WordPress Support Forums with no luck. I’m not sure when or why they stopped working, but if any readers have any suggestions of how to troubleshoot, I’d love to hear about it. Nothing relevant appears in server logs.

In the meantime, apologies if you came here trying to find out about me. I’m temporarily out of service.

A Much Simpler Fix for the r8169 “Link-Down” Problem

There is a widespread problem with the Linux driver for the Realtek 8168/8169 cards where the modules load properly and the card is visible but no link is detected. E.g.:

Jun 21 18:28:41 localhost kernel: r8169: eth0: link down
Jun 21 18:28:41 localhost kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready

There are lots of details and suggested solutions from the Ubuntu people. None of the suggestions worked for me, however. Several of them suggest configuring the card under Windows, but the box containing this device is a single-boot linux fileserver. The “wake-on-LAN” functionality seems to be implicated, but not in a way I can see how to fix.

After much head-banging (in the bad sense), I found a simple solution:

(1) Install ethtool
(2) Modify /etc/network/interfaces as follows (substitute your r8169 interface for ‘eth1’ and other settings accordingly):

iface eth1 inet static
pre-up /usr/sbin/ethtool -s eth1 autoneg off

This did the trick for me where no other solution would work. Of course, link autodetection no longer occurs, but that’s a small price to pay for connectivity.

This is a Debian etch installation using a slightly more recent kernel (2.6.25-2-686).

As an interesting side note, on this new box, the interface appears as eth0 in the kernel logs, but is actually mapped as eth1. Similarly, a second Ethernet interface appears in the log as a different device number than that by which it is referenced. Any ideas why?

Update 6/22/08: Still not getting 1000BaseT (Gigabit), however. If I force 1000BaseT with ethtool -s eth1 speed 1000, the link goes down again (even with autoneg off). The same card in another box, however, detects the link and goes to 1000BaseT automatically. So I’m stuck at 100BaseT.

Update 6/24/08: Linux 2.6.26-rc5 fixes the problem 100% for me.

[Tags]realtek, r8169, linux, drivers, hacking[/Tags]

No time for blogging, but…

I’ve been on the road a lot lately with no time to blog (or sleep).  Following are a couple items to keep this space from going completely dead.

A wild turkey stopped by our backyard (in Boston):

a turkey

I called city animal control, and they said these guys are everywhere. Apparently there is no more wild left for the wildlife.

In other news, Gears is finally available for Firefox 3! For me, at least, this is a big deal, since I frequently depend on offline mode for Google Reader and have otherwise fully migrated to FF 3 for the speed benefits. Since RC2, it’s mostly stopped crashing as well.

[Tags]Wild Turkey, Wildlife, Gears, Boston, Firefox[/Tags]

Microsoft Outlook 2003 Tip: VBA Macro to Remove Stationery from Email Message

Quick link to ClearStationery.bas

I don’t have much (any?) history of posting tips for the Windows platform, but I’m currently stuck with it for daily work use, so I figured I might as well share some tips that my readers who happen to be in the same predicament will find useful. (Planet Debian readers please have mercy.)

One of the worst things you that Microsoft Outlook allows a user to do is select a “stationery” for email. Stationery goes beyond regular old HTML mail (e.g., fonts, colors, and bullet lists) to add a patterned background, invariably rendering the content much less readable than it would be with a white (or even any other color) background. What’s worse is every reply to an email with stationery also adopts the original sender’s stationery!

I searched quite a bit for a solution that does not involve sending a nastygram to the original sender. Of course you can convert the email to plain text (or set Outlook to only display the plain text version) and then convert back to HTML or Rich Text, but you’ll also lose other formatting that you might want to retain. You could cut and paste the text into a new email, but what is really needed is a simple VBA macro that will strip the stationery but not other formatting.

Strangely, I don’t think that macro already exists. So I wrote one, to some extent cribbing from related code snippets (mostly from here). I now present to the world ClearStationery.bas, my best contribution to date to the Outlook ecosystem. Simply paste it into your Outlook Visual Basic Editor (ALT-F11) and then map the macro ClearStationeryFormatting() onto a toolbar with a hotkey, and you can instantly remove stationery from any email, whether it is in the “preview” pane or the full message view.

Comments, bug reports, and improvements are welcome:

Sub ClearStationeryFormatting()
On Error GoTo ClearStationeryFormatting_Error
    Dim strEmbeddedImageTag As String
    Dim strStyle As String
    Dim strReplaceThis As String
    Dim intX As Integer, intY As Integer
    Dim myMessage As Outlook.MailItem

    ' First, check to see if we are in preview-pane mode or message-view mode
    ' If neither, quit out
    Select Case TypeName(Outlook.Application.ActiveWindow)
        Case "Explorer"
            Set myMessage = ActiveExplorer.Selection.Item(1)
        Case "Inspector"
            Set myMessage = ActiveInspector.CurrentItem
        Case Else
            MsgBox ("No message selected.")
            Exit Sub
    End Select

    ' Sanity check to make sure selected message is actually a mail item
    If TypeName(myMessage) <> "MailItem" Then
       MsgBox ("No message selected.")
       Exit Sub
    End If

    ' Remove attributes from <BODY> tag
    intX = InStr(1, myMessage.HTMLBody, "<BODY", vbTextCompare)
    If intX > 0 Then
        intY = InStr(intX, myMessage.HTMLBody, ">", vbTextCompare)
        strReplaceThis = Mid(myMessage.HTMLBody, intX, intY - intX + 1)
    End If

    If strReplaceThis <> "" Then
        myMessage.HTMLBody = Replace(myMessage.HTMLBody, strReplaceThis, "<BODY>")
        strReplaceThis = ""
        Err.Raise vbObjectError + 7, , "An unexpected error occurred searching for the BODY tag in the e-mail message."
        Exit Sub
    End If

    ' Find and replace <STYLE> tag
    intX = InStr(1, myMessage.HTMLBody, "<STYLE>", vbTextCompare)
    If intX > 0 Then
        intY = InStr(8, myMessage.HTMLBody, "</STYLE>", vbTextCompare)
        strReplaceThis = Mid(myMessage.HTMLBody, intX, ((intY + 8) - intX))
    End If

    If strReplaceThis <> "" Then
        myMessage.HTMLBody = Replace(myMessage.HTMLBody, strReplaceThis, "")
    End If

    If InStr(1, myMessage.HTMLBody, "<center><img id=", vbTextCompare) > 0 Then
        strEmbeddedImageTag = "<center><img id="
        '"<center><img id=""ridImg"" src="citbannA.gif align=bottom></center>"
        intX = InStr(1, myMessage.HTMLBody, strEmbeddedImageTag, vbTextCompare)
        If intX = 0 Then
            Err.Raise vbObjectError + 8, , "An unexpected error occurred searching for the embedded image file name start tag in the e-mail message."
            Exit Sub
        End If
        intY = InStr(intX + Len(strEmbeddedImageTag), myMessage.HTMLBody, " align=bottom></center>", vbTextCompare)
        If intY = 0 Then
            Err.Raise vbObjectError + 9, , "An unexpected error occurred searching for the embedded image file name end tag in the e-mail message."
            Exit Sub
        End If
        strEmbeddedImageTag = Mid(myMessage.HTMLBody, intX, intY - intX)
        intX = InStr(1, myMessage.HTMLBody, "<CENTER>", vbTextCompare)
        intY = InStr(intX, myMessage.HTMLBody, "</CENTER>", vbTextCompare)
        strReplaceThis = Mid(myMessage.HTMLBody, intX, intY - intX) & "</CENTER>"
        myMessage.HTMLBody = Replace(myMessage.HTMLBody, strReplaceThis, "", , , vbTextCompare)
    End If

    ' Finally, saved modified message

    On Error GoTo 0
    Exit Sub


    MsgBox "Error " & Err.Number & " (" & Err.Description & ")"
    Resume Next
End Sub


Proof of Fall 2007

I’ve been playing around with Gallery and wpg2. I’m still a bit puzzled attempting to integrate Gallery and WordPress. I’ve resolved most issues; the main remaining issue is to display images in the Ajaxian theme without running over the borders in the Ajax/slideshow views. Also, the embedded image apparently doesn’t render in the RSS feed. Update: I’ve given up on the G2 tinymce plugin and the WPG2 tag for now and just hardcoded the image and album URL. Update 2: now the embedded image is working again for no good reason. Suggestions on the entire configuration are welcome.

In any case, I took some pretty photos today in our back yard (use left and right arrow keys to scroll through images after clicking on the one below — I still can’t get the navigation icons to appear):


Before: Proof of Spring 2007.

[Tags]Autumn, Foliage, Trees, WordPress, WPG2, Gallery[/Tags]