Archive for September, 2002

25 Septembris 2002

It works!

I finally got my Puerto Rico name-cataloguing script to work. This doesn’t sound huge, but it was a test of my limited skill. (Warning: heavy geek talk coming.)

First, I had to extract the data from the pipe-delimited text files. No sweat, but why bother figuring out where all the fields are if not to build query-able objects out of each record? So I did that. Yeah, it added major overhead—but it paid off this week when my boss asked if I could build a list of occupations and industries from the data-so-far. Sure I could. Easily. Not only that, but I could build a list of correlated occupations and industries (so that, for example, sugarcane farmers are listed apart from coffee farmers).

Plus, I can segregate names by sex, eliminate all names of foreign-born people… it’s easier to get a cleaner namelist.

And then I had to figure out how to match up not-quite-letter-for-letter names, to make the counts more accurate and point out correspondences I might not otherwise see. This cost me a lot of brain-bending thought because I am an idiot. I decided finally on two matching algorithms: one-letter-off for names of equal length (e.g. “Bacilisa” and “Basilisa” are the same name), and one-letter-missing (“Cresencia” and “Crescencia” are the same name).

Then it suddenly occurred to me that both these algorithms will fail now and then if “rr,” “ll,” and “ch” are not considered single characters. Okay, fixed that.

Now, build a dictionary of unique names keyed to the number of times they appear. That part’s easy.

Now comes the fun part—check each name against all names previous to it alphabetically to see if either of the matching algorithms applies; if a match is found, pop the name into a dictionary of “header” names keyed to a list of matching subnames. This. Is. Bloody. Slow. But it works. There’s probably an obvious way to optimize it that hasn’t occurred to me because I am an idiot, and a self-taught idiot at that.

From there it’s a simple matter of building a prettyprinted report.

I haven’t run this on the whole dataset yet—dataset’s pretty big, it’ll take a lot of time—but I already know the next step: eliminating false matches. Fewer of those than I had expected, but they are there. No big; just means finding them and putting them in the check-header-name dictionary that the program currently builds from scratch.

The results (even the truncated ones I have now) are fascinating. I can’t publish them because the data themselves are raw and not public, but I can talk about them and I expect I will.

Anybody know offhand how to find a saints’ calendar that might have been in use in early 20th-century Puerto Rico?

That’s just freaky

Apparently I should be careful what I do with showing you guys markup in CavLec.

Look what Radio’s aggregator did to my previous post. Yikes.

Thanks to Will Cox at The Peanut Gallery for the screenshot. He says that Radio’s double-decoding of character entities (such as the &lt; I have to use to get a < on the screen) is causing the problem. I thought that had been fixed. Guess not.

Oops. Just did it again. Sorry, Will; this post will look bizarre too.

<PRE> element considered harmful

I never use the <pre> element on CavLec.

No, it’s not because I’m a markup snob and consider <pre> too presentational. It’s because I don’t know how wide your screen or your browser window are.

Lots of techbloggers use <pre> to put code in. I wish they wouldn’t. Use a div with font-family: monospace and perhaps some nice hanging indentation, please. (Even if you have to achieve indentation with non-breaking spaces at the beginnings of lines. Even so.)

Because what happens in small browser windows is this (picking on Mark because I know he’ll take it in good part). I can’t read the flippin’ code without viewing source; the lines can’t break.

So don’t use <pre> if it can be avoided. Thanks.

Update: Phil correctly notes that putting non-breaking spaces in code to indent it will cause that code not to execute when copied and pasted. This is bad. Neither he nor I has a spiffy solution at the moment that provides both working and pretty-printed code.

It does occur to me, though, that XHTML 2’s line element is a start. Using that instead of br will let you play with the left margin on line, line+line, line+line+line, and so on until you run out of lines of code.

Next project

Well, AKMA is humming along with his Movable Type installation, so I’m at loose ends for something to do.

I happened upon Tish in the comments to another site, asking for help with permalinks. I explained a bit about fragment identifiers to her; she got them working, and explained to me that she’d tried a Movable Type install but hadn’t had any luck.

So that’s my next markover project. Rubbing my hands together in anticipation…

Oh, fragment identifiers? This is an HTML thing. Remember how I explained about id and a name? In purely functional terms they accomplish the same goal, marking something to be pointed at from elsewhere. I never talked about how to do the pointing.

To point to a web page as a whole, you simply put its URL in the href attribute on the a element. To point to CavLec, for example, you would use the tag <a href="http://yarinareth.net/caveatlector/index.html>. (Note that in XHTML 2.0, the href attribute will be allowed on practically all elements, not just a, letting you turn anything into a link. Cool, huh?)

To point to part of a page, that page-part has to have either an id attribute or an a element with a name attribute. How do you find out? You have to view source, I’m afraid. A really cool browser stunt would be to have a mode that searches a page for identifiers and then exposes them visually somehow for easy linking. (Sort of the way modern journals put bibliographic information for articles at the bottom of the first page of the article.)

I don’t know of any browser that does this, but perhaps one could write a Javascript gizmo that would do it. This would be a cool thing for ebooks, too—point, click, wham! there’s an XPath/XPointer expression that will reliably take you to that spot in the book. Pretty doable, I think, and it solves the citation problem that ebooks-in-academia users complain about.

Off-track, sorry. Once you have the id or a name value for the page-part you want, you attach it to the page’s URL with a hash mark (pound sign, whatever, a #, shift-3 on most keyboards). CavLec’s archives listing in the sidebar has the id archives, so to link directly there (for some weird reason), you would use the tag <a href="http:// yarinareth.net/caveatlector/index.html#archives">.

The #archives part of the URL is often called a “fragment identifier” for reasonably intuitive reasons. And there you are.

23 Septembris 2002

What I can’t not do

Today Jonathon quoted a bit from Mark Pilgrim that I’d seen before. Enlightenment struck (in the scattershot way it strikes where I’m concerned) and I knew I had to comment.

“Do what you can’t not do,” says Mark, the assumption being that there is some transcendent occupation that fits you like a glove, that will suffice to keep you in the style to which you are accustomed (and preferably a bit better), and that will lead you ever upwards and outwards to new glories, successes, and joys.

All right, I may be overstating things a wee bit. Still. That’s the general idea, isn’t it?

Which makes things a little tough for us non-transcendent types. Just pointless drudges, that’s us, no direction and no grand schemes. No flow.

What got me riding this train of thought was an email from a friend about David’s unclaimed property, expressing a certain amount of wry astonishment that anyone could lose track of a significant amount of money. Well, look, that’s how David is; I’ve known that for twelve years. He has other things to think about, and if he doesn’t, he’ll find them. Transcendent things. Lifework things. Things he can’t not do.

Mundane details of everyday life are just not his thing. He’s the original innocent; once he gets his Ph.D (knock wood) he will be the perfect absentminded professor.

So the thing I can’t not do is take care of him, keep him fed and sheltered and his tuition paid. That’s my—well, not my job exactly, more like my permanent task, my lifelong occupation.

Not terribly transcendent, is it? Nor is what I do in service of that goal. A low-paid data-entry drudge job. Taxes and other financial-management work, including planning, oversight, and worry-warting. Shopping (all types) and cooking. Nagging. Computer maintenance. The odd bit of mending, the occasional gasp of horror and whisking away of a no-longer-viable article of clothing into the rag bag. (He doesn’t notice holes large enough for a cat paw, I swear.) Guiding him through mundane encounters he’s never experienced before and doesn’t have the Goffman scripts to handle.

Which is not to say I resent this work; much of it carries intrinsic reward. (Aside from tax returns, which are horrible, personal finance is intriguing. Yes, even now.) There is also a great deal to be said for possessing the complete trust of one’s spouse. And I am just stupid proud of what David has accomplished, and what’s coming down the pike for him (no, sorry, can’t tell just yet).

It does irk me now and then, though, patience not being an especial virtue of mine. I’ve lost my temper with David a few times, usually because a particular decision feels too big or too worrisome for me but he won’t or can’t participate. Or because he pokes his head up and makes a fuss over a decision I’ve made. Or because he doesn’t seem to understand or value what I do for him (usually a misapprehension on my part, but there it is).

It’s also not to say that I couldn’t give up particular facets of my life that on reflection exist only to serve this thing I can’t not do. Job—I’ve thrown away two in the last three years, and the one I’m on will end sometime next year (because the project will be done, not because I’m going to stomp out in a huff or anything like that).

Finances—I talk a lot about paying off the mortgage, probably more than I should, but that’s because it is a weight, a big one. I want it gone. With the extra principal payments, it has sucked up a good one-third to one-half our post-tax income the last three years, windfalls aside. Burning the mortgage will buy me freedom I can hardly imagine right now.

Not just financial freedom, since that is usually defined as “freedom to spend lots of money on useless stuff;” I don’t have lifestyle ambitions much beyond the way I live now. What I’m after is mental freedom, a return of the brainspace currently occupied with “how do I find $x to put into the house? do I put $x I currently have into the house, save it for David’s tuition (or, heaven forfend, my own) or what? oh, crud, have I even fed the IRAs this year?” In other words, I’m trying to put myself out of (part of) this job.

Even then, though, I doubt I’ll do much that Mark or Jonathon would recognize as life-work. It’s just not the way I function. I am part worker bee, (large) part drone. I genuinely enjoy music performance, but I’ve lived without it for several years now. Theater—even longer, just about long enough to grow into the mother-in-law roles that fit me best. Writing—Burningbird is a writer, not me. I do not have whatever drive it is that impels some people into relentless pursuit of a particular metier.

It’s lovely when I have work to do that I can genuinely get into, and I’m pushing myself toward library school (and accepting the additional mental burden of figuring out how to pay for it—the words “home equity line of credit” have actually crossed my mind, much to my dismay) in hopes of finding such work; but I don’t need that in the same way I need to watch out for my husband.

Every time I’ve sought the One True Lifework, in fact, I’ve fallen flat on my face, tripped over my own inflated expectations. Graduate school. Ebooks and markup. Better I should do what I can’t not do, and let the One True Lifework go.

I read Richard Florida’s The Rise of the Creative Class recently. It didn’t especially convince me—I can accept his data but not all his conclusions—but I had a true fire-book-across-room moment while perusing his final chapter. (No, I didn’t fire book across room. I don’t do that to library books. I just snarled and wanted to.)

See, Florida points out that a Creative Class–based economy leaves non-Creative-Classers in the dust, in service jobs that pay a wretched benefits-less pittance. He’s right. His solution? Not revaluing service jobs as vital parts of tolerable living. Not valuing human time and energy enough to pay a decent minimum wage, to ensure a minimum standard of health care. Oh, no. The answer is to move everyone into the Creative Class!

Moron. Clueless, elitist moron—I really wanted to fire that book into a wall, hard. Maybe I am a drudge; maybe I do deserve no more than the pittance I’m paid. Move me out of my drudgery into the lighthearted, self-absorbed Creative Class, though, and David’s in immediate trouble. I may be a drudge, but I am a necessary drudge.

And we necessary drudges deserve better than second-class status. Better than to be told that whatever we’re doing is less than what we should be because it is a means to an end and not an end in itself.

22 Septembris 2002

Email hassles

I didn’t know this, but my email is apparently spotty at the moment. I’m not sure what’s up. I know some things have been getting out; just apparently not everything.

Jonathon, I got your message and tried to reply. I also sent you a reply to your design message last night, which I will resend later on.

Sorry for the fuss, all.

IRS">I owe the IRS

No, no, not money. Thanks.

Yes, I do mean it. I have reason to thank the IRS. Well, I certainly hope miracles will never cease.

Seems my husband somehow managed to abandon a bank account in the state he used to live in. (Now you begin to understand why I handle our money. I am utterly incapable of this. Which may not be a virtue, admitted.) So eventually the unclaimed-property office in that state, not finding my husband, turned to the IRS.

Yeah, said the IRS, we know a David Salo, what’s it to you? And when the unclaimed-property office explained, the IRS sent us a letter indicating that we should get in touch with someone at the unclaimed-property office.

It came in an official IRS envelope. I nearly expired on the spot when I saw it; my taxes are in order as far as I can tell, but I do them myself and I am no tax law expert. And who wants an audit anyway, no matter how clean their records are?

But the final upshot of the encounter is waiting to go to the credit union, a nice little windfall that whacks another five months off our mortgage. So thank you, IRS, and thank you, unclaimed-property office.

21 Septembris 2002

Thank you, Tish

I’ve spent a couple of weeks now feeling lousy about the dustup I started over appropriation of body image. Seems clear I made an impression—such an enormous one, in fact, that it may well turn out to define, permanently, the image that the blogsphere has of me. I didn’t intend it to do that (and I can’t help but appreciate the irony involved!), but I know as well as anyone that intentions often don’t matter.

Worse than that, though, was the lingering feeling that I hadn’t actually said what I wanted to say in understandable fashion. I got pushback (lots of it), I got some (legitimate) questioning of my position, I got nudges testing my boundaries, I even got an attempt to sidetrack me so that I wouldn’t talk about it any more. I didn’t get a whole lot that I felt right calling understanding.

Since I’d already crossed the line twice and hurt people thereby, I didn’t try for understanding; I wrote the whole thing off as a mistake. I did want to try, actually. One of the things I was (and am) entirely bloody sure I’d failed to get across was the sheer amount of pain this kind of thing caused, and still causes when I can’t quite manage to deaden my receptors enough.

So I want to thank Tish, publicly, for getting it (scroll down to the entries for September 17 and 21) and having the courage to get it out loud. I needed that more than I cared to admit, and I’m grateful. Thank you, Tish.

I just added Tish to my blogroll. I’d run into her site before, but I resisted blogrolling her, for no other reason than that her issues (for lack of a better word) parallel mine enough to cause me discomfort, pull scabs off still-tender areas of my psyche. Lousy reason. Better late than never, I hope.

Evangelization

Even just the blowback from the RSS mess is poisonous, so much so that I quite understand why Bill Kearney wrote:

Meanwhile, as a developer, ask yourself what would you rather follow, a developer that dictates to you or a community backed by some amazing briliant people? As a user, you probably won’t care about it. But perhaps you should for as your needs grow who’s going to address your needs in a thoughtful and well-planned manner? Would you rather take advantage of a team of industry experts? Or just one vendor with a product to shill?

Even so, I wish he hadn’t. A bit of friendly advice from someone who’s been there on another spec: Don’t go there. In the long run, it will not help you.

Not just because it’s adversarial, though it is and we’ve had rather more than enough of that lately. Because it lays you wide open to the devastating question, “yeah, so if it was any darn good why would you bother to plug the people who wrote it? Anonymously, yet?”

The best way to evangelize just now, especially to developers, is to talk about the virtues of the spec, not the virtues of its authors or the demerits of the author(s) of That Other Spec. Prove that your spec kicks butt. Prove it.

And as a side benefit, you’ll have taken the high road into the bargain. High road looks pretty untrodden just now.

Aggregator stats

It seems that NetNewsWire is kicking serious butt over at Mark’s place. He professes surprise at this, given that NNW is Mac-only, and OSX-only to boot.

Well, Mark, given the Mac goodies you dole out on a regular basis, I can’t manage quite the same surprise.

So I checked my own logs, via the handy-dandy Analog report that my hosting service provides. Yep, lots of aggregators in the past week. Numbers appear to come out to 56% for Radio (I assume that radio.userland.com/newsAggregator and frontier.userland.com/xmlAggregator are both Radio-related), 23% for NetNewsWire, 14% for Aggie, and 7% for Amphetadesk.

Given that I don’t dole out any Mac goodies at all (I’m still running OS 9, for Pete’s sake!), 23% for NNW is pretty darn respectable.