‘Blogging tools’ Archive

8 Iunii 2008

Archiving blogs

Meredith asks if anybody’s thought about archiving blogs.

Well, I have, and I can prove it. Dan Chudnov had a blog-preservation infrastructure he was kicking around, but I don’t know what happened to it.

Here are the chief barriers I see:

  1. Rights barriers. If getting a license from the blog owner weren’t hassle enough, consider the problem of third-party-owned designs.
  2. Respect barriers. If I had a buck for every time I’ve heard this: “Libraries exist to preserve the filtered, reviewed, authoritative scholarly literature. When we step outside those boundaries, we damage our reputation for purveying credible knowledge.” I’ve heard it about IRs. I’ve heard it about data curation. I’ve heard it three times over about blogs. Even those of us who see value in blog preservation can’t move forward while our libraries still think like this.
  3. Technological barriers. DSpace is very poorly-suited to acquiring serials of any description. It doesn’t have any kind of harvest or cron-job mechanism. This could be hacked, but nobody’s hacked it. Until someone does, don’t talk to me about blogs; I don’t have time to do manual grabs once a month or whatever.
  4. Priority barriers. I am one person responsible to 26 campuses. Where am I going to put my energy? Capturing peer-reviewed literature? Data curation? Open-access journals? Grey lit? I’m sorry, blogs are pretty far down the list.

That said, if I had it in mind to bootstrap a blog-preservation program, I tell you what I’d do: write a grant proposal, probably to IMLS. Focus on law blawgs, because there’s already scholarship indicating that they’re being cited in law reviews and the rest of the legal literature, so like it or not, they’re part of the scholarly record. Promise an ongoing collection project and a survey of the rights landscape as well as an open-source collection tool (that plays nice with SWORD and OAI-ORE, natch) to help other libraries archive blogs.

I think that might be a winner.

17 Martii 2006

My first podcast

I finished the podcast for HigherEdBlogCon today. It clocks in at twenty-one minutes and change; it started at twenty-three, but I went through and killed a lot of dead air.

Podcasting is sorta fun, if you can stand the sound of your own voice. I can, but only just barely; where did those horrendous Porky Pig sibilants come from? Maybe I should see an orthodontist. I’ll give myself props for speaking with a decent flow, though (and no, I did not just read my paper, because who wants to hear that?). The original recording was done in fits and starts, but I got all twenty-odd minutes of the talk in well under an hour. I only had to clean up a couple of fluffs (left a couple in; I’m only human), and I only cut one sentence out of the entire ’cast as being pointless.

It took me a bit of time to get the hang of GarageBand, but once I (with the able help of a colleague) sussed out what the controls were and what the cursor-changes meant, it wasn’t too bad.

While mileage does vary, I won’t be giving up blogging for podcasting any time soon. It took me hideous amounts of time to edit twenty decent minutes of talk, and that was with a paper prepared already. Plus, the paper is only eight pages long—at a minute or so of reading-time per page, you do the math. Auditory learners will just have to pass me right on by, I fear.

Yeah, so it’ll be at HEBC sometime the week of April 10 April 13, which seems somehow fitting even though it’s not Friday. When I see it go up, I’ll self-archive it too, of course.

15 Ianuarii 2005

Wow, conspiracies

So Six Apart buys LiveJournal and it promptly goes down dead as a doornail.

And suddenly my husband notices that Six Apart backwards is (dramatic chord)… TRAP AXIS.

Don’t it just figure?

11 Ianuarii 2005

Killing referrer spam

I’ve been watching Technorati on the subject of referrer spam the last few days. The blogger Ann Elisabeth has been doing excellent work ferreting out where all this is coming from, and I do recommend you go see—but that won’t so much help you stop it.

If you don’t know what referrer spam is? Ignore this post entirely or pass it on to a knowledgeable friend. But what the hey, for the rest of you, here’s what I do. Not a silver bullet—takes work—but definitely a bandwidth-reducer.

You will need:

  • Your webhost to be running the Apache webserver (not IIS).
  • FTP access to your server, an FTP client, and the skill to use it.
  • A text editor. Notepad actually will do this time. Microsoft Word won’t.
  • Some patience.

If you have access to your server logs, “recent visitor” logs, or a log-analyzer (like Analog or AWStats), that will help a lot. I will also be discussing some WordPress-specific tricks; I’ll mark them as such.

What we’re going to be doing is messing with .htaccess files. These files tell Apache (among other things) who is allowed to see a particular part of your website and how to rewrite and redirect URLs when that’s necessary. If you use WordPress and you have pretty permalinks, you’ve already messed with .htaccess, because that’s what’s making the pretty permalinks work.

BE AWARE: YOU CAN BORK YOUR WEBSITE WITH THIS. I’ve done it. (In fact, I did it two minutes ago. Go me.) How will you know your .htaccess file is borking your site? Well, usually, when you browse to your weblog’s URL you’ll get a “500 Internal Server Error” page of some sort instead of your beloved weblog.

Always, always, always keep a last-known-good version of your .htaccess file! If you’re using FTP to place your .htaccess file and you bork your site, you just upload the last-known-good file, and you’re golden.

If, on the other hand, you’re using WordPress’s Templates menu to mess with your .htaccess file and you bork your site, you probably can’t use WordPress to fix it! So what you do (well, what I do) is fire up the FTP program, grab the malfunctioning file, fix it, and re-upload it. WordPress will then behave normally. DANGER WILL ROBINSON! Don’t use WordPress’s Templates menu for this unless you’re reasonably confident you know what you’re doing, or at least can fix whatever you mess up!

Right. That said (and to it I add: don’t sue me if you bork your site; you take my advice at your own risk), onwards.

Your first decision is where to put your .htaccess file. Typically, it should go as high up in your web-folder hierarchy as possible, because it should then protect all the subfolders underneath. However, if you’re a WordPresser and your WordPress install is in a subfolder, go ahead and use the existing .htaccess file, or if you don’t have one, put one in along with your index.php file.

If you use a subdomain for your blog (as I do; the difference between http://cavlec.yarinareth.net/ and http://www.yarinareth.net/caveatlector/), you have to put your .htaccess file in a directory belonging to the subdomain. (At least on my webhost you do.) This is annoying, because if you maintain more than one WordPress blog on more than one subdomain, you have to edit .htaccess separately for every single subdomain. I haven’t found a workaround for this. Yet. If anyone has one, please let me know!

Okay, now that you know where your towel .htaccess file is, what do you put in it? At the top, you need the following two lines:

RewriteEngine On
RewriteBase /

If they’re already there, great; leave them be. This tells Apache that you’ll be making some rules about URLs on your site.

I now recommend that you pick up one of the tricks from Mr. Costello:

RewriteCond %{HTTP_HOST} !^yarinareth.net$ [NC]
RewriteCond %{HTTP_REFERER} ^(.*)$ [NC]
RewriteRule ^(.*)$ %1 [R=301,L]

Replace my website domain name in the first line above with yours. (No. Really. Do it. If you don’t, YOU WILL BORK YOUR SITE.) This won’t kill all referrer spam by a long shot, but it’ll kill the really stupid ones. What it does is, if the stupid referrer spammer asks for a page that isn’t even part of your site (as a few of ’em actually do!), Apache silently tells them to go to the page they’re trying to referrer-spam you with! Cute, no?

Next we’re going to deal with referrer spammers smart enough not to screw up in this fashion. And to make our lives easier when new referrer spammers come along, we’re going to use a bit of indirection. First, we’ll make a list of words or word-fragments that show up in fake referrers, and we tell Apache that they’re bad. Then, we’ll tell Apache not to give anything to anybody who shows up with a referrer containing one of the words or word-fragments we’ve defined as bad. Make sense? Good.

Here’s my current list, which you are welcome to copy and paste — just get rid of all hard returns so that the entire thing is one line. Apologies for the bad language below; not precisely my fault!

SetEnvIfNoCase Referer
".*(credit|canadianlabels|8gold|texas-hold|hold-em|holdem|
fidelityfunding|condo|sportsparent|mortgage|spoodles|money|
cash|hotel|houseofseven|stmaryonline|newtruths|popwow|oiline|
flafeber|thatwhichis|tmsathai|pisoc|crepesuzette|mediavisor|
commerce|easymoney|911|////.vi|gb////.com|4free|macsurfer|teen|
pussy|discount|blogincome|lillystar|aizzo|webdevsquare|laser-eye|
escal8|xopy|vixen1|linkerdome|youradulthosting|fick|inkjet-toner|
fuck|ime.nu|perfume-cologne|italiancharmsbracelets|shoesdiscount|
psnarones|hasfun|casino|gambling|poker|porn|sex|paris|gabriola|nude|
xxx|hilton|pics|video|adminshop|devaddict|iaea|empathica|
insuranceinfo|atelebanon|handy-sms|peng|just-deals|pisx|rimpim).*"
BadReferrer

(Wow. I’ve got quite a few of ’em, haven’t I? Well, I’ve been doing this a while.)

Next, add the following lines:

order deny,allow
deny from env=BadReferrer

Be careful to stifle your automatic tendency to put a space after a comma in the first line above. THAT WILL BORK YOUR SITE. Seriously. Apache is unforgiving.

And that’s that. Any request for a page containing a referring URL with any of the words separated by pipe (|) characters above is going to get smacked down.

What happens when the new batch of referrer spammers hits? Let’s say somebody’s desperately trying to insert example.com into your server logs. All you do is add example| to the beginning of your string o’ bad words, and you’re golden.

What, you say, I don’t have to enter the whole URL? No, no you don’t. And in fact you probably don’t want to, because entering “mortgage” just once blocks every single stupid mortgage-shark referrer-spammer coming down the pike.

How do you tell it’s working? Check your server logs (however you do that) for HTTP code 403, which means “forbidden” and is what Apache should do to these yobbos. But to use a broader brush, you should also see your daily bandwidth drop significantly if these guys have been hitting you.

I’ve created a new category “Spam” which I intend to use to pass on additions to my personal list of bad words, as well as any other good tips I see and decide to employ. I don’t want to become a blacklist clearinghouse, so if anybody else wants to take on this job, please please be my guest.

I’ve got a few other techniques to pass on, but they’re strictly for the serious anti-blog-spammer, so I’ll close this post for now, hoping it does some good.

15 Decembri 2004

RSS and science publishing

Haven’t gotten all the way through it yet, but for my non-librarian readers who are syndication enthusiasts, DLib has an article on RSS that (judging from what I have gotten through) is well worth your perusal.

21 Augusti 2004

PHP problems fixed

I think I’ve finally fixed the PHP errors that were getting thrown all over the place. I had draconian error-reporting turned on someplace buried deep in the WordPress internals. Happened while Adrian and I were working on getting my Latin dates back; he asked for the full slate of errors, I gave it to him, and for some idiotic reason I never set the switch back to normal.

Let me know if you’re still getting weirdnesses in trackback or whatever.

28 Iulii 2004

WordPress gotcha

A mild gotcha that the WordPress folks (users and coders alike) might want to take notice of:

My RSS and Atom feeds are currently both invalid and ill-formed because I inserted HTML character entities (specifically ô and ñ) into my posts, rather than using the numerical Unicode equivalents or inserting the characters directly.

The fix, if you choose to code it, is to do a quick search-and-replace sweep for those entities when outputting any sort of XML other than XHTML. The fix, if you choose to author it, is not to be lazy like I am—look up the numbers and use ’em.

24 Maii 2004

Easy facets in MT?

Help, MT gurus! I want to build a faceted classification system so easy a child could use it, but I’m not finding it an easy task.

I know about Pixelcharmer’s solution as well as tima’s plugin. They are not what I want. Unless I’m reading completely wrong (and please tell me if I am!), they force me to create a huge number of categories, even with just two facets.

(For example, if I have one WhatIsThisThingAnyway facet containing Link, Form, and LocalInfo — which is what I’m looking at doing — multiplied by a Subject facet containing, say, eight entries, that’s TWENTY-FOUR CATEGORIES, and it only gets worse from there. I cannot expect my target user to use a category list that long effectively. I simply can’t.)

Multiple categories? Well, yes, but since the MT interface can’t separate out the categories in one facet from the categories in another, I can’t guarantee that my user is going to Do The Right Thing. This needs to be EASY.

Subcategories? I may have to go this route, but I don’t like it, because it limits me to two facets. I really don’t like that. I at least want an Audience facet too.

And then I noticed that MT has a Keywords box. Bonanza! I thought. I can have my user type in the items as keywords. It’s kludgy — I’d rather have checkboxes for some of these facets, a la Wordpress, which does category interface vastly better than MT anyway — but it’ll work; all I need is an MT tag that grabs entries that have a particular keyword or combination of keywords.

Guess what. No such tag. All you can do with keywords is list them for a particular entry. That’s real useful. Not.

The best I can find is a RelatedEntriesByKeyword plugin which isn’t what I want. I don’t want to grab entries based on their relatedness to another entry. I want to grab them based on a keyword I supply. I want entry listings by keyword the same way I can get entry listings by category.

Except I can’t have it.

If this were my personal weblog, I’d throw up my hands and move immediately to WordPress. But it isn’t; it’s for my practicum client, who’s heartily enamored of MT (and shouldn’t move to WordPress until it’s got native multi-blog support anyway).

I’m stuck. I can’t figure out how to make this work. If you have a bright idea, post it to your blog and ping this entry, or email me with it. Thanks.

Addendum: Okay, I misunderstood how the plugins work. I wouldn’t have 24 categories, but 11; all I have to do is prefix the category with its facet name, not cross-ref everything. That’s better, but it’s still not good enough; since all the categories are mushed in together, it’s still too hard to get the client to Do The Right Thing.

Oh, well. I think it’s back to the drawing board for me.

22 Iulii 2003

Extended entries

A while back I helped Invisible Adjunct suss out Movable Type’s extended-entry capability. I’ll do the same for everyone else now.

If you’re a short-form or link-and-comment blogger, you can safely skip this post, as you don’t need extended entries. (You might be able to remove some extraneous crud from your templates and your entry screen, though, so by all means stay tuned.)

Movable Type allows you to split a single weblog entry into three parts. One part, called the “excerpt,” is intended to be a quick summary of the entry. One part is the main body, and the last part is the extended body. The typical setup is to have only the main body and a link to the extended body show up on your index page, while your archive pages display first the main body and then the extended body.

Some prominent webloggers argue that you should almost always write an excerpt, crafting it very carefully so that your readers know whether they’ll be interested in the whole entry. Part of the reason for this is that your RSS feed (what people read in a news aggregator) probably sends out only the excerpt, not the entire post. If you don’t write an excerpt, Movable Type sends out the first 20 words of your post, which may or may not be a good indicator of its content.

Your call. I stick with the first 20 words, myself, acknowledging that many aggregator users curse me for it. I suspect that in CavLec’s case most people find post category rather than excerpt the best guide to whether they’ll be interested.

The Movable Type tag that represents an entry excerpt is <$MTEntryExcerpt$>. It has one attribute, convert_breaks, which adds p and br tags if set to "1" and does not add them if set to "0".

Every MT blog in existence, pretty much, uses the <MTEntryBody> placeholder. The convert_breaks attribute is available here also, but there’s a catch: if you set it, either way, it overrides whatever value you set in the “Text Formatting” box on the entry screen. I recommend staying away from that attribute on this placeholder.

<MTEntryBody> represents the main body of the entry, the part of the entry that should appear both on the front page and as the first part of the archived post.

Only those blogs that use extended entries need to use the <MTExtendedEntry> placeholder. If you never use extended entries, you may still have this chunk of code (or one similar) in your template:

<MTEntryIfExtended>
<span class="extended"><a href="<$MTEntryPermalink$>#more"> Continue reading “<$MTEntryTitle$></a></span><br />
</MTEntryIfExtended>

You don’t need this code. All it does is give a “Continue reading (your post title)” with a link to the extended entry. Since you never use extended entries, you don’t need this code. Get rid of it.

The rest of you, now that you understand the parts of an entry, can probably figure out what the above code does for yourselves. Hope so, anyway.

If you don’t use extended entries, you obviously don’t need the extended entry box on the page you type your posts into. Lucky you—you can get rid of it. Click the link at bottom that says “Customize the display of this page.” Choose the “Custom” option, and uncheck the box by “Extended Entry.” It’s quite safe to play around with the checkboxes, by the way; just click the link and check the checkbox to bring back any part of the page you got rid of and now want back.

20 Iunii 2003

Redone page up

Before I restart the Movable Type thread… I’ve redone the Movable Type default page markup, and the result is here. Would you folks kindly look it over, View Source, and let me know if you see any markup bugs, or spots where I’ve gotten the design wrong?

(I know about the silly-looking Latin dates. I’m debating whether to change them, because doing so will require a template change that five-nines of the world won’t need to make.)

The changes weren’t major, by the way. Ben and Mena do good work, by and large.