Homebrewing Knowledge-Base from HBD Archives?

Uh-oh!

Started thinking again. This time about a way to repurpose messages on the HomeBrew Digest into a kind of database of brewing knowledge. I can just see it. It’d be ah-some!

Anybody knows how to transform email messages from well-structured digests into database entries? Seems to me that it should be a trivial task, especially for someone well-versed in Perl and/or PHP. But what do I know?
That venerable HBD mailing-list contains a wealth of information about pretty much every single dimension of beer homebrewing. For a large number of reasons, content from the HBD.org site turns up quite often in Web searches for brewing terms.

One issue with the HBD, though, is that it’s a bit hard to search. There used to be a custom-built search feature on the site but we now need to rely on Google and AltaVista. This wouldn’t be too much of an issue if not for the fact that those engines search complete digests instead of individual messages. So the co-occurrence of two terms in the same digest can be due to two messages on completely different subjects.

Another issue with the HBD (as with many other mailing-lists) is the relatively high redundancy in message content. Some topics came cyclically on the mailing-list and though some kind souls were gracious enough to respond to the same queries over and over again, the mailing-list often looks like an outlet for FAQs. Among HBD “perennials” (or cyclical topics) are discussions of the effects of HSA (hot-side aeration), decoction mashing, and batch sparging, to name but a few technical issues.

Unfortunately, it looks like the HBD might need to be retired at some point in the not-so-distant future, at least for lack of sponsorship. Also, Pat Babcock, the digest’s “janitor,” recently asked for mirror space and announced the retrieval of some of the older digests (from the late 1980s).

Of course, there are lots of other brewing resources out there. So many, in fact, that it can be overwhelming to the newbie brewer. One impact of having so much information so easily available about homebrewing (and commercial brewing, for that matter) is a “democratization of beer knowledge.” Contrary to brewing guilds of medieval times, brew groups are open and free. Yet a side-effect of this is that there isn’t a centralized authority to prevent disinformation. Also, because the accumulated knowledge is difficult to peruse, people tend to “reinvent the wheel.”

In Internet terms, the HBD is the closest equivalent to a historical source. Few other mailing-lists have been running continuously since 1986.

Luckily, all the digests since October 1988 are available as HTML files. And the digest format has remained almost unchanged since that time.
All of the content is in plain ASCII. Messages never exceed a certain
length. IIRC, line length is also controlled. And HTML was officially
not admitted. Apparently, some messages did contain a bit of HTML
code
, but that shouldn’t be an issue.

Here’s what I imagine could be done:

  1. “Burst” out digests into individual messages (with each message containing digest information)
  2. Put all the individual messages (350MB worth) into a Content Management System
  3. Host the archived messages in the form of a knowledge-base
  4. Process those entries for things like absolute links and line breaks
  5. Collect messages in threads
  6. Add relevant del.icio.us-like tags and slashdot- or digg-like ratings
  7. Use this knowledge-base for wiki-like collaborative editing
  8. Assess some key issues to be taken up by brewing communities
  9. Add to the brewing knowledge-base
  10. Build profiles for major contributors and major groups

Because I couldn’t help it, I started writing down some potential tags I might use to label messages on the HBD. It could be part “folksonomy,” part taxonomy. For one thing, it’d be useful to distinguish messages based on “type” (general queries about a brewing technique vs. recipe posted after a competition) since many of the same terms and tags would be found in radically different messages.

Advertisement

To a Newbie Blogger

 

Lisamm, who just commented on two of my own blog entries, is asking about blogging:How to Increase Your Blog Hits « Books on the Brain

Blogging is new to me. I haven’t learned the lingo. I don’t know the etiquette. I don’t know what a meme is (Do I want one? Do I need one? Is it fattening?) What is the deal with bloggers giving other bloggers awards? No one has challenged me, or tagged me, or whatever it is people do. I’m totally winging it.Someone told me recently that I could increase my blog hits with an intriguing title on my entries. Hmmmm. This one might get noticed. I guess we’ll see how it works.Speaking of blog hits, I seem to be getting a lot (I guess). What is a lot? How many do other people get?What is up with my obsessive desire to check my stats? How I love to see the blog stat graph go up, up, up. Is this normal? Why do I care? Do other bloggers do that? Will the obsession wear off soon????Experienced bloggers, I would love to hear from you. I’m hoping my insanity is only temporary.

My answers:Simply put, meme is an idea which propagates itself. Think “viral marketing.” Among bloggers, it often refers to a kind of tag-like game by which one blogger asks other blogger to post about something (say, eight random things about yourself) and to do the same with other people. It’s a fun (and non-fattening) way to connect with fellow bloggers.Awards are a bit similar. Bloggers tend to enjoy kudos, praises, marks of recognition, etc. Some awards (the “thinking blog” one is an example) are given as a way to connect bloggers who perceive to be of the same calibre, in one dimension or another.Intriguing titles do help increase traffic and bloggers are often (semi-secretly) proud of their clever titles. In this sense, we’re no different from journalists! An issue with titles, though, is that the type of traffic it increases might be the type of headline-reading which does relatively little good to a blog. My best example is my Facebook Celebs and Fakes post which is getting good traffic, apparently for the wrong reasons… ;-)As anyone can guess, “a lot” of blog hits is a really relative measure. Some bloggers get thousands of hits every single day, others get a few hundreds a month. From November, 2006 to February, 2007, I was getting an average of about 180 hits a day (with a peak at 307 hits in a single day). Since then, I’ve been down to about 100 to 130 hits a day. I still consider this to be a lot of hits, especially when I compare it to the number of comments I get. I also notice (by looking at the WordPress.com statistics page) that many of the hits I get come from Web searches about terms for which my entries aren’t that relevant (cf. “celebs and fakes” above).Many bloggers are obsessed by stats even if they know that they don’t tell much of a story. Bloggers often discuss measurement tools, especially if their blogging has a financial impact. Personally, I do check my blog stats regularly but I don’t really care about the numbers. It’s more of a way to observe tendencies, to learn more about effects of blogging, and as a way to assess differences between blog entries. Besides, the way WordPress.com works, the stats page is where incoming links are displayed. Now, having said all this, it’s probably true that I get a pleasant feeling when I see my numbers going up and I probably was slightly disappointed when they dropped. But those feelings are really transient.Speaking of graphs going up. It seems to be a common effect among bloggers that a site’s traffic will increase pretty regularly, regardless of what the blogger does. At least, that’s what I figured until my March, 2007 drop. I’m still a bit puzzled about this, actually.As for insanity, I think it comes with the territory.Main point of blogging is: blog the way you want to blog. Have fun, experiment with things, don’t take yourself too seriously. Blogging is just a system for making content available publicly. There aren’t set rules about blogging. In other words, don’t listen to any piece of advice.Now, a few words of advice. ;-)It’s probably a good idea not to make too much of stats. They’re fun to look at but they don’t say much about blogs. A blog with a small but dynamic reader-base is often better than a blog getting a lot of hits. Technorati and other measures of influence are similarly misleading as blogging isn’t “about that,” for most people. Yes, there are “A-list bloggers” out there (blogging celebrities, very influential bloggers). But starting a blog to become an A-list blogger is like learning a new language to become a best-selling author in that language.Use the bookmarklet in your blogging system. I can’t paste the WordPress.com one because WordPress.com doesn’t accept JavaScript in blog entries (for security reasons, allegedly), but it’s the one at the bottom of the blog writing page. I personally find those bookmarklets to be among the best features available anywhere. When you see a web page you want to blog about, select a piece of text and click on the bookmarklet from your bookmark bar. You then have a new blog entry with the title of the page, a link to that page, and the portion of text you selected. This part is so ingrained in my blogging habits that I often look for a page to start an entry from instead of creating a blank entry. That part may sound silly but it makes sense in my workflow.Speaking of workflow, it’s probably a good idea to take on tabbed browsing if you haven’t done so already. One blogging use of browser tabs is as placeholders for would-be blog entries. Kind of like a “to do” list for blogging. Notice something potentially bloggable? Keep that tab open so you can come back to it when you have time. I know other bloggers are doing this too because some talk about the number of tabs remaining in their browsers.Which leads me to one of the main hazards of blogging: you end up thinking about all the things you could say and you never find time to do much of it. As a general concept, “Information Overload” refers to something similar. Hence the need to adopt a blogging strategy. Personally, I haven’t find the best way to do it yet but I am decreasing my “blogload,” somehow. In fact, blogging itself does make me more efficient as it provides a central place for putting things I would otherwise repeat. (Though I end up with something like seven blogs…) So, my advice here would be something like: think about ways to control the number of things you want to blog about.One way to think about it is that, with “big issues,” other people have certainly blogged about them. Though there’s something intimidating about this, it also means that you may not need to blog about something if it’s likely to become common knowledge soon.Many bloggers seem to crave the latest thing. They want to “scoop” a story, be the first to blog it. Though it pains me to do so, I must say that I’m probably as guilty of this as the next blogger. Problem with this is that it requires a lot of effort to keep up with everything which is happening. And while being the first to blog about something might be the best way to get incredible traffic, the outcome may not be worth the effort.I try to take a longer view on things. If I can, I like to bring multiple items together in the same blog entry. Kind of like a “roundup,” if I can. It’s also a lot of effort, but it’s less likely to make you crazy than the quest for the first post.This all reminds me of a blog post I read about types of blog posts. IIRC, it was a presentation file and it had some things to say about the effectiveness of those posts. Though this kind of thinking makes a lot of sense for media-oriented bloggers, there’s a lot more to blogging than trying to build readership.Which leads me to the more social aspects of blogging. In the past several months, my blogging activities have probably decreased as my Facebook (Fb) activities increased. While Fb and blogging are quite different from one another, connections are quite clear. Posting notes or other items on Fb is almost exactly like a simplified form of blogging.There are disadvantages to posting things on Facebook.com, by comparison with blogging. There aren’t (AFAICT) RSS feeds for Fb Notes and Posted Items. Only your Fb friends can see (and comment on) things you post on Facebook. There isn’t a WYSIWYG editor for Fb notes (though you can use basic HTML). Fb notes don’t have categories or tags (though you can tag Fb friends). And you don’t get neat stats.But there are nice things about Fb notes and posted items. Since those items are seen by people who already know you, it’s often easier to get feedback through Facebook posted items than through a (public) blog. And because posted items are put on your Facebook profile, there’s a special connection between your items and your Facebook persona. Not to mention that blog entries can be posted directly on Facebook, which kills two birds with one stone.To get back to social dimensions of blogging… No matter how much bloggers like to talk about blogging as a social form of writing, it tends to be one-to-many, not many-to-many. In fact, most people who leave comments on blog entries are bloggers themselves. Though blogging is very “democratic,” it’s not the most efficient community-building tool available online.Anyhoo…I do tend to ramble a lot. There’s a lesson about blogging, somewhere… ;-)