Tagging Old Entries

As part of the current upgrade effort, I decided it was time to go through and tag older blog entries, the ones written before the introduction of tags. These entries were categorized, and the filesystem was used to manage the categorization. Each directory represented a category, and each subdirectory a subcategory. The directories were given descriptive names, suitable for use with the pycategories plugin for pyblosxom.

An obvious approach to automating the retagging of these old entries was to use the category hierarchy itself to provide the tags. I wrote a python program to walk the directory tree, and add tags to the old entries.

I chose to ignore the general category, though, because as a tag general doesn't provide much information. I think I might try a more sophisticated approach to those entries, analyzing the content of the entry to choose tags. I haven't worked out all the details, yet, but I'm considering building a map of tagged entries, based on the frequency of certain words or phrases that appear in them, then applying that map to a histogram of the entries currently categorized as general.

Tue, 26 Feb 2008 09:21

Comments: 0

Reader Comments

Add a Comment

  • Only the name field is mandatory. It's hard to keep track of a conversations where everyone is Anonymous.
  • If you choose to leave an email address, it will not be shown on the page.
  • A limited subset of HTML is allowed in your comment.

Name

Email

URL


Comment: (Limited HTML allowed.)