Tagging Old Entries
As part of the current upgrade effort, I decided it was time to go through and tag older blog entries, the ones written before the introduction of tags. These entries were categorized, and the filesystem was used to manage the categorization. Each directory represented a category, and each subdirectory a subcategory. The directories were given descriptive names, suitable for use with the pycategories plugin for pyblosxom.
An obvious approach to automating the retagging of these old entries was to use the category hierarchy itself to provide the tags. I wrote a python program to walk the directory tree, and add tags to the old entries.
I chose to ignore the general
category, though, because as a
tag general
doesn't provide much information. I think I might try
a more sophisticated approach to those entries, analyzing the content of
the entry to choose tags. I haven't worked out all the details, yet, but
I'm considering building a map of tagged entries, based on the frequency
of certain words or phrases that appear in them, then applying that map
to a histogram of the entries currently categorized as
general
.
Comments: 0