Remove Experts Exchange from Google Search Results

About once I week I do a google search, usually regarding the vagueries of C++ destructors or something, that produces a raft of hits from experts-exhange.com and the top of the results. Since experts-exchange.com hooking you with questions, then hitting you up for cash before it actually gives you the information you're looking for, these bogus results are a distraction from the task at hand.

Usually I just hold my nose and scroll by these results, but then a tweet reminded me of the Three Virtues. Why the hell isn't the computer doing this work for me?

So after teaching myself a little javascript and a little xpath (again), I rolled a greasemonkey user script to remove these pesky obstacles to real information.

The script can be found here. You will need to install greasemonkey (and firefox) before you can use it.

Todo:

  • UI for adding domains to remove from the searches.
  • Support other search engines.

Sun, 30 Aug 2009 10:32

Comments:

Quick and Dirty Text-to-Speech in Python

I looked around for some python bindings for Festival, but neither pyfestival nor pyfest seem to be maintained. I was too tired to use swig or similar to wrap the C++ library. Instead, I wrote a thin shim between python and the Scheme-based command interpreter. (You must have festival installed.)


import os

BIN="/usr/bin/festival"

class Festival(object):
    def __init__(self):
        self.p = os.popen("%s --pipe" % BIN, "w")

    def eval(self, scm):
        self.p.write(scm + "\n")
        self.p.flush()

    def say(self, text):
        text = text.replace('"', '')
        self.eval('(SayText "%s")' % str(text))

Import the module, instantiate a Festival object, and call its "say" method to create utterances.


>>> import festival
>>> tts = festival.Festival()
>>> tts.say("Hello, world.")

It's quick. It's dirty. It works.

Sat, 20 Jun 2009 23:25

Comments:

Building cURLpp 0.7.2

I want to use cURLpp for a project I've been working on, but there's no package for Ubuntu.

I downloaded the tarball, but discovered that example 18 would not compile. It would fail with the following error:


example18.cpp: In function ‘int main(int, char**)’:
example18.cpp:85: error: ‘BoostWriteFunction’ is not a member of ‘cURLpp::Options’
example18.cpp:85: error: ‘test’ was not declared in this scope
example18.cpp:85: error: expected type-specifier
example18.cpp:85: error: expected `;'
make[1]: *** [example18.o] Error 1

I discovered that usage of the Boost libraries was an option, and wasn't being tested for in example 18. boost::bind is one of the tools I frequently use in C++, the Boost support seems less optional to me, so I cleaned and ran configure --with-boost. Still, I got the error.

It became clear that the configure-generated config.h was not being included. This patch fixed that, and fixed the compile:


--- ../../curlpp-0.7.2-orig/curlpp/global.h     2007-09-22 09:31:10.000000000 -0500
+++ global.h    2009-04-05 10:57:58.000000000 -0500
@@ -26,6 +26,8 @@
 
 #ifndef HAVE_CONFIG_H
 #include "config.win32.h"
+#else
+#include "config.h"
 #endif
 
 #endif

Sun, 05 Apr 2009 11:02

Comments:

Email vs. Feeds

A little plug here for my friend Jeff. He's been working on an email utility called mailpie, a set of command-line programs providing full-text indexing and searching of large email archives. I'm not using it yet, because I've been too lazy to set it up, but it looks pretty cool.

Lately, Jeff and I were discussing the limitations of our approaches to reading our feeds. During a fit of outsourcing, I settled on Google Reader. Jeff continues to prefer to make it tough for advertisers to mine his data, opting for Planet. Like any software, both have minor issues.

I like Google Reader's keyboardable interface. That makes old Unix-heads like me happy. I don't like that old, unread entries don't quietly slip away. There's a lot of pressure from the line on the navigation pane that says, Liberal Rubbish (1000+). It makes me feel bad for not keeping up, and makes reading my feeds seem like work. Probably this could be solved with a Mark as read after some period of time setting, but that currently doesn't exist. I'd like them to expire based on the time-to-live specified by the feed itself. Feeds are by their nature ephemeral, and keeping them around and all up in my fries misses the point. When I look at my feeds, I want to see what's new, not the history of the website back to the beginning of time.

I also feel a little dirty using Google Reader because I know that all my clicks, choice of subscriptions, and entry keywords are being inspected and dissected by Google's army of virtual demographers and adaptive hindbrain delivery scripts, so my brain can be picked apart and sold to advertisers. Nevermind that my browser provides anti-marketing countermeasures; knowing this is going on is irksome.

Yes, blah blah business model. Yes, blah don't be evil blah. Yes, I know my data is not linked back to me individually. I know, I know. I love the power the Google gives me, and I take advantage of it. It just squicks me out sometimes.

Bonus points for the Google's brilliant, intutive, usable AJAX interface, though.

My primary reason for not choosing Planet is essentially a bug: the feeds are cached, and the caches are never cleaned-up, growing to fill the disk space they have available. Lacking the desire to worry about solving this problem, along with going through an offloading phase, during which I started using flickr seriously, along with Google Calendar and Reader, steered me away from Planet.

...So Jeff and I were talking about feeds and readers the other day. Because we both choose mutt as our email clients, with appreciation for its power and flexibility, Jeff said, I should write something like mailpie for feeds. It would put together mbox mailboxes, readable from mutt. As I've been working on the next generation of my feed sidebar-plugins for pyblosxom, I've been thinking about that idea. Here are a few of the things that have crossed my mind.

Pros:

  • Mutt has a well-known and convenient user interface.
  • Like email messages, feed items are discrete chunks.
  • There are no distracting bells and whistles in mutt. Just cool, comfortable text.

Cons:

  • Feed items often have embedded links the reader would like to follow. Mutt does not support links directly (or any rendering of embedded html, for that matter). The user would have to contrive to run a text-based browser to display each feed entry, or strip the formatting entirely and devise a way to display links so the user could take advantage of any terminal-based utilities for following a link.
  • Feed items, while not dependent on formatting, may be multimedia. YouTube videos do not fit well in mutt. You'd have to fire up a browser to see them. (Or embedded pictures or podcasts or...)
  • Unlike email messages, feed items are typically not threaded. One of the major benefits of mutt is irrelevant.
  • Barring some kind of intelligent downloading scheme, feeds that are not full text are inconvenient to read if you're pretending they're something like email messages. Of course, they're inconvenient to read, whatever reader is used. It’s not exactly a pot of gold at the end of a rainbow, but Lincoln Mayor Chris Beutler is considering dipping into ... (5 comments) Thanks, journalstar.com.
  • Mailpie's strength lay in its full text search of an archive of email messages. I don't want an archive of feed items. The web is essentially that, using Google as the index.

The jury's still out. The idea of a text-mode, minimally formatted feed reader is attractive. (Any of you whippersnappers remember Gopher?). The question is how to present the potentially rich set of data with links usefully. Maybe its just a matter of providing a properly configured mutt...

Sun, 15 Jun 2008 11:51

Comments:

Tagging Old Entries

As part of the current upgrade effort, I decided it was time to go through and tag older blog entries, the ones written before the introduction of tags. These entries were categorized, and the filesystem was used to manage the categorization. Each directory represented a category, and each subdirectory a subcategory. The directories were given descriptive names, suitable for use with the pycategories plugin for pyblosxom.

An obvious approach to automating the retagging of these old entries was to use the category hierarchy itself to provide the tags. I wrote a python program to walk the directory tree, and add tags to the old entries.

I chose to ignore the general category, though, because as a tag general doesn't provide much information. I think I might try a more sophisticated approach to those entries, analyzing the content of the entry to choose tags. I haven't worked out all the details, yet, but I'm considering building a map of tagged entries, based on the frequency of certain words or phrases that appear in them, then applying that map to a histogram of the entries currently categorized as general.

Tue, 26 Feb 2008 09:21

Comments:

 Page   of 3