[Esip-discovery] Twitter as an aggregation pipeline

Thu Jul 19 12:14:42 EDT 2012

Hi Ruth,

The idea I proposed at the ESIP summer meeting Metadata Casting session was to use Twitter as one way for the "long tail" scientists to make their datasets known.  

The basic concept would be to choose a hash tag (e.g., #esipcasting), and set up a pipeline that checks the Twitter stream every so often (say, every 15 minutes) for all new Tweets using that hash tag.  Presumably, there will be a URL in that Tweet in addition to that hash tag, which will link to the new *cast.  The tweet might also contain short text about the cast for additional metadata (although, that would just be icing on the cake).  

You can then use your existing Nutch infrastructure to go and crawl that URL and decide what type of document the link contains.

If you're interested, there is a really cool Web workflow engine called IFTTT (if this then that) at http://ifttt.com/.  You can setup IFTTT to send you an email or post a file to Dropbox, or do various other tasks (check them out!).  Basically, you can set a "trigger" for IFTTT to detect Tweets using the #esipcasting hash tag, and email you the body of the tweet.  This would save you the trouble of interacting with the Twitter API, you just need to set up an automated mail client.

Cheers,
-Eric

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-discovery/attachments/20120719/4166ec50/attachment.html>