Here's some of the publicly available code I've worked on over the years, as well as some useful resources (data) I've collected.
The human eval data used in the paper Unsupervised joke generation from big data presented at ACL 2013. The data can be found here (there is a README file that should explain everything). This could be useful for anyone who wants to train a supervised system from human labeled jokes.
Twitter FSD corpus is a corpus of 50 million tweets along with event annotation. This is a useful resource for measuring the performance of an event detection system.