I'm a software engineer at Google working on ads. My general interests include large-scale nlp (randomization and streaming algorithms in particular), machine learning, information retrieval, and social media.
In December 2012 I've defended my PhD thesis titled "Real-time event detection in massive streams". As of January 2013 I am working in Google Zurich.
Previously, I was involved in the CROSS project which aims at detecting events from multiple streams (e.g., news, Twitter, Wikipedia).
If you are interested in detecting events in Twitter, you might find our Twitter FSD corpus a useful resource for measuring the performance of your system.
We are unfortunately unable to continue distributing our Twitter dataset mentioned in the paper The Edinburgh Twitter corpus, due to a request by Twitter. Please do not contact me asking for the dataset as I cannot give it to you. Consider instead downloading the new Twitter FSD corpus.