Search any Twitter Account

There are a lot of services offering the same, even offering archiving but they often need registration.

Now with jetwick it is easy to search any account you like . E.g. try my account. To do the same for your account go to Jetwick, click ‘login’, allow jetwick access to your account (you can revoke it at any time and we won’t misuse it or even post tweets etc) and then click “grab tweets”.Then you will see something like

After this procedure you can search the whole history of the tweets easily.

Why do you want to grab tweets from other users? With that you can easily see about which topic a user tweets, on the right side. Again see “Words related to your query” of my account:

PS: Jetwick is now free software … you can host your own and play around!

Algorithm against Twitter Spam

In jetwick we only want to show relevant tweets for a search. No noise, no spam.

So first problem is solvable when we the user sorts by retweets, filters out by a specific criteria of his choice or when he refines its search: adds more specific terms.

But how can we get rid of spam at twitter? First, what is spam at twitter?

Several years ago Paul Graham gave a nice definition of email spam: ‘unsolicited and automated’. With this definition we can identify 4 situations for twitter spam:

  1. unsolicited tweets which appear in your timeline (e.g. the new ads or even retweets of your followers could be spam too ;-))
  2. unsolicited tweets in your searches not relevant to the search (e.g. spammers simply add hashtags from the trending topics to increase popularity of their tweets)
  3. unsolicited automated tweets which mentions you (and not only you …)
  4. unsolicited direct messages
  5. even fast following and unfollowing can be spam, because most users have enabled a notification
  6. a very cool spamming technic: some spammers add a mini advertisment to a nice statement of you or your product. If you don’t read the tweet carefully or follow the links, this could (mis)lead you to retweet it and make indirect advertisment for them.

With the following algorithm we can try to solve point 2 and 3.

  1. For a new tweet T get the user U
  2. For U get (some) additional tweets and store to a list L
  3. Go through L and compare the content with T.
  4. Use the Jaccard index for comparison (additionally compare URL and title of the linked webpage)
  5. If Jaccard index is too high or URLs are identical then decrease quality. Repeat with 3. if there are still tweets in L otherwise go to 6.
  6. Mark T as spam if quality is under a certain quality limit

I applied this algorithm on the data of jetwick and grabbed the twitter users with a lot of spammy tweets (the number in brackets) for the last week:

careerfan (968) -> bot
teamnapalm (587) -> spam
endy_pink (481) -> spam
manypro (312) -> bot
i_want_napalm (294) -> spam
gutlazaro (216) -> spam (canceled from twitter)
appstoreadam (210) -> bot+spam
livralivro (207) -> bot
sigaajesus (195) -> spammy
thakiddunncase (195) -> ups, no spam
lauberte_ (167) -> spam
2dvdlsnorjeuvou (158) -> spam
josialemrossi (152) -> spam (canceled from twitter)
malucomunic (139) -> spam (Was this account hacked!!?? Because no one of the followers is a spammer!)

The idea is simple, but the results looks promising. There could be a lot of use cases. E.g. twitter clients like hootsuite could add tweet quality to its available filters … the user specific klout score is not useful, because even less popular tweeters can create great tweets :-)

Let me know what you think!

Fun and some important Dev-Tweets of the last week, 11th October

Let us start with the fun tweets. Ok, this week a lot Java bashing tweets, but I like them!

  • maven 3 is out. It now lets you download the internet even faster than before.

  • The world needs to stop hyping “html5” as though it’s markup alone that builds rich web apps. It makes JavaScript angry.

  • “JavaScript is the only language that people feel they dont need to learn before they start using it.” – Crockford

  • Little known fact: JavaScript also has an isNaaN() function for when you aren’t sure if you’re working with Indian food

  • I have seen an app with SQL code in the *views*, looked like a java coder was given a php book and told to make a rails app.

  • Matz on #ruby speed: Build your website in Ruby until you have more traffic than Twitter, then use your riches to hire Java programmers.

  • OH: “Java is just a DSL for turning XML into core dumps.”

  • judging Clojure/Lisp by its parens is like judging Java by its classpath



And last but not least some intersting infos:



Of course this list isn’t complete! So, watch out for more fun and infos at twitter and contact me or comment if you want to add it here or for the next week.

Important Java Tweets of the last week, 20th September

My last summary cries for another week: here you have the latest news and fun tweets.

Fun

News

Important Java Tweets of the last Week

First, a funny news from twitter.com/geekgay: Feliz Dia Do Programador! http://pt.wikipedia.org/wiki/Dia_do_Programador. Look at this wikipedia link to understand the tweet.

News

Fun

  • twitter.com/rhauch
    @al3x Maybe the JDK 7 release date keeps changing because java.util.Date is mutable?
  • twitter.com/al3x
    Maybe it’s hard to predict when JDK 7 will ship because they’re using java.util.Date in their automated prediction system?
  • twitter.com/jamesiry
    To sum up Java 7’s plan: it won’t have lambdas unless it will in which case it might not.
  • twitter.com/gilad_bracha
    If you need to use Java, you should be using Scala. If you don’t need to use Java – then you have options.

Now some funny tweets from today:

  • twitter.com/kumpera
    I chatted with a friend about Java other day. Oracle charged me $10 for that.
  • twitter.com/psnively
    Struts2 + Velocity = the type safety of Rails + the verbosity of Java.
  • twitter.com/angie_design
    va de nuevo para los programadores:¿Quien fue el primer programador de la historia? Pedro Picapiedra por que dominaba el Java Daba Duu

Subjective selected via ‘many retweets‘ at jetwick/?q=java. There is also an option to find the origin of any query, which I tried really hard for every tweet. So now I hope I always found the original author of the tweet!