I have a love hate relationship with Twitter. As a user I see the benefits of Twitter, when looking at it without the spam, duplicates and senseless tweets e.g. through jetwick. But as a developer the Twitter API is very ‘heuristic’ and handwaving in a lot areas and makes it complicated to use. I would have been lost without the nice twitter4j project, so thanks to the author!
Now let me give you some examples of
Strange things of the Twitter API
- The since id attribute is not supported when paginating in the search API:
“The since_id parameter will be removed from the next_page element as it is not supported for pagination. If since_id is removed a warning will be added to alert you.”
So you need to create your own pagination when you do not want to get already visited tweets via search API
- Search API returns matches in URLs. This is in nearly all cases not useful. Especially for terms like ‘twitter’ or ‘google’ where the search API returns confusing tweets containing URLs search.twitter.com or google.com. But marketing companies need to search URLs and also the tweet button also relies on that ‘feature’, why not disable that and enable ‘link:http://any-link.here’ ? And it would be more useful to match against the title of the website like jetwick it does, but that’s another topic.
- Search API does NOT return complete results compared to streaming API. I.e. results from streaming API contains all tweets with the specified keywords (without tweets via the URL bug I mentioned in the previous point). But the search API in contrast can leave out ‘spam’ tweets. I’m unsure if those tweets has to be really low quality or whatever. I guess this is more a technically issue with the search API that it leaves out some tweets the streaming has.
- REST API allows one to get only ~3200 old tweets from one user and 800 tweets from your friends (i.e. your homeline).
- Huge amount of different API limits:
- 350 requests per hour and user for the REST API
- Searches are restricted to IP (unknown number much higher than the 350 requests per hour)
- Only 2 filter streams are allowed – this is restricted to the IP. And only 200 keywords are possible per stream! But filter streams allow only approx. 50 tweets/s even if only a few keywords are used. (Then those keywords are high frequent)
- Search API allows searches into history, but how long depends on the frequency of the term. I know this is logically for every real time inverted index of this size, but should be better documented.
Regarding API Terms
Of course Twitter has API terms. This is necessary and nice to prevent the users from spam sites etc.
But there is also a display style guideline, which I had ‘fun’ the last weekend. Where I was asked e.g. to make the hashtag links of jetwick according to the display guideline. This is annoying. Now I need to pop up a dialog instead of directly triggering a search on jetwick – hey, it is a search engine! But twitter has to make money. That is ok. But I would like to have an exception for free or open source projects. No chance 😦 … here is my email conversation regarding the minor API term violation:
The answer from twitter is crystal clear that Twitter does not provide API term exceptions to open source projects like other companies does. It also indicates that the API guys have a bit too much to do as the support does not really answer my question and neither understands what github is nor what jetwick means:
Hey Peter, Thanks for following up. The API Terms of Service, as an overriding document, do require you to adhere to these display guidelines -- in the same "Don't Surprise Users" section you referenced. I recommend adding links of your own, such as "#github on Jetwick" that surface these results. Again, I'm sorry for the inconvenience this has caused, and let me know if you have any other questions. Regards, XY
A second important thing
you’ll otherwise miss is that you are not allowed to offer an API to other people. Even if your project is open source! Here the email:
“Returning Twitter data, like tweets, through an API of your own is not allowed, neither for commercial services nor independent or open-source services. We are not looking for partners to formally extend new APIs as you request.”
So, keep this all in mind when you start to build a system using or even relying on the Twitter API. I hope this post clarifies the mystics of the Twitter API a bit! If you have encountered similar issues: feel free to comment 🙂 !