I have a love hate relationship with Twitter. As a user I see the benefits of Twitter, when looking at it without the spam, duplicates and senseless tweets e.g. through jetwick. But as a developer the Twitter API is very ‘heuristic’ and handwaving in a lot areas and makes it complicated to use. I would have been lost without the nice twitter4j project, so thanks to the author!
Now let me give you some examples of
Strange things of the Twitter API
- The since id attribute is not supported when paginating in the search API:
“The since_id parameter will be removed from the next_page element as it is not supported for pagination. If since_id is removed a warning will be added to alert you.”
So you need to create your own pagination when you do not want to get already visited tweets via search API - Search API returns matches in URLs. This is in nearly all cases not useful. Especially for terms like ‘twitter’ or ‘google’ where the search API returns confusing tweets containing URLs search.twitter.com or google.com. But marketing companies need to search URLs and also the tweet button also relies on that ‘feature’, why not disable that and enable ‘link:http://any-link.here’ ? And it would be more useful to match against the title of the website like jetwick it does, but that’s another topic.
- Search API does NOT return complete results compared to streaming API. I.e. results from streaming API contains all tweets with the specified keywords (without tweets via the URL bug I mentioned in the previous point). But the search API in contrast can leave out ‘spam’ tweets. I’m unsure if those tweets has to be really low quality or whatever. I guess this is more a technically issue with the search API that it leaves out some tweets the streaming has.
- REST API allows one to get only ~3200 old tweets from one user and 800 tweets from your friends (i.e. your homeline).
- Huge amount of different API limits:
- 350 requests per hour and user for the REST API
- Searches are restricted to IP (unknown number much higher than the 350 requests per hour)
- Only 2 filter streams are allowed – this is restricted to the IP. And only 200 keywords are possible per stream! But filter streams allow only approx. 50 tweets/s even if only a few keywords are used. (Then those keywords are high frequent)
- Search API allows searches into history, but how long depends on the frequency of the term. I know this is logically for every real time inverted index of this size, but should be better documented.
Regarding API Terms
Of course Twitter has API terms. This is necessary and nice to prevent the users from spam sites etc.
But there is also a display style guideline, which I had ‘fun’ the last weekend. Where I was asked e.g. to make the hashtag links of jetwick according to the display guideline. This is annoying. Now I need to pop up a dialog instead of directly triggering a search on jetwick – hey, it is a search engine! But twitter has to make money. That is ok. But I would like to have an exception for free or open source projects. No chance 😦 … here is my email conversation regarding the minor API term violation:
Dear XY, ok, I won't provide an API to others. Thanks for the clarification. I've got a further question. Are the display guidelines a requirement to be aligned with the API terms of use and to continue running Jetwick? (I shutted it down to not being evil) In the terms I can read as the first principle: "Don't surprise users" which is very important for me and it would disturb the user experience if a hashtag click (or a click on '@user') in a tweet would result in a pop up to twitter search or something and not simply trigger a search on jetwick. Please do not understand me wrong, I have already several links back to twitter: the date links to the tweet on twitter, the retweet and reply links to twitter and finally the user links back to twitter. Jetwick is a complete read only service (see my API access), so I would be stupid if I hadn't links back to twitter, which actually allows my users to share noisefree information via twitter. Finally: If the layout guides are a requirement, would you make an exception for Jetwick regarding the hashtag and @user links within a tweet? Many companies make exceptions when it comes to open source projects such as Jetbrains (IDEA), Yourkit (Profiler), Attlassian (Confluence), ... what about Twitter? Kind Regards, Peter.
The answer from twitter is crystal clear that Twitter does not provide API term exceptions to open source projects like other companies does. It also indicates that the API guys have a bit too much to do as the support does not really answer my question and neither understands what github is nor what jetwick means:
Hey Peter, Thanks for following up. The API Terms of Service, as an overriding document, do require you to adhere to these display guidelines -- in the same "Don't Surprise Users" section you referenced. I recommend adding links of your own, such as "#github on Jetwick" that surface these results. Again, I'm sorry for the inconvenience this has caused, and let me know if you have any other questions. Regards, XY
A second important thing
you’ll otherwise miss is that you are not allowed to offer an API to other people. Even if your project is open source! Here the email:
“Returning Twitter data, like tweets, through an API of your own is not allowed, neither for commercial services nor independent or open-source services. We are not looking for partners to formally extend new APIs as you request.”
Conclusion
So, keep this all in mind when you start to build a system using or even relying on the Twitter API. I hope this post clarifies the mystics of the Twitter API a bit! If you have encountered similar issues: feel free to comment 🙂 !