Jun 26, 2016

Apache Zeppelin provides a Web-UI where you can iteratively build spark scripts in Scala, Python, etc. (It also provides autocomplete support), run Sparkql queries against Hive or other store and visualize the results from the query or spark dataframes. This is somewhat akin to what Ipython notebooks do for python. Spark developers know that...

Posted on Sunday, June 26, 2016 by Unknown

Apache spark has an advanced DAG execution engine and supports in memory computation. In memory computation combined with DAG execution leads to a far better performance than running map reduce jobs. In this post, I will show an example of using Linear regression with Apache Spark. The dataset is NYC-Yellow taxi dataset for...

Posted on Sunday, June 26, 2016 by Unknown

Google recently has deprecated the Google+ Sign in and process of obtaining oauth access tokens viaGoogleAuthUtil.getToken API. Now, they reccomend a single entry point via new Google Sign-In API. The major reasons for doing so are 1. It enhances user experience and 2. It improves security, more here. Also starting with android...

Posted on Sunday, June 26, 2016 by Unknown

Hive or Impala ? Hive and Impala both support SQL operation, but the performance of Impala is far superior than that ofHive. Although now with Spark SQL engine and use of HiveContext the performance of hive queries is also significantly fast, impala still has a better performance. The reason that impala has better performance is that it already has daemons running on the worker nodes and thus it avoids the overhead that is incurred during the creation of map and reduce jobs. The query that I will mention later ran almost 10X...

Posted on Sunday, June 26, 2016 by Unknown

The Idea Java 8 introduced functional programming support, this is a powerful feature which was missing from earlier versions. One of the benefits of functional programming is that it can be used to implement decorator pattern easily. One common requirement is to implement some kind of rate limiting for web services. Now, ideally you would want separation of concerns between the actual business logic and rate limitation logic. With Java 8, we can use function references to implement this separation of concerns and implement the decorator...

Posted on Sunday, June 26, 2016 by Unknown

You might run into a scenario where you might require conditional authentication with Retrofit 2.0. This post provides an example of integration with the Lyft API. In case of the Lyft API, first we need to authenticate with and query the oauth/token endpoint to obtain the OAUTH token, and then use thisaccessToken in other service calls. Also, such access tokens have an expiry time(1 hour), so ideally there should be a mechanism to handle this scenario. One lazy (tends out to be perfect) solution is to use interceptors...

Posted on Sunday, June 26, 2016 by Unknown

Mar 11, 2016

Well blogger does not have support for latex and the windows live writer is being redeveloped. So, in the meanwhile, I have written a few posts on the pelican blog and thought, I might as well link to them here. Why you should prefer to use the square root of Gini Index: This post examines the advantages of using the Gini Index as the criteria for building decision trees. http://orastack.com/why-you-should-use-square-root-of-gini-index.html Do tweets have predictive power. This post examines whether tweets have an effect on opening weekend revenue...

Posted on Friday, March 11, 2016 by Unknown