tag:blogger.com,1999:blog-73510840554633237612024-03-13T09:27:27.380+05:30Technical EssentialsJava, ADF, Android, Identity Management, Data Science, Machine Learning, Fusion Middleware, Linux, Counter Strike 1.6, BSD, Windows,
Programming, Search EnginesAnonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comBlogger165125tag:blogger.com,1999:blog-7351084055463323761.post-15105198597681566802018-12-29T05:55:00.000+05:302019-01-18T05:55:37.847+05:30Android Implementing Google Sign In<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" data-skip="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
As you all are aware that <tt class="docutils literal" style="box-sizing: border-box;">Google Plus</tt> is shutting down in <tt class="docutils literal" style="box-sizing: border-box;">March 2019</tt> and so are all its services. I have had a legacy android app on play store that was using the <tt class="docutils literal" style="box-sizing: border-box;">GoogleApiClient</tt> for authentication with <tt class="docutils literal" style="box-sizing: border-box;">Google Plus</tt> services, alas, I had to upgrade the application to use the new <tt class="docutils literal" style="box-sizing: border-box;">GoogleSignInClient</tt>. And, I am glad that I did so for the following reasons:</div>
<ul class="simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li style="box-sizing: border-box;"><tt class="docutils literal" style="box-sizing: border-box;">GoogleSignInClient</tt> API is based on <a class="reference external" href="https://developer.android.com/reference/com/google/android/play/core/tasks/Task" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">Task</a> Api</li>
<li ct-id="2" data-meaningful="true" data-skip="true" style="box-sizing: border-box;">It does not involve managing connection as with <tt class="docutils literal" style="box-sizing: border-box;">GoogleApiClient</tt> API's, so no callback hell and <a class="reference external" href="http://orastack.com/migrating-to-google-sign-in-with-android.html" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">boilerplate code</a> for managing connection state.</li>
<li ct-id="3" data-meaningful="true" style="box-sizing: border-box;">You can get other information such as user's first name, last name, and email, directly from the result.</li>
</ul>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
So how does it look.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
The <tt class="docutils literal" style="box-sizing: border-box;">initialization</tt> is pretty much the similar to the <a class="reference external" href="http://orastack.com/migrating-to-google-sign-in-with-android.html" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">earlier post</a>.</div>
<ul style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Initialize the <tt class="docutils literal" style="box-sizing: border-box;">GoogleSignInClient</tt> in <tt class="docutils literal" style="box-sizing: border-box;">onCreate</tt> method.</div>
</li>
<li style="box-sizing: border-box;"><div class="first" ct-id="4" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Initialize the <tt class="docutils literal" style="box-sizing: border-box;">GoogleSignInOptions</tt> with <tt class="docutils literal" style="box-sizing: border-box;">Profile</tt> scope, which gives you basic profile information as before. You can request for email via <tt class="docutils literal" style="box-sizing: border-box;">requestEmail</tt> on the builder.</div>
</li>
<li style="box-sizing: border-box;"><div class="first" ct-id="5" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
As before, you can request the authorization token to perform the request on behalf of the user from your backend server, this is available as <tt class="docutils literal" style="box-sizing: border-box;">server_auth_code</tt> on the response.</div>
</li>
<li style="box-sizing: border-box;"><div class="first" ct-id="6" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
The client verification token is available in <tt class="docutils literal" style="box-sizing: border-box;">client_token</tt> on the response.</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="nd" style="box-sizing: border-box; color: #66cccc;">@Override</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">protected</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">onCreate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Bundle</span> <span class="n" style="box-sizing: border-box;">savedInstanceState</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">mActivity</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">this</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="n" style="box-sizing: border-box;">GoogleSignInOptions</span> <span class="n" style="box-sizing: border-box;">gso</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">GoogleSignInOptions</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">Builder</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">GoogleSignInOptions</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">DEFAULT_SIGN_IN</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">requestIdToken</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">serverToken</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">requestServerAuthCode</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">serverToken</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span>
<span class="n" style="box-sizing: border-box;">requestEmail</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span>
<span class="n" style="box-sizing: border-box;">requestScopes</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">Scope</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Scopes</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">PROFILE</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">build</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">mGoogleSignInClient</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">GoogleSignIn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getClient</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">this</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">gso</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</li>
<li style="box-sizing: border-box;"><div class="first" ct-id="7" data-meaningful="true" data-skip="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
The fresh sign in method is mentioned below, note that you can attempt <tt class="docutils literal" style="box-sizing: border-box;">silentSignIn</tt> if user has already signed in your app before. This will be shown later.</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">signIn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">Intent</span> <span class="n" style="box-sizing: border-box;">intent</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">mGoogleSignInClient</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getSignInIntent</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">mIntentInProgress</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">mIntentInProgress</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">true</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="n" style="box-sizing: border-box;">showProgressDialog</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">startActivityForResult</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">intent</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">RC_SIGN_IN</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</li>
<li style="box-sizing: border-box;"><div class="first" ct-id="8" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Handling the result. We just check for the request code and call the <tt class="docutils literal" style="box-sizing: border-box;">Task<googlesigninaccount> getSignedInAccountFromIntent</googlesigninaccount></tt> method with the intent data.</div>
<div ct-id="9" data-meaningful="true" data-skip="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
After we obtain the task, we can just call the <tt class="docutils literal" style="box-sizing: border-box;">addOnCompleteListener</tt> method of the <a class="reference external" href="https://developer.android.com/reference/com/google/android/play/core/tasks/Task" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">Task</a> API, to check for success or failure and maybe retry on failure?.</div>
<ul style="box-sizing: border-box; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li style="box-sizing: border-box;"><div class="first" ct-id="10" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
In case of Success, we refresh the UI to show that the user has logged in and also store the <tt class="docutils literal" style="box-sizing: border-box;">client_token</tt>and <tt class="docutils literal" style="box-sizing: border-box;">server_auth_code</tt>. <tt class="docutils literal" style="box-sizing: border-box;">getProfileInfo</tt> method just extracts the relevant profile information from the user.</div>
</li>
<li style="box-sizing: border-box;"><div class="first" ct-id="11" data-meaningful="true" data-skip="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
In case of failure, which can be caused, if we attempted <tt class="docutils literal" style="box-sizing: border-box;">silentSignIn</tt>, and it failed with <tt class="docutils literal" style="box-sizing: border-box;">SIGN_IN_REQUIRED</tt>error code, we retry with fresh signin. Finally, If the request fails with SIGN_IN_FAILED, we cannot use that account for sign in and inform the user of the same.</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span> <span class="nd" style="box-sizing: border-box; color: #66cccc;">@Override</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">protected</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">onActivityResult</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">int</span> <span class="n" style="box-sizing: border-box;">requestCode</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">int</span> <span class="n" style="box-sizing: border-box;">responseCode</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span>
<span class="n" style="box-sizing: border-box;">Intent</span> <span class="n" style="box-sizing: border-box;">data</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">mIntentInProgress</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">requestCode</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">==</span> <span class="n" style="box-sizing: border-box;">RC_SIGN_IN</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">hideProgressDialog</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">Task</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">GoogleSignInAccount</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="n" style="box-sizing: border-box;">task</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span>
<span class="n" style="box-sizing: border-box;">GoogleSignIn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getSignedInAccountFromIntent</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">data</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="n" style="box-sizing: border-box;">handleSignInResult</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">task</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">handleSignInResult</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Task</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">GoogleSignInAccount</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="n" style="box-sizing: border-box;">task</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">final</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">boolean</span> <span class="n" style="box-sizing: border-box;">silent</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">task</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">addOnCompleteListener</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">OnCompleteListener</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">GoogleSignInAccount</span><span class="o" style="box-sizing: border-box; color: #66cccc;">>()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="nd" style="box-sizing: border-box; color: #66cccc;">@Override</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">onComplete</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="nd" style="box-sizing: border-box; color: #66cccc;">@NonNull</span> <span class="n" style="box-sizing: border-box;">Task</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">GoogleSignInAccount</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="n" style="box-sizing: border-box;">task</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">task</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">isSuccessful</span><span class="o" style="box-sizing: border-box; color: #66cccc;">())</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">GoogleSignInAccount</span> <span class="n" style="box-sizing: border-box;">result</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">task</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getResult</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">isSignedIn</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">true</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="n" style="box-sizing: border-box;">invalidateOptionsMenu</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">SharedPreferences</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">Editor</span> <span class="n" style="box-sizing: border-box;">editor</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">preferences</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">edit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">editor</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">putString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"client_token"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">result</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getIdToken</span><span class="o" style="box-sizing: border-box; color: #66cccc;">());</span>
<span class="n" style="box-sizing: border-box;">editor</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">putString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"server_auth_code"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">result</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getServerAuthCode</span><span class="o" style="box-sizing: border-box; color: #66cccc;">());</span>
<span class="n" style="box-sizing: border-box;">editor</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">apply</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">getProfileInfo</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">result</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">else</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">Exception</span> <span class="n" style="box-sizing: border-box;">e</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">task</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getException</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">e</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">instanceof</span> <span class="n" style="box-sizing: border-box;">ApiException</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">ApiException</span> <span class="n" style="box-sizing: border-box;">apiEx</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">ApiException</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">silent</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">&&</span> <span class="n" style="box-sizing: border-box;">apiEx</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getStatusCode</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">==</span> <span class="n" style="box-sizing: border-box;">GoogleSignInStatusCodes</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">SIGN_IN_REQUIRED</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">signIn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">apiEx</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getStatusCode</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">==</span> <span class="n" style="box-sizing: border-box;">GoogleSignInStatusCodes</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">SIGN_IN_FAILED</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">FeedReaderApplication</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">showSnackOrToast</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">findViewById</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">R</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">id</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">main_parent_view</span><span class="o" style="box-sizing: border-box; color: #66cccc;">),</span> <span class="n" style="box-sizing: border-box;">R</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">string</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">sign_in_failed</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span>
<span class="kc" style="box-sizing: border-box; color: #cc99cc;">true</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">});</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</li>
</ul>
</li>
<li style="box-sizing: border-box;"><div class="first" ct-id="12" data-meaningful="true" data-skip="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
<tt class="docutils literal" style="box-sizing: border-box;">Silent Sign In</tt>: If the user has already signed in earlier, and we have obtained the tokens, instead of starting the <tt class="docutils literal" style="box-sizing: border-box;">sign in</tt> flow, we can attempt <tt class="docutils literal" style="box-sizing: border-box;">silentSignIn</tt>. We already handle the failures in <tt class="docutils literal" style="box-sizing: border-box;">handleSignInResult</tt> method. In addition, <tt class="docutils literal" style="box-sizing: border-box;">silentSignIn</tt> should be our default and should be called in <tt class="docutils literal" style="box-sizing: border-box;">onResume</tt> method of activity. Before calling <tt class="docutils literal" style="box-sizing: border-box;">silentSignIn</tt>, we can check whether user is connected to the internet.</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">attemptSilentSignIn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">Task</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">GoogleSignInAccount</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="n" style="box-sizing: border-box;">task</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">mGoogleSignInClient</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">silentSignIn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">handleSignInResult</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">task</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">true</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">signInUsingNewAPI</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">ConnectionChecker</span> <span class="n" style="box-sizing: border-box;">checker</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">ConnectionChecker</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">FeedReaderApplication</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getAppContext</span><span class="o" style="box-sizing: border-box; color: #66cccc;">());</span>
<span class="kt" style="box-sizing: border-box; color: #ffcc66;">boolean</span> <span class="n" style="box-sizing: border-box;">isConnected</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">checker</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">isConnectedToInternet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">isSignedIn</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">&&</span> <span class="n" style="box-sizing: border-box;">isConnected</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">attemptSilentSignIn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
<tt class="docutils literal" style="box-sizing: border-box;">Signing Out</tt>:</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">signOutFromGoogle</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">mGoogleSignInClient</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">signOut</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">isSignedIn</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
<tt class="docutils literal" style="box-sizing: border-box;">Revoking Access</tt>:</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">revokeGoogleAccess</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">mGoogleSignInClient</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">revokeAccess</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">isSignedIn</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</li>
<li style="box-sizing: border-box;"><div class="first" ct-id="13" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
<tt class="docutils literal" style="box-sizing: border-box;">Finally</tt>, A helper method is shown below for extracting user profile information from <tt class="docutils literal" style="box-sizing: border-box;">GoogleSignInAccount</tt></div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">getProfileInfo</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">GoogleSignInAccount</span> <span class="n" style="box-sizing: border-box;">account</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">String</span> <span class="n" style="box-sizing: border-box;">personName</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">account</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getDisplayName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">String</span> <span class="n" style="box-sizing: border-box;">firstName</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">account</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getDisplayName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">String</span> <span class="n" style="box-sizing: border-box;">lastName</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">account</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getDisplayName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">Uri</span> <span class="n" style="box-sizing: border-box;">photoUrl</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">account</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getPhotoUrl</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">String</span> <span class="n" style="box-sizing: border-box;">email</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">account</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getEmail</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</li>
</ul>
<div class="section" id="conclusions" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Conclusions</h2>
<div ct-id="14" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
The <a class="reference external" href="https://developer.android.com/reference/com/google/android/play/core/tasks/Task" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">Task</a> API and <tt class="docutils literal" style="box-sizing: border-box;">GoogleSignInClient</tt> makes it a lot easier to manage sign in process and flow. You don't have to take my word for it, just look at the <a class="reference external" href="http://orastack.com/migrating-to-google-sign-in-with-android.html" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">earlier post</a>.</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-11956197344498001282018-12-21T05:53:00.000+05:302019-01-18T05:54:44.151+05:30Java WatchService<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
In this post, I will cover a tutorial that involves different moving pieces. It covers the following:</div>
<ul class="simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Java</span> <tt class="docutils literal" style="box-sizing: border-box;">WatchService</tt></li>
<li style="box-sizing: border-box;"><tt class="docutils literal" style="box-sizing: border-box;">Spring Boot</tt></li>
<li style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Initialization-on-demand holder idiom</span></li>
<li style="box-sizing: border-box;">Managing concurrency</li>
<li style="box-sizing: border-box;"><tt class="docutils literal" style="box-sizing: border-box;">RXJava</tt></li>
<li ct-id="2" data-meaningful="true" style="box-sizing: border-box;">Lombok (because why type more?)</li>
</ul>
<div ct-id="3" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
The example will expose a Spring Boot REST service that exposes csv file records from a directory. In addition, there is a <tt class="docutils literal" style="box-sizing: border-box;">WatchService</tt> that monitors the directory for changes, specifically only creation and removal of <tt class="docutils literal" style="box-sizing: border-box;">CSV</tt> files.</div>
<div class="section" id="let-s-start-with-the-pieces" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 data-skip="true" style="box-sizing: border-box; margin: 0.83em 0px;">
Let's start with the pieces</h2>
<ul class="simple" style="box-sizing: border-box; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="4" data-meaningful="true" style="box-sizing: border-box;">We want to access the records for a CSV file. So, the first thing we need to do is either search the directory for the csv file or maintain a in memory lookup containing the path of a file. If choosing a in memory lookup, lookups need to be fast, so a <tt class="docutils literal" style="box-sizing: border-box;">HashMap</tt> like structure with <code class="java" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;"><span class="n" style="box-sizing: border-box;">Map</span><span class="o" style="box-sizing: border-box;"><</span><span class="n" style="box-sizing: border-box;">String</span><span class="o" style="box-sizing: border-box;">,</span><span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box;">></span></code> should suffice.</li>
<li ct-id="5" data-meaningful="true" style="box-sizing: border-box;">But, we need to somehow update the entries in that map after they are added or deleted and this needs to happen concurrently from the watch service, so a better data structure would be a <code class="java" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;"><span class="n" style="box-sizing: border-box;">ConcurrentHashMap</span><span class="o" style="box-sizing: border-box;"><</span><span class="n" style="box-sizing: border-box;">String</span><span class="o" style="box-sizing: border-box;">,</span><span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box;">></span></code></li>
<li style="box-sizing: border-box;"><tt class="docutils literal" style="box-sizing: border-box;">WatchService</tt> needs to run in the background periodically. A <tt class="docutils literal" style="box-sizing: border-box;">RxJava</tt> <tt class="docutils literal" style="box-sizing: border-box;">interval</tt> stream with <tt class="docutils literal" style="box-sizing: border-box;">Schedulers.io()</tt> should suffice.</li>
</ul>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span> <span class="n" style="box-sizing: border-box;">Flowable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">interval</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="mi" style="box-sizing: border-box; color: #f99157;">5</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">TimeUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">SECONDS</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">subscribeOn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Schedulers</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">io</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">forEach</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="c1" style="box-sizing: border-box; color: #999999;">//do something)</span>
</pre>
</div>
<ul class="simple" style="box-sizing: border-box; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="6" data-meaningful="true" style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Initialization-on-demand holder idiom</span> : This is used to for safely creating a singleton instance of our map. This happens because of lazy and sequential guarantees of class initialization.</li>
</ul>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//Lazy holder idiom for initializing the singleton map</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">class</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">CSVMapping</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">final</span> <span class="n" style="box-sizing: border-box;">ResourceBundle</span> <span class="n" style="box-sizing: border-box;">rb</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">ResourceBundle</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getBundle</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"app"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//safe creation</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">final</span> <span class="n" style="box-sizing: border-box;">Map</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="n" style="box-sizing: border-box;">CSVMap</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">getCSVMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
</pre>
</div>
<ul class="simple" style="box-sizing: border-box; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="7" data-meaningful="true" style="box-sizing: border-box;">Reading CSV using RXJava: I covered this in an earlier post.</li>
<li ct-id="8" data-meaningful="true" data-skip="true" style="box-sizing: border-box;">Creating a <tt class="docutils literal" style="box-sizing: border-box;">Spring Boot</tt> <tt class="docutils literal" style="box-sizing: border-box;">REST</tt> Controller: We just create a REST controller for Spring Boot that checks whether the file name in the <tt class="docutils literal" style="box-sizing: border-box;">GET</tt> request exists or not, if it exists we just need to collect the CSV records and return them as Response, or return 404 error.</li>
</ul>
</div>
<div class="section" id="putting-it-all-together" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Putting it all together:</h2>
<div ct-id="9" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
The code below shows the entire service.</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="nd" style="box-sizing: border-box; color: #66cccc;">@Slf4j</span>
<span class="nd" style="box-sizing: border-box; color: #66cccc;">@RestController</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">class</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">CSVFileWatcher</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="nd" style="box-sizing: border-box; color: #66cccc;">@GetMapping</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"/getcsv/{fileName}"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="n" style="box-sizing: border-box;">ResponseEntity</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">List</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">Iterator</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">>>></span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">readCSVFile</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span>
<span class="nd" style="box-sizing: border-box; color: #66cccc;">@PathVariable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"fileName"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="n" style="box-sizing: border-box;">String</span> <span class="n" style="box-sizing: border-box;">fileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">val</span> <span class="n" style="box-sizing: border-box;">dirMap</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">CSVMapping</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">CSVMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dirMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">containsKey</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">ResponseEntity</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><>(</span>
<span class="n" style="box-sizing: border-box;">CSVUtil</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">readRecordsFromFile</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dirMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">get</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">),</span> <span class="n" style="box-sizing: border-box;">CSVFormat</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">DEFAULT</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">r</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-></span> <span class="n" style="box-sizing: border-box;">r</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">iterator</span><span class="o" style="box-sizing: border-box; color: #66cccc;">())</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">subscribeOn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Schedulers</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">io</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">toList</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">blockingGet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(),</span>
<span class="n" style="box-sizing: border-box;">HttpStatus</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">OK</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">else</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">ResponseEntity</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><>(</span><span class="n" style="box-sizing: border-box;">HttpStatus</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">NOT_FOUND</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//Lazy holder idiom for initializing the singleton map</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">class</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">CSVMapping</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">final</span> <span class="n" style="box-sizing: border-box;">ResourceBundle</span> <span class="n" style="box-sizing: border-box;">rb</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">ResourceBundle</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getBundle</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"app"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">final</span> <span class="n" style="box-sizing: border-box;">Map</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="n" style="box-sizing: border-box;">CSVMap</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">getCSVMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">boolean</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">isCSV</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Path</span> <span class="n" style="box-sizing: border-box;">path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">try</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">val</span> <span class="n" style="box-sizing: border-box;">contentType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">Files</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">probeContentType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">contentType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">!=</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">null</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">&&</span> <span class="n" style="box-sizing: border-box;">contentType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">equals</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"text/csv"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">true</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">catch</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">IOException</span> <span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">log</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">error</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" data-skip="true" style="box-sizing: border-box; color: #99cc99;">"Unable to probe content type"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">registerFileWatcher</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Path</span> <span class="n" style="box-sizing: border-box;">path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">Map</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">try</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">val</span> <span class="n" style="box-sizing: border-box;">watchService</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">FileSystems</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getDefault</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="na" style="box-sizing: border-box; color: #6699cc;">newWatchService</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">register</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">watchService</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">ENTRY_CREATE</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">ENTRY_DELETE</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="n" style="box-sizing: border-box;">Flowable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">interval</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="mi" style="box-sizing: border-box; color: #f99157;">5</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">TimeUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">SECONDS</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">subscribeOn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Schedulers</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">io</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">forEach</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">t</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-></span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">val</span> <span class="n" style="box-sizing: border-box;">key</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">watchService</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">poll</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">key</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">!=</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">null</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">for</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">WatchEvent</span> <span class="n" style="box-sizing: border-box;">event</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">:</span> <span class="n" style="box-sizing: border-box;">key</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">pollEvents</span><span class="o" style="box-sizing: border-box; color: #66cccc;">())</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">val</span> <span class="n" style="box-sizing: border-box;">kind</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">event</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">kind</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">kind</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">==</span> <span class="n" style="box-sizing: border-box;">OVERFLOW</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">continue</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="n" style="box-sizing: border-box;">val</span> <span class="n" style="box-sizing: border-box;">fileEvent</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">WatchEvent</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">>)</span> <span class="n" style="box-sizing: border-box;">event</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="n" style="box-sizing: border-box;">Path</span> <span class="n" style="box-sizing: border-box;">dir</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="n" style="box-sizing: border-box;">key</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">watchable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="n" style="box-sizing: border-box;">Path</span> <span class="n" style="box-sizing: border-box;">fullPath</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">dir</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">resolve</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fileEvent</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">context</span><span class="o" style="box-sizing: border-box; color: #66cccc;">());</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">kind</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">==</span> <span class="n" style="box-sizing: border-box;">ENTRY_CREATE</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">&&</span> <span class="n" style="box-sizing: border-box;">isCSV</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fullPath</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">log</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">debug</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" data-skip="true" style="box-sizing: border-box; color: #99cc99;">"New CSV file detected {}"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">fullPath</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">toString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">());</span>
<span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">put</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fullPath</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getFileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="na" style="box-sizing: border-box; color: #6699cc;">toString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(),</span>
<span class="n" style="box-sizing: border-box;">fullPath</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">else</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">kind</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">==</span> <span class="n" style="box-sizing: border-box;">ENTRY_DELETE</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">&&</span> <span class="n" style="box-sizing: border-box;">isCSV</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fullPath</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">log</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">debug</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"CSV file {} deleted"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">fullPath</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">toString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">());</span>
<span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">remove</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fullPath</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getFileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="na" style="box-sizing: border-box; color: #6699cc;">toString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">());</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="n" style="box-sizing: border-box;">key</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">reset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">();</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">});</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">catch</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">IOException</span> <span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">log</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">error</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"error occurred"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//ignore or throw</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="n" style="box-sizing: border-box;">Map</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">getCSVMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">throws</span> <span class="n" style="box-sizing: border-box;">RuntimeException</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">final</span> <span class="n" style="box-sizing: border-box;">Map</span> <span class="n" style="box-sizing: border-box;">CSVMapping</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">ConcurrentHashMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">>();</span>
<span class="n" style="box-sizing: border-box;">val</span> <span class="n" style="box-sizing: border-box;">path</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">Paths</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">get</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">rb</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"directory_loc"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">));</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">try</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Stream</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="n" style="box-sizing: border-box;">files</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">Files</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">list</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">::</span><span class="n" style="box-sizing: border-box;">toAbsolutePath</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">CSVFileWatcher</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">CSVMapping</span><span class="o" style="box-sizing: border-box; color: #66cccc;">::</span><span class="n" style="box-sizing: border-box;">isCSV</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">files</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">parallel</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">forEach</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">p</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-></span> <span class="n" style="box-sizing: border-box;">CSVMapping</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">put</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">p</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">getFileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="na" style="box-sizing: border-box; color: #6699cc;">toString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(),</span> <span class="n" style="box-sizing: border-box;">p</span><span class="o" style="box-sizing: border-box; color: #66cccc;">));</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">catch</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">IOException</span> <span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">log</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">error</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"Error occurred"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">throw</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">RuntimeException</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="n" style="box-sizing: border-box;">registerFileWatcher</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">path</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">CSVMapping</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="n" style="box-sizing: border-box;">CSVMapping</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span></pre>
</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-51023439984289163362018-12-15T05:52:00.000+05:302019-01-18T05:53:04.045+05:30Reactively Streaming CSV using RXJava<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
<a class="reference external" href="https://github.com/ReactiveX/RxJava" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">RXJava</a> is an extremely useful streaming framework (here is an example application using it for parallel processing of restful calls to both uber and lyft (<a class="reference external" href="https://github.com/ramannanda9/RT-UBER-NYC-TAXI/tree/master/QueryAPIAndStoreCSV" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">RT_UBER_NYC_TAXI</a>)). However, In this post, I will cover how you can reactively stream and process a CSV file.</div>
<div ct-id="2" data-meaningful="true" data-skip="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
Firstly, you can create a <tt class="docutils literal" style="box-sizing: border-box;">Flowable</tt> of <tt class="docutils literal" style="box-sizing: border-box;">CSVRecord</tt> (commons-csv) by converting <tt class="docutils literal" style="box-sizing: border-box;">iterator</tt> to <tt class="docutils literal" style="box-sizing: border-box;">Flowable</tt> using the call <tt class="docutils literal" style="box-sizing: border-box;">Flowable.fromIterable()</tt>. Next, we want this to be safe resource usage i.e. we don't want to leave open file handles, so we use the resource safe <tt class="docutils literal" style="box-sizing: border-box;">Flowable.using(Callable resourceSupplier, Function> sourceSupplier, Consumer resourceDisposer)</tt> method call, where the last argument is a resource disposer.</div>
<table class="highlighttable" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;"><tbody style="box-sizing: border-box;">
<tr style="box-sizing: border-box;"><td class="linenos" style="box-sizing: border-box;"><div class="linenodiv" style="box-sizing: border-box;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 15.84px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15</pre>
</div>
</td><td class="code" style="box-sizing: border-box;"><div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="n" style="box-sizing: border-box;">Flowable</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">CSVRecord</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">readRecordsFromFile</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Path</span> <span class="n" style="box-sizing: border-box;">inputFile</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span><span class="n" style="box-sizing: border-box;">CSVFormat</span> <span class="n" style="box-sizing: border-box;">csvFormat</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="n" style="box-sizing: border-box;">Flowable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">using</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-></span> <span class="n" style="box-sizing: border-box;">Files</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">newBufferedReader</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">inputFile</span><span class="o" style="box-sizing: border-box; color: #66cccc;">),</span>
<span class="n" style="box-sizing: border-box;">bufferedReader</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-></span> <span class="n" style="box-sizing: border-box;">csvRecordFlowable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">bufferedReader</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">csvFormat</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">withHeader</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()),</span>
<span class="n" style="box-sizing: border-box;">BufferedReader</span><span class="o" style="box-sizing: border-box; color: #66cccc;">::</span><span class="n" style="box-sizing: border-box;">close</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">private</span> <span class="kd" style="box-sizing: border-box; color: #cc99cc;">static</span> <span class="n" style="box-sizing: border-box;">Flowable</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">CSVRecord</span><span class="o" style="box-sizing: border-box; color: #66cccc;">></span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">csvRecordFlowable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">BufferedReader</span> <span class="n" style="box-sizing: border-box;">br</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">CSVFormat</span> <span class="n" style="box-sizing: border-box;">csvFormat</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">try</span><span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">final</span> <span class="n" style="box-sizing: border-box;">CSVParser</span> <span class="n" style="box-sizing: border-box;">csvParser</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">CSVParser</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">br</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span><span class="n" style="box-sizing: border-box;">csvFormat</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="n" style="box-sizing: border-box;">Flowable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">fromIterable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-></span> <span class="n" style="box-sizing: border-box;">csvParser</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">iterator</span><span class="o" style="box-sizing: border-box; color: #66cccc;">());</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">catch</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">IOException</span> <span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">){</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">throw</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">RuntimeException</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">e</span><span class="o" style="box-sizing: border-box; color: #66cccc;">);</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</td></tr>
</tbody></table>
<div ct-id="3" data-meaningful="true" data-skip="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
This nicely sets up a <tt class="docutils literal" style="box-sizing: border-box;">Flowable<csvrecord></csvrecord></tt> which can then be processed in different ways and you get all the <tt class="docutils literal" style="box-sizing: border-box;">Flowable</tt> features like backpressure, etc. Example usage mentioned below.</div>
<table class="highlighttable" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;"><tbody style="box-sizing: border-box;">
<tr style="box-sizing: border-box;"><td class="linenos" style="box-sizing: border-box;"><div class="linenodiv" style="box-sizing: border-box;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 15.84px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23</pre>
</div>
</td><td class="code" style="box-sizing: border-box;"><div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="n" style="box-sizing: border-box;">readRecordsFromFile</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Paths</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">get</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"sample.csv"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">parallel</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span>
<span class="n" style="box-sizing: border-box;">runOn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Schedulers</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">io</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="cm" style="box-sizing: border-box; color: #999999;">/*filter here*/</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="cm" style="box-sizing: border-box; color: #999999;">/*map here*/</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="na" style="box-sizing: border-box; color: #6699cc;">sequential</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="na" style="box-sizing: border-box; color: #6699cc;">subscribe</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="n" style="box-sizing: border-box;">Subscriber</span><span class="o" style="box-sizing: border-box; color: #66cccc;"><</span><span class="n" style="box-sizing: border-box;">CSVRecord</span><span class="o" style="box-sizing: border-box; color: #66cccc;">>(){</span>
<span class="n" style="box-sizing: border-box;">Subscription</span> <span class="n" style="box-sizing: border-box;">sub</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">null</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="nd" style="box-sizing: border-box; color: #66cccc;">@Override</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">onSubscribe</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Subscription</span> <span class="n" style="box-sizing: border-box;">s</span><span class="o" style="box-sizing: border-box; color: #66cccc;">){</span>
<span class="n" style="box-sizing: border-box;">sub</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="n" style="box-sizing: border-box;">s</span><span class="o" style="box-sizing: border-box; color: #66cccc;">;</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="nd" style="box-sizing: border-box; color: #66cccc;">@Override</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">onNext</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">CSVRecord</span> <span class="n" style="box-sizing: border-box;">record</span><span class="o" style="box-sizing: border-box; color: #66cccc;">){</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//do something here</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//request next item</span>
<span class="n" style="box-sizing: border-box;">sub</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="na" style="box-sizing: border-box; color: #6699cc;">request</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="mi" style="box-sizing: border-box; color: #f99157;">1</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="nd" style="box-sizing: border-box; color: #66cccc;">@Override</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">onError</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">Throwable</span> <span class="n" style="box-sizing: border-box;">t</span><span class="o" style="box-sizing: border-box; color: #66cccc;">){</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//handle error</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="nd" style="box-sizing: border-box; color: #66cccc;">@Override</span>
<span class="kd" style="box-sizing: border-box; color: #cc99cc;">public</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">void</span> <span class="nf" style="box-sizing: border-box; color: #6699cc;">onComplete</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(){</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">})</span></pre>
</div>
</td></tr>
</tbody></table>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-1620960698582550832018-12-15T05:51:00.000+05:302019-01-18T05:51:51.799+05:30Spark Scaling to large datasets<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
In this post, I will share a few quick tips about scaling your <tt class="docutils literal" style="box-sizing: border-box;">Spark</tt> applications to larger datasets without having large executor memory.</div>
<ul class="simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="2" data-meaningful="true" data-skip="true" style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Increase Shuffle partitions</span>: The default shuffle partitions is 200, for larger datasets, you are better off with larger number of shuffle partitions. This helps in many ways firstly, it avoids OOM errors on executors because it reduces the size of each shuffle partition. Secondly, it speeds up operations such as <tt class="docutils literal" style="box-sizing: border-box;">hashpartition</tt>because there are more buckets. The config option for spark submit is <code class="shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">--conf spark.sql.shuffle.partitions<span class="o" style="box-sizing: border-box;">=</span><value></value></code></li>
<li ct-id="3" data-meaningful="true" data-skip="true" style="box-sizing: border-box;"><span data-skip="true" style="box-sizing: border-box; font-weight: 700;">Specify the key for</span> <tt class="docutils literal" style="box-sizing: border-box;">partitionby</tt>: By default the rows are partitioned via <tt class="docutils literal" style="box-sizing: border-box;">hashpartition</tt> by computing hash of the entire row, but with your queries, you know better. For wide datasets, choosing a key identifier column can significantly increase the speed of hashing. <code class="scala" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;"><span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box;">.</span><span class="n" style="box-sizing: border-box;">repartition</span><span class="o" style="box-sizing: border-box;">(</span><span class="n" style="box-sizing: border-box;">num_partitions</span><span class="o" style="box-sizing: border-box;">,</span> <span class="n" style="box-sizing: border-box;">partitonByCols</span><span class="k" style="box-sizing: border-box;">:_</span><span class="kt" style="box-sizing: border-box;">*</span><span class="o" style="box-sizing: border-box;">)</span></code> . Here by default, if you don't give the partition by columns, the dataframe just uses hash partitioning on the entire row.</li>
<li ct-id="4" data-meaningful="true" style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Increase</span> <tt class="docutils literal" style="box-sizing: border-box;">Overhead</tt> <span style="box-sizing: border-box; font-weight: 700;">memory</span>: Overhead memory is used by spark for storing interned strings and other storage, you can increase this to slightly larger value, if you see that the tasks are failing with explicit errors notifying you to increase this value. A value of around 12 percent of executor memory, usually suffices. <code class="shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">--conf spark.executor.memoryOverhead<span class="o" style="box-sizing: border-box;">=</span><value in="" mbs=""></value></code>.</li>
<li ct-id="5" data-meaningful="true" style="box-sizing: border-box;"><span data-skip="true" style="box-sizing: border-box; font-weight: 700;">Avoid caching of datasets, unless necessary</span>: The rule is simple, if you cache a dataset, it will take up memory and storage space (from spark 2.0, cache is an alias to persist) and that means less memory for other operations such as shuffle.</li>
<li ct-id="6" data-meaningful="true" style="box-sizing: border-box;"><span data-skip="true" style="box-sizing: border-box; font-weight: 700;">Have smaller and more executors, rather than larger and few executors</span>: The importance of this cannot be overstated, you should always keep executor cores and memory smaller and have more executors. A good thumb rule, that I often use is <tt class="docutils literal" style="box-sizing: border-box;">25G</tt> executor memory and <tt class="docutils literal" style="box-sizing: border-box;">5 cores</tt> per executor, In addition <tt class="docutils literal" style="box-sizing: border-box;">2700mb</tt> of overhead memory per executor.</li>
<li ct-id="7" data-meaningful="true" style="box-sizing: border-box;">While saving the data, repartition it by a key that will be used for accessing it later.</li>
<li ct-id="8" data-meaningful="true" style="box-sizing: border-box;"><span data-skip="true" style="box-sizing: border-box; font-weight: 700;">Use efficient file storage formats</span>: The format that gives the best performance is <a class="reference external" href="https://orc.apache.org/" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">ORC</a>.</li>
<li style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">If using</span> <tt class="docutils literal" style="box-sizing: border-box;">ORC</tt>, <span style="box-sizing: border-box; font-weight: 700;">use native orc with</span> <tt class="docutils literal" style="box-sizing: border-box;">spark 2.3.0+</tt>. <code class="shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">--conf spark.sql.orc.impl<span class="o" style="box-sizing: border-box;">=</span>native</code></li>
<li ct-id="9" data-meaningful="true" data-skip="true" style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Increase</span> <tt class="docutils literal" style="box-sizing: border-box;">ReservedCodeCacheSize</tt>: As you may know, spark uses <tt class="docutils literal" style="box-sizing: border-box;">wholestagecodegen</tt> to generate code for dataframe operations. for wider transformations, the size of code generated is large, so when hotspot compiler tries to compile it into native code, it may cause warnings and or errors. You would certainly benefit from having larger ReservedCodeCacheSize. Pass this in <tt class="docutils literal" style="box-sizing: border-box;">extraJavaOptions</tt> to executor and driver <code class="shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">-XX:ReservedCodeCacheSize<span class="o" style="box-sizing: border-box;">=</span><value></value></code>. <tt class="docutils literal" style="box-sizing: border-box;">250 mb</tt> should be fine for most of the generated code.</li>
<li ct-id="10" data-meaningful="true" data-skip="true" style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">GC tuning</span>: You can tune Garbage collection to avoid full GC and reduce GC time. Use the low pause <a class="reference external" href="https://www.oracle.com/technetwork/articles/java/g1gc-1984535.html" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">G1GC</a>, more Concurrent GC threads and parallel GC threads, lower initiating heap occupancy percent. pass these in <tt class="docutils literal" style="box-sizing: border-box;">extraJavaOptions</tt>. <code class="shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">-XX:+UseG1GC -XX:ConcGCThreads:<value> -XX:ParallelGCThreads:<value> -XX:InitiatingHeapOccupancyPercent<span class="o" style="box-sizing: border-box;">=</span><span class="m" style="box-sizing: border-box;">30</span></value></value></code>. Note that concurrent gc threads should be less than parallel gc threads.</li>
</ul>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-70166715580482190422018-04-12T05:49:00.000+05:302019-01-18T05:49:49.311+05:30Removing Projection Column Ambiguity in Spark<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
<span style="box-sizing: border-box; font-weight: 700;">Column ambiguity</span> is quite common when you <tt class="docutils literal" style="box-sizing: border-box;">join</tt> two tables. Now this poses a unnecessary hassle when you want to select all the columns from both the tables whilst discarding the duplicate columns. The aforementioned problem is difficult to handle especially, if you have wide tables, where you would want to avoid typing the column names.</div>
<div ct-id="2" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
There are a couple of programmatic solutions to the problem, both essentially do the same thing, but achieve the results differently.</div>
<ul class="simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="3" data-meaningful="true" data-skip="true" style="box-sizing: border-box;">Either execute sql traditionally using <code class="inlinecode scala" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;"><span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box;">.</span><span class="n" style="box-sizing: border-box;">sql</span><span class="o" style="box-sizing: border-box;">(</span><span class="n" style="box-sizing: border-box;">query</span><span class="o" style="box-sizing: border-box;">)</span></code>, and then manually transform the Dataset by converting to RDD, dropping duplicate column names, or</li>
<li ct-id="4" data-meaningful="true" style="box-sizing: border-box;">Implement the query execution in a similar way as spark does, drop the duplicate column names and then create <tt class="docutils literal" style="box-sizing: border-box;">Dataset</tt> this avoid unnecessary conversion and creation of <tt class="docutils literal" style="box-sizing: border-box;">Dataset</tt> until duplicates are dropped.</li>
</ul>
<div ct-id="5" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
<span style="box-sizing: border-box; font-weight: 700;">Note:</span> There are huge trade offs with one of these approaches, first one works for all scenarios, the other one uses <tt class="docutils literal" style="box-sizing: border-box;">executeCollectPublic</tt> and has high probability of giving heap errors .</div>
<div class="section" id="solution-2" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Solution 2:</h2>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">sqlDropDuplicateColumns</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">query</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">ss</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">SparkSession</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">logicalPlan</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">sessionState</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">sqlParser</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">parsePlan</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sqlText</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">query</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">qe</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">sessionState</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">executePlan</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">logicalPlan</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//Assert plan is valid</span>
<span class="n" style="box-sizing: border-box;">qe</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">assertAnalyzed</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">ep</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">qe</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">executedPlan</span>
<span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//Drop duplicate column names</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">schema</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">schemaToSet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">ep</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">rows</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">ep</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">executeCollectPublic</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">r</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">lb</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ListBuffer</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Any</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span>
<span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">lb</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">+=</span> <span class="n" style="box-sizing: border-box;">r</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span>
<span class="nc" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">lb</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">})</span>
<span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">createDataFrame</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">sparkContext</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">parallelize</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">rows</span><span class="o" style="box-sizing: border-box; color: #66cccc;">),</span> <span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</div>
<div class="section" id="solution-1" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Solution 1:</h2>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">sqlDropDuplicateColumnsRDD</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">query</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">ss</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">SparkSession</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">df</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span><span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">sql</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">query</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//Drop duplicate column names</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">schema</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">schemaToSet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">rdd</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">rdd</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">r</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">lb</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ListBuffer</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Any</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span>
<span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">lb</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">+=</span> <span class="n" style="box-sizing: border-box;">r</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span>
<span class="nc" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">lb</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">})</span>
<span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">createDataFrame</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">rdd</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</div>
<div class="section" id="helper-method" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Helper method:</h2>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">schemaToSet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">schema</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">StructType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">StructType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">={</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">schemaMap</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">=new</span> <span class="n" style="box-sizing: border-box;">mutable</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">HashMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span>,<span class="kt" style="box-sizing: border-box; color: #ffcc66;">StructField</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]()</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">for</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span><span class="k" style="box-sizing: border-box; color: #cc99cc;"><- span=""><span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">){</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">schemaMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">contains</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toLowerCase</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)){</span>
<span class="n" style="box-sizing: border-box;">schemaMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">put</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toLowerCase</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span><span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="nc" style="box-sizing: border-box; color: #ffcc66;">StructType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">schemaMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">values</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toArray</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span></-></span></pre>
</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-77968964959530047382018-03-18T05:46:00.000+05:302019-01-18T05:47:35.244+05:30Efficient Spark Dataframe Transforms<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" data-skip="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
If you are working with Spark, you will most likely have to write transforms on dataframes. Dataframe exposes the obvious method <tt class="docutils literal" style="box-sizing: border-box;">df.withColumn(col_name,col_expression)</tt> for adding a column with a specified expression. Now, as we know that the dataframes are immutable in nature, so we are getting a newly created copy of dataframe with our added column (if you look at the source code for method <tt class="docutils literal" style="box-sizing: border-box;">withColumn</tt>, you will also see additional checks being performed, like whether the column exists or not. That check is unnecessary in most cases). And, this is very inefficient, especially, if we have to add multiple columns. for example, a wide transform of our dataframe such as <tt class="docutils literal" style="box-sizing: border-box;">pivot</tt> transform (<span style="box-sizing: border-box; font-weight: 700;">Note</span>: There is also a <tt class="docutils literal" style="box-sizing: border-box;">bug</tt> <a class="footnote-reference" href="http://orastack.com/efficient-spark-dataframe-transforms.html#id2" id="id1" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">[1]</a> on how wide your transformation can be, which is fixed in <span style="box-sizing: border-box; font-weight: 700;">Spark 2.3.0</span>).</div>
<div ct-id="2" data-meaningful="true" data-skip="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
Here is an optimized version of a pivot method. Note that rather than using <tt class="docutils literal" style="box-sizing: border-box;">df.withColumn</tt>, we are collecting all column expressions in a mutable <tt class="docutils literal" style="box-sizing: border-box;">ListBuffer</tt> and then applying all expressions at once via <tt class="docutils literal" style="box-sizing: border-box;">df.select(colExprs: _*)</tt>which is phenomenally fast, especially considering the fact that <tt class="docutils literal" style="box-sizing: border-box;">df.withColumn</tt> hangs the driver process even for a transform on a few hundred columns(it causes hung threads and locks, you can see this using jVisualVM), whereas the optimized version can operate on thousands of columns easily.</div>
<div class="section" id="optimized-version" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Optimized Version:</h2>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="cm" style="box-sizing: border-box; color: #999999;">/**</span>
<span class="cm" data-skip="true" style="box-sizing: border-box; color: #999999;"> * Pivots the DataFrame by the pivot column. It is better to specify the distinct values, as otherwise distinct values need to be calculated</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> *</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param groupBy The columns to groupBy</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param pivot The pivot column</span>
<span class="cm" data-skip="true" style="box-sizing: border-box; color: #999999;"> * @param distinct An Optional Array of distinct values</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param agg the aggregate function to apply. Default="sum"</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param df the df to transpose and return</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param ev the implicit encoder to use</span>
<span class="cm" data-skip="true" style="box-sizing: border-box; color: #999999;"> * @tparam A The type of pivot column</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @return the transposed dataframe</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> */</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">doPivotTF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">],</span> <span class="n" style="box-sizing: border-box;">pivot</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">distinct</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Option</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Array</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]],</span> <span class="n" style="box-sizing: border-box;">agg</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"sum"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)(</span><span class="n" style="box-sizing: border-box;">df</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">])(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">implicit</span> <span class="n" style="box-sizing: border-box;">ev</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Encoder</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">])</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colsToFilter</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">pivot</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">++</span> <span class="n" style="box-sizing: border-box;">groupBy</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">++</span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">dataType</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">NumericType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Numeric</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">true</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">distinct</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colsToTranspose</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">columns</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">colsToFilter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">contains</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">toSeq</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">isDebugEnabled</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">debug</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">s"colsToFilter </span><span class="si" style="box-sizing: border-box; color: #f99157;">$colsToFilter</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">debug</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">s"colsToTranspose </span><span class="si" style="box-sizing: border-box; color: #f99157;">$colsToTranspose</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">distinctValues</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">distinct</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Some</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">v</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">v</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">None</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">pivot</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">row</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="n" style="box-sizing: border-box;">pivot</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">distinct</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">collect</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colExprs</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ListBuffer</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Column</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]()</span>
<span class="n" style="box-sizing: border-box;">colExprs</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">+=</span> <span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"*"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">for</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;"><- span=""> <span class="n" style="box-sizing: border-box;">colsToTranspose</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">for</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">index</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;"><- span=""> <span class="n" style="box-sizing: border-box;">distinctValues</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colExpr</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">when</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">pivot</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">===</span> <span class="n" style="box-sizing: border-box;">index</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">otherwise</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="mf" style="box-sizing: border-box; color: #f99157;">0.0</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colNameToUse</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">s"</span><span class="si" style="box-sizing: border-box; color: #f99157;">${</span><span class="n" style="box-sizing: border-box;">colName</span><span class="si" style="box-sizing: border-box; color: #f99157;">}</span><span class="s" style="box-sizing: border-box; color: #99cc99;">_TN</span><span class="si" style="box-sizing: border-box; color: #f99157;">$index</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"</span>
<span class="n" style="box-sizing: border-box;">colExprs</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">+=</span> <span class="n" style="box-sizing: border-box;">colExpr</span> <span class="n" style="box-sizing: border-box;">as</span> <span class="n" style="box-sizing: border-box;">colNameToUse</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">transposedDF</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colExprs</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//Drop all original columns except columns in groupBy</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colsToDrop</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">colsToFilter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">contains</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">++</span> <span class="n" style="box-sizing: border-box;">colsToTranspose</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">dfBeforeGroupBy</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">transposedDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">drop</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colsToDrop</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">finalDF</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">dfBeforeGroupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">agg</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dfBeforeGroupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">columns</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">contains</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-></span> <span class="n" style="box-sizing: border-box;">agg</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">toMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//Remove spark generated $agg suffixes</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">finalColNames</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">finalDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">columns</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">stripSuffix</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">s"</span><span class="si" style="box-sizing: border-box; color: #f99157;">$agg</span><span class="s" style="box-sizing: border-box; color: #99cc99;">("</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">stripSuffix</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">")"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">isDebugEnabled</span><span class="o" style="box-sizing: border-box; color: #66cccc;">())</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">debug</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" data-skip="true" style="box-sizing: border-box; color: #99cc99;">s"Final set of colum names </span><span class="si" style="box-sizing: border-box; color: #f99157;">$finalColNames</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="n" style="box-sizing: border-box;">finalDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">finalColNames</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</-></span></-></span></pre>
</div>
</div>
<div class="section" id="slow-version" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Slow Version:</h2>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span> <span class="cm" style="box-sizing: border-box; color: #999999;">/**</span>
<span class="cm" data-skip="true" style="box-sizing: border-box; color: #999999;"> * Pivots the DataFrame by the pivot column. It is better to specify the distinct values, as otherwise distinct values need to be calculated</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> *</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param groupBy The columns to groupBy</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param pivot The pivot column</span>
<span class="cm" data-skip="true" style="box-sizing: border-box; color: #999999;"> * @param distinct An Optional Array of distinct values</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param agg the aggregate function to apply. Default="sum"</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param df the df to transpose and return</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @param ev the implicit encoder to use</span>
<span class="cm" data-skip="true" style="box-sizing: border-box; color: #999999;"> * @tparam A The type of pivot column</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> * @return the transposed dataframe</span>
<span class="cm" style="box-sizing: border-box; color: #999999;"> */</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">doPivotTFSlow</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">],</span> <span class="n" style="box-sizing: border-box;">pivot</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">distinct</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Option</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Array</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]],</span> <span class="n" style="box-sizing: border-box;">agg</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"sum"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)(</span><span class="n" style="box-sizing: border-box;">df</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">])(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">implicit</span> <span class="n" style="box-sizing: border-box;">ev</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Encoder</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">])</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colsToFilter</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">pivot</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">++</span> <span class="n" style="box-sizing: border-box;">groupBy</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">++</span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">dataType</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">NumericType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Numeric</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">true</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">distinct</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colsToTranspose</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">columns</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">colsToFilter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">contains</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">toSeq</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">isDebugEnabled</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">debug</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">s"colsToFilter </span><span class="si" style="box-sizing: border-box; color: #f99157;">$colsToFilter</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">debug</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">s"colsToTranspose </span><span class="si" style="box-sizing: border-box; color: #f99157;">$colsToTranspose</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">distinctValues</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Array</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">distinct</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Some</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">v</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">v</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">None</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">pivot</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">distinct</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">A</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="n" style="box-sizing: border-box;">pivot</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">collect</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">var</span> <span class="n" style="box-sizing: border-box;">dfTemp</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">df</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">for</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;"><- span=""> <span class="n" style="box-sizing: border-box;">colsToTranspose</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">for</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">index</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;"><- span=""> <span class="n" style="box-sizing: border-box;">distinctValues</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colExpr</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">when</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">pivot</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">===</span> <span class="n" style="box-sizing: border-box;">index</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">otherwise</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="mf" style="box-sizing: border-box; color: #f99157;">0.0</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colNameToUse</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">s"</span><span class="si" style="box-sizing: border-box; color: #f99157;">${</span><span class="n" style="box-sizing: border-box;">colName</span><span class="si" style="box-sizing: border-box; color: #f99157;">}</span><span class="s" style="box-sizing: border-box; color: #99cc99;">_TN</span><span class="si" style="box-sizing: border-box; color: #f99157;">$index</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"</span>
<span class="n" style="box-sizing: border-box;">dfTemp</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">dfTemp</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">withColumn</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colNameToUse</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">colExpr</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">transposedDF</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">dfTemp</span>
<span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//Drop all original columns except columns in groupBy</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">colsToDrop</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">colsToFilter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">contains</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">++</span> <span class="n" style="box-sizing: border-box;">colsToTranspose</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">dfBeforeGroupBy</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">transposedDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">drop</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colsToDrop</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">finalDF</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">dfBeforeGroupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">col</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">agg</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dfBeforeGroupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">columns</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">filter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(!</span><span class="n" style="box-sizing: border-box;">groupBy</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">contains</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-></span> <span class="n" style="box-sizing: border-box;">agg</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">toMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//Remove spark generated $agg suffixes</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">finalColNames</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">finalDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">columns</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">stripSuffix</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">s"</span><span class="si" style="box-sizing: border-box; color: #f99157;">$agg</span><span class="s" style="box-sizing: border-box; color: #99cc99;">("</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">stripSuffix</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">")"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">isDebugEnabled</span><span class="o" style="box-sizing: border-box; color: #66cccc;">())</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">logger</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">debug</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" data-skip="true" style="box-sizing: border-box; color: #99cc99;">s"Final set of colum names </span><span class="si" style="box-sizing: border-box; color: #f99157;">$finalColNames</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="n" style="box-sizing: border-box;">finalDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">finalColNames</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">*</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</-></span></-></span></pre>
</div>
<div class="section" id="visualvm-image" style="box-sizing: border-box;">
<h3 style="box-sizing: border-box; margin: 1em 0px;">
VisualVM Image</h3>
<img alt="visual vm image" class="" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguY8Ka6pmgUkQdAWEXM9G5ASFTf-7_hYv2BgpKg1xZKHr8poZgM8kY4Yl69uCswdhiraYMy0ACVuceSk877l3So9TRj9pGSAeRtWb4LUY5RyAZ_MKPym2pNqgLJpmD8w9d-5NLzi5PI6c/s1600/Screen+Shot+2018-03-18+at+11.00.30+PM.png" style="border: 0px; box-sizing: border-box; height: auto; max-width: 100%;" /><div ct-id="3" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
So the crux here is always use <tt class="docutils literal" style="box-sizing: border-box;">df.select</tt> rather than <tt class="docutils literal" style="box-sizing: border-box;">df.withColumn</tt>, unless you are sure that the transform is only going to be invoked on a few columns.</div>
<table class="docutils footnote" frame="void" id="id2" rules="none" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box;"><colgroup style="box-sizing: border-box;"><col class="label" style="box-sizing: border-box;"></col><col style="box-sizing: border-box;"></col></colgroup><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td class="label" style="box-sizing: border-box;"><a class="fn-backref" href="http://orastack.com/efficient-spark-dataframe-transforms.html#id1" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">[1]</a></td><td style="box-sizing: border-box;"><a class="reference external" href="https://issues.apache.org/jira/browse/SPARK-18016" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">https://issues.apache.org/jira/browse/SPARK-18016</a></td></tr>
</tbody></table>
</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-13618014212151004792018-01-17T05:44:00.000+05:302019-01-18T05:45:33.631+05:30Writing Generic UDFs in Spark<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" data-skip="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
Apache Spark offers the ability to write <tt class="docutils literal" style="box-sizing: border-box;">Generic</tt> <tt class="docutils literal" style="box-sizing: border-box;">UDFs</tt>. However, for an idiomatic implementation, there are a couple of things that one needs to keep in mind.</div>
<ol class="arabic simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="2" data-meaningful="true" style="box-sizing: border-box;">You should return a subtype of <tt class="docutils literal" style="box-sizing: border-box;">Option</tt> because <tt class="docutils literal" style="box-sizing: border-box;">Spark</tt> treats <tt class="docutils literal" style="box-sizing: border-box;">None</tt> subtype automatically as null and is able to extract value from <tt class="docutils literal" style="box-sizing: border-box;">Some</tt> subtype.</li>
<li ct-id="3" data-meaningful="true" data-skip="true" style="box-sizing: border-box;">Your Generic UDFs should be able to handle <tt class="docutils literal" style="box-sizing: border-box;">Option</tt> or regular type as input. To accomplish this, use type matching in case of Option and recursively extract values. This scenario occurs, if your UDF is in turn wrapped by another UDF.</li>
</ol>
<div ct-id="4" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
If these considerations are handled correctly, the implemented UDF has several important benefits:</div>
<ul class="simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="5" data-meaningful="true" style="box-sizing: border-box;">It avoids the code duplication. And,</li>
<li ct-id="6" data-meaningful="true" style="box-sizing: border-box;">It handles nulls in a more idiomatic way.</li>
</ul>
<div ct-id="7" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
Here is an example of a UDF that can be used to calculate the intervals between two time periods.</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc; font-family: sans-serif; font-size: 17.6px;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="k" style="box-sizing: border-box; color: #cc99cc;">import</span> <span class="nn" style="box-sizing: border-box; color: #ffcc66;">java.time.</span><span class="o" style="box-sizing: border-box; color: #66cccc;">{</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">LocalDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ZoneId</span><span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">import</span> <span class="nn" style="box-sizing: border-box; color: #ffcc66;">java.time.format.DateTimeFormatter</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">import</span> <span class="nn" style="box-sizing: border-box; color: #ffcc66;">java.time.temporal.ChronoUnit</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">import</span> <span class="nn" style="box-sizing: border-box; color: #ffcc66;">scala.util.Try</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">convertToDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">T</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="n" style="box-sizing: border-box;">date</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">T</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Option</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">LocalDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">date</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">==</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">null</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">None</span>
<span class="n" style="box-sizing: border-box;">date</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="n" style="box-sizing: border-box;">dt</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">LocalDate</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Some</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="n" style="box-sizing: border-box;">dt</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">isEmpty</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">None</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">retValue</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Try</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="nc" style="box-sizing: border-box; color: #ffcc66;">LocalDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">parse</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">DateTimeFormatter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">ISO_DATE</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}.</span><span class="n" style="box-sizing: border-box;">getOrElse</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">LocalDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">parse</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">DateTimeFormatter</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">ISO_LOCAL_DATE_TIME</span><span class="o" style="box-sizing: border-box; color: #66cccc;">))</span>
<span class="nc" style="box-sizing: border-box; color: #ffcc66;">Some</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">retValue</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="n" style="box-sizing: border-box;">dt</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.sql.Date</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Some</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toLocalDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="n" style="box-sizing: border-box;">dt</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.util.Date</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Some</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toInstant</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">atZone</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">ZoneId</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">systemDefault</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()).</span><span class="n" style="box-sizing: border-box;">toLocalDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="n" style="box-sizing: border-box;">dt</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Option</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">isDefined</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="n" style="box-sizing: border-box;">convertToDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">dt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">get</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">else</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">None</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">interval_between</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">V1</span>, <span class="kt" style="box-sizing: border-box; color: #ffcc66;">V2</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">V1</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">V2</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">intType</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Option</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Long</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">calculateInterval</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">LocalDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">LocalDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">intType</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"months"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Option</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Long</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">returnVal</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">intType</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"decades"</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ChronoUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">DECADES</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">between</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"years"</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ChronoUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">YEARS</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">between</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"months"</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ChronoUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">MONTHS</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">between</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"days"</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ChronoUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">DAYS</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">between</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"hours"</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ChronoUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">HOURS</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">between</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"minutes"</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ChronoUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">MINUTES</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">between</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"seconds"</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">ChronoUnit</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">SECONDS</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">between</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">throw</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">new</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">IllegalArgumentException</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">s"</span><span class="si" style="box-sizing: border-box; color: #f99157;">$intType</span><span class="s" style="box-sizing: border-box; color: #99cc99;"> is not supported"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="nc" style="box-sizing: border-box; color: #ffcc66;">Some</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">returnVal</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">fromDt</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">convertToDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">toDt</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">convertToDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">toDate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">isEmpty</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">||</span> <span class="n" style="box-sizing: border-box;">toDt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">isEmpty</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">None</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="n" style="box-sizing: border-box;">calculateInterval</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fromDt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">get</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">toDt</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">get</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">intType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toLowerCase</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
<div ct-id="8" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
The above UDF takes care of the concerns mentioned earlier in the post. To use it, you simply have to register it as a <tt class="docutils literal" style="box-sizing: border-box;">UDF</tt> with <tt class="docutils literal" style="box-sizing: border-box;">SparkSession</tt>.</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc; font-family: sans-serif; font-size: 17.6px;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span> <span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">udf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">register</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"interval_between"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">interval_between</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span></pre>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-14101315756859546552018-01-17T05:42:00.000+05:302019-01-18T05:43:36.039+05:30Testing Spark Dataframes<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
Testing <tt class="docutils literal" style="box-sizing: border-box;">Spark</tt> <tt class="docutils literal" style="box-sizing: border-box;">Dataframe</tt> transforms is essential and can be accomplished in a more reusable manner. The way, I generally accomplish that is to</div>
<ul class="simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="2" data-meaningful="true" style="box-sizing: border-box;">Read the expected and test Dataframe, and</li>
<li ct-id="3" data-meaningful="true" style="box-sizing: border-box;">Invoke the desired transform, and</li>
<li ct-id="4" data-meaningful="true" style="box-sizing: border-box;">Calculate the difference between dataframes. The only caveat in calculating the difference is that in built except function is not sufficient for columns with decimal column types and that requires a bit of work.</li>
</ul>
<div ct-id="5" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
To accomplish generic dataframe comparison:</div>
<ul class="simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="6" data-meaningful="true" style="box-sizing: border-box;">We need to look at the type of the column and when its numeric,</li>
<li ct-id="7" data-meaningful="true" style="box-sizing: border-box;">Convert it to the corresponding java type and then do decimal comparisons , while allowing for custom precision mismatches. Otherwise,</li>
<li ct-id="8" data-meaningful="true" style="box-sizing: border-box;">Just use the except clause for other column comparisons.</li>
</ul>
<div class="section" id="comparison-code" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Comparison Code</h2>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">compareDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">result</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">],</span> <span class="n" style="box-sizing: border-box;">expected</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">])</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Unit</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">expectedSchemaMap</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">expected</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">dataType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">toMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span>, <span class="kt" style="box-sizing: border-box; color: #ffcc66;">DataType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">resSchemaMap</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">result</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">schema</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">sf</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">dataType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">toMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span>, <span class="kt" style="box-sizing: border-box; color: #ffcc66;">DataType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">name</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">dType</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">NumericType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span>
<span class="n" style="box-sizing: border-box;">assert</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">compareNumericTypes</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">result</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">expected</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">resSchemaMap</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">),</span> <span class="n" style="box-sizing: border-box;">dType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">name</span><span class="o" style="box-sizing: border-box; color: #66cccc;">),</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">s"</span><span class="si" style="box-sizing: border-box; color: #f99157;">$name</span><span class="s" style="box-sizing: border-box; color: #99cc99;"> column was not equal"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="n" style="box-sizing: border-box;">kv</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span>, <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span>
<span class="n" style="box-sizing: border-box;">assert</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">result</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">kv</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">_1</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">except</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">result</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">kv</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">_1</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">count</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">==</span> <span class="mi" style="box-sizing: border-box; color: #f99157;">0</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">s"</span><span class="si" style="box-sizing: border-box; color: #f99157;">${</span><span class="n" style="box-sizing: border-box;">kv</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">_1</span><span class="si" style="box-sizing: border-box; color: #f99157;">}</span><span class="s" style="box-sizing: border-box; color: #99cc99;"> column was not equal"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">compareNumericTypes</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">result</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">],</span> <span class="n" style="box-sizing: border-box;">expected</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">],</span> <span class="n" style="box-sizing: border-box;">resType</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">DataType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">expType</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">DataType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">colName</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">precision</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Double</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="mf" style="box-sizing: border-box; color: #f99157;">0.01</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Boolean</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//collect Results</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">res</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">extractAndSortNumericRow</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">result</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">resType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">exp</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">extractAndSortNumericRow</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">expected</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">expType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//compare lengths first</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">if</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">res</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">length</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">!=</span> <span class="n" style="box-sizing: border-box;">exp</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">length</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">return</span> <span class="kc" style="box-sizing: border-box; color: #cc99cc;">false</span>
<span class="n" style="box-sizing: border-box;">res</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Integer</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">*)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">|</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Long</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">*)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">!</span><span class="n" style="box-sizing: border-box;">res</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">zip</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">exp</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">exists</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">zipped</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">safelyGet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">zipped</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">_1</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">longValue</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-</span> <span class="n" style="box-sizing: border-box;">safelyGet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">zipped</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">_2</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">longValue</span><span class="o" style="box-sizing: border-box; color: #66cccc;">())</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">!=</span> <span class="mi" style="box-sizing: border-box; color: #f99157;">0L</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Float</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">*)</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">|</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Double</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="o" style="box-sizing: border-box; color: #66cccc;">*)</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">!</span><span class="n" style="box-sizing: border-box;">res</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">zip</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">exp</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">exists</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">zipped</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">safelyGet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">zipped</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">_1</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">doubleValue</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">-</span> <span class="n" style="box-sizing: border-box;">safelyGet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">zipped</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">_2</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">doubleValue</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()).</span><span class="n" style="box-sizing: border-box;">abs</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">>=</span> <span class="n" style="box-sizing: border-box;">precision</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//upcast types</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">safelyGet</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">T</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">>:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Number</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="n" style="box-sizing: border-box;">v</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">T</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">T</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">v</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Long</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">|</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Integer</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="n" style="box-sizing: border-box;">java</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">lang</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">Long</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">parseLong</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">v</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Float</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">|</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Double</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span>
<span class="n" style="box-sizing: border-box;">java</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">lang</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="nc" style="box-sizing: border-box; color: #ffcc66;">Double</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">parseDouble</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">v</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">toString</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">v</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//map internal spark types to java types.</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">extractAndSortNumericRow</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">T</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;"><: span=""> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">NumericType</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="n" style="box-sizing: border-box;">df</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">],</span> <span class="n" style="box-sizing: border-box;">colName</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">dt</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">T</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Seq</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Number</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">import</span> <span class="nn" style="box-sizing: border-box; color: #ffcc66;">ss.implicits._</span>
<span class="n" style="box-sizing: border-box;">dt</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">match</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">LongType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">row</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Long</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="mi" style="box-sizing: border-box; color: #f99157;">0</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">sort</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">collect</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">IntegerType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">row</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Integer</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="mi" style="box-sizing: border-box; color: #f99157;">0</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">sort</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">collect</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">DoubleType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">row</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Double</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="mi" style="box-sizing: border-box; color: #f99157;">0</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">sort</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">collect</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">FloatType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">row</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.lang.Float</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="mi" style="box-sizing: border-box; color: #f99157;">0</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">sort</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">collect</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">case</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">DecimalType</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=></span> <span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">select</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">colName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">map</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">row</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="n" style="box-sizing: border-box;">row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">getAs</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">java.math.BigDecimal</span><span class="o" style="box-sizing: border-box; color: #66cccc;">](</span><span class="mi" style="box-sizing: border-box; color: #f99157;">0</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)).</span><span class="n" style="box-sizing: border-box;">sort</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">collect</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</:></span></pre>
</div>
<div ct-id="9" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
The code above does the heavylifting for doing comparisons for dataframes. Now all we need is a simple function that invokes the transforms and some simple scalatest testing code showing all this in action.</div>
<div ct-id="10" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Function that invokes the transform and does comparison:</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">invokeAndCompare</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">testFileName</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">expectedFileName</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">func</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=></span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">])</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Unit</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">df</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">readJsonDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">testFileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">expected</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">readJsonDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">expectedFileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">transformResult</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="n" style="box-sizing: border-box;">func</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">df</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="n" style="box-sizing: border-box;">compareDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">transformResult</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">expected</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">def</span> <span class="n" style="box-sizing: border-box;">readJsonDF</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fileName</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">String</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span><span class="k" style="box-sizing: border-box; color: #cc99cc;">:</span> <span class="kt" style="box-sizing: border-box; color: #ffcc66;">Dataset</span><span class="o" style="box-sizing: border-box; color: #66cccc;">[</span><span class="kt" style="box-sizing: border-box; color: #ffcc66;">Row</span><span class="o" style="box-sizing: border-box; color: #66cccc;">]</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">read</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">json</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">fileName</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</div>
<div class="section" id="testing-code" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Testing Code</h2>
<div ct-id="11" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Just utilize <tt class="docutils literal" style="box-sizing: border-box;">ScalaTest</tt>. Here is how a test looks like for your transforms.</div>
<div class="highlight" style="background: rgb(45, 45, 45); box-sizing: border-box; color: #cccccc;">
<pre style="box-shadow: rgb(136, 136, 136) 2px 2px 2px; box-sizing: border-box; font-family: monospace, serif; font-size: 13.2px; margin-bottom: 1em; margin-top: 1em; overflow-wrap: break-word; padding: 15px; white-space: pre-wrap;"><span style="box-sizing: border-box;"></span><span class="k" style="box-sizing: border-box; color: #cc99cc;">class</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">RandomTransformsTest</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">extends</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">FlatSpec</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">with</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">Matchers</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">with</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">BeforeAndAfter</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="n" style="box-sizing: border-box;">after</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="c1" style="box-sizing: border-box; color: #999999;">//close spark session</span>
<span class="n" style="box-sizing: border-box;">ss</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">close</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="n" style="box-sizing: border-box;">before</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">ss</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">SparkSession</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">builder</span><span class="o" style="box-sizing: border-box; color: #66cccc;">().</span><span class="n" style="box-sizing: border-box;">master</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="s" style="box-sizing: border-box; color: #99cc99;">"local[*]"</span><span class="o" style="box-sizing: border-box; color: #66cccc;">).</span><span class="n" style="box-sizing: border-box;">getOrCreate</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="s" style="box-sizing: border-box; color: #99cc99;">"testRandomTransform"</span> <span class="n" style="box-sizing: border-box;">should</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">"give correct output for input dataframe"</span> <span class="n" style="box-sizing: border-box;">in</span> <span class="o" style="box-sizing: border-box; color: #66cccc;">{</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">testFileLoc</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">""</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">expectedFileLoc</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="s" style="box-sizing: border-box; color: #99cc99;">""</span>
<span class="c1" data-skip="true" style="box-sizing: border-box; color: #999999;">//just get the function definition, it will be invoked by invokeAndCompare with the dataframe later on.</span>
<span class="k" style="box-sizing: border-box; color: #cc99cc;">val</span> <span class="n" style="box-sizing: border-box;">func</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">=</span> <span class="nc" style="box-sizing: border-box; color: #ffcc66;">RandomTransforms</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">someRandomFunc</span><span class="o" style="box-sizing: border-box; color: #66cccc;">()</span> <span class="k" style="box-sizing: border-box; color: #cc99cc;">_</span>
<span class="nc" style="box-sizing: border-box; color: #ffcc66;">SomeObject</span><span class="o" style="box-sizing: border-box; color: #66cccc;">.</span><span class="n" style="box-sizing: border-box;">invokeAndCompare</span><span class="o" style="box-sizing: border-box; color: #66cccc;">(</span><span class="n" style="box-sizing: border-box;">testFileLoc</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">expectedFileLoc</span><span class="o" style="box-sizing: border-box; color: #66cccc;">,</span> <span class="n" style="box-sizing: border-box;">func</span><span class="o" style="box-sizing: border-box; color: #66cccc;">)</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
<span class="o" style="box-sizing: border-box; color: #66cccc;">}</span>
</pre>
</div>
</div>
<div class="section" id="wrap-up" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Wrap Up:</h2>
<div ct-id="12" data-meaningful="true" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
So, there we go, testing made easy for Spark dataframes. It requires some tedious mapping for decimal numbers, but once developed, tests are easy to write for all your dataframe transforms.</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-49696705764050404022017-10-28T05:39:00.000+05:302019-01-18T05:40:51.468+05:30Parallel Orchestration of Spark ETL Processing<div dir="ltr" style="text-align: left;" trbidi="on">
<div ct-id="1" data-meaningful="true" data-skip="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
I have been working a lot on <span style="box-sizing: border-box; font-weight: 700;">Spark</span> and <span style="box-sizing: border-box; font-weight: 700;">Scala</span>. I have really like scala as a language, due to its numerous advantages over Java, the foremost being that for a simpler API having <tt class="docutils literal" style="box-sizing: border-box;">Type classes</tt> and <tt class="docutils literal" style="box-sizing: border-box;">Default Method Arguments </tt>does wonders. Also, idiomatic scala code uses higher order functions, so it encourages a functional style of programming.</div>
<div ct-id="2" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
I also like spark a lot, but couldn't stand the inefficient way it was being used, i.e. processing a bunch of sql queries sequentially. I strongly believe, Spark wasn't designed to be used this way.</div>
<div ct-id="3" data-meaningful="true" data-skip="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
A <tt class="docutils literal" style="box-sizing: border-box;">SparkSession</tt> supports executing multiple queries in parallel provided of course that they are independent. So, there was a clear optimization opportunity in <tt class="docutils literal" style="box-sizing: border-box;">Orchestrating</tt> i.e. wresting control of execution, whilst providing sufficient callback mechanisms. Thus, I developed a framework which given a set of queries and their dependencies builds a DAG <a class="footnote-reference" href="http://orastack.com/parallel-orchestration-of-spark-etl-processing.html#id3" id="id1" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">[1]</a>. It then uses dynamic programming to find out the depth of each node correctly. The idea, then is to to create stages corresponding to the nodes at each depth and as they are independent, they can be executed in parallel.</div>
<img alt="image_dag" data-score="0.000009628177122842916" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgasKiQzRU9EXYqcL6GdYGkB-6-vS9WpQtL0hyjCRfhY_owV9tDz5c6YXHTFaqlnRB_-nmfeWc5PJa6DoMS8Kisbp6m3gq7iIoKWzS9lUEqvs4cftDjFtpxIEW64cau6ZZdlxP14gr9Mss/s1600/Screen+Shot+2017-10-05+at+7.28.26+PM.png" style="border: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; height: auto; max-width: 100%;" /><span style="color: #333332; font-family: sans-serif; font-size: 17.6px;"></span><br />
<div ct-id="4" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
Once we have the <tt class="docutils literal" style="box-sizing: border-box;">DAG's</tt> stages, the execution is pretty straightforward using <tt class="docutils literal" style="box-sizing: border-box;">ExecutorService</tt> and configuring an <tt class="docutils literal" style="box-sizing: border-box;">implicit</tt> instance of the <tt class="docutils literal" style="box-sizing: border-box;">ExecutionContext</tt> to use the configured <tt class="docutils literal" style="box-sizing: border-box;">ExecutorService</tt>.</div>
<div ct-id="5" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
In general, for a framework, once you wrest control of execution there are numerous advantages, some of the potent ones are re-usability, optimization and maintainability.</div>
<div ct-id="6" data-meaningful="true" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin-bottom: 1em; margin-top: 1em;">
The framework thus developed has the following features:</div>
<ul class="simple" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li ct-id="7" data-meaningful="true" style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Optimization</span> : Parallel execution of query or custom processing steps.</li>
<li style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Global</span> and <span style="box-sizing: border-box; font-weight: 700;">Local</span> bind variable substitutions.</li>
<li style="box-sizing: border-box;">Ability to enable <tt class="docutils literal" style="box-sizing: border-box;">explain plan</tt> by turning on configuration option.</li>
<li style="box-sizing: border-box;"><tt class="docutils literal" style="box-sizing: border-box;">JSON</tt> based logging using a <tt class="docutils literal" style="box-sizing: border-box;">AsyncAppender</tt> <a class="footnote-reference" href="http://orastack.com/parallel-orchestration-of-spark-etl-processing.html#id4" id="id2" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">[2]</a> (this is essentially a <tt class="docutils literal" style="box-sizing: border-box;">BlockingQueue</tt>, as multiple threads can write and only a single consumer should write to the log file.), so can be easily integrated with <tt class="docutils literal" style="box-sizing: border-box;">splunk</tt>.</li>
<li style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Custom UDF</span> registration and default registration of a bunch of common UDF's.</li>
<li style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Custom hooks</span> into the execution by implementing a <tt class="docutils literal" style="box-sizing: border-box;">trait</tt> which is then invoked at the right stage by the <tt class="docutils literal" style="box-sizing: border-box;">Orchestrator</tt> (Inversion of Control).</li>
<li ct-id="8" data-meaningful="true" style="box-sizing: border-box;">Configuration based coding (users don't need to know scala or spark to use it). And,</li>
<li style="box-sizing: border-box;"><span style="box-sizing: border-box; font-weight: 700;">Reusability</span> and <span style="box-sizing: border-box; font-weight: 700;">Maintaenability</span>.</li>
</ul>
<table class="docutils footnote" frame="void" id="id3" rules="none" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;"><colgroup style="box-sizing: border-box;"><col class="label" style="box-sizing: border-box;"></col><col style="box-sizing: border-box;"></col></colgroup><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td class="label" style="box-sizing: border-box;"><a class="fn-backref" href="http://orastack.com/parallel-orchestration-of-spark-etl-processing.html#id1" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">[1]</a></td><td style="box-sizing: border-box;">DAG: <a class="reference external" href="https://en.wikipedia.org/wiki/Directed_acyclic_graph" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">https://en.wikipedia.org/wiki/Directed_acyclic_graph</a></td></tr>
</tbody></table>
<table class="docutils footnote" frame="void" id="id4" rules="none" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px;"><colgroup style="box-sizing: border-box;"><col class="label" style="box-sizing: border-box;"></col><col style="box-sizing: border-box;"></col></colgroup><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td class="label" style="box-sizing: border-box;"><a class="fn-backref" href="http://orastack.com/parallel-orchestration-of-spark-etl-processing.html#id2" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">[2]</a></td><td style="box-sizing: border-box;">AsyncAppender: <a class="reference external" href="https://logback.qos.ch/manual/appenders.html#AsyncAppender" style="box-sizing: border-box; color: #8e8ed6; text-decoration-line: none;">https://logback.qos.ch/manual/appenders.html#AsyncAppender</a><br /></td></tr>
</tbody></table>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-30317532373630282082016-06-26T21:54:00.000+05:302016-06-26T21:58:08.297+05:30Apache Zeppelin Notebooks<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
<a class="reference external" href="https://zeppelin.incubator.apache.org/" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">Apache Zeppelin</a> provides a Web-UI where you can iteratively build spark scripts in Scala, Python, etc. (It also provides autocomplete support), run Sparkql queries against Hive or other store and visualize the results from the query or spark dataframes. This is somewhat akin to what Ipython notebooks do for python. Spark developers know that building, testing and fixing errors in spark scripts can be a lengthy process (It is also dull because it is not interactive), but if you use Apache Zeppelin, you can iteratively buld and test portions of your script and this will enhance your productivity significantly.</div>
<div class="section" id="installing-and-configuring-apache-zeppelin" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Installing and Configuring Apache Zeppelin</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Ensure following prerequisites are installed</div>
<ul class="simple" style="box-sizing: border-box; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li style="box-sizing: border-box;">Java 8: <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">su -c yum install java-1.8.0-openjdk-devel</code></li>
<li style="box-sizing: border-box;">Maven 3.1.x+: <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">sudo yum install apache-maven</code> and then link it <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">sudo ln -s /usr/share/apache-maven/bin/mvn /usr/bin/mvn</code>. If this does not work for you, you can install it the following way.
<pre class="brush:bash">wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo tar -zxf apache-maven-3.3.3-bin.tar.gz -C /usr/local/
sudo ln -s /usr/local/apache-maven-3.3.3/bin/mvn /usr/local/bin/mvn
</pre>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Git: <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">sudo yum install git</code></div>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
NPM: <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">yum install nodejs npm</code></div>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Either download the source code from <a class="reference external" href="https://zeppelin.incubator.apache.org/download.html" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">here</a> or clone the git repository in a folder as <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">git clone https://github.com/apache/incubator-zeppelin.git</code></div>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Build from source, Go to the incubator-zeppelin directory and run the following command from it.</div>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
<code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">mvn clean package -Pspark-1.5 -Ppyspark -Dhadoop.version<span class="o" style="box-sizing: border-box;">=</span>2.6.0-cdh5.5.0 -Phadoop-2.6 -Dmaven.test.skip<span class="o" style="box-sizing: border-box;">=</span><span class="nb" style="box-sizing: border-box;">true</span></code></div>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
This command works for version 5.5 of cloudera distribution, make sure your versions of hadoop and spark are correct. In addtion to installing support for spark, this command will configure zeppelin with support for pyspark as well.</div>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
To configure access for hive metastore copy the hive-site.xml to conf directory under zeppelin.</div>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
In the conf folder create copies of files zeppelin-env.sh.template and zeppelin-site.xml.template as zeppelin-env.sh and zeppelin-site.xml respectively.</div>
</li>
<li style="box-sizing: border-box;"><div class="first" style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
If you would like to change the port for zeppelin, change the following property in zeppelin-site.xml.</div>
<pre class="brush:xml"><property>
<name>zeppelin.server.port</name>
<value>8999</value>
<description>Server port.</description>
</property>
</pre>
</li>
<li style="box-sizing: border-box;">To start zeppelin use the command <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">./zeppelin-daemon.sh start</code>. Then you can access zeppelin ui at <a class="reference external" href="http://localhost:8999/" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">http://localhost:8999</a> <a class="footnote-reference" href="http://ramannanda.blogspot.com/2016/06/apache-zeppelin-notebooks.html#id2" id="id1" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">[1]</a></li>
<li style="box-sizing: border-box;">To stop zeppelin use the command <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">./zeppelin-daemon.sh stop</code></li>
</ul>
</div>
<h2 style="box-sizing: border-box; color: #333332; font-family: sans-serif; margin: 0.83em 0px;">
Running SparkQL queries against Hive and Visualizing Results</h2>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
In a cell in zeppelin type %hive to activate interpreter with hive ql support. After you do this, you can then run the query and the visualization support is automatically activated in the output. To execute the cell use <code style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">Shift+Enter</code> key.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://3.bp.blogspot.com/qxdfm83Xb0YWJqgwZ_W7veo2uKWcHOaCIHksKwvM-v0bAHNLNevJ_QLeZ7Kz0cjunv31FmaZnumpRckrW0NsPt8B-v47tFWA7Ccdhv9nyESnthwEmI7W5yHURO_2_prLBZa3BN7uFRBivxuikFaGHeOesv-G1KBkdu85Ymj_xKjdXnZ7sX5fqdztPXvHxPc13T0MiMHJO4qU2XkaO8rmnv3P8x4PxxtPWwfn2q0Tiz-KUeIbWv7Q5zMPomZPMcZgy7k2moyLznm1V6H6qIg7KDzG49oQOMzslejz4MZhPs0qoEMYhj-RLHnuLByaAPvvD-OAZ_VxXjy4R4nub0harD7RzzfINJuF9Bybsv-8up7VN5Nlshw0iKLFokoNNJzllRG6dbkbau1kjQw4JuG3rv5BQbzP-OlxZmcfJ9i10ebBahb_HqVyiEojL7wBopNh146RIVhzIHtYzrDXyv3R0EQX_C7o2yzQarkdllwgejHQgtKz1Qkp9HVz-dlcn8-Ud83ZdtlreuogBAdH5b1bFLz5U5kFgZrCiw6UCSLg_Ja-VVMAVcZV7y8zRNiy_WygEirA3Olk70zsTfXGACtl_CMW-JDEf8s=w2863-h1098-no" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="244" src="https://3.bp.blogspot.com/qxdfm83Xb0YWJqgwZ_W7veo2uKWcHOaCIHksKwvM-v0bAHNLNevJ_QLeZ7Kz0cjunv31FmaZnumpRckrW0NsPt8B-v47tFWA7Ccdhv9nyESnthwEmI7W5yHURO_2_prLBZa3BN7uFRBivxuikFaGHeOesv-G1KBkdu85Ymj_xKjdXnZ7sX5fqdztPXvHxPc13T0MiMHJO4qU2XkaO8rmnv3P8x4PxxtPWwfn2q0Tiz-KUeIbWv7Q5zMPomZPMcZgy7k2moyLznm1V6H6qIg7KDzG49oQOMzslejz4MZhPs0qoEMYhj-RLHnuLByaAPvvD-OAZ_VxXjy4R4nub0harD7RzzfINJuF9Bybsv-8up7VN5Nlshw0iKLFokoNNJzllRG6dbkbau1kjQw4JuG3rv5BQbzP-OlxZmcfJ9i10ebBahb_HqVyiEojL7wBopNh146RIVhzIHtYzrDXyv3R0EQX_C7o2yzQarkdllwgejHQgtKz1Qkp9HVz-dlcn8-Ud83ZdtlreuogBAdH5b1bFLz5U5kFgZrCiw6UCSLg_Ja-VVMAVcZV7y8zRNiy_WygEirA3Olk70zsTfXGACtl_CMW-JDEf8s=w2863-h1098-no" width="640" /></a></div>
<h2 style="box-sizing: border-box; color: #333332; font-family: sans-serif; margin: 0.83em 0px;">
Bulding scala scripts and plotting model outputs</h2>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
You can also code in scala or python by activating the interpreter. Scala and Spark interpreter is activated by default for a cell.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://3.bp.blogspot.com/kAkFkIFxtVItDZsyKwitwEB9bZkXFtefEQ31FcEtB_suh9eHBrqG6Od_D5Ckiv0hiR8iv-2IPBZkI_DxU1Zsh_FpzyeFMSpQDh7JDu0nYNk_AL7DkztL-HWKtrRNwBhYTDPTNOFfVFNWAqnuET5Qq-1Ag__qkkX1ge3UQI3VcRJXD3txBh3S1GO02kIDYpQra2jUOY66WDWCbzTKQHKu1VBW_70mjPtWW0bS-c4A-IMIdH9GcecFQ0dRxWfT6ULbPFb8Q5UWpbDQJl_u97oESLM5fgaSFrDzZKE5LDn0BMXaIOVxLzH55MSuVv5YFM9V5J_xt9vW5lSA4kNOUvt6Xgfwxh_rAeryUR91my_43sAu9ML6jEiIASdFiK5CEn_yTMrOptUnmKL6wtJRdIEIRxjOokUU13xp1W_L0-oyvT3M20Zh9rs1o5BWj6F408hoO4YQMc52f8NdW0KwTd_IO9yCCsGNavu4fhrDSHfoKB_Pl6ZNJGzkjAVDRFG9TPg7XG7HDJE3goIljLn4lzl403pjvSI5-flrTDKlCc5YlusC0TR3eMhqyUSd7pBH7BcDg8xqfmiZRuqOf8YRsZMvTaR-RFfXs44=w3113-h568-no" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="116" src="https://3.bp.blogspot.com/kAkFkIFxtVItDZsyKwitwEB9bZkXFtefEQ31FcEtB_suh9eHBrqG6Od_D5Ckiv0hiR8iv-2IPBZkI_DxU1Zsh_FpzyeFMSpQDh7JDu0nYNk_AL7DkztL-HWKtrRNwBhYTDPTNOFfVFNWAqnuET5Qq-1Ag__qkkX1ge3UQI3VcRJXD3txBh3S1GO02kIDYpQra2jUOY66WDWCbzTKQHKu1VBW_70mjPtWW0bS-c4A-IMIdH9GcecFQ0dRxWfT6ULbPFb8Q5UWpbDQJl_u97oESLM5fgaSFrDzZKE5LDn0BMXaIOVxLzH55MSuVv5YFM9V5J_xt9vW5lSA4kNOUvt6Xgfwxh_rAeryUR91my_43sAu9ML6jEiIASdFiK5CEn_yTMrOptUnmKL6wtJRdIEIRxjOokUU13xp1W_L0-oyvT3M20Zh9rs1o5BWj6F408hoO4YQMc52f8NdW0KwTd_IO9yCCsGNavu4fhrDSHfoKB_Pl6ZNJGzkjAVDRFG9TPg7XG7HDJE3goIljLn4lzl403pjvSI5-flrTDKlCc5YlusC0TR3eMhqyUSd7pBH7BcDg8xqfmiZRuqOf8YRsZMvTaR-RFfXs44=w3113-h568-no" width="640" /></a></div>
<div class="section" id="bulding-scala-scripts-and-plotting-model-outputs" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
To <span style="box-sizing: border-box; font-weight: 700;">visualize</span> the spark dataframe just use <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">z.show<span class="o" style="box-sizing: border-box;">(</span>df<span class="o" style="box-sizing: border-box;">)</span></code> command.</div>
</div>
<div class="section" id="writing-documentation" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Writing documentation</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Activate the markdown support in a cell by using <code class="inlinecode shell" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;">%md</code>. You can then add documentation along with your code. Unfortunately, the support for latex is still not there, but it should be there in future releases.</div>
</div>
<div class="section" id="what-s-missing" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
What's missing ?</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Unlike ipython notebooks, there is no option to export to html or pdf(using latex). Also, the support for embedding latex expressions is missing, but these features should be added in future releases.</div>
</div>
<div class="section" id="conclusion" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Conclusion</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Although certain features are missing, <a class="reference external" href="https://zeppelin.incubator.apache.org/" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">Apache Zeppelin</a> surely helps you in increasing your productivity by reducing the time required for build, test and fix cycle. Also, it provides nice visualization capabilities for your queries and dataframes.</div>
<table class="docutils footnote" frame="void" id="id2" rules="none" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box;"><colgroup style="box-sizing: border-box;"><col class="label" style="box-sizing: border-box;"></col><col style="box-sizing: border-box;"></col></colgroup><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td class="label" style="box-sizing: border-box;"><a class="fn-backref" href="http://ramannanda.blogspot.com/2016/06/apache-zeppelin-notebooks.html#id1" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">[1]</a></td><td style="box-sizing: border-box;">If you changed the port.</td></tr>
</tbody></table>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-33230512224767164172016-06-26T21:39:00.000+05:302016-06-26T22:00:34.678+05:30Machine learning with Apache Spark, Scala and Hive<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
<span style="box-sizing: border-box; font-weight: 700;">Apache spark</span> has an advanced <span style="box-sizing: border-box; font-weight: 700;">DAG execution engine</span> and supports in <span style="box-sizing: border-box; font-weight: 700;">memory computation</span>. In memory computation combined with DAG execution leads to a far better performance than running map reduce jobs. In this post, I will show an example of using Linear regression with Apache Spark. The dataset is NYC-Yellow taxi dataset for a particular month in 2015. The data was filtered to extract records for a day.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
This example uses <tt class="docutils literal" style="box-sizing: border-box;">HiveContext</tt> <a class="footnote-reference" href="http://ramannanda.blogspot.com/2016/06/machine-learning-with-apache-spark.html#id4" id="id1" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">[1]</a> which is an instance of Spark SQL execution engine that integrates with Hive data store. The dataset has the following features.</div>
<table border="1" class="docutils" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;"><colgroup style="box-sizing: border-box;"><col style="box-sizing: border-box;" width="75%"></col><col style="box-sizing: border-box;" width="25%"></col></colgroup><thead style="box-sizing: border-box;" valign="bottom">
<tr style="box-sizing: border-box;"><th class="head" style="box-sizing: border-box;">Feature Name</th><th class="head" style="box-sizing: border-box;">Feature Data Type</th></tr>
</thead><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">trip_distance</td><td style="box-sizing: border-box;">Double</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">duration (journey_end_time-journey_start_time)</td><td style="box-sizing: border-box;">Double</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">store_and_forward_flag(categorical, requires convertion)</td><td style="box-sizing: border-box;">String "Y/N"</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">ratecodeid( categorical, requires convertion)</td><td style="box-sizing: border-box;">Int</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">start_hour</td><td style="box-sizing: border-box;">Int</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">start_minute</td><td style="box-sizing: border-box;">Int</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">start_second</td><td style="box-sizing: border-box;">Int</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">fare_amount(target variable)</td><td style="box-sizing: border-box;">Double</td></tr>
</tbody></table>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
We want to predict the fare_amount given the set of features. As fare is a continuous variable, so the task of predicting fare requires a regression model.</div>
<div class="section" id="things-to-consider" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Things to consider:</h2>
<ul class="simple" style="box-sizing: border-box; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li style="box-sizing: border-box;">To obtain the data into the dataframe, we must first query the hive store using <tt class="docutils literal" style="box-sizing: border-box;">hiveCtxt.sql()</tt> method. We can drop invalid records using <tt class="docutils literal" style="box-sizing: border-box;">na.drop()</tt> <a class="footnote-reference" href="http://ramannanda.blogspot.com/2016/06/machine-learning-with-apache-spark.html#id5" id="id2" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">[2]</a> on the obtained dataframe and then cache it using<tt class="docutils literal" style="box-sizing: border-box;">cache()</tt> method for later use.</li>
<li style="box-sizing: border-box;">The two categorical variables need to be converted to vector representation. This is done by using<tt class="docutils literal" style="box-sizing: border-box;">StringIndexer</tt> and <tt class="docutils literal" style="box-sizing: border-box;">OneHotEncoder</tt>. Look at the method <tt class="docutils literal" style="box-sizing: border-box;">preprocessFeatures()</tt> in the code below.</li>
<li style="box-sizing: border-box;">Models can be saved by serializing them as <tt class="docutils literal" style="box-sizing: border-box;">sc.parallelize(Seq(model),<span class="pre" style="box-sizing: border-box;">1).saveAsObjectFile("nycyellow.model")</span></tt> and can be used by deserializing them<tt class="docutils literal" style="box-sizing: border-box;">sc.objectFile[CrossValidatorModel]("nycyellow.model").first()</tt>. Newer spark api supports OOTB methods for doing this and using those methods is recommended.</li>
<li style="box-sizing: border-box;">Data can be split into training and testing data by using <tt class="docutils literal" style="box-sizing: border-box;">randomSplit()</tt> method on the DataFrame. Although if you are using cross validation, it is recommended to train the model on the entire sample dataset.</li>
<li style="box-sizing: border-box;">The features in the dataframe must be transformed using <tt class="docutils literal" style="box-sizing: border-box;">VectorAssembler</tt> into the vector representation and the column should be named as <span style="box-sizing: border-box; font-weight: 700;">features</span>. The target variable should be renamed as <span style="box-sizing: border-box; font-weight: 700;">label</span>, you can use <tt class="docutils literal" style="box-sizing: border-box;">withColumnRenamed()</tt> function to do so.</li>
<li style="box-sizing: border-box;">Cross validation can be performed using <tt class="docutils literal" style="box-sizing: border-box;">CrossValidatorModel</tt> and estimator can be set by<tt class="docutils literal" style="box-sizing: border-box;">setEstimator()</tt>.</li>
<li style="box-sizing: border-box;">The evaluator chosen depends on whether you are doing classification or regression. In this case, we would use <tt class="docutils literal" style="box-sizing: border-box;">RegressionEvaluator</tt></li>
<li style="box-sizing: border-box;">You can specify different values for parameters such as regularization parameter, number of iterations and those would be used by CrossValidatorModel to come up with the best set of parameters for your model.</li>
<li style="box-sizing: border-box;">After this you can fit the model with the dataset and evaluate its performance. In this case, as we are testing regression model accuracy. We can use <tt class="docutils literal" style="box-sizing: border-box;">RegressionMetrics</tt> to compare the predicted_fare vs actual_fare. The measures that can be used are <a class="reference external" href="https://en.wikipedia.org/wiki/Coefficient_of_determination" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">R-Squared</a> (r2), <a class="reference external" href="https://en.wikipedia.org/wiki/Mean_absolute_error" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">Mean Absolute Error</a>.</li>
<li style="box-sizing: border-box;">For new predictions the saved model can be reused. The new data needs to be transformed into the same format as was used to train the model. To do so we must first create a dataframe using<tt class="docutils literal" style="box-sizing: border-box;">StructType</tt> to specify its structure, then preprocess features the same way by invoking<tt class="docutils literal" style="box-sizing: border-box;">preprocessFeatures()</tt> method.</li>
<li style="box-sizing: border-box;">The data can be visualized using Apache Zeppelin <a class="footnote-reference" href="http://ramannanda.blogspot.com/2016/06/machine-learning-with-apache-spark.html#id6" id="id3" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">[3]</a>.</li>
</ul>
</div>
<div class="section" id="the-code" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
The code.</h2>
<div>
<br /></div>
</div>
</div>
<pre class="brush:scala;">import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.DataFrame
import org.apache.spark.ml.PipelineStage
import org.apache.spark.ml.feature.StringIndexer
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.OneHotEncoder
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.{ Vector, Vectors }
import org.apache.spark.sql.Row;
import org.apache.spark.ml.tuning.CrossValidator
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.mllib.evaluation.RegressionMetrics
import org.apache.spark.ml.tuning.ParamGridBuilder
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types.DoubleType
import org.apache.spark.sql.types.StructType
import org.apache.spark.ml.evaluation.RegressionEvaluator
import akka.dispatch.Foreach
import org.apache.spark.sql.DataFrame
import org.apache.spark.ml.PipelineModel
import org.apache.hadoop.mapred.InvalidInputException
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.ml.tuning.CrossValidatorModel
import scala.collection.mutable.ListBuffer
import edu.nyu.realtimebd.analytics.nyctaxi.domain.NYCDomain.NYCParams
import org.apache.spark.sql.types.IntegerType
/*
*@Author Ramandeep Singh
*/
object Analytics {
val sparkConf = new SparkConf().setAppName("NYC-TAXI-ANALYSIS").setMaster("local")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val hiveCtxt = new HiveContext(sc)
var df: DataFrame = _
def initializeDataFrame(query: String): DataFrame = {
//cache the dataframe
if (df == null) {
df = hiveCtxt.sql(query).na.drop().cache()
}
return df
}
def preprocessFeatures(df: DataFrame): DataFrame = {
val stringColumns = Array("store_and_fwd_flag", "ratecodeid")
var indexModel: PipelineModel = null;
var oneHotModel: PipelineModel = null;
try {
indexModel = sc.objectFile[PipelineModel]("nycyellow.model.indexModel").first()
} catch {
case e: InvalidInputException => println()
}
if (indexModel == null) {
val stringIndexTransformer: Array[PipelineStage] = stringColumns.map(
cname => new StringIndexer().setInputCol(cname).setOutputCol(s"${cname}_index"))
val indexedPipeline = new Pipeline().setStages(stringIndexTransformer)
indexModel = indexedPipeline.fit(df)
sc.parallelize(Seq(indexModel), 1).saveAsObjectFile("nycyellow.model.indexModel")
}
var df_indexed = indexModel.transform(df)
stringColumns.foreach { x => df_indexed = df_indexed.drop(x) }
val indexedColumns = df_indexed.columns.filter(colName => colName.contains("_index"))
val oneHotEncodedColumns = indexedColumns
try {
oneHotModel = sc.objectFile[PipelineModel]("nycyellow.model.onehot").first()
} catch {
case e: InvalidInputException => println()
}
if (oneHotModel == null) {
val oneHotTransformer: Array[PipelineStage] = oneHotEncodedColumns.map { cname =>
new OneHotEncoder().
setInputCol(cname).setOutputCol(s"${cname}_vect")
}
val oneHotPipeline = new Pipeline().setStages(oneHotTransformer)
oneHotModel = oneHotPipeline.fit(df_indexed)
sc.parallelize(Seq(oneHotModel), 1).saveAsObjectFile("nycyellow.model.onehot")
}
df_indexed = oneHotModel.transform(df_indexed)
indexedColumns.foreach { colName => df_indexed = df_indexed.drop(colName) }
df_indexed
}
def buildPriceAnalysisModel(query: String) {
initializeDataFrame(query)
var df_indexed = preprocessFeatures(df)
df_indexed.columns.foreach(x => println("Preprocessed Columns Model Training" + x))
val df_splitData: Array[DataFrame] = df_indexed.randomSplit(Array(0.7, 0.3), 11l)
val trainData = df_splitData(0)
val testData = df_splitData(1)
//drop target variable
val testData_x = testData.drop("fare_amount")
val testData_y = testData.select("fare_amount")
val columnsToTransform = trainData.drop("fare_amount").columns
//Make feature vector
val vectorAssembler = new VectorAssembler().
setInputCols(columnsToTransform).setOutputCol("features")
columnsToTransform.foreach { x => println(x) }
val trainDataTemp = vectorAssembler.transform(trainData).withColumnRenamed("fare_amount", "label")
val testDataTemp = vectorAssembler.transform(testData_x)
val trainDataFin = trainDataTemp.select("features", "label")
val testDataFin = testDataTemp.select("features")
val linearRegression = new LinearRegression()
trainDataFin.columns.foreach(x => println("Final Column =>" + x))
trainDataFin.take(1)
//Params for tuning the model.
val paramGridMap = new ParamGridBuilder()
.addGrid(linearRegression.maxIter, Array(10, 100, 1000))
.addGrid(linearRegression.regParam, Array(0.1, 0.01, 0.001, 1, 10)).build()
//5 fold cross validation
val cv = new CrossValidator().setEstimator(linearRegression).
setEvaluator(new RegressionEvaluator()).setEstimatorParamMaps(paramGridMap).setNumFolds(5)
//Fit the model
val model = cv.fit(trainDataFin)
val modelResult = model.transform(testDataFin)
val predictionAndLabels = modelResult.map(r => r.getAs[Double]("prediction")).zip(testData_y.map(R => R.getAs[Double](0)))
val regressionMetrics = new RegressionMetrics(predictionAndLabels)
//Print the results
println(s"R-Squared= ${regressionMetrics.r2}")
println(s"Explained Variance=${regressionMetrics.explainedVariance}")
println(s"MAE= ${regressionMetrics.meanAbsoluteError}")
val lrModel = model.bestModel.asInstanceOf[LinearRegressionModel]
println(lrModel.explainParams())
println(lrModel.weights)
sc.parallelize(Seq(model), 1).saveAsObjectFile("nycyellow.model")
}
def predictFare(list: ListBuffer[NYCParams]): DataFrame = {
var nycModel: CrossValidatorModel = null;
try {
nycModel = sc.objectFile[CrossValidatorModel]("nycyellow.model").first()
} catch {
case e: InvalidInputException => println()
}
if (nycModel == null) {
buildPriceAnalysisModel("""select
trip_distance,
(cast(journey_end_time as double)-cast(journey_start_time as double)) as duration,
store_and_fwd_flag,
ratecodeid,
hour(journey_start_time) as start_hour,
minute(journey_start_time) as start_minute,
second(journey_start_time) as start_second,
fare_amount from nyc_taxi_data_limited
where start_latitude <> 0 and trip_distance >0
and journey_end_time>journey_start_time and
trip_distance <200 and fare_amount>1 limit 12000""")
}
nycModel = sc.objectFile[CrossValidatorModel]("nycyellow.model").first()
var schema = StructType(Array(
StructField("trip_distance", DoubleType, true),
StructField("duration", DoubleType, true),
StructField("store_and_fwd_flag", StringType, true),
StructField("ratecodeid", DoubleType, true),
StructField("start_hour", IntegerType, true),
StructField("start_minute", IntegerType, true),
StructField("start_second", IntegerType, true)))
var rows: ListBuffer[Row] = new ListBuffer
list.foreach(x => rows += Row(x.trip_distance, x.duration, x.store_and_fwd_flag, x.ratecodeid, x.start_hour, x.start_minute, x.start_second))
val row = sc.parallelize(rows)
var dfStructure = sqlContext.createDataFrame(row, schema)
var preprocessed = preprocessFeatures(dfStructure)
preprocessed.columns.foreach(x => println("Preprocessed Columns " + x))
val vectorAssembler = new VectorAssembler().
setInputCols(preprocessed.columns).setOutputCol("features")
preprocessed = vectorAssembler.transform(preprocessed)
var results = nycModel.transform(preprocessed.select("features"))
results
}
}
</pre>
<h2 style="box-sizing: border-box; color: #333332; font-family: sans-serif; margin: 0.83em 0px;">
Results</h2>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
Upon training the model, it gave the following results against the test data set.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
R-Squared= 0.954496421456682</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
MAE= 1.1704343793855545</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
To predict the fares for our inputs we can invoke <tt class="docutils literal" style="box-sizing: border-box;">predictFare()</tt> method. Example code to do so is mentioned below.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
</div>
<pre class="brush:scala">class TestAnalytics {
def main(args: Array[String]) {
var testAnalytics = Analytics
val testData = new ListBuffer[NYCParams]()
testData += NYCParams(10.6, 600.0, "N", 1.0, 10, 2, 33)
var result = testAnalytics.predictFare(testData)
result.describe().show()
}
}
</pre>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
After the initial invocation all the models are stored in the directory from which the execution is carried out.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
For the sample request above the result is shown below.</div>
<blockquote style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin: 1em 40px;">
<table border="1" class="docutils" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box;"><colgroup style="box-sizing: border-box;"><col style="box-sizing: border-box;" width="28%"></col><col style="box-sizing: border-box;" width="72%"></col></colgroup><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">summary</td><td style="box-sizing: border-box;">prediction</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">count</td><td style="box-sizing: border-box;">1</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">mean</td><td style="box-sizing: border-box;">31.146162583102516</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">stddev</td><td style="box-sizing: border-box;">0.0</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">min</td><td style="box-sizing: border-box;">31.146162583102516</td></tr>
<tr style="box-sizing: border-box;"><td style="box-sizing: border-box;">max</td><td style="box-sizing: border-box;">31.146162583102516</td></tr>
</tbody></table>
</blockquote>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
This prediction shows that the journey for 10.6 miles, if covered in 10 minutes, by using NYC yellow taxi would cost roughly 31 dollars.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://1.bp.blogspot.com/SWQnnD8WpnTFL6JCIRiNQwpMKLOz4tNEm9lyhF1drSVFJvedBEqZp4xW7MN1jQCBhs_ZnAPPfUy3ybQ0zQmXT5-TlCmtBXupuTnvCxSNowaBlJu86X6UqYiW9LiMd8iFrVnPQ-igCsvOMfIz8z5JGYarSlu2s-sVQ6YqdUwXm_EbvdkAnWANjzihLRy-lYv3yELhv_IlqONBcTF62HeZW52QFwZcUEU1GLM0Fx6LhvU-91uLqYqRyReB8DWo79txfPyHXPBxce47XBYCCwpLFsN-ai4z_lIb6rjkNesmD8EUqh6Ickz2ILPtrUd2-n8lbmkD4ayUqT2gJMjbgqzR2c0xKMBkANAOvhJzbwE05UyWY_Cmnl0wqMqanObgsiya3MLMgS1sgG1FlMcozfF92-aubYrRLRx43G4Lkf_k6RoQ71qZvv9Z_hqKlKmG_IGFm8lX1ifKHZyc0aL1qwy8oJvZyi5-YBo9Dh--8z5c7jIkA-JgefVmsBaLjFEGkOnS1fpB7Wo7u8CFzIUxZqg9nrlBeDwa13IcsjFSXuntUcn5AnJWpJ3bHTlCOmR1-fMFtUB7ANLnkctA3DwYap94jhrjZsaFzIc=w1740-h606-no" imageanchor="1"><img alt="prediction_vs_actual_plotted" border="0" height="220" src="https://1.bp.blogspot.com/SWQnnD8WpnTFL6JCIRiNQwpMKLOz4tNEm9lyhF1drSVFJvedBEqZp4xW7MN1jQCBhs_ZnAPPfUy3ybQ0zQmXT5-TlCmtBXupuTnvCxSNowaBlJu86X6UqYiW9LiMd8iFrVnPQ-igCsvOMfIz8z5JGYarSlu2s-sVQ6YqdUwXm_EbvdkAnWANjzihLRy-lYv3yELhv_IlqONBcTF62HeZW52QFwZcUEU1GLM0Fx6LhvU-91uLqYqRyReB8DWo79txfPyHXPBxce47XBYCCwpLFsN-ai4z_lIb6rjkNesmD8EUqh6Ickz2ILPtrUd2-n8lbmkD4ayUqT2gJMjbgqzR2c0xKMBkANAOvhJzbwE05UyWY_Cmnl0wqMqanObgsiya3MLMgS1sgG1FlMcozfF92-aubYrRLRx43G4Lkf_k6RoQ71qZvv9Z_hqKlKmG_IGFm8lX1ifKHZyc0aL1qwy8oJvZyi5-YBo9Dh--8z5c7jIkA-JgefVmsBaLjFEGkOnS1fpB7Wo7u8CFzIUxZqg9nrlBeDwa13IcsjFSXuntUcn5AnJWpJ3bHTlCOmR1-fMFtUB7ANLnkctA3DwYap94jhrjZsaFzIc=w1740-h606-no" title="prediction_results_plotted" width="640" /></a></div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
This code is part of a project that I did, to browse the entire repository and access the dataset on Github click <a class="reference external" href="https://github.com/ramannanda9/RT-UBER-NYC-TAXI" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">here</a>.</div>
<table class="docutils footnote" frame="void" id="id4" rules="none" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;"><colgroup style="box-sizing: border-box;"><col class="label" style="box-sizing: border-box;"></col><col style="box-sizing: border-box;"></col></colgroup><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td class="label" style="box-sizing: border-box;"><a class="fn-backref" href="http://ramannanda.blogspot.com/2016/06/machine-learning-with-apache-spark.html#id1" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">[1]</a></td><td style="box-sizing: border-box;">To use hive, hive-site.xml must be placed in spark/conf directory.</td></tr>
</tbody></table>
<table class="docutils footnote" frame="void" id="id5" rules="none" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;"><colgroup style="box-sizing: border-box;"><col class="label" style="box-sizing: border-box;"></col><col style="box-sizing: border-box;"></col></colgroup><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td class="label" style="box-sizing: border-box;"><a class="fn-backref" href="http://ramannanda.blogspot.com/2016/06/machine-learning-with-apache-spark.html#id2" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">[2]</a></td><td style="box-sizing: border-box;">null columns are considered invalid records by ml models.</td></tr>
</tbody></table>
<table class="docutils footnote" frame="void" id="id6" rules="none" style="border-collapse: collapse; border-spacing: 0px; box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;"><colgroup style="box-sizing: border-box;"><col class="label" style="box-sizing: border-box;"></col><col style="box-sizing: border-box;"></col></colgroup><tbody style="box-sizing: border-box;" valign="top">
<tr style="box-sizing: border-box;"><td class="label" style="box-sizing: border-box;"><a class="fn-backref" href="http://ramannanda.blogspot.com/2016/06/machine-learning-with-apache-spark.html#id3" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">[3]</a></td><td style="box-sizing: border-box;">This will be covered in a future post.</td></tr>
</tbody></table>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-25245295870194042742016-06-26T21:27:00.000+05:302016-06-26T21:27:13.402+05:30Migrating to Google Sign-In with Android<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
Google recently has deprecated the <span style="box-sizing: border-box; font-weight: 700;">Google+ Sign in</span> and process of obtaining oauth access tokens via<tt class="docutils literal" style="box-sizing: border-box;">GoogleAuthUtil.getToken</tt> API. Now, they reccomend a single entry point via new Google Sign-In API. The major reasons for doing so are 1. It enhances user experience and 2. It improves security, more <a class="reference external" href="http://android-developers.blogspot.in/2016/05/improving-security-and-user-experience.html" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">here</a>. Also starting with android 6.0, the <span style="box-sizing: border-box; font-weight: 700;">GET_ACCOUNTS</span> permission has to be requested at runtime and if you implement this API, it eliminates the need for requiring this permission.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
The feature that is really exciting is that it introduces new <tt class="docutils literal" style="box-sizing: border-box;">silentSignIn</tt> API, which allows for cross device silent sign in (essentially if a user has signed into your application on another platform, he won't be shown the sign in prompt) provided that the requested scopes are same, so this improves the user experience. In addition, you don't have to use the <tt class="docutils literal" style="box-sizing: border-box;">GoogleAuthUtil.getToken</tt> API to obtain the tokens as they are granted on the initial sign-in.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
So if you have an android application in which you had previously implemented the <span style="box-sizing: border-box; font-weight: 700;">Google+</span> sign in and used other Google plus features and want to migrate your android applications to new Google Sign in implementation, this post explains how to do so. Depending upon whether you choose to automate the lifecycle for <span style="box-sizing: border-box; font-weight: 700;">GoogleAPIClient</span> (Use <tt class="docutils literal" style="box-sizing: border-box;">enableAutoManage</tt>, this approach is recommended as it avoids boilerplate code) or manage the lifecycle for <span style="box-sizing: border-box; font-weight: 700;">GoogleAPIClient</span> by implementing the <tt class="docutils literal" style="box-sizing: border-box;">ConnectionCallbacks</tt> interface, the code might slightly differ. But, as the latter approach requires a bit more code, I will explain the process using it.</div>
<div class="section" id="what-needs-to-be-changed" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
What needs to be changed</h2>
<ul class="simple" style="box-sizing: border-box; margin: 1em 0px; padding: 0px 0px 0px 40px;">
<li style="box-sizing: border-box;">Replace <tt class="docutils literal" style="box-sizing: border-box;">mGoogleApiClient.connect()</tt> with <tt class="docutils literal" style="box-sizing: border-box;">mGoogleApiClient.connect(GoogleApiClient.SIGN_IN_MODE_OPTIONAL)</tt>, this is basically required to allow the client to transition between authenticated and unauthenticated states and for use with <tt class="docutils literal" style="box-sizing: border-box;">GoogleSignInApi</tt>.</li>
<li style="box-sizing: border-box;">Build a GoogleSignInOptions instance. While building the instance, request the additional scopes via requestScopes method,( this is where you can request scopes such as <tt class="docutils literal" style="box-sizing: border-box;">SCOPE_PLUS_LOGIN</tt> and<tt class="docutils literal" style="box-sizing: border-box;">SCOPE_PLUS_PROFILE</tt>). Also, if you need to authenticate the user with the backend and want to obtain the authorization token to access the API's using your backend use <tt class="docutils literal" style="box-sizing: border-box;">requestIdToken(serverToken)</tt> and<tt class="docutils literal" style="box-sizing: border-box;">requestServerAuthCode(serverToken)</tt> methods. Here unlike Google plus sign in the serverToken is just the clientId of the web application.</li>
<li style="box-sizing: border-box;">Build the <tt class="docutils literal" style="box-sizing: border-box;">GoogleApiClient</tt> instance, use the addApi method to add the <tt class="docutils literal" style="box-sizing: border-box;">Auth.GOOGLE_SIGN_IN_API</tt> and<tt class="docutils literal" style="box-sizing: border-box;">Plus.API</tt>.</li>
<li style="box-sizing: border-box;">In the onStart method connect the client using<tt class="docutils literal" style="box-sizing: border-box;">mGoogleApiClient.connect(GoogleApiClient.SIGN_IN_MODE_OPTIONAL)</tt> and in onStop method disconnect the client. (You may do this in onResume and onPause methods also).</li>
<li style="box-sizing: border-box;">After the client is connected, first attempt the silentSignIn and if it fails with code <tt class="docutils literal" style="box-sizing: border-box;">SIGN_IN_REQUIRED</tt>, attempt to do a fresh sign in for the user.</li>
<li style="box-sizing: border-box;">After the sign in is completed, then you can invoke the Plus.PeopleApi with user accountId to obtain users google profile information.</li>
<li style="box-sizing: border-box;">To Sign out the user use <tt class="docutils literal" style="box-sizing: border-box;">Auth.GoogleSignInApi.signOut</tt> method and to revoke access use<tt class="docutils literal" style="box-sizing: border-box;">Auth.GoogleSignInApi.revokeAccess</tt> method.</li>
<li style="box-sizing: border-box;">Remove the <tt class="docutils literal" style="box-sizing: border-box;">android.permission.GET_ACCOUNTS</tt> permission from android manifest.</li>
</ul>
</div>
<div class="section" id="here-s-the-relevant-code" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Here's the relevant code.</h2>
<div>
<br /></div>
<div>
<br /></div>
</div>
</div>
<pre class="brush:java;highlight:[]"> @Override
protected void onCreate(Bundle savedInstanceState) {
..
//Here unlike Google plus sign in the serverToken is just the clientId of the web application
GoogleSignInOptions gso = new GoogleSignInOptions.Builder(GoogleSignInOptions.DEFAULT_SIGN_IN)
.requestIdToken(serverToken).requestServerAuthCode(serverToken).
requestEmail().
requestScopes(Plus.SCOPE_PLUS_LOGIN, Plus.SCOPE_PLUS_PROFILE)
.build();
mGoogleApiClient = new GoogleApiClient.Builder(this)
.addConnectionCallbacks(this)
.addOnConnectionFailedListener(this).
addApi(Auth.GOOGLE_SIGN_IN_API, gso).
addApi(Plus.API)
.build();
}
protected void onResume() {
super.onResume();
//Here isConnected is just a flag that checks whether user is connected to internet.
/**To avoid execution of this block you can check whether user previously signedIn on this device by storing a userIdToken and checking whether user needs to be signedIn automatically or not. */
if (!mGoogleApiClient.isConnecting() && !mGoogleApiClient.isConnected() && isConnected) {
mGoogleApiClient.connect(GoogleApiClient.SIGN_IN_MODE_OPTIONAL);
} //Here isSignedIn is a boolean flag that tracks whether the user is signedIn or not.
else if(isConnected&&mGoogleApiClient.isConnected()&&!isSignedIn){
signInUsingNewAPI();
}
}
private void signInUsingNewAPI() {
if (!isSignedIn&&isConnected) {
attemptSilentSignIn();
}
}
private void attemptSilentSignIn(){
OptionalPendingResult<GoogleSignInResult> opr = Auth.GoogleSignInApi.silentSignIn(mGoogleApiClient);
if (opr.isDone()) {
// If the user's cached credentials are valid, the OptionalPendingResult will be "done"
// and the GoogleSignInResult will be available instantly.
Log.d(TAG, "Got cached sign-in");
GoogleSignInResult result = opr.get();
handleSignInResult(result);
} else {
// If the user has not previously signed in on this device or the sign-in has expired,
// this asynchronous branch will attempt to sign in the user silently. Cross-device
// single sign-on will occur in this branch.
showProgressDialog();
opr.setResultCallback(new ResultCallback<GoogleSignInResult>() {
@Override
public void onResult(GoogleSignInResult googleSignInResult) {
hideProgressDialog();
handleSignInResult(googleSignInResult);
}
});
}
}
private void handleSignInResult(GoogleSignInResult result){
if (!result.getStatus().isSuccess()) {
isSignedIn = false;
mIntentInProgress = false;
if(result.getStatus().hasResolution()||result.getStatus().getStatusCode()== CommonStatusCodes.SIGN_IN_REQUIRED){
freshSignIn(); //Rather than using startResolutionForResult, we invoke our method which attempts to do a fresh sign in and if there is error it is handled in onActivityResult method.
}
}
else {
mIntentInProgress = false;
isSignedIn = true;
final GoogleSignInAccount account = result.getSignInAccount();
//Maybe save this result.
SharedPreferences.Editor editor = preferences.edit();
editor.putString("client_id_token", account.getIdToken());
editor.putString("auth_code",account.getServerAuthCode());
editor.apply();
//You can pass these credentials to your server from here.
//Invoke the GPlus People API
Plus.PeopleApi.load(mGoogleApiClient, account.getId()).setResultCallback(new ResultCallback<People.LoadPeopleResult>() {
@Override
public void onResult(@NonNull People.LoadPeopleResult loadPeopleResult) {
Person person = loadPeopleResult.getPersonBuffer().get(0);
//Method that obtains the userInfo
getProfileInfo(person, account.getEmail());
}
});
}
}
private void freshSignIn(){
Intent signInIntent = Auth.GoogleSignInApi.getSignInIntent(mGoogleApiClient);
showProgressDialog();
startActivityForResult(signInIntent, RC_SIGN_IN);
}
@Override
protected void onActivityResult(int requestCode, int responseCode,
Intent intent) {
if(requestCode==RC_RESOLVE_ERROR){
mIntentInProgress = false;
if (responseCode != RESULT_OK) {
isSignedIn = false;
//Maybe show a dialog to user ?
return;
}
//Attemp connection again.
if (!mGoogleApiClient.isConnecting()) {
mGoogleApiClient.connect();
}
}
else if (requestCode == RC_SIGN_IN) {
hideProgressDialog();
GoogleSignInResult result = Auth.GoogleSignInApi.getSignInResultFromIntent(intent);
handleSignInResult(result,false);
}
}
// Connection callbacks
@Override
public void onConnected(Bundle bundle) {
if(!isSignedIn) {
signInUsingNewAPI();
}
}
@Override
public void onConnectionSuspended(int i) {
isSignedIn = false;
if (!isSignedIn&&isConnected) {
mGoogleApiClient.connect(GoogleApiClient.SIGN_IN_MODE_OPTIONAL);
}
}
@Override
public void onConnectionFailed(ConnectionResult result) {
if (!result.hasResolution()) {
GoogleApiAvailability.getInstance().getErrorDialog(
this, result.getErrorCode(), RC_SIGN_IN).show();
return;
}
if (!mIntentInProgress) {
// Store the ConnectionResult for later usage
mConnectionResult = result;
if (!isSignedIn) {
// The user has already clicked 'sign-in' so we attempt to
// resolve all
// errors until the user is signed in, or they cancel.
resolveSignInError();
}
}
}
/**
* Method to resolve any signin errors
*/
private void resolveSignInError() {
if (mConnectionResult.hasResolution()) {
try {
mIntentInProgress = true;
mConnectionResult.startResolutionForResult(this, RC_RESOLVE_ERROR);
} catch (IntentSender.SendIntentException e) {
mIntentInProgress = false;
mGoogleApiClient.connect(GoogleApiClient.SIGN_IN_MODE_OPTIONAL);
}
}
}
//Sign out and revoke methods
/**
* Sign-out from google
*/
public void signOutFromGoogle() {
if (mGoogleApiClient.isConnected()) {
Auth.GoogleSignInApi.signOut(mGoogleApiClient).setResultCallback(
new ResultCallback<Status>() {
@Override
public void onResult(Status status) {
isSignedIn = false;
//do other stuff here.
mGoogleApiClient.disconnect();
//Builds a fresh instance of GoogleApiClient
buildGoogleApiClient();
}
});
}
}
/**
* Revoking access from google
*/
public void revokeGplusAccess() {
if (mGoogleApiClient.isConnected()) {
Auth.GoogleSignInApi.revokeAccess(mGoogleApiClient).setResultCallback(
new ResultCallback<Status>() {
@Override
public void onResult(Status status) {
isSignedIn = false;
//do other stuff here.
mGoogleApiClient.disconnect();
//Builds a fresh instance of GoogleApiClient
buildGoogleApiClient();
//You can inform your server of this change
}
});
}
}
//Other utility methods
private void hideProgressDialog() {
if (mProgressDialog != null && mProgressDialog.isShowing()) {
mProgressDialog.hide();
}
}
private void showProgressDialog() {
if (mProgressDialog == null) {
mProgressDialog = new ProgressDialog(this);
mProgressDialog.setMessage("Signing In");
mProgressDialog.setIndeterminate(true);
}
mProgressDialog.show();
}
</pre>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
Hope this code is helpful in helping you move to the new Google sign in implementation.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://1.bp.blogspot.com/r5H_FlHyOwftHeAdoyCizADEu1bAFHMqMBgH9x46zsfX72uciQPFNYjDuJGZj_K0B3dc7cTFt1sgvfjw-WEk5S5-dBV9h9By9B_gohmvbcpaL6gTXpVyD1FPCD8HZ1F4_nmBC6wDEF76BM4HrWDCBCrOhjSamwZUYBHP12SVcIZ8vsGZyhmUyPf2MJiQY7eyHknVySwc8dQ62GzKTSjY0o5eeDLtM7pXXhaHgyAbYCRyybJbbXWvzOZL8QVbWilIvY4Zz_sJYRpZ4_LglDKEowz4Kw21rkBALy-0oDIweBiZLeOidSJ0IiHltbDF7X8x0BQhWrjfGeVtpfXRAPiKQZrvwvEuDntlz3iG6D864wz0fzRLo3DXaqHoPZRgymWaykpwYoDvVA9KJCX8mlPtDnHnhyTb-CTOmstRMJDspShd4uI_uQznPdR1TVPa6WO47IiuPnXBfny5S6sp_yRomk3JaXla_Up-p2wvg3UBCKv5Lb089RX6P7xW9pjgABqV2n3K3bd18_dM748oENQjpYZ4AaNSr7Ap7bEy0QOrZU3wYE0rjgIET-ieX0n0PUKfgA5pUHZLDC1p1-Qe7NjdKj-S1stdMyg=w500-h888-no" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="app_google_sign_in_image" border="0" height="400" src="https://1.bp.blogspot.com/r5H_FlHyOwftHeAdoyCizADEu1bAFHMqMBgH9x46zsfX72uciQPFNYjDuJGZj_K0B3dc7cTFt1sgvfjw-WEk5S5-dBV9h9By9B_gohmvbcpaL6gTXpVyD1FPCD8HZ1F4_nmBC6wDEF76BM4HrWDCBCrOhjSamwZUYBHP12SVcIZ8vsGZyhmUyPf2MJiQY7eyHknVySwc8dQ62GzKTSjY0o5eeDLtM7pXXhaHgyAbYCRyybJbbXWvzOZL8QVbWilIvY4Zz_sJYRpZ4_LglDKEowz4Kw21rkBALy-0oDIweBiZLeOidSJ0IiHltbDF7X8x0BQhWrjfGeVtpfXRAPiKQZrvwvEuDntlz3iG6D864wz0fzRLo3DXaqHoPZRgymWaykpwYoDvVA9KJCX8mlPtDnHnhyTb-CTOmstRMJDspShd4uI_uQznPdR1TVPa6WO47IiuPnXBfny5S6sp_yRomk3JaXla_Up-p2wvg3UBCKv5Lb089RX6P7xW9pjgABqV2n3K3bd18_dM748oENQjpYZ4AaNSr7Ap7bEy0QOrZU3wYE0rjgIET-ieX0n0PUKfgA5pUHZLDC1p1-Qe7NjdKj-S1stdMyg=w500-h888-no" title="app_google_sign_in_image" width="225" /></a></div>
<div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-33337059672613623362016-06-26T21:19:00.000+05:302016-06-26T21:19:35.688+05:30Impala vs Hive vs RDBMS<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="section" id="hive-or-impala" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Hive or Impala ?</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Hive and Impala both support SQL operation, but the performance of <span style="box-sizing: border-box; font-weight: 700;">Impala</span> is far superior than that of<span style="box-sizing: border-box; font-weight: 700;">Hive</span>. Although now with Spark SQL engine and use of HiveContext the performance of hive queries is also significantly fast, impala still has a better performance. The reason that impala has better performance is that it already has daemons running on the worker nodes and thus it avoids the overhead that is incurred during the creation of map and reduce jobs.</div>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
The query that I will mention later ran almost <span style="box-sizing: border-box; font-weight: 700;">10X faster</span> on impala than on Hive <span style="box-sizing: border-box; font-weight: 700;">(61 seconds vs around 600 seconds)</span>: Impala is known to give even better performance.</div>
</div>
<div class="section" id="schema-on-read-vs-schema-on-write" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Schema on read vs Schema on write</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Schema on read differs from schema on write as data is not validated till it is read. Although schema on read offers flexibility of defining multiple schemas for the same data, it can cause nasty runtime errors. As an example Hive and Impala are very particular about the timestamp format that they recognize and support, one workaround to avoid such bad records is to use a trick where rather than specifying the data type as timestamp, you specify the datatype as String and then use the cast operator to transform the records to timestamp format, this way bad records are skipped and the query does not error out.</div>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
</div>
</div>
<pre class="brush:sql">cast(field_name as timestamp)
</pre>
<h2 style="box-sizing: border-box; color: #333332; font-family: sans-serif; margin: 0.83em 0px;">
Window Functions, Top-N Queries, PL/SQL</h2>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
Hive and Impala do not support <span style="box-sizing: border-box; font-weight: 700;">update queries</span>, but they do support <code class="inlinesql sql" style="box-sizing: border-box; font-family: monospace, serif; font-size: 1em;"><span class="k" style="box-sizing: border-box;">select</span> <span class="o" style="box-sizing: border-box;">*</span> <span class="k" style="box-sizing: border-box;">from</span> <span class="k" style="box-sizing: border-box;">insert</span> <span class="k" style="box-sizing: border-box;">into</span></code>operation. Hive and impala also support window functions. The latter makes life easier because both Impala and Hive do not support PL/SQL procedures.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
In the example below, I am using the dataset of NYC Yellow Taxi from the month of January 2015. The query below filters out invalid timestamp records and selects first 500 records per hour for 1st january 2015.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
</div>
<pre class="brush:sql;highlight:[3,4,5]">/**Top-N Subquery selects first 500 records per hour for a day*/
insert into nyc_taxi_data_limited select VendorID, tpep_pickup_datetime , tpep_dropoff_datetime , passenger_count ,trip_distance ,pickup_longitude ,pickup_latitude,RateCodeID ,store_and_fwd_flag ,dropoff_longitude ,dropoff_latitude ,payment_type ,fare_amount ,extra,mta_tax ,tip_amount,tolls_amount,improvement_surcharge,total_amount from ( select *,
row_number() over (partition by trunc(cast(tpep_pickup_datetime as timestamp), 'HH') order by trunc(cast(tpep_pickup_datetime as timestamp), 'HH') desc)
as rownumb from nyc_taxi_data where cast(tpep_pickup_datetime as timestamp) between cast('2015-01-01 00:00:00' as timestamp) and cast('2015-01-01 23:59:59' as timestamp)
) as q where rownumb<=500;
</pre>
<br />
<div class="section" id="window-functions-top-n-queries-pl-sql" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Note the use of window function row_number and ordering by truncated timestamp, and cast operator to avoid invalid records.</div>
</div>
<div class="section" id="what-s-the-catch" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
What's the catch ?</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Given the benefits of Impala why would one ever use Hive ? The answer lies in the fact that impala queries are not fault tolerant.</div>
</div>
<div class="section" id="conclusion" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
Conclusion</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
Although, Impala and Hive do not offer entire repertoire of functionality supported by traditional RDBMS's, they are closest wrt to functionality offered by traditional RDBMS's in the world of distributed systems and offer scalable and large scale data analysis capability.</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-11035050081777721612016-06-26T21:12:00.000+05:302017-06-02T08:36:24.462+05:30Java8: Decorating with Functional Programming and Generics<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="section" id="the-idea" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
The Idea</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
<span style="box-sizing: border-box; font-weight: 700;">Java 8</span> introduced functional programming support, this is a powerful feature which was missing from earlier versions. One of the benefits of functional programming is that it can be used to implement decorator pattern easily. One common requirement is to implement some kind of rate limiting for web services. Now, ideally you would want separation of concerns between the actual business logic and rate limitation logic. With Java 8, we can use function references to implement this separation of concerns and implement the decorator pattern.</div>
</div>
<div class="section" id="the-code" style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px;">
<h2 style="box-sizing: border-box; margin: 0.83em 0px;">
The code</h2>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
The code fragment below shows the implementation of the pattern. It is an example of integration with the Lyft API. The full source code is available <a class="reference external" href="https://github.com/ramannanda9/RT-UBER-NYC-TAXI/blob/master/Lyft-Client/src/main/java/edu/nyu/realtimebd/lyftclient/utils/LyftClientUtil.java" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">here</a>.</div>
<div style="box-sizing: border-box; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
<pre class="brush:java;highlight:[13,14,35,52,54]">/**
* Generic method which can invoke any function without applying rate limit
*
* @param method the function to invoke or apply the each map input to
* @param inputList The list of Maps, each of which contains the key value pair of service parameters
* @param <R> Generic Return object type in the list
* @param <K> Type of Key in Map
* @param <V> Type of Value in Map
* @return A list with object type <V>
*/
private <R, K, V> List<R> invokeWithoutRateLimit(Function<Map, R> method, List<Map<K, V>> inputList) {
List<R> returnList = new ArrayList<>();
inputList.stream().forEach(m -> {
returnList.add(method.apply(m));
});
return returnList;
}
/**
* Generic method which can invoke any function with applying rate limit
* It uses RxJava and Blocking invocation
*
* @param method the function to invoke or apply the each map input to
* @param inputList The list of Maps, each of which contains the key value pair of service parameters
* @param <R> Generic Return object type in the list
* @param <K> Type of Key in Map
* @param <V> Type of Value in Map
* @return A list with object type <V>
*/
private <R, K, V> List<R> invokeWithRateLimit(Function<Map, R> method, List<Map<K, V>> inputList) {
List<R> returnList = new ArrayList<>();
Observable.zip(Observable.from(inputList),
Observable.interval(RATE_LIMIT, TimeUnit.SECONDS), (obs, timer) -> obs)
.doOnNext(item -> {
R result = method.apply(item);
returnList.add(result);
}
).toList().toBlocking().first();
return returnList;
}
/**
* This method accepts a list of coordinates and returns the estimated
* fare for different lyft rides
*
* @param costRequestList The list of coordinates
* @param invokeWithRateLimit Apply Rate limiting
* @return A list of Prices per request
*/
public List<CostEstimates> getCostEstimates(List<Map<String, Float>> costRequestList, boolean invokeWithRateLimit) {
if (invokeWithRateLimit) {
return invokeWithRateLimit(this::getCostEstimate, costRequestList);
} else {
return invokeWithoutRateLimit(this::getCostEstimate, costRequestList);
}
}
</pre>
<div style="box-sizing: border-box; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em; white-space: normal;">
The highlighted code above shows how to pass method reference to methods <tt class="docutils literal" style="box-sizing: border-box;">invokeWithRateLimit()</tt> and<tt class="docutils literal" style="box-sizing: border-box;">invokeWithoutRateLimit()</tt>, each of these methods then adds some custom preprocessing logic (like rate limitation using RxJava) after which it invokes the supplied method by using the <tt class="docutils literal" style="box-sizing: border-box;">apply()</tt> method. This implementation of the decorator pattern is much easier to grasp, than going via the inheritance route.</div>
<div style="box-sizing: border-box; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em; white-space: normal;">
You can use the following link to view the entire code on Github repository.</div>
<div style="box-sizing: border-box; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em; white-space: normal;">
<a class="reference external" href="https://github.com/ramannanda9/RT-UBER-NYC-TAXI/blob/master/Lyft-Client/src/main/java/edu/nyu/realtimebd/lyftclient/utils/LyftClientUtil.java" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">Lyft-Client</a> on Github.</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-70190452205149458732016-06-26T20:16:00.000+05:302016-06-26T20:24:19.612+05:30Retrofit 2.0 Basic and Conditional Authentication<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
You might run into a scenario where you might require conditional authentication with Retrofit 2.0.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
This post provides an example of integration with the <span style="box-sizing: border-box; font-weight: 700;">Lyft API</span>. In case of the Lyft API, first we need to authenticate with and query the <tt class="docutils literal" style="box-sizing: border-box;">oauth/token</tt> endpoint to obtain the <span style="box-sizing: border-box; font-weight: 700;">OAUTH</span> token, and then use this<tt class="docutils literal" style="box-sizing: border-box;">accessToken</tt> in other service calls. Also, such access tokens have an expiry time(1 hour), so ideally there should be a mechanism to handle this scenario.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
One lazy (tends out to be perfect) solution is to use interceptors and compare the <span style="box-sizing: border-box; font-weight: 700;">HTTP Response code</span>from the service to see whether the code is <span style="box-sizing: border-box; font-weight: 700;">401</span>. If the code is 401, you can assume that the token has either expired or was never obtained initially, either way you would need to re-authenticate and query the endpoint to obtain the <tt class="docutils literal" style="box-sizing: border-box;">accessToken</tt>.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
The code block below shows how this is done. To access the entire source code you can visit <a class="reference external" href="https://github.com/ramannanda9/RT-UBER-NYC-TAXI/blob/master/Lyft-Client/src/main/java/edu/nyu/realtimebd/lyftclient/utils/LyftClientUtil.java" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">Lyft-Client</a> on Github.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
<pre class="brush:java">
/**
* This method initializes the retrofit clients
* a) One for the initial authentication end point
* b) Other for other service requests
*/
private void initializeRetrofitClients() {
OkHttpClient.Builder builder = new OkHttpClient().newBuilder();
OkHttpClient clientNormal;
OkHttpClient clientAuthenticated;
builder.interceptors().add(new Interceptor() {
@Override
public okhttp3.Response intercept(Chain chain) throws IOException {
Request originalRequest = chain.request();
Request.Builder builder = originalRequest.newBuilder().header("Authorization:Bearer ", accessToken).
method(originalRequest.method(), originalRequest.body());
okhttp3.Response response = chain.proceed(builder.build());
/**
implies that the token has expired
or was never initialized
*/
if (response.code() == 401) {
tokenExpired = true;
logger.info("Token Expired");
getAuthenticationToken();
builder = originalRequest.newBuilder().header("Authorization:Bearer ", accessToken).
method(originalRequest.method(), originalRequest.body());
response = chain.proceed(builder.build());
}
return response;
}
});
clientAuthenticated = builder.build();
retrofitAuthenticated = new Retrofit.Builder().client(clientAuthenticated)
.baseUrl(API_ENDPOINT)
.addConverterFactory(GsonConverterFactory.create())
.build();
OkHttpClient.Builder builder1 = new OkHttpClient().newBuilder();
builder1.authenticator(new Authenticator() {
@Override
public Request authenticate(Route route, okhttp3.Response response) throws IOException {
String authentication = Credentials.basic(CLIENT_ID, CLIENT_SECRET);
Request.Builder builder = response.request().newBuilder().addHeader("Authorization", authentication);
return builder.build();
}
});
clientNormal = builder1.build();
retrofit = new Retrofit.Builder().client(clientNormal).
baseUrl(API_ENDPOINT).
addConverterFactory(GsonConverterFactory.create()).build();
}
/**
* Is invoked only when the access token is required
* Or it expires
*/
private void getAuthenticationToken() {
LyftService lyftService = this.retrofit.create(LyftService.class);
Call<OAuthResponse> authRequestCall = lyftService.getAccessToken(oAuthRequest);
Response<OAuthResponse> response = null;
try {
response = authRequestCall.execute();
if (response.isSuccessful()) {
accessToken = response.body().getAccessToken();
}
} catch (IOException e) {
logger.error("Exception occurred due to ", e);
}
}
</pre>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
As can be seen in the above code example, we build <span style="box-sizing: border-box; font-weight: 700;">two OkHttpClient</span> objects, the <span style="box-sizing: border-box; font-weight: 700;">clientNormal</span> object is configured to use HTTP basic authentication and is used by retrofit object to query the <tt class="docutils literal" style="box-sizing: border-box;">getAccessToken</tt>endpoint to obtain the access token, this accessToken is required by other Lyft service endpoints. The<tt class="docutils literal" style="box-sizing: border-box;">clientAuthenticated</tt> object uses a interceptor to set the <tt class="docutils literal" style="box-sizing: border-box;">Authorization:Bearer</tt> header with the value of the<tt class="docutils literal" style="box-sizing: border-box;">accessToken</tt>, which is required for all other service endpoints.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
In the method <tt class="docutils literal" style="box-sizing: border-box;">initializeRetrofitClients</tt> it can be seen that initially, we just invoke the service endpoint by using the value of accessToken (by calling <tt class="docutils literal" style="box-sizing: border-box;">chain.proceed</tt>) and if we see that the response code is 401, we invoke <tt class="docutils literal" style="box-sizing: border-box;">getAuthenticationToken</tt> followed by another call to <tt class="docutils literal" style="box-sizing: border-box;">chain.proceed</tt> with the new value of <tt class="docutils literal" style="box-sizing: border-box;">accessToken</tt>. For subsequent calls, the interceptor will use the stored value of <tt class="docutils literal" style="box-sizing: border-box;">accessToken</tt>. This lazy invocation to obtain the access token is better because this way the logic for deciding when to obtain <tt class="docutils literal" style="box-sizing: border-box;">accessToken</tt> is not hardcoded. In addition, this keeps the code simple by avoiding unncessary checks.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
Hope this post was helpful in clearing the use of interceptors for conditional authentication.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
You can use the following link to view the entire code on Github repository.</div>
<div style="box-sizing: border-box; color: #333332; font-family: sans-serif; font-size: 17.6px; line-height: 25.52px; margin-bottom: 1em; margin-top: 1em;">
<a class="reference external" href="https://github.com/ramannanda9/RT-UBER-NYC-TAXI/blob/master/Lyft-Client/src/main/java/edu/nyu/realtimebd/lyftclient/utils/LyftClientUtil.java" style="box-sizing: border-box; color: #8e8ed6; text-decoration: none;">Lyft-Client</a> on Github.</div>
</div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-47418654187070484122016-03-11T00:35:00.000+05:302016-03-11T00:35:42.003+05:30Few links to Data Science and Machine learning posts.<div dir="ltr" style="text-align: left;" trbidi="on">
Well blogger does not have support for latex and the windows live writer is being redeveloped. So, in the meanwhile, I have written a few posts on the pelican blog and thought, I might as well link to them here.<br />
<br />
<b>Why you should prefer to use the square root of Gini Index</b>: This post examines the advantages of using the Gini Index as the criteria for building decision trees.<br />
<a href="http://orastack.com/why-you-should-use-square-root-of-gini-index.html" target="_blank">http://orastack.com/why-you-should-use-square-root-of-gini-index.html</a><br />
<br />
<b>Do tweets have predictive power</b>. This post examines whether tweets have an effect on opening weekend revenue of box office movies.<br />
<a href="http://orastack.com/scikit-learn_tweet_classifier.html">http://orastack.com/scikit-learn_tweet_classifier.html</a></div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-87896158855912504832015-08-05T23:04:00.001+05:302015-08-05T23:27:00.355+05:30ADF querying the policy store<div dir="ltr" style="text-align: left;" trbidi="on">
Earlier, I had covered an example <a href="http://ramannanda.blogspot.com/2014/05/adf-dynamically-managing-application.html" target="_blank">here</a>, which showed how to dynamically create users and map the application roles to enterprise groups. In this post, the sample application is extended to show how you can query the application roles from the application stripe(application specific policies).<br />
To query the application specific roles, you need to access the application's policy from policystore and then you can either directly invoke searchRoles(String roleName) or searchRoles(String attributeToSearchRolesBy,String attributeValue,String equalityOrInequalityFlag). The response from the method is a List<AppRoleEntry>. The snippet below shows the code for doing so. Note that although there is another much more flexible method to search across application stripes by using policyStore.getAppRoles(StoreAppRoleSearchQuery obj), It is not implemented for the embedded policy store and throws UnsupportedOperationException. <br />
<pre class="brush: java;"> /**
* Given a application stripe name and rolename this can be used to search for a role name
* This method performs a wildcard search also.
* @param roleName the rolename to search
* @param applicationStripeName the application stripe name
*/
public List<AppRoleEntry> searchAppRoleInApplicationStripe(String roleName,
String applicationStripeName) {
JpsContext ctxt = IdentityStoreConfigurator.jpsCtxt;
PolicyStore ps = ctxt.getServiceInstance(PolicyStore.class);
ApplicationPolicy policy;
try {
policy = ps.getApplicationPolicy(applicationStripeName);
return policy.searchAppRoles(ApplicationRoleAttributes.NAME.toString(),roleName,false );
} catch (PolicyStoreException e) {
throw new RuntimeException(e);
}
}
......
private static final class IdentityStoreConfigurator {
private static final JpsContext jpsCtxt = initializeFactory();
private static JpsContext initializeFactory() {
String methodName =
Thread.currentThread().getStackTrace()[1].getMethodName();
JpsContextFactory tempFactory;
JpsContext jpsContext;
try {
tempFactory = JpsContextFactory.getContextFactory();
jpsContext = tempFactory.getContext();
} catch (JpsException e) {
DemoJpsLogger.severe("Exception in " + methodName + " " +
e.getMessage() + " ", e);
throw new RuntimeException("Exception in " + methodName + " " +
e.getMessage() + " ", e);
}
return jpsContext;
}
}
....</pre>
<br />
<br />
You can also do various other operations on the policy store and alter the application specific policies, once you have access to those operations. A key thing to note is these operations require specific PolicyStoreAccessPermissions to be granted in the jazn-data.xml. The steps to do so are mentioned below. <br />
<br />
<br />
<ol><br />
<li>Define a resource type of permission class PolicyStoreAccessPermission and the neccessary actions that you want to grant access to (In this example,I am granting access to all operations, signified by *). The snippet is shown below:- <br /> <br /><br /><br /> <pre class="brush: xml;"><resource-type>
<name>PolicyStorePermission</name>
<matcher-class>oracle.security.jps.service.policystore.PolicyStoreAccessPermission</matcher-class>
<actions>*</actions>
</resource-type></pre>
<br /> </li>
<br /><br />
<li>Next, create resources that are to be granted permissions. In this case, I have created two of them, the first one is the superset that allows access to all the application stripes and the next one grants access to only this applications's stripe. The snippet is shown below:- <br /> <br /><br /><br /> <pre class="brush: xml; highlight: [3,8];"><resources>
<resource>
<name>context=APPLICATION, name=*</name>
<type-name-ref>PolicyStorePermission</type-name-ref>
</resource>
<resource>
<name>context=APPLICATION,name=DemoAppSecurity#V2.0</name>
<type-name-ref>PolicyStorePermission</type-name-ref>
</resource>
</resources></pre>
<br /> </li>
<br /><br />
<li>In the last step you have to assign these resources to the application roles or groups. </li>
</ol>
<br />
<br />
<br />
<a href="http://lh3.googleusercontent.com/-5hqKuKJaB0Q/VcJJGddRbfI/AAAAAAAABXE/1xdytp8vKow/s1600-h/custom_permission%25255B4%25255D.png" target="_blank"><img alt="custom_permission" border="0" src="http://lh3.googleusercontent.com/-KRo-9xArm_M/VcJJIDRR9-I/AAAAAAAABXM/4oa6HAWldKs/custom_permission_thumb%25255B2%25255D.png?imgmax=800" height="342" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="custom_permission" width="644" /></a> <a href="http://lh3.googleusercontent.com/-pvhr_fMId4s/VcJJJBEjmeI/AAAAAAAABXU/sEPklBEsvmc/s1600-h/application_stripe_search%25255B5%25255D.png" target="_blank"><img alt="application_stripe_search" src="http://lh3.googleusercontent.com/-dZjNyVQbJ-k/VcJJKuC0AkI/AAAAAAAABXc/ZJ6J0KxBBM8/application_stripe_search_thumb%25255B3%25255D.png?imgmax=800" height="278" style="display: inline;" title="application_stripe_search" width="640" /></a> <br />
<br />
The enterprise identity store provider being used here is the embedded weblogic ldap, to run the application properly you will need to configure a password for it in weblogic and set the password in jps-config.xml as shown in the screenshot below. <br /> <br /><a href="http://lh3.googleusercontent.com/-g1Z3nI4CO9c/VcJJL-NQB5I/AAAAAAAABXk/GblAgHD1u94/s1600-h/image_jps_emb_ldap%25255B3%25255D.png"><img alt="image_jps_emb_ldap" border="0" src="http://lh3.googleusercontent.com/-fCAp-cUgXJg/VcJJNxiqZGI/AAAAAAAABXs/URSjra2TGhA/image_jps_emb_ldap_thumb%25255B1%25255D.png?imgmax=800" height="361" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image_jps_emb_ldap" width="644" /></a> <br />
<br />
<br />
To run the application, the username/password combination is john/oracle123. To view the search roles screen, either run the SearchRoles.jspx or click on the Search Roles link in the left navigation bar. The link to download the application is mentioned below:-<br /> <br /><a href="https://dl.dropboxusercontent.com/u/42099017/DemoAppSecurity.zip" target="_blank">Download the application.</a></div>
Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-54154390548449870522015-06-11T21:02:00.001+05:302015-06-11T21:02:01.673+05:30Automatically Detect Content Encoding<p>You might be aware that there are different content encoding formats for encoding the text. Generally, it is safe to use UTF encoding, but at least you would expect that the websites would specify the encoding format in the response. Alas, you might find certain sites , which just send the content without specifying the content encoding that they are using. So to detect content encoding for such cases, you need a FSM (Finite State Machine). Initially, you just split the input into individual characters and then pass them onto different state machines, each of which uses a different encoding scheme.  For each character that is passed to the state machine, it can either immediately identify a character that is unique to its encoding format, continue, or error out.  At, the end of operation, you would generally expect a specific content encoding format or if insufficient input is available, return the default encoding format.  You can read more about this here <a href="http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html" target="_blank">Universal Charset Detection.</a> </p> <p>Mozilla already has a library for this and there is a java port available for it. Mozilla’s library for this is universalchardet and the java port is <a href="https://code.google.com/p/juniversalchardet/" target="_blank">juniversalchardet</a>. </p> <p>It’s really simple to use this as can be seen below:-</p> <pre class="brush: java; toolbar: false">public static String detectCharset(InputStream is) throws IOException {<br /><br /> UniversalDetector detector = new UniversalDetector(null);<br /><br /> String encoding=null;<br /><br /> byte[] buf = new byte[1000];<br /> // (2)<br /> int nread;<br /> while ((nread = is.read(buf)) > 0 && !detector.isDone()) {<br /> detector.handleData(buf, 0, nread);<br /> }<br /> // (3)<br /> detector.dataEnd();<br /><br /> // (4)<br /> encoding = detector.getDetectedCharset();<br /> if (encoding != null) {<br /> logger.info("Detected encoding = " + encoding);<br /> } else {<br /> encoding = "ISO-8859-1";<br /> logger.info("No encoding detected.");<br /> }<br /><br /> // (5)<br /> detector.reset();<br /> return encoding;<br /><br /> }</pre><br /><br /><p>This can be really useful, if you are planning to create a full text parser and the websites don’t return the content encoding that they are using.  I used this while designing the full text parser for my application. </p><br /><br /><p>Hope this is useful for others out there.</p> Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-5031472268257057362015-04-10T20:46:00.001+05:302015-04-10T23:56:02.818+05:30Analyzing memory leaks in Android<p>In this post I will share an example of <strong>analyzing memory leak</strong> in an Android application. I recently tried to integrate a popular android library ShowcaseView which can be used for first run demos in your android application. The thing is while testing the library, I noticed severe memory leaks. This was occurring because references to ShowcaseView class were being kept. And, as bitmaps are created internally by this class to overlay the content, so each instance was holding a large chunk of memory. </p> <p><strong>Analyzing the issue</strong> </p> <p>For heap analysis, I used VisualVM. First, thing you need to realize is that the heap dump taken from android cannot be directly analyzed with VisualVM or other tools such as eclipse MAT. You need to convert it to a format that these tools can analyze. For this you have to run the following command. </p> <pre class="brush: xhtml; toolbar: false">hprof-conv Snapshot_2015.04.10_19.37.29.hprof heapissue</pre><br /><br /><p>Here the first parameter is the android heap dump and the second is the output file. </p><br /><br /><p>After loading the file into VisualVM for analysis, I could see the references of ShowCaseView class were being retained as it had GC roots, thus live references, which meant that the instances could not be collected by the JVM Garbage collector. </p><br /><br /><p><a href="http://lh3.ggpht.com/-zg2s4rHae2c/VSfpCjVlcuI/AAAAAAAABTM/pm7F4VTELpY/s1600-h/heap_issue_1%25255B15%25255D.png" target="_blank"><img title="heap_issue_1" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="heap_issue_1" src="http://lh4.ggpht.com/-ZQhB0ezbH_U/VSfpEt0-WKI/AAAAAAAABTU/aP1_RFuRivE/heap_issue_1_thumb%25255B13%25255D.png?imgmax=800" width="504" height="266" /></a> </p><br /><br /><p>Simple <strong>OQL analysis</strong> revealed live paths to the instances. </p><br /><br /><p><strong>Query:</strong> </p><br /><br /><pre class="brush: sql; toolbar: false">select heap.livepaths(u,false) from com.github.amlcurran.showcaseview.ShowcaseView u </pre><br /><br /><br /><br /><p><a href="http://lh3.ggpht.com/-L8oC8d6V12A/VSfpGZyyVAI/AAAAAAAABTc/atF4DGf8WgM/s1600-h/heap_issue_2%25255B13%25255D.png" target="_blank"><img title="heap_issue_2" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="heap_issue_2" src="http://lh6.ggpht.com/-GYXWkbuNECs/VSfpIKM6tkI/AAAAAAAABTk/2JbPyXIJ0SQ/heap_issue_2_thumb%25255B9%25255D.png?imgmax=800" width="504" height="277" /></a> </p><br /><br /><p>As you can see phone decor view was holding references to the view which was added by ShowCaseView.  Thus, the decorview was holding reference to ShowCaseView instances. </p><br /><br /><p>Now, Ideally whenever the view has been displayed it should be destroyed, especially,  if it is holding such large chunks of memory.  but, instead due to oversight from the developer instead of removing the view from decorview, he was just setting the views property to View.Gone, which only makes the view invisible and does not remove the view from the ViewGroup.  </p><br /><br /><p><strong>The Solution</strong></p><br /><br /><p>The problem was then clearly with the fact that the added views should be removed after the ShowcaseView was hidden.  So, I just modified the hide() and hideImmediate() methods to remove the views that were added on top of the decorView. </p><br /><br /><p>I have mentioned one of those method below. </p><br /><br /><pre class="brush: java; highlight: [9,10,11,12,13,14,15,16,17]">@Override<br /> public void hide() {<br /> clearBitmap();<br /> // If the type is set to one-shot, store that it has shot<br /> shotStateStore.storeShot();<br /> mEventListener.onShowcaseViewHide(this);<br /> fadeOutShowcase();<br /> getViewTreeObserver().removeOnPreDrawListener(draw);<br /> if(Build.VERSION.SDK_INT>15){<br /> getViewTreeObserver().removeOnGlobalLayoutListener(globalLayout);<br /> }<br /> else {<br /> getViewTreeObserver().removeGlobalOnLayoutListener(globalLayout);<br /> }<br />// removeView(ShowcaseView.this);<br /> ((ViewGroup)mActivity.getWindow().getDecorView()).removeView(ShowcaseView.this);<br /> }</pre><br /><br /><p> </p><br /><br /><p>Post, implementing these changes the library works as it is supposed to work, that is just fine.  </p><br /><br /><p><a href="http://lh3.ggpht.com/-F2OyWn3UNN8/VSfpKUbaVsI/AAAAAAAABTs/FS4yP7igbOE/s1600-h/heap_issue_fine_1%25255B7%25255D.png" target="_blank"><img title="heap_issue_fine_1" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="heap_issue_fine_1" src="http://lh4.ggpht.com/-_a8Jol9-YKk/VSfpLonSZ2I/AAAAAAAABT0/9XmpXJnuddU/heap_issue_fine_1_thumb%25255B5%25255D.png?imgmax=800" width="504" height="285" /></a> </p><br /><br /><p><a href="http://lh6.ggpht.com/-l2huGlhvFcI/VSfpM91fosI/AAAAAAAABT8/8T1u-2rIS9Y/s1600-h/heap_issue_fine_2%25255B8%25255D.png" target="_blank"><img title="heap_issue_fine_2" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="heap_issue_fine_2" src="http://lh5.ggpht.com/-qW4NPyxp194/VSfpQAaxYOI/AAAAAAAABUE/euOFE2XzVX8/heap_issue_fine_2_thumb%25255B6%25255D.png?imgmax=800" width="504" height="244" /></a> </p><br /><br /><p><a href="http://lh5.ggpht.com/-SRV09rGwEqo/VSfpSDrp2hI/AAAAAAAABUM/UKVaJK-phd0/s1600-h/showcaseview%25255B12%25255D.png" target="_blank"><img title="showcaseview" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="showcaseview" src="http://lh3.ggpht.com/-aU0LdD3AOxA/VSfpTs2ZUAI/AAAAAAAABUU/z5vjgSEw5x8/showcaseview_thumb%25255B10%25255D.png?imgmax=800" width="484" height="772" /></a> </p><br /><br /><p>Below, I have mentioned the Gist that contains the entire source code for the modified ShowcaseView. So, happy coding :-)</p><br /><script src="https://gist.github.com/ramannanda9/7bf837f535b0ba2b96f2.js"></script> Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-4152622454789625332015-03-21T21:03:00.001+05:302015-03-31T19:15:05.901+05:30Handling and Identifying SSL Handshake errors<p><strong>SSL handshake errors</strong> can occur due to various reasons such as Self Signed certificate, unavailability of protocol or cipher suite requested by client or server, etc.  Recently I faced this issue where I was connecting to third party server using HttpClient library.  Here’s what I did to identify the cause:- </p> <p>Firstly, I enabled the debug flag for SSL,handshake and failure on  javax.net packages. </p> <pre class="brush: xhtml; toolbar: false">-Djavax.net.debug=ssl,handshake,failure</pre><br /><br /><p>On examining the logs, I could see that the third party site was expecting a cipher key of 256 bits and the only supported keys in my glassfish server were of 128 bits length.  As it happens,  this occurs because OOTB java 6, 7 or 8 support only 128 bit encryption keys. To enable 256 or higher bit key length , you need to download the <strong><em>Java Cryptography Extension</em> (<em>JCE</em>) <em>Unlimited</em> Strength Jurisdiction Policy Files </strong> which essentially contains two jars i.e <strong>US_export_policy.jar</strong> and <strong>local_policy.jar</strong> and place them in <JRE_HOME>/lib/security/ directory and restart the server to enable higher bit encryption keys.</p><br /><br /><p>The above step will enable  256 bit or higher bit encryption keys and will ensure that you do not face SSL Handshake errors due to key strength. </p><br /><br /><p>You can download the Policy files from the following links. </p><br /><br /><p><a href="http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html" target="_blank">JCE Unlimited for java 6</a></p><br /><br /><p><a href="http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html" target="_blank">JCE Unlimited for java 7</a></p><br /><br /><p><a href="http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html" target="_blank">JCE Unlimited for java 8</a></p> Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-38844180285512406052015-03-21T20:21:00.001+05:302015-03-21T20:25:03.258+05:30ProGuard configuration for Retrofit<p>If you use <strong>ProGuard</strong> for obfuscating your code and happen to use <strong>Retrofit </strong>in your application, you will need to configure ProGuard to exclude certain Retrofit files from being obfuscated. Also you must note that if you are using <strong>GSON</strong> for conversion from <strong>JSON</strong> to POJO representation, you must ignore those POJO classes from being obfuscated, this is required as if those POJO class  field names are obfuscated, conversion to POJO’s from JSON would fail because POJO  field names are inferred from JSON response.   So to keep it brief you should use the following configuration.</p> <pre class="brush: xhtml; toolbar: false; highlight: [16,17]">-keep class com.squareup.** { *; }<br />-keep interface com.squareup.** { *; }<br />-dontwarn com.squareup.okhttp.**<br />-keep class retrofit.** { *; }<br /><br />-keepclasseswithmembers class * {<br /> @retrofit.http.* <methods>;<br />}<br /><br />-keep interface retrofit.** { *;}<br />-keep interface com.squareup.** { *; }<br />-dontwarn rx.**<br />-dontwarn retrofit.**<br /><br /><br />#Here include the POJO's that have you have created for mapping JSON response to POJO for example<br />com.blogspot.ramannanda.apps.xyz.FeedlyResponse {*;}</pre><br /><br /><p>Here FeedlyResponse is just a POJO class that maps to JSON fields returned by Feedly feed search API.</p> Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-44891016219173063542015-03-04T22:44:00.001+05:302015-03-04T23:14:51.460+05:30Book Review: Bulletproof Android: Practical Advice for Building Secure Apps<a href="http://www.amazon.com/gp/product/0133993329/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0133993329&linkCode=as2&tag=techniessent-20&linkId=IDA676O6C4DD6QMO"><img style="float: none; margin-left: auto; display: block; margin-right: auto" border="0" src="http://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&ASIN=0133993329&Format=_SL250_&ID=AsinImage&MarketPlace=US&ServiceVersion=20070822&WS=1&tag=techniessent-20" /></a><img style="border-top-style: none !important; border-bottom-style: none !important; border-right-style: none !important; margin: 0px; border-left-style: none !important" border="0" alt="" src="http://ir-na.amazon-adsystem.com/e/ir?t=techniessent-20&l=as2&o=1&a=0133993329" width="1" height="1" /> <p>I recently reviewed this title and found it short on a few important consideration such as cross client authorization. This is definitely a book for developers who are beginning android application development, but isn’t comprehensive.</p> <p>I discuss about what each chapter covers and then offer suggestions later on how this book can be improved further. </p> <p><strong>Chapter 1: Android Security Issues</strong> </p> <ol> <li>Talks about the different security compliance standards  </li> <li>What are the common problems in android applications </li> <li>How one can easily re-engineer your applications code. </li> </ol> <p></p> <p><strong>Chapter 2: Protecting your code</strong> <br />Here the author talks about why you should obfuscate your code. It starts by explaining how easy it is to re-engineer the code, if the code is not obfuscated. Obfuscation tools are then covered to show how to obfuscate your applications code. The author then talks about disassemblers to show that even though obfuscation might deter someone from looking at your code, It might not truly prevent someone from hacking your application code. </p> <p><strong>Chapter 3: Authentication</strong> <br />Here the author talks about different authentication schemes username/password, facebook login etc. </p> <p><strong>Chapter 4: Network communication</strong> <br />Talks about asymmetric public key encryption, Why you should use SSL security and demonstrates the Man in the middle attack. It also explains why your application should validate ssl certificates.</p> <p><strong>Chapter 5: Databases</strong> <br />Talks about general database best practices such as encryption and preventing SQL injection.</p> <p><strong>Chapter 6: Web Server Attacks</strong> <br />Talks about securing web services, XSS attack etc. Here, I feel the author should have covered authentication and authorization challenges that one usually faces with android applications, as one generally needs to implement validations of requests from mobile devices. For example, A user can easily know your service endpoint as the code is deployed on the client side and send a request to that URL from their application as well, So you need to differentiate between the request from your application and other applications.  (I personally use Google plus sign in API's, along with server side token validation to ensure that any back-end requests are originating from within my application and are from the correct individual)</p> <p><strong>Chapter 7: Third party library integration</strong> <br />Mentions that you should be aware of the permissions that you are granting to the third party libraries.</p> <p><strong>Chapter 8:Device Security <br /></strong>Talks about device security issues and why you should enable encryption. It then talks about how device security is enforced on Kitkat. The author then discusses some android version specific exploits and offers certain solutions. </p> <p><strong>Chapter 9: The Future</strong> <br />This chapter covers Intent hijacking and how to deal with it in your android application. The chapter then covers devices such as android wear and the extended ecosystem of android devices and its impact on security considerations. Furthermore, the chapter covers tools which expose security vulnerability in your application.</p> <p><strong>Conclusions: </strong> <br />The book covers a lot of <strong>common</strong> security vulnerabilities that developers expose while writing the android applications and has a lucid prose and demonstrates these vulnerabilities practically by showing examples. It also offers solutions to those problems. For a developer who is beginning application development with android, having knowledge about these issues is important.  However, most of these issues would be known to experienced developers. I feel the detailed coverage of topics such as securing back-end services unobtrusively, OAuth, OWSM,  etc could have added value to the book. Maybe its just me but I expect these topics to be covered in detail, as most of the android applications would be using some form of back-end service to offload heavy processing. I rate it 3.5 for the content it has covered.  </p> Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-20391636901847689892015-02-26T20:22:00.001+05:302015-02-26T20:28:33.269+05:30Apache Chemistry and CMIS<p>In this post, I will explain how you can use apache chemistry API’s to query the enterprise content management systems. <strong>Apache Chemistry</strong> project provides client libraries for you to easily implement integration with any of the content management products that support or implement CMIS standard. As you might be aware that there are multiple standards such as <strong>JCR and CMIS</strong>  for interacting with the content repositories. Although with JCR 2 support for SQL-92 is now available,  I really prefer the conciseness and wider adoption of the  CMIS standard, and the fact that Apache chemistry API’s really make it easy to interact with the content repository.  </p> <p>I am sharing an example that you can readily test without any special environment setup.  <strong>Alfresco</strong>, is one of the vendors that provides a ECM product, which supports the CMIS standard. Alfresco, offers a public repository for you to play with.  I will cover this example on a piecemeal basis.</p> <ol> <li><strong>Connecting to the repository</strong>:  <pre class="brush: java; highlight: [5,9,10,11,12,13]">SessionFactory sessionFactory = SessionFactoryImpl.newInstance();<br /> Map<String,String> parameter = new HashMap<String,String>();<br /> parameter.put(SessionParameter.USER, "admin");<br /> //the binding type to use<br /> parameter.put(SessionParameter.BINDING_TYPE, BindingType.ATOMPUB.value());<br /> parameter.put(SessionParameter.PASSWORD, "admin");<br /> //the endpoint<br /> parameter.put(SessionParameter.ATOMPUB_URL, "http://cmis.alfresco.com/s/cmis");<br /> parameter.put(SessionParameter.BINDING_TYPE, BindingType.ATOMPUB.value());<br /> //fetch the list of repositories<br /> List<Repository> repositories = sessionFactory.getRepositories(parameter);<br /> //establish a session with the first ?<br /> Session session = repositories.get(0).createSession();</pre><br /><br /> <p>To connect to the repository, you require a few basic parameters such as the username, password, endpoint url (In this case the REST AtomPub service) and a binding type to specify which type of endpoint is it (WEBSERVICES, ATOMPUB, BROWSER, LOCAL, CUSTOM ) are the valid binding types. After we have this information , we still need a repository id to connect to. In this case, I am using the first repository from a list of repositories to establish the session. Now, let’s create a query statement to search for the documents. </p><br /><br /> <p></p><br /> </li><br /><br /> <li><strong>Querying the repository:</strong>  <pre class="brush: java; highlight: [2,4,6,10,14,16]">//query builder for convenience<br />QueryStatement qs=session.createQueryStatement("SELECT D.*, O.* FROM cmis:document AS D JOIN cm:ownable AS O ON D.cmis:objectId = O.cmis:objectId " +<br /> " where " +<br /> " D.cmis:name in (?)" +<br /> " and " +<br /> " D.cmis:creationDate > TIMESTAMP ? " +<br /> " order by cmis:creationDate desc");<br />//array for the in argument<br />String documentNames[]= new String[]{"Project Objectives.ppt","Project Overview.ppt"};<br />qs.setString(1, documentNames);<br />Calendar now = Calendar.getInstance();<br />//subtract 5 year for viewing documents for last 5 year<br />now.add(Calendar.YEAR, -5);<br />qs.setDateTime(2, now);<br />//get the first 50 records only.<br />ItemIterable<QueryResult> results = session.query(qs.toQueryString(), false).getPage(50);</pre><br /><br /> <p>Here I have used createQueryStatement  method to build a query just for convenience, you could also directly specify a query string(not recommended). The query is essentially a join between objects. This sample code shows, how to specify the date (Line 14) and an array (Line 10) for the in clause as parameters.  Line 16 assigns the searched values to an Iterable interface, where each QueryResult is a record containing the selected columns. </p><br /> </li><br /><br /> <li><strong>Iterating the results:</strong> <br /><br /> <pre class="brush: java; toolbar: false; highlight: [1,2,7]">for(QueryResult record: results) {<br /> Object documentName=record.getPropertyByQueryName("D.cmis:name").getFirstValue();<br /> logger.info("D.cmis:name " + ": " + documentName);<br /><br /> Object documentReference=record.getPropertyByQueryName("D.cmis:objectId").getFirstValue();<br /> logger.info("--------------------------------------");<br /> logger.info("Content URL: http://cmis.alfresco.com/service/cmis/content?conn=default&id="+documentReference);<br />}</pre><br />As explained above, we get a Iterable result-set to iterate over the individual records. To fetch the first value from the record (as there might be multiple valued attributes), I am using the getFirstValue method of the PropertyData interface.  Note Line 7 as it contains the actual URL of the resource, which is just a base URL to which the object id of the matched document is appended. </li><br /><br /> <li><strong>Closing the connection ?</strong> As per the chemistry javadoc, there is no need to close a session, as it is purely a client side concept, which makes sense as we are not holding a connection here. </li><br /></ol><br /><br /><p><strong>Viewing the results</strong>: To view the actual documents just use the URL’s generated by the log statement in the browser.</p><br /><br /><p><strong>Building the code</strong>: Add the following dependency to maven for building the sample. </p><br /><br /><pre class="brush: xhtml; toolbar: false"> <dependency><br /> <groupId>org.apache.chemistry.opencmis</groupId><br /> <artifactId>chemistry-opencmis-client-impl</artifactId><br /> <version>0.12.0</version><br /> </dependency></pre><br /><br /><p><strong>Wrapping up</strong>: I have just covered one example of the CMIS Query API  and Apache chemistry to query for the documents. Kindly refer to the documentation links provided in reference section for other usages.  Below, is the gist that contains the entire sample code. </p><br /><br /><br /><script src="https://gist.github.com/ramannanda9/c85e52225645e3b9db4c.js"></script><br /><br /><p><strong>References</strong>:</p><br /><a title="https://wiki.alfresco.com/wiki/CMIS_Query_Language" href="https://wiki.alfresco.com/wiki/CMIS_Query_Language">CMIS_Query_Language</a> <a href="http://chemistry.apache.org/java/examples/index.html">Java Examples for Apache Chemistry</a> Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-48974740243411741312015-02-20T12:45:00.001+05:302015-02-26T20:42:18.221+05:30Retrofit custom JSON deserializer<p><strong>Retrofit</strong> uses <strong>Google’s</strong> <strong>gson</strong> libraries to <strong>deserialize</strong> <strong>JSON</strong> representation to Java object representation. Although, this deserialization process works for most of the cases, sometimes you would have to override the deserialization process to parse a part of the response or because you don’t have any clear object representation of the JSON data. </p> <p>In this post, I will share an example of a custom deserializer to parse the response from <strong>Wiktionary’s</strong> word definition API. First, let us take a look at the request and response. </p> <p>The request URL is mentioned below:-</p> <p><a title="http://en.wiktionary.org/w/api.php?format=json&action=query&titles=sublime&prop=extracts&redirects&continue" href="http://en.wiktionary.org/w/api.php?format=json&action=query&titles=sublime&prop=extracts&redirects&continue">http://en.wiktionary.org/w/api.php?format=json&action=query&titles=sublime&prop=extracts&redirects&continue</a></p> <p>The response is mentioned below, It has been shortened for brevity. </p> <pre class="brush: javascript; toolbar: false">{"batchcomplete":"","query":{"pages":{"200363":{"pageid":200363,"ns":0,"title":"sublime","extract":"<p></p>\n<h2><span id=\"English\">English</span></h2>\n<h3><span id=\"Pronunciation\">Pronunciation</span></h3>\n<ul><li>\n</li>\n<li>Rhymes: <span lang=\"\">-a\u026am</span></li>\n</ul><h3><span id=\"Etymology_1\">Etymology 1</span></h3>\n<p>From <span>Middle English</span> <i class=\"Latn mention\" .......for brevity }}}} </pre><br /><br /><p>As, you can see the data we would be interested in is extract and probably the pageid. Now, as there is no straightforward object representation of this entire response in Java, so we would implement our own custom deserializer to parse this JSON response. </p><br /><br /><p>The code for the deserializer  is mentioned below. </p><br /><br /><pre class="brush: java; toolbar: false; highlight: [7,14]">public class DictionaryResponseDeserializer implements JsonDeserializer<WicktionarySearchResponse> {<br /><br /> @Override<br /> public WicktionarySearchResponse deserialize(JsonElement json, Type typeOfT, JsonDeserializationContext context) throws JsonParseException {<br /> Gson gson=new Gson();<br /> JsonElement value = null;<br /> value = json.getAsJsonObject().get("query").getAsJsonObject().get("pages");<br /> WicktionarySearchResponse response = new WicktionarySearchResponse();<br /> if(value!=null) {<br /> Iterable<Map.Entry<String, JsonElement>> entries = value.getAsJsonObject().entrySet();<br /> Query query = new Query();<br />ArrayList<ResultPage> resultPages = new ArrayList<ResultPage>();<br /> for (Map.Entry<String, JsonElement> entry : entries) {<br /> resultPages.add(new Gson().fromJson(entry.getValue(), ResultPage.class));<br /><br /> }<br /> query.setPages(resultPages);<br /> response.setQuery(query);<br /> }<br /><br /><br /> return response;<br /> }<br />}</pre><br /><br /><p>Pay special attention to the highlighted lines. On the first highlighted line, we are assigning the JsonElement with the value of the object that contains all the pages from the JSON response, as we are interested in only that data.  Next, we iterate the assigned value and as we are interested in the actual values and not the keys (as the key pageid is already present in the individual pageid objects), so we just use entry.getValue to obtain that and then transform it to a Java POJO instance using the GSON object instance. </p><br /><br /><p>Below, I have mentioned the service interface and an util class to invoke the word search API.</p><br /><br /><pre class="brush: java; toolbar: false">public interface DictionaryService {<br /><br /> @GET("/w/api.php")<br /> public void getMeaningOfWord(@QueryMap Map<String, String> map, Callback<WicktionarySearchResponse> response);<br /><br /> @GET("/w/api.php")<br /> public WicktionarySearchResponse getMeaningOfWord(@QueryMap Map<String, String> map);<br />}</pre><br /><br /><pre class="brush: java; toolbar: false">/**<br /> * Created by Ramandeep on 07-01-2015.<br /> */<br />public class DictionaryUtil {<br /> private static final String tag="DictionaryUtil";<br /> private static Gson gson= initGson();<br /><br /> private static Gson initGson() {<br /> if(gson==null){<br /> gson= new GsonBuilder().registerTypeAdapter(WicktionarySearchResponse.class,new DictionaryResponseDeserializer()).create();<br /> }<br /> return gson;<br /> }<br /><br /> public static WicktionarySearchResponse searchDefinition(String word){<br /> WicktionarySearchResponse searchResponse=null;<br /> RestAdapter restAdapter = new RestAdapter.Builder()<br /> .setEndpoint("http://wiktionary.org").setConverter(new GsonConverter(gson))<br /> .build();<br /> DictionaryService serviceImpl= restAdapter.create(DictionaryService.class);<br /> Map queryMap=new HashMap();<br /> queryMap.put("action","query");<br /> queryMap.put("prop","extracts");<br /> queryMap.put("redirects",null);<br /> queryMap.put("format","json");<br /> queryMap.put("continue",null);<br /> queryMap.put("titles",word);<br /> try {<br /> searchResponse= serviceImpl.getMeaningOfWord(queryMap);<br /> }catch (Exception e){<br /> if(e==null&&e.getMessage()!=null) {<br /> Log.e(tag, e.getMessage());<br /> }<br /> }<br /> return searchResponse;<br /><br /> }<br /><br /><br /><br /><br />}</pre><br /><br /><p> </p><br /><br /><p> </p><br /><br /><p>Below, I have mentioned the POJO classes. In order of hierarchy. </p><br /><br /><pre class="brush: java; toolbar: false">public class WicktionarySearchResponse {<br /><br />private Query query=null;<br /><br /> public Query getQuery() {<br /> return query;<br /> }<br /><br /> public void setQuery(Query query) {<br /> this.query = query;<br /> }<br />}</pre><br /><br /><pre class="brush: java; toolbar: false">public class Query {<br /><br /><br /> public List<ResultPage> getPages() {<br /> return pages;<br /> }<br /><br /> public void setPages(List<ResultPage> pages) {<br /> this.pages = pages;<br /> }<br /><br /> private List<ResultPage> pages=null;<br /><br /><br />}</pre><br /><br /><pre class="brush: java; toolbar: false">public class ResultPage {<br /> private long pageId;<br /> private String title;<br /> private int index;<br /> private String extract;<br /><br /> public ResultPage() {<br /> }<br /><br /> public long getPageId() {<br /> return pageId;<br /> }<br /><br /> public void setPageId(long pageId) {<br /> this.pageId = pageId;<br /> }<br /><br /> public String getTitle() {<br /> return title;<br /> }<br /><br /> public void setTitle(String title) {<br /> this.title = title;<br /> }<br /><br /> public int getIndex() {<br /> return index;<br /> }<br /><br /> public void setIndex(int index) {<br /> this.index = index;<br /> }<br /><br /> public String getExtract() {<br /> return extract;<br /> }<br /><br /> public void setExtract(String extract) {<br /> this.extract = extract;<br /> }<br />}</pre> Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.comtag:blogger.com,1999:blog-7351084055463323761.post-82234130459961605412015-01-15T00:16:00.001+05:302015-01-15T00:26:22.776+05:30Simply Read: A Rss/Atom Feed Reader for android<p>In the past few weeks, I had been working on an android application, for reading feed articles, there were quite a few takeaways from that experience and I am just sharing few of those, along with the link and features to the application.  </p> <p>The app can be downloaded from the link: <a title="https://play.google.com/store/apps/details?id=com.blogspot.ramannanda.apps.simplyread" href="https://play.google.com/store/apps/details?id=com.blogspot.ramannanda.apps.simplyread">Simply Read on Play Store</a></p> <p>Takeaways: -</p> <ul> <li><strong>Parsing is slow</strong>: As I had to design full text content scraper and parser for enabling the user to view the full content of articles, I soon realized the multiple iterations of parsing and scraping can be painfully slow.  The same thing that would execute in sub-second on local machine would take 30-40 seconds on an android device, which is unbearable for an end user.  Considering that my device was pretty fast, this could be attributed to performance snags with the standard java API implementation by Google.  <strong>Solution</strong>: Move the parsing code to the server, but provide the user an option to use the parser within the app,  create a rest service which returns the parsed article. <a href="http://lh3.ggpht.com/-uFoWHGY1f7o/VLa5e3lt2dI/AAAAAAAABQg/6on2IWz53Yc/s1600-h/can_encode_drill_down%25255B12%25255D.png"><img title="can_encode_drill_down" style="border-top: 0px; border-right: 0px; border-bottom: 0px; float: none; margin-left: auto; border-left: 0px; display: block; margin-right: auto" border="0" alt="can_encode_drill_down" src="http://lh4.ggpht.com/-4hKgPlTsRUg/VLa5f0fiLVI/AAAAAAAABQo/M-K_EiYq7OY/can_encode_drill_down_thumb%25255B10%25255D.png?imgmax=800" width="244" height="133" /></a>  <a href="http://lh3.ggpht.com/-q6rIIOTo_mw/VLa5hRbrV7I/AAAAAAAABQw/ANQ0m8knDiU/s1600-h/can_encode_slow%25255B9%25255D.png"><img title="can_encode_slow" style="border-top: 0px; border-right: 0px; border-bottom: 0px; float: none; margin-left: auto; border-left: 0px; display: block; margin-right: auto" border="0" alt="can_encode_slow" src="http://lh4.ggpht.com/-ZD0uc5rdayA/VLa5idbwxpI/AAAAAAAABQ4/dRsDn8Edb-E/can_encode_slow_thumb%25255B7%25255D.png?imgmax=800" width="244" height="133" /></a> </li> <li><strong>Authenticating user requests</strong>:  This is an interlinked problem to the first one, when I had to move the parsing code to the server side, I needed to ensure that only authenticated user with the application send the parsing requests as I just could not allow everyone to query the backend and retrieve the parsed article.  <strong>Solution</strong>: Use Google+ sign in with server side validation of the client oauth tokens, this ensured that the requests originated from my application on an android device and the user was authenticated before making a parse request. </li> <li><strong>Not Using Content providers, So Managing Data Refresh: </strong>I chose not to use content providers and instead chose to go with the native API’s for fetching data from the SQLite database and doing the updates. The obvious problem that arises is to manage data refresh across different activities. <strong>Solution</strong>: I used otto coupled with the loaders to manage data refresh across activities. Using Otto ensured loose coupling of components. </li> <li><strong>ProGuard and Retrofit</strong> don’t gel well together:  There are quite a few standard exclusion rules that you would have to write to get retrofit to work with <strong>ProGuard</strong>.  Just make sure also to exclude classes and attributes that you are going to use with GSON to convert JSON to Object representation.  Here’s a snippet of the rules. <pre class="brush: java; toolbar: false; highlight: [14]">-keepattributes Signature<br />-keepattributes *Annotation*<br />-keep interface com.squareup.** { *; }<br />-dontwarn rx.**<br />-dontwarn retrofit.**<br />-keep class com.squareup.** { *; }<br />-keep class retrofit.** { *; }<br /><br />-keepclasseswithmembers class * {<br /> @retrofit.http.* <methods>;<br />}<br /><br />//now exclude the response classes and Pojo's<br />-keep class com.blogspot.ramannanda.apps.simplyread.model.rest.SampleResponse {*;}</pre><br /> </li><br /></ul><br /><br /><p> </p><br /><br /><p>Although there are a lot of other takeaways. I am going to keep this brief and look forward to hearing your feedback about the app. </p><br /><br /><div id="scid:66721397-FF69-4ca6-AEC4-17E6B3208830:f8f68c2d-0a77-4466-b83d-e081737fdf4b" class="wlWriterEditableSmartContent" style="width: 340px; float: none; padding-bottom: 0px; padding-top: 0px; padding-left: 0px; margin: 0px auto; display: block; padding-right: 0px"><a style="border:0px" href="https://onedrive.live.com/redir.aspx?cid=bb4c0c79b03cc854&page=browse&resid=BB4C0C79B03CC854!1767&parId=BB4C0C79B03CC854!107&type=5"><img style="border:0px" alt="View Simply Read" src="http://lh3.ggpht.com/-j3zy-IcBy84/VLa5jHHM8LI/AAAAAAAABP0/o0gIfCiAFUk/InlineRepresentation041c3b0c-2a29-455b-a9c6-83650ac9bb7f%25255B3%25255D.jpg?imgmax=800" /></a><div style="width:340px;text-align:right;" ><a href="https://onedrive.live.com/redir.aspx?cid=bb4c0c79b03cc854&page=browse&resid=BB4C0C79B03CC854!1767&parId=BB4C0C79B03CC854!107&type=5">View Full Album</a></div></div> Anonymoushttp://www.blogger.com/profile/03385350305653681268noreply@blogger.com