0% found this document useful (0 votes)
329 views3 pages

Data Lake Development in Clinical Analytics

Keshav Balivada has over 4 years of experience working with big data technologies like Apache NiFi, Hadoop, HDFS, Hive, Impala, Sqoop, and Spark SQL. He is currently a Senior Associate at Syneos Health working on a project to create a centralized clinical study data lake. Previous experience includes projects with Ernst & Young to build data lakes for banking and pharmaceutical clients. Technologies used include Python, SQL, Hive, Impala, MongoDB, SAS, and Spotfire.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
329 views3 pages

Data Lake Development in Clinical Analytics

Keshav Balivada has over 4 years of experience working with big data technologies like Apache NiFi, Hadoop, HDFS, Hive, Impala, Sqoop, and Spark SQL. He is currently a Senior Associate at Syneos Health working on a project to create a centralized clinical study data lake. Previous experience includes projects with Ernst & Young to build data lakes for banking and pharmaceutical clients. Technologies used include Python, SQL, Hive, Impala, MongoDB, SAS, and Spotfire.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

KESHAV BALIVADA

Email: keshavbalivada@[Link]
Contact No.: +91-8500360567
Work Experience: 4 years

PROFESSIONAL SUMMARY
 Currently working as Senior Associate, IT in SYNEOS HEALTH (Jan 2019 to Present) in BT
Application Engineering (Big data Analytics) team.
 2+ years of experience as Senior Analyst in ERNST&YOUNG, GDS Bangalore (Sept 2016 to Dec
2018).
 Good exposure in Python, Big data platform for banking domain.
 Hands on experience with Apache NiFi, Hadoop, HDFS, HUE, Hive, Impala, Sqoop,
Beeline, Oozie, Spark SQL and UNIX shell scripting.
 Hands on experience in writing HiveQL queries to process data for analysis.

EDUCATION

Bachelors of Technology from M.V.G.R College of Engineering in the stream of Mechanical Engineering.

TECHNICAL SKILLS & TOOLS

Languages : Good exposure on Python, SQL


Operating Systems : Windows XP/Vista, Linux
Visualizations : Spotfire, PowerBI
Database : SqlServer, MySQL, MongoDB
Hadoop Ecosystems : Hadoop,HUE, Hive,Impala, Sqoop, Oozie.
ETL : Apache Nifi, Alteryx.

PROJECT EXPERIENCE

Project#1:

Project Title : Sponsor Integration

Distribution : HortonWorks

Company : Syneos Health

Duration : Jan 2019 to Present

Creating a centralized data repository (Data Lake) for clinical study data and transform the data as per
sponsor requirement using Hive and Apache Nifi.
 Involved in end to end deployment of data lake in both test and production environment
 Design and develop layers of Data Lake.
 Transforming the data based on STM using Hive and creating tables.
 Creating and scheduling Apache NiFi workflows to send study data to sponsor on weekly basis.
 Analyzed large data sets by running Hive queries
 Involved in unit testing
 Handling the development for each sponsor study data
 Gained knowledge in Pharma and clinical domain.

Technology used: Hadoop, Apache Hive, Python, Apache NiFi, Groovy scripting.

Project #2:

Project Title : PNC Bank, US

Distribution : Cloudera

Company : Ernst & Young, Bangalore

Duration : 1 year 2 months

Creating a centralized data repository (Data Lake) for retail bank data and generate monthly reports
through SAS enterprise.
 Involved in analyzing the system and business with clients.
 Design and develop layers of Data Lake.
 Creating the HIVE external tables on top of the loaded data from HDFS.
 Involved in executing the hive scripts in Impala
 Analyzed large data sets by running Hive queries
 Involved in unit testing
 Handling the development for credit card area.
 Migrating existing SAS scripts to python.

Technology used: SAS, Cloudera Platform, Hadoop, Sqoop, Apache Hive, Impala, Python, Pyspark
Project #3:

Project Title : CITIBANK, US

Software/Tools/Technology Used : Python, JavaScript, MongoDB

Worked as a single resource in creating interactive Voice analytics Dashboards, Worked on text analytics
using python NLP (NLTK and Frame Net).
 Created a working prototype for highlighting text in a pdf with a dictionary of keywords using
python, NLTK
 Hosted the dashboards using python flask
 Worked directly with Clients.
 Ensured deliverables are on time.

Project #4:

Project Title : Chemours, US

Software/Tools/Technology Used : MS SQL Server 2012, SAP Database


 Understood and developed the high end business logic
 Worked as a single resource in creating SQL scripts for User to role mapping.
 Created visualization dashboards using Spotfire

Deep Learning POC:

 Currently upskilling on Deep learning out of my interest.


 Developing various POC’s using deep learning algorithms on Text extraction , OCR and Speech
to text converter.
Technology used: Python using fastai , tensorflow, keras, currently upskilling on pytorch

Hackathon In Ernst&Young:

Performed Data Quality checks for UK based real time project.


Technology used: Python with Machine Learning Algorithms

Extra-Curricular Activities:

 Allocate resources to the projects in pipeline.


 Create Bench reports of resources on weekly basis.
 Successfully looking after the resource management for FSO and generate weekly & monthly
reports to the US team.
 Handled a team of five across locations within India.

Achivements:
 Won 4th position in Hackathon out of 27 teams. Topic: Creating Data quality checks using
machine learning and deep learning python.
 Won Extra Miler award with 10000 INR.
 Won Spot Award with 2000 INR.
 Won Reward & Recognition award for all-rounder in learning, practice management and
knowledge transfer.
 Won Onsite Recognition award with 25000 INR for being the best resource in the development of
Data Lake for a Retail Bank Project.

You might also like