Certificate Program in Big Data Foundation+Engineering

Blended Learning | 6 Months (Inclusive of Project - 1 Month) | INR 1,18,000/-

Certificate Program in Big Data Foundation+Engineering | Category : IT

Why Join Certificate Program in Big Data Foundation+Engineering

  • Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions
  • Function effectively as an individual, and as a member or leader in diverse teams, in the Data science corporate and research world
  • Demonstrate knowledge and understanding of data sciences and apply these to one’s own work, as a member and leader in a team, to manage projects and in multidisciplinary environments


Java / Python / SQL / NoSQL / MapReduce / Tableau / R / Hive / Pig / Yarn / Ozee

  • Program Structure
  • Certificate
  • Curriculum & Learning Outcomes
  • Admission & Fee Details
  • Contact Us & FAQs
  • Download Brochure


This Program is designed to train you to use Hadoop as a Data Management tool using languages like R and Java. More than 70% hands-on learning of Apache projects like Hive, Pig, Yarn, Oozie including data visualisation platform Tableau ensures that you will be Hadoop-friendly. Introductory overview of Apache Spark will also assist you in completing the 1 Month Capstone Project which in turn ensures practical implementation of Hadoop based research idea.

Program Duration

6 Months (Inclusive of Project - 1 Month)

Mode Of Delivery

Blended Learning



6 Months Certificate program in Big Data Foundation + Engineering
Big data processing and cloud computing

-Big  data processing: Value of Big data, History, Hadoop Development
-Cloud Computing with AWS: EC2,S3

Getting Hadoop up and running

-Fundamental of Python programming language & Setup in Windows.
-Hadoop on Local Ubuntu Host
-Setting Up Hadoop, Downloading Hadoop
-Setting Up SSH
-Using Hadoop to calculate pi
-Configuring the pseudo-distributed mode
-Changing the base HDFS Directory
-Formatting the name node
-Starting Hadoop
-Using Hdfs
-Wordcount, the Hello world Mapreduce
-Using Ealstic mapreduce
-Wordcount in EMR using management console
-Comparison of Local Vs EMR Hadoop

Understanding Mapreduce

-Key Value pairs
-Hadoop java API for Mapreduce
-Mapreduce program:Setting up classpath
-Implimenting Wordcount
-Buiding a Jar file
-Running wordcount on Local Hadoop cluster
-Running wordcount on EMR
-WordCount with combiner
-Fixing wordcount to work with combiner
-Hadoop specific data types
-Using the writable wrapper class

Developing Mapreduce programs

-Using languages other than Java with Hadoop
-Word count using streaming
-Analysing large dataset
-Summarizing the UFO data
-Summarizing the shape data
-Correlating sighting duration to UFO shape
-Performing the shape/time analysis from the command line
-Using chain/mapper for field validation/analysis
-Using the distributed cache to improve location output
-Counters, status and other output, creating counters, task states and writing log output

Advanced Mapreduce techniques

-Simple,Adcvanced,and in-between joins
-Reduce-sidejoins using MultipleInputs
-Graph Algorithms
-Representing the graph
-Creating the source code
-The first run
-the second run
-the third run
-the fourth and the last run
-Using language -independent data structures
-Getting and installing Avro
-Defining the schema
-Creating the source Avro data with ruby
-Consumung the Avaro data with Java
-Generating the shape summaries in Mapreduce
-Examining the output data with ruby
-Examining output data with Java

Node Breaking

-Hadoop Node Failure
-Killing a data node process
-The replication factor in action
-Intentionally causing missing blocks
-Killing a task tracker process
-Killing the job tracker
-Killing the namenode process
-Causing the task failure
-Handling dirty data by using skip mode

Keeping things running
- Brawsing default properties
- Setting up cluster
- Examining a default rack configuration
- Adding a rack awareness script
- Cluster access control
- Demonstrating the default security
- Managing the namenode
- Adding an additional fsimage location
- Swapping for the new namenode host
- Managing HDFS
- Mapreduce management
- Changing Job priorities and killing a job
- Scalling
Relational view on data with Hive
- Overview of Hive
- Setting up Hive
- Installing Hive
- Creating a table for the UFO data
- Inserting the UFO data
- SAS user interface
- Validating the table
- Redefining the table with the correct column separator
- Creating a table  with correct column separator
- Creating a table from an existing file
- Performing a join
- Using views
- Exporting query output
- Making a partitioned UFO sighting table
- Adding a new user defined function(UDF)
- Hive on AWS-T
- Running UFO analysis on EMR
Working with relational database
- Common data paths
- Installing and setting up MySQL
- Configuring MySQL to allow remote connections
- Setting up the employee database
- Getting data into Hadoop
- Exporting data from MySQL to HDFS
- Exporting data from MySQL to HDFS
- Exporting data from MySQL into Hive
Data collection with flume
- AWS Services
- Getting web server data into Hadoop
- Introducing Apache flume
- Installing and configuring Flume
- Capturing network traffic to a log file
- Logging to the console
- Capturing the output of a command in a flat file
- Capturing a remote file in alocal flat file
- Writing network traffic onto HDFS
- Adding timestamps
- Multilevel flume Networks
- Writing to multiple sinks
Apache projects and programming abstraction
- Hbase
- Oozie
- Pig
- R for Analytics
Data visualisation: Tableau
- Introduction
- Installation of Tableau Desktop
- Tableau architecture
- Installation of Tableau Desktop
- Tableau server component
- Tableau Environment
- Tableau Workspace
- Build views in tableau
- Connect to data source in tableau
- Export DB connection in tableau
- Data blending in tableau
- Joining tables in tableau
- Data Bins in Tableau
- Creating a dashboard in Tableau
- Tableau Desktop shortcuts Cheat Sheet.



  • Engineering, Graduate and Post Graduates (Computer Science, Information Technology)(Electronics and telecommunication, Electronics (With pre-requisite test
  • Science Graduate (BSC IT, BCA) Diploma Engineer of above branches (With pre-requisite test)
  • Phd-Persuing in the above mentioned domains


INR 1,18,000/-

Please consult your Admission Counselor for flexi-EMI options


Please reach out to the admission office if you have any queries
100% Placement Assistance with Leading Corporates

Our placement assistance program offers students one-on-one career counselling, and the chance to work with our corporate partners.

What if I am unable to complete the course?

In case you drop out of the course due to a genuine reason, you will have 6 months’ time to return to it. If you fail to return to your course within this period, you will have to start afresh.

How will I pay for this course?

You can pay for the course of your interest on the website by clicking on the Fees tab via e-wallets, net banking, credit cards, debit cards as well as NEFT/Bank Transfer

What is the refund policy?

Refund must be claimed before the commencement of your batch. The Application form fees are non-refundable. Skillville will deduct 20% of the program fees paid till date of application for refund towards administrative charges and 80% will be refundable within 1 month from the approval of refund by a Skillville Authorized representative

What if you miss a class?

You will have an opportunity to catch up with a simultaneous batch in session or you can reach the respective Faculty to cover the missed class.

Whom should I contact in case of any purchase related query?

Please contact your Admission Counselor or drop an email regarding your querry to admissions@skillville.in.

Do I get a certificate of participation at the end of the training program?

Yes, you will get a certificate after completing your program after you meet the attendance and evaluation criteria set for your respective program.