java hadoop Interview Questions

21.What is meant by Data node?

Data node is the slave deployed in each of the systems and provides the actual storage locations and serves read and writer requests for clients.

22.What is daemon?

Daemon is the process that runs in background in the UNIX environment. In Windows it is ‘services’ and in DOS it is ‘TSR’

23.What is the function of ‘job tracker’?

Job tracker is one of the daemons that runs on name node and submits and tracks the MapReduce tasks in Hadoop. There is only one job tracker who distributes the task to various task trackers. When it goes down all running jobs comes to a halt.

24.What is the role played by task trackers?

Daemons that run on What data nodes, the task tracers take care of individual tasks on slave node as entrusted to them by job tracker.

25.What is meant by heartbeat in HDFS?

Data nodes and task trackers send heartbeat signals to Name node and Job tracker respectively to inform that they are alive. If the signal is not received it would indicate problems with the node or task tracker.

26.Is it necessary that Name node and job tracker should be on the same host?

No! They can be on different hosts.

27.What is meant by ‘block’ in HDFS?

Block in HDFS refers to minimum quantum of data for reading or writing. Default block size is 64 MB in HDFS. If a file is 52 MB then HDFS would store it and leave 12 MB empty and ready to use

28.Can blocks be broken down by HDFS if a machine does not have the capacity to copy as many blocks as the user wants?

Blocks in HDFS cannot be broken. Master node calculates the required space and how data would be transferred to a machine having lower space.

29.What is the process of indexing in HDFS?

Once data is stored HDFS will depend on the last part to find out where the next part of data would be stored.

30.How a data node is identified as saturated?

When a data node is full and has no space left the name node will identify it.

31.What type of data is processed by Hadoop?

Hadoop processes the digital data only.

32.How Name node determines which data node to write on?

Name node contains metadata or information in respect of all the data nodes and it will decide which data node to be used for storing data.

33.Who is the ‘user’ in HDFS?

Anyone who tries to retrieve data from database using HDFS is the user. Client is not end user but an application that uses job tracker and task tracker to retrieve data.

34.How the client communicates with Name node and Data node in HDFS?

The communication mode for clients with name node and data node in HDFS is SSH.

35.What is a rack in HDFS?

Rack is the storage location where all the data nodes are put together. Thus it is a physical collection of data nodes stored in a single location.

36.What is Big Data?

Big data is defined as the voluminous amount of structured, unstructured or semi-structured data that has huge potential for mining but is so large that it cannot be processed using traditional database systems. Big data is characterized by its high velocity, volume and variety that requires cost effective and innovative methods for information processing to draw meaningful business insights. More than the volume of the data – it is the nature of the data that defines whether it is considered as Big Data or not

37. What do the four V’s of Big Data denote?

IBM has a nice, simple explanation for the four critical features of big data:
a) Volume –Scale of data
b) Velocity –Analysis of streaming data
c) Variety – Different forms of data
d) Veracity –Uncertainty of data
Here is an explanatory video on the four V’s of Big Data

38.How big data analysis helps businesses increase their revenue? Give example.

Big data analysis is helping businesses differentiate themselves – for example Walmart the world’s largest retailer in 2014 in terms of revenue - is using big data analytics to increase its sales through better predictive analytics, providing customized recommendations and launching new products based on customer preferences and needs. Walmart observed a significant 10% to 15% increase in online sales for $1 billion in incremental revenue. There are many more companies like Facebook, Twitter, LinkedIn, Pandora, JPMorgan Chase, Bank of America, etc. using big data analytics to boost their revenue. Here is an interesting video that explains how various industries are leveraging big data analysis to increase their revenue

39.Name some companies that use Hadoop.

Yahoo (One of the biggest user & more than 80% code contributor to Hadoop)
Facebook
Netflix
Amazon
Adobe
eBay
Hulu
Spotify
Rubikloud
Twitter

40.Differentiate between Structured and Unstructured data.

Data which can be stored in traditional database systems in the form of rows and columns, for example the online purchase transactions can be referred to as Structured Data. Data which can be stored only partially in traditional database systems, for example, data in XML records can be referred to as semi structured data. Unorganized and raw data that cannot be categorized as semi structured or structured data is referred to as unstructured data. Facebook updates, Tweets on Twitter, Reviews, web logs, etc. are all examples of unstructured data.

« Previous | 0 | 1 | 2 | 3 | 4 | Next »

The largest Interview Solution Library on the web