With the combination of Map Reduce and the HDFS module, Hadoop aims to implement reliable, scalable and distributed computing.
Aside from the Hadoop framework, numerous other popular open source Apache projects are related to Hadoop, including HBase, Hive, Mahout, Pig and Zoo Keeper.
The open source Apache Hadoop project, which adopts the Map Reduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services.
In this article, we present Map Reduce frame-based applications that can be employed in the next-generation sequencing and other biological domains.
Computing and sequencing capability has recently improved fast.
Given the urgency of establishing a new computational framework, high-performance computing has become extremely important for large-scale data analysis.URL: A modified version of the Hadoop Map Reduce framework.URL: Apache Hadoop and Microsoft Dryad/Azure are widely implemented in local systems, not only for their parallel capability but also for their easy deployment in a commodity hardware cluster.MPI is a widely used traditional parallel programming model.Although MPI provides a powerful API for the general programmers, researchers with a biology background still consider the program complicated.The GPU architecture, in which one GPU chip contains hundreds of cores with threads that run in such cores, enables a GPU to execute multiple tasks simultaneously.According to the NVIDIA CUDA Programming Guide 4.0, the CUDA programming model is divided into three hierarchies, in which threads comprise a thread block and thread blocks comprise a grid.Dryad  is an extension of Map Reduce from Microsoft and Azure is one of Microsoft’s cloud technologies .With Dryad, Microsoft proposes the use of a directed acyclic graph (DAG) to combine computational ‘vertices’ with communication ‘channels’ to model data flow graphs .Given that the number of nodes may increase up to a certain threshold, the network incurs huge data transfer costs.However, the strong extensibility of Hadoop mitigates this problem. However, when a node fails in the Hadoop system, the framework will launch the same job in another normal node.