REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE
SENSING APPLICATION
ABSTRACT:
In today’s era, there is a great deal added to real-time remote sensing Big Data than it seems at first, and extracting the useful information in an efficient manner leads a system toward a major computational challenges, such as to analyze, aggregate, and store, where data are remotely collected. Keeping in view the above mentioned factors, there is a need for designing a system architecture that welcomes both realtime, as well as offline data processing. In this paper, we propose real-time Big Data analytical architecture for remote sensing satellite application.
The proposed architecture comprises three main units:
1) Remote sensing Big Data acquisition unit (RSDU);
2) Data processing unit (DPU); and
3) Data analysis decision unit (DADU).
First, RSDU acquires data from the
satellite and sends this data to the Base Station, where initial processing
takes place. Second, DPU plays a vital role in architecture for efficient
processing of real-time Big Data by providing filtration, load balancing, and parallel
processing. Third, DADU is the upper layer unit of the proposed architecture,
which is responsible for compilation, storage of the results, and generation of
decision based on the results received from DPU.
INTRODUCTION:
EXISTING SYSTEM:
Existing methods inapplicable on standard computers it is not desirable or possible to load the entire image into memory before doing any processing. In this situation, it is necessary to load only part of the image and process it before saving the result to the disk and proceeding to the next part. This corresponds to the concept of on-the-flow processing. Remote sensing processing can be seen as a chain of events or steps is generally independent from the following ones and generally focuses on a particular domain. For example, the image can be radio metrically corrected to compensate for the atmospheric effects, indices computed, before an object extraction based on these indexes takes place.
The typical processing chain will process the whole image for each step, returning the final result after everything is done. For some processing chains, iterations between the different steps are required to find the correct set of parameters. Due to the variability of satellite images and the variety of the tasks that need to be performed, fully automated tasks are rare. Humans are still an important part of the loop. These concepts are linked in the sense that both rely on the ability to process only one part of the data.
In the case of simple algorithms, this is
quite easy: the input is just split into different non-overlapping pieces that
are processed one by one. But most algorithms do consider the neighborhood of
each pixel. As a consequence, in most cases, the data will have to be split
into partially overlapping pieces. The objective is to obtain the same result
as the original algorithm as if the processing was done in one go. Depending on
the algorithm, this is unfortunately not always possible.
DISADVANTAGES:
PROPOSED SYSTEM:
We present a remote sensing Big Data analytical architecture, which is used to analyze real time, as well as offline data. At first, the data are remotely preprocessed, which is then readable by the machines. Afterward, this useful information is transmitted to the Earth Base Station for further data processing. Earth Base Station performs two types of processing, such as processing of real-time and offline data. In case of the offline data, the data are transmitted to offline data-storage device. The incorporation of offline data-storage device helps in later usage of the data, whereas the real-time data is directly transmitted to the filtration and load balancer server, where filtration algorithm is employed, which extracts the useful information from the Big Data.
On the other hand, the load balancer balances the processing power by equal distribution of the real-time data to the servers. The filtration and load-balancing server not only filters and balances the load, but it is also used to enhance the system efficiency. Furthermore, the filtered data are then processed by the parallel servers and are sent to data aggregation unit (if required, they can store the processed data in the result storage device) for comparison purposes by the decision and analyzing server. The proposed architecture welcomes remote access sensory data as well as direct access network data (e.g., GPRS, 3G, xDSL, or WAN). The proposed architecture and the algorithms are implemented in applying remote sensing earth observatory data.
We proposed architecture has the
capability of dividing, load balancing, and parallel processing of only useful
data. Thus, it results in efficiently analyzing real-time remote sensing Big
Data using earth observatory system. Furthermore, the proposed architecture has
the capability of storing incoming raw data to perform offline analysis on
largely stored dumps, when required. Finally, a detailed analysis of remotely
sensed earth observatory Big Data for land and sea area are provided using
.NET. In addition, various algorithms are proposed for each level of RSDU, DPU,
and DADU to detect land as well as sea area to elaborate the working of
architecture.
ADVANTAGES:
Big Data process high-speed, large amount of real-time remote sensory image data using our proposed architecture. It works on both DPU and DADU by taking data from medical application.
Our architecture for offline as well online traffic, we perform a simple analysis on remote sensing earth observatory data. We assume that the data are big in nature and difficult to handle for a single server.
The data are continuously coming from a satellite with high speed. Hence, special algorithms are needed to process, analyze, and make a decision from that Big Data. Here, in this section, we analyze remote sensing data for finding land, sea, or ice area.
We have used the proposed architecture to perform
analysis and proposed an algorithm for handling, processing, analyzing, and
decision-making for remote sensing Big Data images using our proposed
architecture.
HARDWARE REQUIREMENT:
ARCHITECTURE DIAGRAM
MODULES:
DATA ANALYSIS DECISION UNIT (DADU):
DATA PROCESSING UNIT (DPU):
REMOTE SENSING APPLICATION RSDU:
FINDINGS AND DISCUSSION:
ALGORITHM
DESIGN AND TESTING:
MODULES DESCRIPTION:
DATA PROCESSING UNIT (DPU):
In data processing unit (DPU), the filtration and load balancer server have two basic responsibilities, such as filtration of data and load balancing of processing power. Filtration identifies the useful data for analysis since it only allows useful information, whereas the rest of the data are blocked and are discarded. Hence, it results in enhancing the performance of the whole proposed system. Apparently, the load-balancing part of the server provides the facility of dividing the whole filtered data into parts and assign them to various processing servers. The filtration and load-balancing algorithm varies from analysis to analysis; e.g., if there is only a need for analysis of sea wave and temperature data, the measurement of these described data is filtered out, and is segmented into parts.
Each processing server has its algorithm
implementation for processing incoming segment of data from FLBS. Each
processing server makes statistical calculations, any measurements, and
performs other mathematical or logical tasks to generate intermediate results
against each segment of data. Since these servers perform tasks independently
and in parallel, the performance proposed system is dramatically enhanced, and
the results against each segment are generated in real time. The results
generated by each server are then sent to the aggregation server for
compilation, organization, and storing for further processing.
DATA ANALYSIS DECISION UNIT (DADU):
DADU contains three major portions, such as aggregation and compilation server, results storage server(s), and decision making server. When the results are ready for compilation, the processing servers in DPU send the partial results to the aggregation and compilation server, since the aggregated results are not in organized and compiled form. Therefore, there is a need to aggregate the related results and organized them into a proper form for further processing and to store them. In the proposed architecture, aggregation and compilation server is supported by various algorithms that compile, organize, store, and transmit the results. Again, the algorithm varies from requirement to requirement and depends on the analysis needs. Aggregation server stores the compiled and organized results into the result’s storage with the intention that any server can use it as it can process at any time.
The aggregation server also sends the
same copy of that result to the decision-making server to process that result
for making decision. The decision-making server is supported by the decision
algorithms, which inquire different things from the result, and then make
various decisions (e.g., in our analysis, we analyze land, sea, and ice,
whereas other finding such as fire, storms, Tsunami, earthquake can also be
found). The decision algorithm must be strong and correct enough that
efficiently produce results to discover hidden things and make decisions. The
decision part of the architecture is significant since any small error in
decision-making can degrade the efficiency of the whole analysis. DADU finally
displays or broadcasts the decisions, so that any application can utilize those
decisions at real time to make their development. The applications can be any
business software, general purpose community software, or other social networks
that need those findings (i.e., decision-making).
REMOTE SENSING APPLICATION RSDU:
Remote sensing promotes the expansion of earth observatory system as cost-effective parallel data acquisition system to satisfy specific computational requirements. The Earth and Space Science Society originally approved this solution as the standard for parallel processing in this particular qualifications for improved Big Data acquisition, soon it was recognized that traditional data processing technologies could not provide sufficient power for processing such kind of data. Therefore, the need for parallel processing of the massive volume of data was required, which could efficiently analyze the Big Data. For that reason, the proposed RSDU is introduced in the remote sensing Big Data architecture that gathers the data from various satellites around the globe as possible that the received raw data are distorted by scattering and absorption by various atmospheric gasses and dust particles. We assume that the satellite can correct the erroneous data.
However, to make the raw data into image
format, the remote sensing satellite uses effective data analysis, remote
sensing satellite preprocesses data under many situations to integrate the data
from different sources, which not only decreases storage cost, but also
improves analysis accuracy. The data must be corrected in different methods to
remove distortions caused due to the motion of the platform relative to the
earth, platform attitude, earth curvature, nonuniformity of illumination,
variations in sensor characteristics, etc. The data is then transmitted to
Earth Base Station for further processing using direct communication link. We
divided the data processing procedure into two steps, such as real-time Big
Data processing and offline Big Data processing. In the case of offline data
processing, the Earth Base Station transmits the data to the data center for
storage. This data is then used for future analyses. However, in real-time data
processing, the data are directly transmitted to the filtration and load
balancer server (FLBS), since storing of incoming real-time data degrades the
performance of real-time processing.
FINDINGS AND DISCUSSION:
Preprocessed and formatted data from satellite contains all or some of the following parts depending on the product.
1) Main product header (MPH): It includes the products basis information, i.e., id, measurement and sensing time, orbit, information, etc.
2) Special products head (SPH): It contains information specific to each product or product group, i.e., number of data sets descriptors (DSD), directory of remaining data sets in the file, etc.
3) Annotation data sets (ADS): It contains information of quality, time tagged processing parameters, geo location tie points, solar, angles, etc.
4) Global annotation data sets (GADs): It contains calling factors, offsets, calibration information, etc.
5) Measurement data set (MDS): It contains measurements or graphical parameters calculated from the measurement including quality flag and the time tag measurement as well. The image data are also stored in this part and are the main element of our analysis.
The MPH and SPH data are in ASCII
format, whereas all the other data sets are in binary format. MDS, ADS, and
GADs consist of the sequence of records and one or more fields of the data for
each record. In our case, the MDS contains number of records, and each record
contains a number of fields. Each record of the MDS corresponds to one row of
the satellite image, which is our main focus during analysis.
ALGORITHM DESIGN AND TESTING:
Our algorithms are proposed to process high-speed, large amount of real-time remote sensory image data using our proposed architecture. It works on both DPU and DADU by taking data from satellite as input to identify land and sea area from the data set. The set of algorithms contains four simple algorithms, i.e., algorithm I, algorithm II, algorithm III, and algorithm IV that work on filtrations and load balancer, processing servers, aggregation server, and on decision-making server, respectively. Algorithm I, i.e., filtration and load balancer algorithm (FLBA) works on filtration and load balancer to filter only the require data by discarding all other information. It also provides load balancing by dividing the data into fixed size blocks and sending them to the processing server, i.e., one or more distinct blocks to each server. This filtration, dividing, and load-balancing task speeds up our performance by neglecting unnecessary data and by providing parallel processing. Algorithm II, i.e., processing and calculation algorithm (PCA) processes filtered data and is implemented on each processing server. It provides various parameter calculations that are used in the decision-making process. The parameters calculations results are then sent to aggregation server for further processing. Algorithm III, i.e., aggregation and compilations algorithm (ACA) stores, compiles, and organizes the results, which can be used by decision-making server for land and sea area detection. Algorithm IV, i.e., decision-making algorithm (DMA) identifies land area and sea area by comparing the parameters results, i.e., from aggregation servers, with threshold values.
IMPLEMENTATION:
Big Data covers diverse technologies same as cloud computing. The input of Big Data comes from social networks (Facebook, Twitter, LinkedIn, etc.), Web servers, satellite imagery, sensory data, banking transactions, etc. Regardless of very recent emergence of Big Data architecture in scientific applications, numerous efforts toward Big Data analytics architecture can already be found in the literature. Among numerous others, we propose remote sensing Big Data architecture to analyze the Big Data in an efficient manner as shown in Fig. 1. Fig. 1 delineates n number of satellites that obtain the earth observatory Big Data images with sensors or conventional cameras through which sceneries are recorded using radiations. Special techniques are applied to process and interpret remote sensing imagery for the purpose of producing conventional maps, thematic maps, resource surveys, etc. We have divided remote sensing Big Data architecture.
Healthcare scenarios, medical practitioners gather
massive volume of data about patients, medical history, medications, and other
details. The above-mentioned data are accumulated in drug-manufacturing
companies. The nature of these data is very complex, and sometimes the
practitioners are unable to show a relationship with other information, which
results in missing of important information. With a view in employing advance
analytic techniques for organizing and extracting useful information from Big
Data results in personalized medication, the advance Big Data analytic
techniques give insight into hereditarily causes of the disease.
ALGORITHMS:
This algorithm takes satellite data or product and then filters and divides them into segments and performs load-balancing algorithm.
The processing algorithm calculates results for different parameters against each incoming block and sends them to the next level. In step 1, the calculation of mean, SD, absolute difference, and the number of values, which are greater than the maximum threshold, are performed. Furthermore, in the next step, the results are transmitted to the aggregation server.
ACA collects the results from each processing servers against each Bi and then combines, organizes, and stores these results in RDBMS database.
CONCLUSION AND FUTURE:
In this paper, we proposed architecture for real-time Big Data analysis for remote sensing applications in the architecture efficiently processed and analyzed real-time and offline remote sensing Big Data for decision-making. The proposed architecture is composed of three major units, such as 1) RSDU; 2) DPU; and 3) DADU. These units implement algorithms for each level of the architecture depending on the required analysis. The architecture of real-time Big is generic (application independent) that is used for any type of remote sensing Big Data analysis. Furthermore, the capabilities of filtering, dividing, and parallel processing of only useful information are performed by discarding all other extra data. These processes make a better choice for real-time remote sensing Big Data analysis.
The algorithms proposed in this paper
for each unit and subunits are used to analyze remote sensing data sets, which
helps in better understanding of land and sea area. The proposed architecture
welcomes researchers and organizations for any type of remote sensory Big Data
analysis by developing algorithms for each level of the architecture depending
on their analysis requirement. For future work, we are planning to extend the
proposed architecture to make it compatible for Big Data analysis for all
applications, e.g., sensors and social networking. We are also planning to use
the proposed architecture to perform complex analysis on earth observatory data
for decision making at realtime, such as earthquake prediction, Tsunami
prediction, fire detection, etc.
REFERENCES:
[1] D. Agrawal, S. Das, and A. E. Abbadi, “Big Data and cloud computing: Current state and future opportunities,” in Proc. Int. Conf. Extending Database Technol. (EDBT), 2011, pp. 530–533.
[2] J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton, “Mad skills: New analysis practices for Big Data,” PVLDB, vol. 2, no. 2, pp. 1481–1492, 2009.
[3] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, 2008.
[4] H. Herodotou et al., “Starfish: A self-tuning system for Big Data analytics,” in Proc. 5th Int. Conf. Innovative Data Syst. Res. (CIDR), 2011, pp. 261–272.
[5] K. Michael and K. W. Miller, “Big Data: New opportunities and new challenges [guest editors’ introduction],” IEEE Comput., vol. 46, no. 6, pp. 22–24, Jun. 2013.
[6] C. Eaton, D. Deroos, T. Deutsch, G. Lapis, and P. C. Zikopoulos, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. New York, NY, USA: Mc Graw-Hill, 2012.
[7] R. D. Schneider, Hadoop for Dummies Special Edition. Hoboken, NJ, USA: Wiley, 2012.