Handbook of Big Data Analytics : Volume 2: Applications in ICT, security and business analytics / edited by Vadlamani Ravi and Aswani kumar Cherukuri

Contributor(s):

Material type: Text

TextPublication details: London, United Kingdom The Institute of Engineering and Technology 2021Edition: First EdDescription: xxxiv, 384p. : ill. ; 24cm; Volume-2ISBN:

9781839530593

Subject(s):

Big Data; Data Analytics

DDC classification:

005.7 RAV

Online resources:

Click here to access online

Contents:

Table of Contents: Front Matter Hide details p. (1) PDF 878.28Kb 1 Big data analytics for security intelligence Hide details p. 1 –19 (19) There is a tremendous increase in the frequency of cyberattacks due to the rapid growth of the Internet. These attacks can be prevented by many well-known cybersecurity solutions. However, many traditional solutions are becoming obsolete because of the impact of big data over networks. Hence, corporate research has shifted its focus on security analytics. The role of security analytics is to detect malicious and normal events in real time by assisting network managers in the investigation of real-time network streams. This technique is intended to enhance all traditional security approaches. The various challenges have to be addressed to investigate the potential of big data for information security. This chapter will focus on the major information security problems that can be solved by big data applications and outlines research directions for security intelligence by applying security analytics. This chapter presents a system called seabed, which facilitates efficient analytics on huge encrypted datasets. Besides, we will discuss a lightweight anomaly detection system (ADS) that is scalable in nature. The identified anomalies will aid us to provide better cybersecurity by examining the network behavior, identifying the attacks and protecting the critical infrastructures. Access Full Text 2 Zero attraction data selective adaptive filtering algorithm for big data applications Hide details p. 21 –35 (15) Big data sets are characterized by raw data with large dimension, noise accumulation, spurious correlation and heavy-tailed behavior along with measurement outliers. Therefore, a unified approach is inevitable to preprocess the raw data in order to improve the prediction accuracy. In this paper, we present updated data preprocessing framework based on adaptive filtering algorithm with data selection capability together with removal of outliers and noise. The l1 norm minimization along with a new update rule based on higher order statistics of error enables dimension reduction without any sacrifice in accuracy. Thus, the proposed zero attraction data selective least mean square (ZA-DS-LMS) algorithm can claim for the reduction in the computational cost with the improvement in convergence speed and steady-state mean square error (MSE) without any compromise in the accuracy. Simulations were performed in real and simulated big data to validate the performance improvement of the proposed algorithm in the context of preprocessing for big data analysis. Access Full Text 3 Secure routing in software defined networking and Internet of Things for big data Hide details p. 37 –72 (36) The Internet of Things (IoT) witnesses a rapid increase in the number of devices getting networked together, along with a surge in big data. IoT causes various challenges, such as quality of service (QoS) for differentiated IoT tasks, power-constrained devices, time-critical applications, and a heterogeneous wireless environment. The big data generated from the billions of IoT devices needs different levels of services, providing big data service based on user requirements which is one of the complex tasks in the network. Software-defined networking (SDN) facilitates interoperability among IoT devices and QoS to the differentiated IoT tasks. SDN introduces programmability in the network by the decoupling of the control plane and data plane. The decoupling of the control logic to a centralized SDN controller drastically reduces power spent on network control on the IoT devices. Also, SDN considerably reduces hardware investments. In this chapter, we focus on discussing the feasibility of using SDN in big data generated by IoT devices. We discuss the architecture of IoT, the relationship between IoT and big data, the arrival of SDN in IoT and big data, routing mechanism, and security aspects of SDN and IoT routing and the application of SDN to IoT. Access Full Text 4 Efficient ciphertext-policy attribute-based signcryption for secure big data storage in cloud Hide details p. 73 –101 (29) This chapter proposes an efficient ciphertext-policy attribute-based signcryption (ECP-ABSC) for big data storage in the cloud. The ECP-ABSC scheme reduces the required number of exponentiation operations during signcryption and outsources the inflated pairing computation during the designcryption process, which, in turn, reduces the computation overhead of data owner and user. Our scheme also provides flexible access control by giving data access rights to unlimited times or a fixed number of times based on the user. This flexible access control feature increases the applicability in commercial applications. Further, we prove the desired security requirements of our scheme that include data confidentiality, signcryptor privacy, and unforgeability in security analysis. The feasibility and practicality of our scheme are provided in performance evaluation. Access Full Text 5 Privacy-preserving techniques in big data Hide details p. 103 –126 (24) Big data is a collection of a massive volume of data from various sources like social networks, the Internet of Things (IoT), and business applications. As multiple parties are involved in these systems, there is an increased chance of privacy breaches in the big data domain. Preservation of privacy plays a crucial role in preventing sensitive information from being visible to others. This chapter gives insights on overview of big data, associated privacy challenges in different phases of the big data life cycle, and focus on various privacy-preserving techniques in big data. It also briefs about the privacy-preserving solutions in resource-constrained devices. Access Full Text 6 Big data and behaviour analytics Hide details p. 127 –143 (17) This chapter gives a deep insight into the field of big data and behaviour analytics, i.e., `How are customers using the product, if at all?' Giving answers to such questions, we move forward to discuss conceptual description of big data, importance of big data, and behaviour analytics in near future. Further, the tools required for processing and analysing of big data (or tracking every user's action digitally, or experiences of users with the products) are discussed. Next, the behaviour analysis along with its applications like watching movies on Netflix (or any video programme on web) and game playing is given in detail. Streaming, Sharing, Stealing, Pausing (i.e., needs and interests), etc. moments are being covered and analysed in the respective examples (or applications) by a deep learning tool. It also discusses various algorithms and techniques used for the big data and behaviour analytics with different examples. This chapter will also cover the future trends and needs to design the novel techniques for big data and behaviour analytics using machine learning and deep learning (as future research directions/some research gaps in current) so that the analysis can be done efficiently in terms of computation. Access Full Text 7 Analyzing events for traffic prediction on IoT data streams in a smart city scenario Hide details p. 145 –167 (23) We propose a framework for complex event processing (CEP) coupled with predictive analytics to predict simple events and complex events on Internet of Things (IoT) data streams. The data is consumed through a REST service containing the traffic data of around 2,000 locations in the city of Madrid, Spain. This prediction of complex events will help users in understanding the future state of road traffic and hence take meaningful decisions. For predicting events, we propose a framework that uses WSO2 Siddhi as CEP, along with InfluxDB as persistent storage. The data is consumed in the CEP with the help of a high-speed Apache Kafka messaging pipeline. This data is used to build predictive models inside the CEP that helps users to derive meaningful insights. However, in these event analytics engines, the events are created via rules that are triggered when the streaming data exceeds a certain threshold. The calculation of the “threshold” is utmost necessary as it acts as the means for the generation of simple events and complex events in an event analytics scenario. We have proposed a novel 2-fold approach for finding out the thresholds in such large datasets. We have taken the help of unsupervised learning to get the idea of thresholds. The first phase uses Node-RED and serverless computing to create the thresholds and then supply them back to the CEP for prediction. The machine learning models run on a cloud service, and the predictions or thresholds are returned back through REST services into the CEP. In the second phase, it not only creates the thresholds but also uses novel hypothesis testing techniques along with windowing mechanism on data streams to implement clustering and supply the result back into the CEP. This approach leverages on the usage of statistical techniques to understand the change in distribution of data. The changes in the data distributions trigger the retraining of the machine learning models, and the results are given back into the CEP for being used in an event generation scenario. We have also included a section in which we have incorporated a statistical analysis on the dataset used. Access Full Text 8 Gender-based classification on e-commerce big data Hide details p. 169 –196 (28) Existing classification techniques over e-commerce data are mainly based on the users' purchasing patterns. However, gender preferences significantly improve in recommending various products, targeting customers for branding products, providing customized suggestions to the users, etc. We explain three methods for gender-based classification. All the methods are two-phased in which the features are extracted in the first phase. Classification of gender is done in the second phase based on the features identified in the first phase. The first technique exploits the hierarchical relationships among products and purchasing patterns. In the first phase, dimensionality is reduced from data by identifying the features that well describe the browsing pattern of the users. The second phase uses these features to classify gender. The second technique extracts both basic and advanced features. It uses the random forest to classify the data based on features identified. The third approach extracts behavioral and temporal features along with product features, and classification is done using gradient-boosted trees. Experiments were also conducted on the state-of-the-art classification algorithms. Access Full Text 9 On recommender systems with big data Hide details p. 197 –228 (32) This chapter introduces taxonomy of recommender systems (RSs) in the context of big data and covers the traditional RSs, namely, collaborative filtering (CF)-based and content-based methods along with the state-of-the-art RSs. We also present a detailed study on (a) state of the art methodologies, (b) issues and challenges such as cold start, scalability and sparsity, (c) similarity measures and methodologies, (d) evaluation metrics and (e) popular experimental datasets. This survey explores the breadth in the field of RSs with a focus on big data to the extent possible and tries to summarize them. Access Full Text 10 Analytics in e-commerce at scale Hide details p. 229 –240 (12) This article focuses on the challenges of architecting distributed and analytical systems for scale, to handle more than a billion online visits per month in India's largest e-commerce company, Flipkart.* The article explains how Flipkart evolved its technology, functions and analytics over time, with Internet users increasingly resorting to the e-commerce market year-on-year. The article then goes into detail of how Flipkart Data Platform (FDP), the big data platform in Flipkart handles the ingestion, storage, processing, analytics, queries, and reporting of the petabytes of data every day, and of how the Data Sciences and machine learning (ML) department provide inferences that power various business workflows in the Flipkart systems. The article then concludes with a summary of how data processing has evolved over the years in Flipkart ensuring that the engineered information is as close to accurate as it should be. Access Full Text 11 Big data regression via parallelized radial basis function neural network in Apache Spark Hide details p. 241 –250 (10) Among many versatile neural network architectures, radial basis function neural network (RBFNN) is one that is used to address classification and regression problems. The supervised and unsupervised parts of the RBFNN are not able to handle a large volume of data. The proposed work overcomes this drawback; it has implemented a parallel distributed version of RBFNN implemented with Apache Spark. Henceforth, we will refer to this version of RBFNN as PRBFNN. Incorporating K-means|| or parallel bisecting K-means in-between the input and the hidden layer, and the employment of parallel least square estimation (LSE) with an outer product of matrices is a novel contribution in the work. The PRBFNN employed Gaussian as well as logistic activation functions (AFs) in the hidden layer for nonlinear transformation. The efficacy of the PRBFNN was analyzed with two real-world datasets under 10-fold cross-validation (10-FCV) setup for a regression problem. Here, we want to make a point of presenting the PRBFNN to handle regression in the big data paradigm. Access Full Text 12 Visual sentiment analysis of bank customer complaints using parallel self-organizing maps Hide details p. 251 –271 (21) Social media has reinforced consumer power, allowing customers to obtain more and more information about businesses and products, voice their opinions as well as convey their grievances. In this article, we introduce a descriptive analytics system for visual sentiment analysis of customer complaints using the self-organizing feature map (SOM). The network eventually learns the underlying classification of grievances that can then be visualized using different methods too. Executives of analytical customer relationship management (ACRM) will derive valuable business insights from the maps and enforce prompt remedial measures. We also propose a high-performance version of the CUDASOM (Compute Unified Device Architecture (CUDA)-based self-organizing function Map) algorithm implemented using NVIDIA®'s parallel computing platform, CUDA, which accelerates the processing of high-dimensional text data and produces fast results. The effectiveness of the proposed model has been demonstrated on a dataset of customer complaints about the products and services of four leading Indian banks. CUDASOM recorded an average speedup of 44 times. Our technique can expand studies into smart grievance redressal systems to provide the complaining consumers with quick solutions. Access Full Text 13 Wavelet neural network for big data analytics in banking via GPU Hide details p. 273 –284 (12) Big data is hard to process using conventional technologies and hence calls for massively parallel processing. Machine learning techniques have been widely adopted in several massive and complex data-intensive fields for handling large data. Artificial neural networks (ANNs) are the most common machine learning techniques used for classification, function approximation, dimensionality reduction etc. Wavelet neural network (WNN) is one of them. The architecture of WNN is having a lot of matrix computations, which can be parallelized by GPU. Theano is used as a programming model for accelerating general-purpose workloads. In our work, we implemented the WNN using Theano. The efficacy of WNN is tested on various bank datasets. In this process, the performance of a conventional CPU implementation of WNN was tested with that of GPU, and the latter was found to be much faster on all datasets. Access Full Text 14 Stock market movement prediction using evolving spiking neural networks Hide details p. 285 –312 (28) Stock price movement direction prediction is regarded as one of the most difficult and challenging tasks in the real world. An accurate prediction can yield profit to the investors and protect them from financial risk. In this study, we propose three variants of the evolving spiking neural networks (eSNNs) for the stock trend prediction: eSNN model using technical stock indicators (SIs) as input variables - SI-eSNN; a parallel implementation of the SI-eSNN model on a GPU machine - Compute Unified Device Architecture (CUDA)-SI-eSNN; and finally, a model for incremental learning using a sliding window (SW) of data - SW-eSNN. We also propose logistic distribution in place of Gaussian distribution in characterizing the receptive fields. The models are applied on nine large-scale benchmark stock indices of different countries. We considered classification accuracy and area under ROC curve (AUC) to measure the performance of the models in terms of predicting UP or DOWN movements on their daily prices. Our experimental results show that the eSNN model and its parallel implementation achieve a high accuracy of 80%-90% of predicting stock movements from 1 day to 1 month ahead forecasts, which is a significant improvement when compared with the use of traditional AI methods. Also, the performance of eSNN model with logistic receptive fields is compared with that of a deep learning architecture: long short-term memory (LSTM) network. The comparative analysis shows that the proposed model outperformed LSTM in the case of four stock indices out of nine in terms of accuracy. The CUDA-SI-eSNN model performs three to five times faster than the sequential eSNN model. The window-based eSNN model, SW-eSNN, achieved around 75% accuracy on average across all indexes. As eSNN can be considered as one specific implementation of a comprehensive SNN architecture called NeuCube, future development of improved models is also discussed. The results are promising for a future development of automated trading systems. Access Full Text 15 Parallel hierarchical clustering of big text corpora Hide details p. 313 –342 (30) Clustering is a technique that facilitates unsupervised learning of patterns or groups among entities in any application domain. Typically, clustering algorithms are iterative in nature and are designed to operate on huge application datasets. This results in prohibitively excessive demand on computational and storage resources that cannot be met using a single workstation processor. Keeping in mind the advances in parallel processing systems like multicore systems, distributed cluster computers, graphic processing units and programming platforms like Open Multi-Processing (OpenMP), Hadoop, Compute Unified Device Architecture (CUDA) and Message Passing Interface (MPI), it is imperative to design algorithms for tackling the compute and storage demands by applications involving huge datasets. In the recent past, much seminal advancement has been made in the field of parallel clustering algorithms. In this article we highlight seminal research attempts and attempt to weave a timeline illustrating the developments in this happening field with an inference of the pros and cons of each of the proposed algorithms and some open research problems. This is likely to be useful for newbie researchers who would want to pursue their research in designing and analyzing parallel algorithms for clustering. We begin by emphasizing the significance of designing parallel algorithms for the hierarchical clustering of large-scale text collections. Some of the key challenges involved and the related seminal research attempts are presented in this article with a subsequent enlisting of the open research problems. Access Full Text 16 Contract-driven financial reporting: building automated analytics pipelines with algorithmic contracts, Big Data and Distributed Ledger technology Hide details p. 343 –369 (27) Future regulatory reporting should be automated to make it more efficient. Moreover, automation enables the supervising authorities to effectively oversee and identify risks of individual financial institutions and the entire financial market. During the last years, we have developed new technologies that are important to reach this goal. These technologies include (i) a suitable standardized representation of financial contracts, (ii) a standardized way of carrying out financial analytics, (iii) Big Data technology required to process hundreds of millions of financial contracts and (iv) Distributed Ledger and Smart Contract technology to create a secure layer for automated reporting. In this work, we provide an overview of these technological elements that are required to reach an earlier established vision of future financial risk reporting. Access Full Text access icon free Back Matter

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	Status	Notes	Date due	Barcode
Reference Book	VIT-AP General Stacks	005.7 RAV (Browse shelf(Opens below))	Not for loan	CSE		021084

About the Book:
Big Data analytics is the complex process of examining big data to uncover information such as correlations, hidden patterns, trends and user and customer preferences, to allow organizations and businesses to make more informed decisions. These methods and technologies have become ubiquitous in all fields of science, engineering, business and management due to the rise of data-driven models as well as data engineering developments using parallel and distributed computational analytics frameworks, data and algorithm parallelization, and GPGPU programming. However, there remain potential issues that need to be addressed to enable big data processing and analytics in real time. In the first volume of this comprehensive two-volume handbook, the authors present several methodologies to support Big Data analytics including database management, processing frameworks and architectures, data lakes, query optimization strategies, towards real-time data processing, data stream analytics, Fog and Edge computing, and Artificial Intelligence and Big Data. The second volume is dedicated to a wide range of applications in secure data storage, privacy-preserving, Software Defined Networks (SDN), Internet of Things (IoTs), behaviour analytics, traffic predictions, gender based classification on e-commerce data, recommender systems, Big Data regression with Apache Spark, visual sentiment analysis, wavelet Neural Network via GPU, stock market movement predictions, and financial reporting. The two-volume work is aimed at providing a unique platform for researchers, engineers, developers, educators and advanced students in the field of Big Data analytics.

It includes about the editors, about the contributors, foreword, Preface, Acknowledgements, and Index Pages etc..

Table of Contents:

Front Matter

Hide details
p. (1)
PDF
878.28Kb

1 Big data analytics for security intelligence

Hide details
p. 1 –19 (19)

There is a tremendous increase in the frequency of cyberattacks due to the rapid growth of the Internet. These attacks can be prevented by many well-known cybersecurity solutions. However, many traditional solutions are becoming obsolete because of the impact of big data over networks. Hence, corporate research has shifted its focus on security analytics. The role of security analytics is to detect malicious and normal events in real time by assisting network managers in the investigation of real-time network streams. This technique is intended to enhance all traditional security approaches. The various challenges have to be addressed to investigate the potential of big data for information security. This chapter will focus on the major information security problems that can be solved by big data applications and outlines research directions for security intelligence by applying security analytics. This chapter presents a system called seabed, which facilitates efficient analytics on huge encrypted datasets. Besides, we will discuss a lightweight anomaly detection system (ADS) that is scalable in nature. The identified anomalies will aid us to provide better cybersecurity by examining the network behavior, identifying the attacks and protecting the critical infrastructures.
Access Full Text

2 Zero attraction data selective adaptive filtering algorithm for big data applications

Hide details
p. 21 –35 (15)

Big data sets are characterized by raw data with large dimension, noise accumulation, spurious correlation and heavy-tailed behavior along with measurement outliers. Therefore, a unified approach is inevitable to preprocess the raw data in order to improve the prediction accuracy. In this paper, we present updated data preprocessing framework based on adaptive filtering algorithm with data selection capability together with removal of outliers and noise. The l1 norm minimization along with a new update rule based on higher order statistics of error enables dimension reduction without any sacrifice in accuracy. Thus, the proposed zero attraction data selective least mean square (ZA-DS-LMS) algorithm can claim for the reduction in the computational cost with the improvement in convergence speed and steady-state mean square error (MSE) without any compromise in the accuracy. Simulations were performed in real and simulated big data to validate the performance improvement of the proposed algorithm in the context of preprocessing for big data analysis.
Access Full Text

3 Secure routing in software defined networking and Internet of Things for big data

Hide details
p. 37 –72 (36)

The Internet of Things (IoT) witnesses a rapid increase in the number of devices getting networked together, along with a surge in big data. IoT causes various challenges, such as quality of service (QoS) for differentiated IoT tasks, power-constrained devices, time-critical applications, and a heterogeneous wireless environment. The big data generated from the billions of IoT devices needs different levels of services, providing big data service based on user requirements which is one of the complex tasks in the network. Software-defined networking (SDN) facilitates interoperability among IoT devices and QoS to the differentiated IoT tasks. SDN introduces programmability in the network by the decoupling of the control plane and data plane. The decoupling of the control logic to a centralized SDN controller drastically reduces power spent on network control on the IoT devices. Also, SDN considerably reduces hardware investments. In this chapter, we focus on discussing the feasibility of using SDN in big data generated by IoT devices. We discuss the architecture of IoT, the relationship between IoT and big data, the arrival of SDN in IoT and big data, routing mechanism, and security aspects of SDN and IoT routing and the application of SDN to IoT.
Access Full Text

4 Efficient ciphertext-policy attribute-based signcryption for secure big data storage in cloud

Hide details
p. 73 –101 (29)

This chapter proposes an efficient ciphertext-policy attribute-based signcryption (ECP-ABSC) for big data storage in the cloud. The ECP-ABSC scheme reduces the required number of exponentiation operations during signcryption and outsources the inflated pairing computation during the designcryption process, which, in turn, reduces the computation overhead of data owner and user. Our scheme also provides flexible access control by giving data access rights to unlimited times or a fixed number of times based on the user. This flexible access control feature increases the applicability in commercial applications. Further, we prove the desired security requirements of our scheme that include data confidentiality, signcryptor privacy, and unforgeability in security analysis. The feasibility and practicality of our scheme are provided in performance evaluation.
Access Full Text

5 Privacy-preserving techniques in big data

Hide details
p. 103 –126 (24)

Big data is a collection of a massive volume of data from various sources like social networks, the Internet of Things (IoT), and business applications. As multiple parties are involved in these systems, there is an increased chance of privacy breaches in the big data domain. Preservation of privacy plays a crucial role in preventing sensitive information from being visible to others. This chapter gives insights on overview of big data, associated privacy challenges in different phases of the big data life cycle, and focus on various privacy-preserving techniques in big data. It also briefs about the privacy-preserving solutions in resource-constrained devices.
Access Full Text

6 Big data and behaviour analytics

Hide details
p. 127 –143 (17)

This chapter gives a deep insight into the field of big data and behaviour analytics, i.e., `How are customers using the product, if at all?' Giving answers to such questions, we move forward to discuss conceptual description of big data, importance of big data, and behaviour analytics in near future. Further, the tools required for processing and analysing of big data (or tracking every user's action digitally, or experiences of users with the products) are discussed. Next, the behaviour analysis along with its applications like watching movies on Netflix (or any video programme on web) and game playing is given in detail. Streaming, Sharing, Stealing, Pausing (i.e., needs and interests), etc. moments are being covered and analysed in the respective examples (or applications) by a deep learning tool. It also discusses various algorithms and techniques used for the big data and behaviour analytics with different examples. This chapter will also cover the future trends and needs to design the novel techniques for big data and behaviour analytics using machine learning and deep learning (as future research directions/some research gaps in current) so that the analysis can be done efficiently in terms of computation.
Access Full Text

7 Analyzing events for traffic prediction on IoT data streams in a smart city scenario

Hide details
p. 145 –167 (23)

We propose a framework for complex event processing (CEP) coupled with predictive analytics to predict simple events and complex events on Internet of Things (IoT) data streams. The data is consumed through a REST service containing the traffic data of around 2,000 locations in the city of Madrid, Spain. This prediction of complex events will help users in understanding the future state of road traffic and hence take meaningful decisions. For predicting events, we propose a framework that uses WSO2 Siddhi as CEP, along with InfluxDB as persistent storage. The data is consumed in the CEP with the help of a high-speed Apache Kafka messaging pipeline. This data is used to build predictive models inside the CEP that helps users to derive meaningful insights. However, in these event analytics engines, the events are created via rules that are triggered when the streaming data exceeds a certain threshold. The calculation of the “threshold” is utmost necessary as it acts as the means for the generation of simple events and complex events in an event analytics scenario. We have proposed a novel 2-fold approach for finding out the thresholds in such large datasets. We have taken the help of unsupervised learning to get the idea of thresholds. The first phase uses Node-RED and serverless computing to create the thresholds and then supply them back to the CEP for prediction. The machine learning models run on a cloud service, and the predictions or thresholds are returned back through REST services into the CEP. In the second phase, it not only creates the thresholds but also uses novel hypothesis testing techniques along with windowing mechanism on data streams to implement clustering and supply the result back into the CEP. This approach leverages on the usage of statistical techniques to understand the change in distribution of data. The changes in the data distributions trigger the retraining of the machine learning models, and the results are given back into the CEP for being used in an event generation scenario. We have also included a section in which we have incorporated a statistical analysis on the dataset used.
Access Full Text

8 Gender-based classification on e-commerce big data

Hide details
p. 169 –196 (28)

Existing classification techniques over e-commerce data are mainly based on the users' purchasing patterns. However, gender preferences significantly improve in recommending various products, targeting customers for branding products, providing customized suggestions to the users, etc. We explain three methods for gender-based classification. All the methods are two-phased in which the features are extracted in the first phase. Classification of gender is done in the second phase based on the features identified in the first phase. The first technique exploits the hierarchical relationships among products and purchasing patterns. In the first phase, dimensionality is reduced from data by identifying the features that well describe the browsing pattern of the users. The second phase uses these features to classify gender. The second technique extracts both basic and advanced features. It uses the random forest to classify the data based on features identified. The third approach extracts behavioral and temporal features along with product features, and classification is done using gradient-boosted trees. Experiments were also conducted on the state-of-the-art classification algorithms.
Access Full Text

9 On recommender systems with big data

Hide details
p. 197 –228 (32)

This chapter introduces taxonomy of recommender systems (RSs) in the context of big data and covers the traditional RSs, namely, collaborative filtering (CF)-based and content-based methods along with the state-of-the-art RSs. We also present a detailed study on (a) state of the art methodologies, (b) issues and challenges such as cold start, scalability and sparsity, (c) similarity measures and methodologies, (d) evaluation metrics and (e) popular experimental datasets. This survey explores the breadth in the field of RSs with a focus on big data to the extent possible and tries to summarize them.
Access Full Text

10 Analytics in e-commerce at scale

Hide details
p. 229 –240 (12)

This article focuses on the challenges of architecting distributed and analytical systems for scale, to handle more than a billion online visits per month in India's largest e-commerce company, Flipkart.* The article explains how Flipkart evolved its technology, functions and analytics over time, with Internet users increasingly resorting to the e-commerce market year-on-year. The article then goes into detail of how Flipkart Data Platform (FDP), the big data platform in Flipkart handles the ingestion, storage, processing, analytics, queries, and reporting of the petabytes of data every day, and of how the Data Sciences and machine learning (ML) department provide inferences that power various business workflows in the Flipkart systems. The article then concludes with a summary of how data processing has evolved over the years in Flipkart ensuring that the engineered information is as close to accurate as it should be.
Access Full Text

11 Big data regression via parallelized radial basis function neural network in Apache Spark

Hide details
p. 241 –250 (10)

Among many versatile neural network architectures, radial basis function neural network (RBFNN) is one that is used to address classification and regression problems. The supervised and unsupervised parts of the RBFNN are not able to handle a large volume of data. The proposed work overcomes this drawback; it has implemented a parallel distributed version of RBFNN implemented with Apache Spark. Henceforth, we will refer to this version of RBFNN as PRBFNN. Incorporating K-means|| or parallel bisecting K-means in-between the input and the hidden layer, and the employment of parallel least square estimation (LSE) with an outer product of matrices is a novel contribution in the work. The PRBFNN employed Gaussian as well as logistic activation functions (AFs) in the hidden layer for nonlinear transformation. The efficacy of the PRBFNN was analyzed with two real-world datasets under 10-fold cross-validation (10-FCV) setup for a regression problem. Here, we want to make a point of presenting the PRBFNN to handle regression in the big data paradigm.
Access Full Text

12 Visual sentiment analysis of bank customer complaints using parallel self-organizing maps

Hide details
p. 251 –271 (21)

Social media has reinforced consumer power, allowing customers to obtain more and more information about businesses and products, voice their opinions as well as convey their grievances. In this article, we introduce a descriptive analytics system for visual sentiment analysis of customer complaints using the self-organizing feature map (SOM). The network eventually learns the underlying classification of grievances that can then be visualized using different methods too. Executives of analytical customer relationship management (ACRM) will derive valuable business insights from the maps and enforce prompt remedial measures. We also propose a high-performance version of the CUDASOM (Compute Unified Device Architecture (CUDA)-based self-organizing function Map) algorithm implemented using NVIDIA®'s parallel computing platform, CUDA, which accelerates the processing of high-dimensional text data and produces fast results. The effectiveness of the proposed model has been demonstrated on a dataset of customer complaints about the products and services of four leading Indian banks. CUDASOM recorded an average speedup of 44 times. Our technique can expand studies into smart grievance redressal systems to provide the complaining consumers with quick solutions.
Access Full Text

13 Wavelet neural network for big data analytics in banking via GPU

Hide details
p. 273 –284 (12)

Big data is hard to process using conventional technologies and hence calls for massively parallel processing. Machine learning techniques have been widely adopted in several massive and complex data-intensive fields for handling large data. Artificial neural networks (ANNs) are the most common machine learning techniques used for classification, function approximation, dimensionality reduction etc. Wavelet neural network (WNN) is one of them. The architecture of WNN is having a lot of matrix computations, which can be parallelized by GPU. Theano is used as a programming model for accelerating general-purpose workloads. In our work, we implemented the WNN using Theano. The efficacy of WNN is tested on various bank datasets. In this process, the performance of a conventional CPU implementation of WNN was tested with that of GPU, and the latter was found to be much faster on all datasets.
Access Full Text

14 Stock market movement prediction using evolving spiking neural networks

Hide details
p. 285 –312 (28)

Stock price movement direction prediction is regarded as one of the most difficult and challenging tasks in the real world. An accurate prediction can yield profit to the investors and protect them from financial risk. In this study, we propose three variants of the evolving spiking neural networks (eSNNs) for the stock trend prediction: eSNN model using technical stock indicators (SIs) as input variables - SI-eSNN; a parallel implementation of the SI-eSNN model on a GPU machine - Compute Unified Device Architecture (CUDA)-SI-eSNN; and finally, a model for incremental learning using a sliding window (SW) of data - SW-eSNN. We also propose logistic distribution in place of Gaussian distribution in characterizing the receptive fields. The models are applied on nine large-scale benchmark stock indices of different countries. We considered classification accuracy and area under ROC curve (AUC) to measure the performance of the models in terms of predicting UP or DOWN movements on their daily prices. Our experimental results show that the eSNN model and its parallel implementation achieve a high accuracy of 80%-90% of predicting stock movements from 1 day to 1 month ahead forecasts, which is a significant improvement when compared with the use of traditional AI methods. Also, the performance of eSNN model with logistic receptive fields is compared with that of a deep learning architecture: long short-term memory (LSTM) network. The comparative analysis shows that the proposed model outperformed LSTM in the case of four stock indices out of nine in terms of accuracy. The CUDA-SI-eSNN model performs three to five times faster than the sequential eSNN model. The window-based eSNN model, SW-eSNN, achieved around 75% accuracy on average across all indexes. As eSNN can be considered as one specific implementation of a comprehensive SNN architecture called NeuCube, future development of improved models is also discussed. The results are promising for a future development of automated trading systems.
Access Full Text

15 Parallel hierarchical clustering of big text corpora

Hide details
p. 313 –342 (30)

Clustering is a technique that facilitates unsupervised learning of patterns or groups among entities in any application domain. Typically, clustering algorithms are iterative in nature and are designed to operate on huge application datasets. This results in prohibitively excessive demand on computational and storage resources that cannot be met using a single workstation processor. Keeping in mind the advances in parallel processing systems like multicore systems, distributed cluster computers, graphic processing units and programming platforms like Open Multi-Processing (OpenMP), Hadoop, Compute Unified Device Architecture (CUDA) and Message Passing Interface (MPI), it is imperative to design algorithms for tackling the compute and storage demands by applications involving huge datasets. In the recent past, much seminal advancement has been made in the field of parallel clustering algorithms. In this article we highlight seminal research attempts and attempt to weave a timeline illustrating the developments in this happening field with an inference of the pros and cons of each of the proposed algorithms and some open research problems. This is likely to be useful for newbie researchers who would want to pursue their research in designing and analyzing parallel algorithms for clustering. We begin by emphasizing the significance of designing parallel algorithms for the hierarchical clustering of large-scale text collections. Some of the key challenges involved and the related seminal research attempts are presented in this article with a subsequent enlisting of the open research problems.
Access Full Text

16 Contract-driven financial reporting: building automated analytics pipelines with algorithmic contracts, Big Data and Distributed Ledger technology

Hide details
p. 343 –369 (27)

Future regulatory reporting should be automated to make it more efficient. Moreover, automation enables the supervising authorities to effectively oversee and identify risks of individual financial institutions and the entire financial market. During the last years, we have developed new technologies that are important to reach this goal. These technologies include (i) a suitable standardized representation of financial contracts, (ii) a standardized way of carrying out financial analytics, (iii) Big Data technology required to process hundreds of millions of financial contracts and (iv) Distributed Ledger and Smart Contract technology to create a secure layer for automated reporting. In this work, we provide an overview of these technological elements that are required to reach an earlier established vision of future financial risk reporting.
Access Full Text

access icon free Back Matter

There are no comments on this title.

to post a comment.

Handbook of Big Data Analytics : Volume 2: Applications in ICT, security and business analytics / edited by Vadlamani Ravi and Aswani kumar Cherukuri

Visitor Number: