Handbook of Big Data Analytics : (Record no. 45537)

MARC details
000 -LEADER
fixed length control field 14967nam a22002537a 4500
003 - CONTROL NUMBER IDENTIFIER
control field VITAP
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20230714165007.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 230714b ||||| |||| 00| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781839530647
040 ## - CATALOGING SOURCE
Transcribing agency VITAP
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Edition number
Classification number 005.7 RAV
245 ## - TITLE STATEMENT
Title Handbook of Big Data Analytics :
Remainder of title Volume 1: Methodologies /
Statement of responsibility, etc. edited by Vadlamani Ravi and Aswani kumar Cherukuri
250 ## - EDITION STATEMENT
Edition statement First Ed.
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc. London, United Kingdom
Name of publisher, distributor, etc. The Institute of Engineering and Technology
Date of publication, distribution, etc. 2021
300 ## - PHYSICAL DESCRIPTION
Extent xxx, 359p. : ill. ; 24cm; Volume-1
500 ## - GENERAL NOTE
General note About the Book:<br/>Big Data analytics is the complex process of examining big data to uncover information such as correlations, hidden patterns, trends and user and customer preferences, to allow organizations and businesses to make more informed decisions. These methods and technologies have become ubiquitous in all fields of science, engineering, business and management due to the rise of data-driven models as well as data engineering developments using parallel and distributed computational analytics frameworks, data and algorithm parallelization, and GPGPU programming. However, there remain potential issues that need to be addressed to enable big data processing and analytics in real time. In the first volume of this comprehensive two-volume handbook, the authors present several methodologies to support Big Data analytics including database management, processing frameworks and architectures, data lakes, query optimization strategies, towards real-time data processing, data stream analytics, Fog and Edge computing, and Artificial Intelligence and Big Data. The second volume is dedicated to a wide range of applications in secure data storage, privacy-preserving, Software Defined Networks (SDN), Internet of Things (IoTs), behaviour analytics, traffic predictions, gender based classification on e-commerce data, recommender systems, Big Data regression with Apache Spark, visual sentiment analysis, wavelet Neural Network via GPU, stock market movement predictions, and financial reporting. The two-volume work is aimed at providing a unique platform for researchers, engineers, developers, educators and advanced students in the field of Big Data analytics.
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc. note It includes about the editors, about the contributors, foreword, Preface, Acknowledgements, and Index Pages etc..
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Table of Contents:<br/><br/> Front Matter<br/> Show details<br/> 1 The impact of Big Data on databases<br/> Hide details<br/> p. 1 –36 (36)<br/><br/> The last decade, from the point of view of information management, is characterized by an exponential generation of data. In any interaction that is carried out by digital means, data is generated. Some popular examples are social networks on the Internet, mobile device apps, commercial transactions through online banking, the history of a user's browsing through the network, geolocation information generated by a user's mobile, etc. In general, all this information is stored by the companies or institutions with which the interaction is maintained (unless the user has expressly indicated that it cannot be stored).<br/> 2 Big data processing frameworks and architectures: a survey<br/> Hide details<br/> p. 37 –104 (68)<br/><br/> In recent times, there has been rapid growth in data generated from autonomous sources. The existing data processing techniques are not suitable to deal with these large volumes of complex data that can be structured, semi-structured or unstructured. This large data is referred to as Big data because of its main characteristics: volume, variety velocity, value and veracity. Extensive research on Big data is ongoing, and the primary focus of this research is on processing massive amounts of data effectively and efficiently. However, researchers are paying little attention on how to store and analyze the large volumes of data to get useful insights from it. In this chapter, the authors examine existing Big data processing frameworks like MapReduce, Apache Spark, Storm and Flink. In this chapter, the architectures of MapReduce, iterative MapReduce frameworks and components of Apache Spark are discussed in detail. Most of the widely used classical machine learning techniques are implemented using these Big data frameworks in the form of Apache Mahout and Spark MLlib libraries and these need to be enhanced to support all existing machine learning techniques like formal concept analysis (FCA) and neural embedding. In this chapter, authors have taken FCA as an application and provided scalable FCA algorithms using the Big data processing frameworks like MapReduce and Spark. Streaming data processing frameworks like Apache Flink and Apache Storm is also examined. Authors also discuss about the storage architectures like Hadoop Distributed File System (HDFS), Dynamo and Amazon S3 in detail while processing large Big data applications. The survey concludes with a proposal for best practices related to the studied architectures and frameworks.<br/> 3 The role of data lake in big data analytics: recent developments and challenges<br/> Hide details<br/> p. 105 –123 (19)<br/><br/> We explore the concept of a data lake (DL), big data fabric, DL architecture and various layers of a DL. We also present various components of each of the layers that exist in a DL. We compare and contrast the notion of data warehouses and DLs concerning some key characteristics. Moreover, we explore various commercial- and open-source-based DLs with their strengths and limitations. Also, we discuss some of the key best practices for DLs. Further, we present two case studies of DLs: Lumada data lake (LDL) and Temenos data lake (TDL) for digital banking. Finally, we explore some of the crucial challenges that are facing in the formation of DLs.<br/> 4 Query optimization strategies for big data<br/> Hide details<br/> p. 125 –155 (31)<br/><br/> Query optimization for big data architectures like MapReduce, Spark, and Druid is challenging due to the numerosity of the algorithmic issues to be addressed. Conventional algorithmic design issues like memory, CPU time, IO cost should be analyzed in the context of additional parameters such as communication cost. The issue of data resident skew further complicates the analysis. This chapter studies the communication cost reduction strategies for conventional workloads such as joins, spatial queries, and graph queries. We review the algorithms for multi-way join using MapReduce. Multi-way θ-join algorithms address the multi-way join with inequality conditions. As θ-join output is much higher compared to the output of equi join, multi-way θ-join further poses difficulties for the analysis. An analysis of multi-way θ-join is presented on the basis of sizes of input sets, output sets as well as the communication cost. Data resident skew plays a key role in all the scenarios discussed. Addressing the skew in a general sense is discussed. Partitioning strategies that minimize the impact of skew on the skew in loads of computing nodes are also further presented. Application of join strategies for the spatial data has dragged the interest of researchers, and distribution of spatial join requires special emphasis for dealing with the spatial nature of the dataset. A controlled replicate strategy is reviewed to solve the problem of multi-way spatial join. Graph-based analytical queries such as triangle counting and subgraph enumeration in the context of distributed processing are presented. Being a primitive needed for many graph queries, triangle counting has been analyzed from the perspective of skew it brings using an elegant distribution scheme. Subgraph enumeration problem is also presented using various partitioning schemes and a brief analysis of their performance.<br/> 5 Toward real-time data processing: an advanced approach in big data analytics<br/> Hide details<br/> p. 157 –174 (18)<br/><br/> Nowadays, a huge quantity of data are produced by means of multiple data sources. The existing tools and techniques are not capable of handling such voluminous data produced from a variety of sources. This continuous and varied generation of data requires advanced technologies for processing and storage, which seems to be a big challenge for data scientists. Some research studies are well defined in the area of streaming in big data. Streaming data are the real-time data or data in motion such as stock market data, sensor data, GPS data and twitter data. In stream processing, the data are not stored in databases instead it is processed and analyzed on the fly to get the value as soon as they are generated. There are a number of streaming frameworks proposed till date for big data applications that are used to pile up, evaluate and process the data that are generated and captured continuously. In this chapter, we provide an in-depth summary of various big data streaming approaches like Apache Storm, Apache Hive and Apache Samza. We also presented a comparative study regarding these streaming platforms.<br/> 6 A survey on data stream analytics<br/> Hide details<br/> p. 175 –208 (34)<br/><br/> With the exponential expansion of the interconnected world, we have large volume, variety and velocity of the data flowing through the systems. The dependencies on these systems have crossed the threshold of business value, and now such communications have started to be classified as essential systems. As such, these systems have become vital social infrastructure that needs all of prediction, monitoring, safe guard and immediate decision-making in case of threats. The key enabler is data stream analytics (DSA). In DSA, the key areas of stream processing constitute prediction and forecasting, classification, clustering, mining frequent patterns and finding frequent item sets (FISs), detecting concept drift, building synopsis structures to answer standing and ad hoc queries, sampling and loadshedding in the case of bursts of data and processing data streams emanating from a very large number of interconnected devices typical for Internet-of-Things (IoT). The processing complexity is impacted by the multidimensionality of the stream data objects, building `forgetting' as a key construct in the processing, leveraging the time-series aspect to aid the processing and so on. In this chapter, we explore some of the aforementioned areas and provide a survey in each of these selected areas. We also provide a survey on the data stream processing systems (DSPSs) and frameworks that are being adopted by the industry at large.<br/> 7 Architectures of big data analytics: scaling out data mining algorithms using Hadoop–MapReduce and Spark<br/> Hide details<br/> p. 209 –296 (88)<br/><br/> Many statistical and machine learning (ML) techniques have been successfully applied to small-sized datasets during the past one and half decades. However, in today's world, different application domains, viz., healthcare, finance, bioinformatics, telecommunications, and meteorology, generate huge volumes of data on a daily basis. All these massive datasets have to be analyzed for discovering hidden insights. With the advent of big data analytics (BDA) paradigm, the data mining (DM) techniques were modified and scaled out to adapt to the distributed and parallel environment. This chapter reviewed 249 articles appeared between 2009 and 2019, which implemented different DM techniques in a parallel, distributed manner in the Apache Hadoop MapReduce framework or Apache Spark environment for solving various DM tasks. We present some critical analyses of these papers and bring out some interesting insights. We have found that methods like Apriori, support vector machine (SVM), random forest (RF), K-means and many variants of the previous along with many other approaches are made into parallel distributed environment and produced scalable and effective insights out of it. This review is concluded with a discussion of some open areas of research with future directions, which can be explored further by the researchers and practitioners alike.<br/> 8 A review of fog and edge computing with big data analytics<br/> Hide details<br/> p. 297 –316 (20)<br/><br/> In this review, we present and explore the cloud computing offloading strategies with fog and edge computing that has been accepted in recent years. It reflects a noticeable improvement in the information collection, transmission as well as the management of data in the field for computer consumers.This review also focuses on how various computing paradigms applied with fog and edge computing environment are used for realising recently emerging IoT applications and cyber security threats.<br/> 9 Fog computing framework for Big Data processing using cluster management in a resource-constraint environment<br/> Hide details<br/> p. 317 –334 (18)<br/><br/> This article presents the implementation details related to the distributed storage and processing of big datasets in fog computing cluster environment. The implementation details of fog computing framework using Apache Spark for big data applications in a resource-constrained environment are given. The results related to Big Data processing, modeling, and prediction in a resource-constraint fog computing framework are presented by considering the evaluation of case studies using the e-commerce customer dataset and bank loan credit risk datasets.<br/> 10 Role of artificial intelligence and big data in accelerating accessibility for persons with disabilities<br/> Hide details<br/> p. 335 –343 (9)<br/><br/> Artificial intelligence (AI) and big data have emerged into mainstream tools from being niche tools in the recent past. These technological improvements have changed the manner in which software tools are designed and have provided unprecedented benefits to the users. This article analyses the impact of both of these technologies through the lens of accessibility computing which is a sub-domain of human- computer interaction. The rationales for incorporating accessibility for persons with disabilities in the digital ecosystem are illustrated. This article proposes a key term `perception porting' which is aimed towards converting of data suitable for one sense through another with the help of AI and big data. The specific tools and techniques that are available to assist persons with specific disabilities such as smart vision, smart exoskeletons, captioning techniques and Internet of Things-based solutions are explored.<br/> Back Matter
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
9 (RLIN) 13131
Topical term or geographic name entry element Big Data; Data Analytics;
700 ## - ADDED ENTRY--PERSONAL NAME
9 (RLIN) 13132
Personal name Ravi, Vadlamani.
Relator term Editor
700 ## - ADDED ENTRY--PERSONAL NAME
9 (RLIN) 13133
Personal name Cherukuri, Aswani Kumar.,
Relator term Editor
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier <a href="https://digital-library.theiet.org/content/books/pc/pbpc037f">https://digital-library.theiet.org/content/books/pc/pbpc037f</a>
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Reference Book
Edition 23rd
Classification part 005.7
Call number suffix RAV
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Materials specified (bound volume or other part) Damaged status Not for loan Home library Current library Shelving location Date acquired Source of acquisition Cost, normal purchase price Inventory number Total Checkouts Full call number Barcode Date last seen Cost, replacement price Price effective from Koha item type Public note
    Dewey Decimal Classification Hard Bound     School of Computer Science Section VIT-AP General Stacks 2023-07-14 Donated by Aswani Kumar Cherukuri, Professor, VIT-Vellore Campus 20163.00 Donated by Aswani Kumar Cherukur   005.7 RAV 021083 2023-07-14 20163.00 2023-07-14 Reference Book CSE

Visitor Number:

Powered by Koha