Scala for Data Science

Leverage the power of Scala with different tools to build scalable, robust data science applications About This Book A complete guide for scalable data science solutions, from data ingestion to data visualization Deploy horizontally ...

Scala for Data Science

Author: Pascal Bugnion

Publisher: Packt Publishing Ltd

ISBN: 1785289381

Page: 416

View: 522

Leverage the power of Scala with different tools to build scalable, robust data science applications About This Book A complete guide for scalable data science solutions, from data ingestion to data visualization Deploy horizontally scalable data processing pipelines and take advantage of web frameworks to build engaging visualizations Build functional, type-safe routines to interact with relational and NoSQL databases with the help of tutorials and examples provided Who This Book Is For If you are a Scala developer or data scientist, or if you want to enter the field of data science, then this book will give you all the tools you need to implement data science solutions. What You Will Learn Transform and filter tabular data to extract features for machine learning Implement your own algorithms or take advantage of MLLib's extensive suite of models to build distributed machine learning pipelines Read, transform, and write data to both SQL and NoSQL databases in a functional manner Write robust routines to query web APIs Read data from web APIs such as the GitHub or Twitter API Use Scala to interact with MongoDB, which offers high performance and helps to store large data sets with uncertain query requirements Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Deploy scalable parallel applications using Apache Spark, loading data from HDFS or Hive In Detail Scala is a multi-paradigm programming language (it supports both object-oriented and functional programming) and scripting language used to build applications for the JVM. Languages such as R, Python, Java, and so on are mostly used for data science. It is particularly good at analyzing large sets of data without any significant impact on performance and thus Scala is being adopted by many developers and data scientists. Data scientists might be aware that building applications that are truly scalable is hard. Scala, with its powerful functional libraries for interacting with databases and building scalable frameworks will give you the tools to construct robust data pipelines. This book will introduce you to the libraries for ingesting, storing, manipulating, processing, and visualizing data in Scala. Packed with real-world examples and interesting data sets, this book will teach you to ingest data from flat files and web APIs and store it in a SQL or NoSQL database. It will show you how to design scalable architectures to process and modelling your data, starting from simple concurrency constructs such as parallel collections and futures, through to actor systems and Apache Spark. As well as Scala's emphasis on functional structures and immutability, you will learn how to use the right parallel construct for the job at hand, minimizing development time without compromising scalability. Finally, you will learn how to build beautiful interactive visualizations using web frameworks. This book gives tutorials on some of the most common Scala libraries for data science, allowing you to quickly get up to speed with building data science and data engineering solutions. Style and approach A tutorial with complete examples, this book will give you the tools to start building useful data engineering and data science solutions straightaway

Scala Programming for Big Data Analytics

This is followed by sections on Scala fundamentals including mutable/immutable variables, the type hierarchy system, control flow expressions and code blocks.

Scala Programming for Big Data Analytics

Author: Irfan Elahi

Publisher: Apress

ISBN: 1484248104

Page: 306

View: 844

Gain the key language concepts and programming techniques of Scala in the context of big data analytics and Apache Spark. The book begins by introducing you to Scala and establishes a firm contextual understanding of why you should learn this language, how it stands in comparison to Java, and how Scala is related to Apache Spark for big data analytics. Next, you’ll set up the Scala environment ready for examining your first Scala programs. This is followed by sections on Scala fundamentals including mutable/immutable variables, the type hierarchy system, control flow expressions and code blocks. The author discusses functions at length and highlights a number of associated concepts such as functional programming and anonymous functions. The book then delves deeper into Scala’s powerful collections system because many of Apache Spark’s APIs bear a strong resemblance to Scala collections. Along the way you’ll see the development life cycle of a Scala program. This involves compiling and building programs using the industry-standard Scala Build Tool (SBT). You’ll cover guidelines related to dependency management using SBT as this is critical for building large Apache Spark applications. Scala Programming for Big Data Analytics concludes by demonstrating how you can make use of the concepts to write programs that run on the Apache Spark framework. These programs will provide distributed and parallel computing, which is critical for big data analytics. What You Will Learn See the fundamentals of Scala as a general-purpose programming language Understand functional programming and object-oriented programming constructs in Scala Use Scala collections and functions Develop, package and run Apache Spark applications for big data analytics Who This Book Is For Data scientists, data analysts and data engineers who intend to use Apache Spark for large-scale analytics. /div

Scala Guide for Data Science Professionals

Scala will be a valuable tool to have on hand during your data science journey for everything from data cleaning to cutting-edge machine learning About This Book Build data science and data engineering solutions with ease An in-depth look ...

Scala  Guide for Data Science Professionals

Author: Pascal Bugnion

Publisher: Packt Publishing Ltd

ISBN: 1787281035

Page: 1100

View: 182

Scala will be a valuable tool to have on hand during your data science journey for everything from data cleaning to cutting-edge machine learning About This Book Build data science and data engineering solutions with ease An in-depth look at each stage of the data analysis process — from reading and collecting data to distributed analytics Explore a broad variety of data processing, machine learning, and genetic algorithms through diagrams, mathematical formulations, and source code Who This Book Is For This learning path is perfect for those who are comfortable with Scala programming and now want to enter the field of data science. Some knowledge of statistics is expected. What You Will Learn Transfer and filter tabular data to extract features for machine learning Read, clean, transform, and write data to both SQL and NoSQL databases Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Load data from HDFS and HIVE with ease Run streaming and graph analytics in Spark for exploratory analysis Bundle and scale up Spark jobs by deploying them into a variety of cluster managers Build dynamic workflows for scientific computing Leverage open source libraries to extract patterns from time series Master probabilistic models for sequential data In Detail Scala is especially good for analyzing large sets of data as the scale of the task doesn't have any significant impact on performance. Scala's powerful functional libraries can interact with databases and build scalable frameworks — resulting in the creation of robust data pipelines. The first module introduces you to Scala libraries to ingest, store, manipulate, process, and visualize data. Using real world examples, you will learn how to design scalable architecture to process and model data — starting from simple concurrency constructs and progressing to actor systems and Apache Spark. After this, you will also learn how to build interactive visualizations with web frameworks. Once you have become familiar with all the tasks involved in data science, you will explore data analytics with Scala in the second module. You'll see how Scala can be used to make sense of data through easy to follow recipes. You will learn about Bokeh bindings for exploratory data analysis and quintessential machine learning with algorithms with Spark ML library. You'll get a sufficient understanding of Spark streaming, machine learning for streaming data, and Spark graphX. Armed with a firm understanding of data analysis, you will be ready to explore the most cutting-edge aspect of data science — machine learning. The final module teaches you the A to Z of machine learning with Scala. You'll explore Scala for dependency injections and implicits, which are used to write machine learning algorithms. You'll also explore machine learning topics such as clustering, dimentionality reduction, Naive Bayes, Regression models, SVMs, neural networks, and more. This learning path combines some of the best that Packt has to offer into one complete, curated package. It includes content from the following Packt products: Scala for Data Science, Pascal Bugnion Scala Data Analysis Cookbook, Arun Manivannan Scala for Machine Learning, Patrick R. Nicolas Style and approach A complete package with all the information necessary to start building useful data engineering and data science solutions straight away. It contains a diverse set of recipes that cover the full spectrum of interesting data analysis tasks and will help you revolutionize your data analysis skills using Scala.

Scala and Spark for Big Data Analytics

About This Book Learn Scala's sophisticated type system that combines Functional Programming and object-oriented concepts Work on a wide array of applications, from simple batch jobs to stream processing and machine learning Explore the ...

Scala and Spark for Big Data Analytics

Author: Md. Rezaul Karim

Publisher: Packt Publishing Ltd

ISBN: 1783550503

Page: 786

View: 562

Harness the power of Scala to program Spark and analyze tonnes of data in the blink of an eye! About This Book Learn Scala's sophisticated type system that combines Functional Programming and object-oriented concepts Work on a wide array of applications, from simple batch jobs to stream processing and machine learning Explore the most common as well as some complex use-cases to perform large-scale data analysis with Spark Who This Book Is For Anyone who wishes to learn how to perform data analysis by harnessing the power of Spark will find this book extremely useful. No knowledge of Spark or Scala is assumed, although prior programming experience (especially with other JVM languages) will be useful to pick up concepts quicker. What You Will Learn Understand object-oriented & functional programming concepts of Scala In-depth understanding of Scala collection APIs Work with RDD and DataFrame to learn Spark's core abstractions Analysing structured and unstructured data using SparkSQL and GraphX Scalable and fault-tolerant streaming application development using Spark structured streaming Learn machine-learning best practices for classification, regression, dimensionality reduction, and recommendation system to build predictive models with widely used algorithms in Spark MLlib & ML Build clustering models to cluster a vast amount of data Understand tuning, debugging, and monitoring Spark applications Deploy Spark applications on real clusters in Standalone, Mesos, and YARN In Detail Scala has been observing wide adoption over the past few years, especially in the field of data science and analytics. Spark, built on Scala, has gained a lot of recognition and is being used widely in productions. Thus, if you want to leverage the power of Scala and Spark to make sense of big data, this book is for you. The first part introduces you to Scala, helping you understand the object-oriented and functional programming concepts needed for Spark application development. It then moves on to Spark to cover the basic abstractions using RDD and DataFrame. This will help you develop scalable and fault-tolerant streaming applications by analyzing structured and unstructured data using SparkSQL, GraphX, and Spark structured streaming. Finally, the book moves on to some advanced topics, such as monitoring, configuration, debugging, testing, and deployment. You will also learn how to develop Spark applications using SparkR and PySpark APIs, interactive data analytics using Zeppelin, and in-memory data processing with Alluxio. By the end of this book, you will have a thorough understanding of Spark, and you will be able to perform full-stack data analytics with a feel that no amount of data is too big. Style and approach Filled with practical examples and use cases, this book will hot only help you get up and running with Spark, but will also take you farther down the road to becoming a data scientist.

Hands On Data Analysis with Scala

This book helps you to leverage the popular Scala libraries and tools for performing core data analysis tasks with ease. The book begins with a quick overview of the building blocks of a standard data analysis process.

Hands On Data Analysis with Scala

Author: Rajesh Gupta

Publisher: Packt Publishing Ltd

ISBN: 1789344263

Page: 298

View: 581

Master scala's advanced techniques to solve real-world problems in data analysis and gain valuable insights from your data Key Features A beginner's guide for performing data analysis loaded with numerous rich, practical examples Access to popular Scala libraries such as Breeze, Saddle for efficient data manipulation and exploratory analysis Develop applications in Scala for real-time analysis and machine learning in Apache Spark Book Description Efficient business decisions with an accurate sense of business data helps in delivering better performance across products and services. This book helps you to leverage the popular Scala libraries and tools for performing core data analysis tasks with ease. The book begins with a quick overview of the building blocks of a standard data analysis process. You will learn to perform basic tasks like Extraction, Staging, Validation, Cleaning, and Shaping of datasets. You will later deep dive into the data exploration and visualization areas of the data analysis life cycle. You will make use of popular Scala libraries like Saddle, Breeze, Vegas, and PredictionIO for processing your datasets. You will learn statistical methods for deriving meaningful insights from data. You will also learn to create applications for Apache Spark 2.x on complex data analysis, in real-time. You will discover traditional machine learning techniques for doing data analysis. Furthermore, you will also be introduced to neural networks and deep learning from a data analysis standpoint. By the end of this book, you will be capable of handling large sets of structured and unstructured data, perform exploratory analysis, and building efficient Scala applications for discovering and delivering insights What you will learn Techniques to determine the validity and confidence level of data Apply quartiles and n-tiles to datasets to see how data is distributed into many buckets Create data pipelines that combine multiple data lifecycle steps Use built-in features to gain a deeper understanding of the data Apply Lasso regression analysis method to your data Compare Apache Spark API with traditional Apache Spark data analysis Who this book is for If you are a data scientist or a data analyst who wants to learn how to perform data analysis using Scala, this book is for you. All you need is knowledge of the basic fundamentals of Scala programming.

Programming Languages for Data Science Julia Scala Python and R

Since this is a very complex matter, no clear winner has been declared yet. However, it is possible to find out which programming language is best for you, for your data science endeavors.

Programming Languages for Data Science  Julia  Scala  Python  and R

Author: Zacharias Voulgaris

Publisher:

ISBN:

Page:

View: 997

"Ever since the birth of the data science field, there has been a constant debate about which language is the best, the one better suited for large-scale data analytics applications. Since this is a very complex matter, no clear winner has been declared yet. However, it is possible to find out which programming language is best for you, for your data science endeavors. In this video we explore just that, providing you with all the information you need to make an educated decision."--Resource description page.

Scala Applied Machine Learning

Leverage the power of Scala and master the art of building, improving, and validating scalable machine learning and AI applications using Scala's most advanced and finest features About This Book Build functional, type-safe routines to ...

Scala Applied Machine Learning

Author: Pascal Bugnion

Publisher: Packt Publishing Ltd

ISBN: 178712455X

Page: 1265

View: 860

Leverage the power of Scala and master the art of building, improving, and validating scalable machine learning and AI applications using Scala's most advanced and finest features About This Book Build functional, type-safe routines to interact with relational and NoSQL databases with the help of the tutorials and examples provided Leverage your expertise in Scala programming to create and customize your own scalable machine learning algorithms Experiment with different techniques; evaluate their benefits and limitations using real-world financial applications Get to know the best practices to incorporate new Big Data machine learning in your data-driven enterprise and gain future scalability and maintainability Who This Book Is For This Learning Path is for engineers and scientists who are familiar with Scala and want to learn how to create, validate, and apply machine learning algorithms. It will also benefit software developers with a background in Scala programming who want to apply machine learning. What You Will Learn Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Deploy scalable parallel applications using Apache Spark, loading data from HDFS or Hive Solve big data problems with Scala parallel collections, Akka actors, and Apache Spark clusters Apply key learning strategies to perform technical analysis of financial markets Understand the principles of supervised and unsupervised learning in machine learning Work with unstructured data and serialize it using Kryo, Protobuf, Avro, and AvroParquet Construct reliable and robust data pipelines and manage data in a data-driven enterprise Implement scalable model monitoring and alerts with Scala In Detail This Learning Path aims to put the entire world of machine learning with Scala in front of you. Scala for Data Science, the first module in this course, is a tutorial guide that provides tutorials on some of the most common Scala libraries for data science, allowing you to quickly get up to speed building data science and data engineering solutions. The second course, Scala for Machine Learning guides you through the process of building AI applications with diagrams, formal mathematical notation, source code snippets, and useful tips. A review of the Akka framework and Apache Spark clusters concludes the tutorial. The next module, Mastering Scala Machine Learning, is the final step in this course. It will take your knowledge to next level and help you use the knowledge to build advanced applications such as social media mining, intelligent news portals, and more. After a quick refresher on functional programming concepts using REPL, you will see some practical examples of setting up the development environment and tinkering with data. We will then explore working with Spark and MLlib using k-means and decision trees. By the end of this course, you will be a master at Scala machine learning and have enough expertise to be able to build complex machine learning projects using Scala. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Scala for Data Science, Pascal Bugnion Scala for Machine Learning, Patrick Nicolas Mastering Scala Machine Learning, Alex Kozlov Style and approach A tutorial with complete examples, this course will give you the tools to start building useful data engineering and data science solutions straightaway. This course provides practical examples from the field on how to correctly tackle data analysis problems, particularly for modern Big Data datasets.

Learning Spark

This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run.

Learning Spark

Author: Holden Karau

Publisher: "O'Reilly Media, Inc."

ISBN: 144935906X

Page: 276

View: 949

This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. You'll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.--

Scala for Machine Learning

A data scientist solves problems related to data in a variety of fields, such as
biological sciences, health care, marketing, or finances. Data and text mining,
signal processing, statistical analysis, and modeling using machine learning
algorithms ...

Scala for Machine Learning

Author: Patrick R. Nicolas

Publisher: Packt Publishing Ltd

ISBN: 178355875X

Page: 520

View: 808

Are you curious about AI? All you need is a good understanding of the Scala programming language, a basic knowledge of statistics, a keen interest in Big Data processing, and this book!

Spark for Data Science

Analyze your data and delve deep into the world of machine learning with the latest Spark version, 2.0 About This Book Perform data analysis and build predictive models on huge datasets that leverage Apache Spark Learn to integrate data ...

Spark for Data Science

Author: Srinivas Duvvuri

Publisher: Packt Publishing Ltd

ISBN: 1785884778

Page: 344

View: 494

Analyze your data and delve deep into the world of machine learning with the latest Spark version, 2.0 About This Book Perform data analysis and build predictive models on huge datasets that leverage Apache Spark Learn to integrate data science algorithms and techniques with the fast and scalable computing features of Spark to address big data challenges Work through practical examples on real-world problems with sample code snippets Who This Book Is For This book is for anyone who wants to leverage Apache Spark for data science and machine learning. If you are a technologist who wants to expand your knowledge to perform data science operations in Spark, or a data scientist who wants to understand how algorithms are implemented in Spark, or a newbie with minimal development experience who wants to learn about Big Data Analytics, this book is for you! What You Will Learn Consolidate, clean, and transform your data acquired from various data sources Perform statistical analysis of data to find hidden insights Explore graphical techniques to see what your data looks like Use machine learning techniques to build predictive models Build scalable data products and solutions Start programming using the RDD, DataFrame and Dataset APIs Become an expert by improving your data analytical skills In Detail This is the era of Big Data. The words ҂ig Data' implies big innovation and enables a competitive advantage for businesses. Apache Spark was designed to perform Big Data analytics at scale, and so Spark is equipped with the necessary algorithms and supports multiple programming languages. Whether you are a technologist, a data scientist, or a beginner to Big Data analytics, this book will provide you with all the skills necessary to perform statistical data analysis, data visualization, predictive modeling, and build scalable data products or solutions using Python, Scala, and R. With ample case studies and real-world examples, Spark for Data Science will help you ensure the successful execution of your data science projects. Style and approach This book takes a step-by-step approach to statistical analysis and machine learning, and is explained in a conversational and easy-to-follow style. Each topic is explained sequentially with a focus on the fundamentals as well as the advanced concepts of algorithms and techniques. Real-world examples with sample code snippets are also included.

Scala Machine Learning Projects

This book is for data scientists, data engineers, and deep learning enthusiasts who have a background in complex numerical computing and want to know more hands-on machine learning application development.

Scala Machine Learning Projects

Author: Md. Rezaul Karim

Publisher: Packt Publishing Ltd

ISBN: 1788471474

Page: 470

View: 808

Powerful smart applications using deep learning algorithms to dominate numerical computing, deep learning, and functional programming. Key Features Explore machine learning techniques with prominent open source Scala libraries such as Spark ML, H2O, MXNet, Zeppelin, and DeepLearning4j Solve real-world machine learning problems by delving complex numerical computing with Scala functional programming in a scalable and faster way Cover all key aspects such as collection, storing, processing, analyzing, and evaluation required to build and deploy machine models on computing clusters using Scala Play framework. Book Description Machine learning has had a huge impact on academia and industry by turning data into actionable information. Scala has seen a steady rise in adoption over the past few years, especially in the fields of data science and analytics. This book is for data scientists, data engineers, and deep learning enthusiasts who have a background in complex numerical computing and want to know more hands-on machine learning application development. If you're well versed in machine learning concepts and want to expand your knowledge by delving into the practical implementation of these concepts using the power of Scala, then this book is what you need! Through 11 end-to-end projects, you will be acquainted with popular machine learning libraries such as Spark ML, H2O, DeepLearning4j, and MXNet. At the end, you will be able to use numerical computing and functional programming to carry out complex numerical tasks to develop, build, and deploy research or commercial projects in a production-ready environment. What you will learn Apply advanced regression techniques to boost the performance of predictive models Use different classification algorithms for business analytics Generate trading strategies for Bitcoin and stock trading using ensemble techniques Train Deep Neural Networks (DNN) using H2O and Spark ML Utilize NLP to build scalable machine learning models Learn how to apply reinforcement learning algorithms such as Q-learning for developing ML application Learn how to use autoencoders to develop a fraud detection application Implement LSTM and CNN models using DeepLearning4j and MXNet Who this book is for If you want to leverage the power of both Scala and Spark to make sense of Big Data, then this book is for you. If you are well versed with machine learning concepts and wants to expand your knowledge by delving into the practical implementation using the power of Scala, then this book is what you need! Strong understanding of Scala Programming language is recommended. Basic familiarity with machine Learning techniques will be more helpful.

Modern Scala Projects

What you will learn Create pipelines to extract data or analytics and visualizations Automate your process pipeline with jobs that are reproducible Extract intelligent data efficiently from large, disparate datasets Automate the extraction, ...

Modern Scala Projects

Author: Ilango gurusamy

Publisher: Packt Publishing Ltd

ISBN: 1788625277

Page: 334

View: 778

Develop robust, Scala-powered projects with the help of machine learning libraries such as SparkML to harvest meaningful insight Key Features Gain hands-on experience in building data science projects with Scala Exploit powerful functionalities of machine learning libraries Use machine learning algorithms and decision tree models for enterprise apps Book Description Scala, together with the Spark Framework, forms a rich and powerful data processing ecosystem. Modern Scala Projects is a journey into the depths of this ecosystem. The machine learning (ML) projects presented in this book enable you to create practical, robust data analytics solutions, with an emphasis on automating data workflows with the Spark ML pipeline API. This book showcases or carefully cherry-picks from Scala’s functional libraries and other constructs to help readers roll out their own scalable data processing frameworks. The projects in this book enable data practitioners across all industries gain insights into data that will help organizations have strategic and competitive advantage. Modern Scala Projects focuses on the application of supervisory learning ML techniques that classify data and make predictions. You'll begin with working on a project to predict a class of flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. By the end of this book, you will be able to build efficient data science projects that fulfil your software requirements. What you will learn Create pipelines to extract data or analytics and visualizations Automate your process pipeline with jobs that are reproducible Extract intelligent data efficiently from large, disparate datasets Automate the extraction, transformation, and loading of data Develop tools that collate, model, and analyze data Maintain the integrity of data as data flows become more complex Develop tools that predict outcomes based on “pattern discovery” Build really fast and accurate machine-learning models in Scala Who this book is for Modern Scala Projects is for Scala developers who would like to gain some hands-on experience with some interesting real-world projects. Prior programming experience with Scala is necessary.

Machine Learning with Scala Quick Start Guide

Supervised and unsupervised machine learning made easy in Scala with this quick-start guide.

Machine Learning with Scala Quick Start Guide

Author: Md. Rezaul Karim

Publisher: Packt Publishing Ltd

ISBN: 1789345413

Page: 220

View: 972

Supervised and unsupervised machine learning made easy in Scala with this quick-start guide. Key Features Construct and deploy machine learning systems that learn from your data and give accurate predictions Unleash the power of Spark ML along with popular machine learning algorithms to solve complex tasks in Scala. Solve hands-on problems by combining popular neural network architectures such as LSTM and CNN using Scala with DeepLearning4j library Book Description Scala is a highly scalable integration of object-oriented nature and functional programming concepts that make it easy to build scalable and complex big data applications. This book is a handy guide for machine learning developers and data scientists who want to develop and train effective machine learning models in Scala. The book starts with an introduction to machine learning, while covering deep learning and machine learning basics. It then explains how to use Scala-based ML libraries to solve classification and regression problems using linear regression, generalized linear regression, logistic regression, support vector machine, and Naïve Bayes algorithms. It also covers tree-based ensemble techniques for solving both classification and regression problems. Moving ahead, it covers unsupervised learning techniques, such as dimensionality reduction, clustering, and recommender systems. Finally, it provides a brief overview of deep learning using a real-life example in Scala. What you will learn Get acquainted with JVM-based machine learning libraries for Scala such as Spark ML and Deeplearning4j Learn RDDs, DataFrame, and Spark SQL for analyzing structured and unstructured data Understand supervised and unsupervised learning techniques with best practices and pitfalls Learn classification and regression analysis with linear regression, logistic regression, Naïve Bayes, support vector machine, and tree-based ensemble techniques Learn effective ways of clustering analysis with dimensionality reduction techniques Learn recommender systems with collaborative filtering approach Delve into deep learning and neural network architectures Who this book is for This book is for machine learning developers looking to train machine learning models in Scala without spending too much time and effort. Some fundamental knowledge of Scala programming and some basics of statistics and linear algebra is all you need to get started with this book.

Advanced Analytics with Spark

With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets ...

Advanced Analytics with Spark

Author: Sandy Ryza

Publisher: "O'Reilly Media, Inc."

ISBN: 1491972920

Page: 280

View: 962

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses

Scientific Computing with Scala

Learn to solve scientific computing problems using Scala and its numerical computing, data processing, concurrency, and plotting libraries About This Book Parallelize your numerical computing code using convenient and safe techniques.

Scientific Computing with Scala

Author: Vytautas Jancauskas

Publisher: Packt Publishing Ltd

ISBN: 1785887475

Page: 232

View: 957

Learn to solve scientific computing problems using Scala and its numerical computing, data processing, concurrency, and plotting libraries About This Book Parallelize your numerical computing code using convenient and safe techniques. Accomplish common high-performance, scientific computing goals in Scala. Learn about data visualization and how to create high-quality scientific plots in Scala Who This Book Is For Scientists and engineers who would like to use Scala for their scientific and numerical computing needs. A basic familiarity with undergraduate level mathematics and statistics is expected but not strictly required. A basic knowledge of Scala is required as well as the ability to write simple Scala programs. However, complicated programming concepts are not used in the book. Anyone who wants to explore using Scala for writing scientific or engineering software will benefit from the book. What You Will Learn Write and read a variety of popular file formats used to store scientific data Use Breeze for linear algebra, optimization, and digital signal processing Gain insight into Saddle for data analysis Use ScalaLab for interactive computing Quickly and conveniently write safe parallel applications using Scala's parallel collections Implement and deploy concurrent programs using the Akka framework Use the Wisp plotting library to produce scientific plots Visualize multivariate data using various visualization techniques In Detail Scala is a statically typed, Java Virtual Machine (JVM)-based language with strong support for functional programming. There exist libraries for Scala that cover a range of common scientific computing tasks – from linear algebra and numerical algorithms to convenient and safe parallelization to powerful plotting facilities. Learning to use these to perform common scientific tasks will allow you to write programs that are both fast and easy to write and maintain. We will start by discussing the advantages of using Scala over other scientific computing platforms. You will discover Scala packages that provide the functionality you have come to expect when writing scientific software. We will explore using Scala's Breeze library for linear algebra, optimization, and signal processing. We will then proceed to the Saddle library for data analysis. If you have experience in R or with Python's popular pandas library you will learn how to translate those skills to Saddle. If you are new to data analysis, you will learn basic concepts of Saddle as well. Well will explore the numerical computing environment called ScalaLab. It comes bundled with a lot of scientific software readily available. We will use it for interactive computing, data analysis, and visualization. In the following chapters, we will explore using Scala's powerful parallel collections for safe and convenient parallel programming. Topics such as the Akka concurrency framework will be covered. Finally, you will learn about multivariate data visualization and how to produce professional-looking plots in Scala easily. After reading the book, you should have more than enough information on how to start using Scala as your scientific computing platform Style and approach Examples are provided on how to use Scala to do basic numerical and scientific computing tasks. All the concepts are illustrated with more involved examples in each chapter. The goal of the book is to allow you to translate existing experience in scientific computing to Scala.

Apache Spark 2 Data Processing and Real Time Analytics

This Learning Path includes content from the following Packt products: Mastering Apache Spark 2.x by Romeo Kienzler Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla Apache Spark 2.x Machine Learning Cookbook by ...

Apache Spark 2  Data Processing and Real Time Analytics

Author: Romeo Kienzler

Publisher: Packt Publishing Ltd

ISBN: 1789959918

Page: 616

View: 834

Build efficient data flow and machine learning programs with this flexible, multi-functional open-source cluster-computing framework Key Features Master the art of real-time big data processing and machine learning Explore a wide range of use-cases to analyze large data Discover ways to optimize your work by using many features of Spark 2.x and Scala Book Description Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform. You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools. By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle. This Learning Path includes content from the following Packt products: Mastering Apache Spark 2.x by Romeo Kienzler Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen MeiCookbook What you will learn Get to grips with all the features of Apache Spark 2.x Perform highly optimized real-time big data processing Use ML and DL techniques with Spark MLlib and third-party tools Analyze structured and unstructured data using SparkSQL and GraphX Understand tuning, debugging, and monitoring of big data applications Build scalable and fault-tolerant streaming applications Develop scalable recommendation engines Who this book is for If you are an intermediate-level Spark developer looking to master the advanced capabilities and use-cases of Apache Spark 2.x, this Learning Path is ideal for you. Big data professionals who want to learn how to integrate and use the features of Apache Spark and build a strong big data pipeline will also find this Learning Path useful. To grasp the concepts explained in this Learning Path, you must know the fundamentals of Apache Spark and Scala.

Learning Scala Programming

Finally, you will learn the interoperability of Scala and Java. After reading this book, you'll be well versed with this language and its features, and you will be able to write scalable, concurrent, and reactive programs in Scala.

Learning Scala Programming

Author: Vikash Sharma

Publisher: Packt Publishing Ltd

ISBN: 1788391616

Page: 426

View: 591

Learn how to write scalable and concurrent programs in Scala, a language that grows with you. Key Features Get a grip on the functional features of the Scala programming language Understand and develop optimal applications using object-oriented and functional Scala constructs Learn reactive principles with Scala and work with the Akka framework Book Description Scala is a general-purpose programming language that supports both functional and object-oriented programming paradigms. Due to its concise design and versatility, Scala's applications have been extended to a wide variety of fields such as data science and cluster computing. You will learn to write highly scalable, concurrent, and testable programs to meet everyday software requirements. We will begin by understanding the language basics, syntax, core data types, literals, variables, and more. From here you will be introduced to data structures with Scala and you will learn to work with higher-order functions. Scala's powerful collections framework will help you get the best out of immutable data structures and utilize them effectively. You will then be introduced to concepts such as pattern matching, case classes, and functional programming features. From here, you will learn to work with Scala's object-oriented features. Going forward, you will learn about asynchronous and reactive programming with Scala, where you will be introduced to the Akka framework. Finally, you will learn the interoperability of Scala and Java. After reading this book, you'll be well versed with this language and its features, and you will be able to write scalable, concurrent, and reactive programs in Scala. What you will learn Get to know the reasons for choosing Scala: its use and the advantages it provides over other languages Bring together functional and object-oriented programming constructs to make a manageable application Master basic to advanced Scala constructs Test your applications using advanced testing methodologies such as TDD Select preferred language constructs from the wide variety of constructs provided by Scala Make the transition from the object-oriented paradigm to the functional programming paradigm Write clean, concise, and powerful code with a functional mindset Create concurrent, scalable, and reactive applications utilizing the advantages of Scala Who this book is for This book is for programmers who choose to get a grip over Scala to write concurrent, scalable, and reactive programs. No prior experience with any programming language is required to learn the concepts explained in this book. Knowledge of any programming language would help the reader understanding concepts faster though.

Apache Spark for Data Science Cookbook

Over insightful 90 recipes to get lightning-fast analytics with Apache Spark About This Book Use Apache Spark for data processing with these hands-on recipes Implement end-to-end, large-scale data analysis better than ever before Work with ...

Apache Spark for Data Science Cookbook

Author: Padma Priya Chitturi

Publisher: Packt Publishing Ltd

ISBN: 1785288806

Page: 392

View: 514

Over insightful 90 recipes to get lightning-fast analytics with Apache Spark About This Book Use Apache Spark for data processing with these hands-on recipes Implement end-to-end, large-scale data analysis better than ever before Work with powerful libraries such as MLLib, SciPy, NumPy, and Pandas to gain insights from your data Who This Book Is For This book is for novice and intermediate level data science professionals and data analysts who want to solve data science problems with a distributed computing framework. Basic experience with data science implementation tasks is expected. Data science professionals looking to skill up and gain an edge in the field will find this book helpful. What You Will Learn Explore the topics of data mining, text mining, Natural Language Processing, information retrieval, and machine learning. Solve real-world analytical problems with large data sets. Address data science challenges with analytical tools on a distributed system like Spark (apt for iterative algorithms), which offers in-memory processing and more flexibility for data analysis at scale. Get hands-on experience with algorithms like Classification, regression, and recommendation on real datasets using Spark MLLib package. Learn about numerical and scientific computing using NumPy and SciPy on Spark. Use Predictive Model Markup Language (PMML) in Spark for statistical data mining models. In Detail Spark has emerged as the most promising big data analytics engine for data science professionals. The true power and value of Apache Spark lies in its ability to execute data science tasks with speed and accuracy. Spark's selling point is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. It lets you tackle the complexities that come with raw unstructured data sets with ease. This guide will get you comfortable and confident performing data science tasks with Spark. You will learn about implementations including distributed deep learning, numerical computing, and scalable machine learning. You will be shown effective solutions to problematic concepts in data science using Spark's data science libraries such as MLLib, Pandas, NumPy, SciPy, and more. These simple and efficient recipes will show you how to implement algorithms and optimize your work. Style and approach This book contains a comprehensive range of recipes designed to help you learn the fundamentals and tackle the difficulties of data science. This book outlines practical steps to produce powerful insights into Big Data through a recipe-based approach.