Author Topic: Developing Spark Applications with Python (Read 424 times)

thewall81317 · « **on:** December 27, 2021, 09:08:57 AM »

Developing Spark Applications with Python

Poster

Name : Developing Spark Applications with Python
ISBN : 1676414150
Author : Xavier Morera (Author), Nereo Campos (Author)
Release : 2012
File Type : pdf
Language : English

If you are going to work with Big Data or Machine Learning, you need to learn Apache Spark.

If you need to learn Spark, you should get this book.

About the Book: Ever since the dawn of civilization, humans have had a need for organizing data.

Accounting has existed for thousands of years. It was initially used to account for crops and herds,
but later on was adopted for many other uses. Simple analog methods were used at first,
which at some point evolved into mechanical devices.

Fast-forward a few years, and we get to the digital era, where things like databases and
spreadsheets started to be used to manage ever-growing amounts of data.

How much data? A lot.

More than what a human could manage in their mind or using analog methods, and it's still growing.
Paraphrasing a smart man, developing applications that worked with data went
something like this: You took a group of developers, put them into a room, fed them a lot of pizza, and
wrote a big check for the largest database that you could buy, and another one for the largest metal
box on the market.

Eventually, you got an application capable of handling large amounts of data for your enterprise.
But as expected, things change-they always do, don't they ?
We reached an era of information explosion, in large part thanks to the internet.
Data started to be created at an unprecedented rate; so much so that some of these data sets
cannot be managed and processed using traditional methods.

In fact, we can say that the internet is partly responsible for taking us into the Big Data era.
Hadoop was created at Yahoo to help crawl the internet, something that could not be done with
traditional methods. The Yahoo engineers that created Hadoop were inspired by two papers released
by Google that explained how they solved the problem of working with large amounts of data in parallel.

But Big Data was more than just Hadoop. Soon enough, Hadoop, which initially was meant to refer to
the framework used for distributed processing of large amounts of data (MapReduce),
started to become more of an umbrella term to describe an ecosystem of tools and platforms
capable of massive parallel processing of data. This included Pig, Hive, Impala, and many more.

But sometime around 2009, a research project in UC Berkeley AMPLab was started by Matei Zaharia.

At first, according to legend, the original project was building a cluster management framework, known as mesos.
Once mesos was born, they wanted to see how easy it was to build a framework from scratch in mesos, and that's
how Spark was born.Spark can help you process large amounts of data, both in the Data Engineering world,
as well as in the Machine Learning one.

Welcome to the Spark era!Table of Contents1 The Spark Era2 Understanding Apache Spark3 Getting
Technical with Spark4 Spark's RDDs5 Going Deeper into Spark Core6 Data Frames and
Spark SQL7 Spark SQL8 Understanding Typed API: DataSet9 Spark Streaming10
Exploring NOOA's Datasets11 Final words12 About the Authors

All Credit Goes To Original Uploader

You are not allowed to view links. Register or Login

Author Topic: Developing Spark Applications with Python (Read 424 times)

thewall81317

Developing Spark Applications with Python