Bøger / faglitteratur

Learning big data with Amazon Elastic MapReduce : easily learn, build, and execute real-world big data solutions using Hadoop and AWS EMR


Beskrivelse


Summary: This book is aimed at developers and system administrators who want to learn about Big Data analysis using Amazon Elastic MapReduce. Basic Java programming knowledge is required. You should be comfortable with using command-line tools. Prior knowledge of AWS, API, and CLI tools is not assumed. Also, no exposure to Hadoop and MapReduce is expected.

Indhold

Seneste udgave,

Cover; Copyright; Credits; About the Authors; Acknowledgments; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Amazon Web Services; What is Amazon Web Services?; Structure and Design; Regions; Availability Zones; Services provided by AWS; Compute; Amazon EC2; Auto Scaling; Elastic Load Balancing; Amazon Workspaces; Storage; Amazon S3; Amazon EBS; Amazon Glacier; AWS Storage Gateway; AWS Import/Export; Databases; Amazon RDS; Amazon DynamoDB; Amazon Redshift; Amazon ElastiCache; Networking and CDN; Amazon VPC; Amazon Route 53; Amazon CloudFront; AWS Direct Connect ; AnalyticsAmazon EMR; Amazon Kinesis; AWS Data Pipeline; Application services; Amazon CloudSearch (Beta); Amazon SQS; Amazon SNS; Amazon SES; Amazon AppStream; Amazon Elastic Transcoder; Amazon SWF; Deployment and Management; AWS Identity and Access Management; Amazon CloudWatch; AWS Elastic Beanstalk; AWS CloudFormation; AWS OpsWorks; AWS CloudHSM; AWS CloudTrail; AWS Pricing; Creating an account on AWS; Step 1 - Creating an Amazon.com account; Step 2 - Providing a payment method; Step 3 - Identity verification by telephone; Step 4 - Selecting the AWS support plan ; Launching the AWS management consoleGetting started with Amazon EC2; How to start a machine on AWS?; Step 1 - Choosing an Amazon Machine Image; Step 2 - Choosing an instance type; Step 3 - Configuring instance details; Step 4 - Adding storage; Step 5 - Tagging your instance; Step 6 - Configuring a security group; Communicating with the launched instance; EC2 instance types; General purpose; Memory optimized; Compute optimized; Getting started with Amazon S3; Creating a S3 bucket; Bucket naming; S3cmd; Summary; Chapter 2: MapReduce; The map function; The reduce function; Divide and conquer ; What is MapReduce?The map reduce function models; The map function model; The reduce function model; Data life cycle in the MapReduce framework; Creation of input data splits; Record reader; Mapper; Combiner; Partitioner; Shuffle and sort; Reducer; Real-world examples and use cases of MapReduce; Social networks ; Media and entertainment; E-commerce and websites; Fraud detection and financial analytics; Search engines and ad networks; ETL and data analytics; Software distributions built on the MapReduce framework; Apache Hadoop; MapR; Cloudera distribution; Summary; Chapter 3: Apache Hadoop ; What is Apache Hadoop?Hadoop modules; Hadoop Distributed File System; Major architectural goals of HDFS; Block replication and rack awareness; The HDFS architecture; NameNode; DataNode; Apache Hadoop MapReduce; Hadoop MapReduce 1.x; JobTracker; TaskTracker; Hadoop MapReduce 2.0; Hadoop YARN; Apache Hadoop as a platform; Apache Pig; Apache Hive; Summary; Chapter 4: Amazon EMR - Hadoop on Amazon Web Services; What is AWS EMR?; Features of EMR; Accessing Amazon EMR features; Programming on AWS EMR; The EMR architecture; Types of nodes; EMR Job Flow and Steps; Job Steps; An EMR cluster ; Hadoop filesystem on EMR - S3 and HDFS


Tidsskrift

Artiklen er en del af

Artiklerne i  handler ofte om

Artikler med samme emner

Fra


Artikler

Alle registrerede artikler fordelt på udgivelser

...

...

...

...

...