lorem ipsum dolor sit amet ...
Tidsskrift
E-bog, 1. ed., 2018
Cover -- Title Page -- Copyright and Credits -- Packt Upsell -- Contributors -- Table of Contents -- Preface -- Chapter 1: Enterprise Data Architecture Principles -- Data architecture principles -- Volume -- Velocity -- Variety -- Veracity -- The importance of metadata -- Data governance -- Fundamentals of data governance -- Data security -- Application security -- Input data -- Big data security -- RDBMS security -- BI security -- Physical security -- Data encryption -- Secure key management -- Data as a Service -- Evolution data architecture with Hadoop -- Hierarchical database architecture -- Network database architecture -- Relational database architecture -- Employees -- Devices -- Department -- Department and employee mapping table -- Hadoop data architecture -- Data layer -- Data management layer -- Job execution layer -- Summary -- Chapter 2: Hadoop Life Cycle Management -- Data wrangling -- Data acquisition -- Data structure analysis -- Information extraction -- Unwanted data removal -- Data transformation -- Data standardization -- Data masking -- Substitution -- Static -- Dynamic -- Encryption -- Hashing -- Hiding -- Erasing -- Truncation -- Variance -- Shuffling -- Data security -- What is Apache Ranger? -- Apache Ranger installation using Ambari -- Ambari admin UI -- Add service -- Service placement -- Service client placement -- Database creation on master -- Ranger database configuration -- Configuration changes -- Configuration review -- Deployment progress -- Application restart -- Apache Ranger user guide -- Login to UI -- Access manager -- Service details -- Policy definition and auditing for HDFS -- Summary -- Chapter 3: Hadoop Design Consideration -- Understanding data structure principles -- Installing Hadoop cluster -- Configuring Hadoop on NameNode -- Format NameNode -- Start all services -- Exploring HDFS architecture ; Defining NameNode -- Secondary NameNode -- NameNode safe mode -- DataNode -- Data replication -- Rack awareness -- HDFS WebUI -- Introducing YARN -- YARN architecture -- Resource manager -- Node manager -- Configuration of YARN -- Configuring HDFS high availability -- During Hadoop 1.x -- During Hadoop 2.x and onwards -- HDFS HA cluster using NFS -- Important architecture points -- Configuration of HA NameNodes with shared storage -- HDFS HA cluster using the quorum journal manager -- Important architecture points -- Configuration of HA NameNodes with QJM -- Automatic failover -- Important architecture points -- Configuring automatic failover -- Hadoop cluster composition -- Typical Hadoop cluster -- Best practices Hadoop deployment -- Hadoop file formats -- Text/CSV file -- JSON -- Sequence file -- Avro -- Parquet -- ORC -- Which file format is better? -- Summary -- Chapter 4: Data Movement Techniques -- Batch processing versus real-time processing -- Batch processing -- Real-time processing -- Apache Sqoop -- Sqoop Import -- Import into HDFS -- Import a MySQL table into an HBase table -- Sqoop export -- Flume -- Apache Flume architecture -- Data flow using Flume -- Flume complex data flow architecture -- Flume setup -- Log aggregation use case -- Apache NiFi -- Main concepts of Apache NiFi -- Apache NiFi architecture -- Key features -- Real-time log capture dataflow -- Kafka Connect -- Kafka Connect - a brief history -- Why Kafka Connect? -- Kafka Connect features -- Kafka Connect architecture -- Kafka Connect workers modes -- Standalone mode -- Distributed mode -- Kafka Connect cluster distributed architecture -- Example 1 -- Example 2 -- Summary -- Chapter 5: Data Modeling in Hadoop -- Apache Hive -- Apache Hive and RDBMS -- Supported datatypes -- How Hive works -- Hive architecture -- Hive data model management -- Hive tables -- Managed tables ; External tables -- Hive table partition -- Hive static partitions and dynamic partitions -- Hive partition bucketing -- How Hive bucketing works -- Creating buckets in a non-partitioned table -- Creating buckets in a partitioned table -- Hive views -- Syntax of a view -- Hive indexes -- Compact index -- Bitmap index -- JSON documents using Hive -- Example 1 - Accessing simple JSON documents with Hive (Hive 0.14 and later versions) -- Example 2 - Accessing nested JSON documents with Hive (Hive 0.14 and later versions) -- Example 3 - Schema evolution with Hive and Avro (Hive 0.14 and later versions) -- Apache HBase -- Differences between HDFS and HBase -- Differences between Hive and HBase -- Key features of HBase -- HBase data model -- Difference between RDBMS table and column - oriented data store -- HBase architecture -- HBase architecture in a nutshell -- HBase rowkey design -- Example 4 - loading data from MySQL table to HBase table -- Example 5 - incrementally loading data from MySQL table to HBase table -- Example 6 - Load the MySQL customer changed data into the HBase table -- Example 7 - Hive HBase integration -- Summary -- Chapter 6: Designing Real-Time Streaming Data Pipelines -- Real-time streaming concepts -- Data stream -- Batch processing versus real-time data processing -- Complex event processing -- Continuous availability -- Low latency -- Scalable processing frameworks -- Horizontal scalability -- Storage -- Real-time streaming components -- Message queue -- So what is Kafka? -- Kafka features -- Kafka architecture -- Kafka architecture components -- Kafka Connect deep dive -- Kafka Connect architecture -- Kafka Connect workers standalone versus distributed mode -- Install Kafka -- Create topics -- Generate messages to verify the producer and consumer -- Kafka Connect using file Source and Sink ; Kafka Connect using JDBC and file Sink Connectors -- Apache Storm -- Features of Apache Storm -- Storm topology -- Storm topology components -- Installing Storm on a single node cluster -- Developing a real-time streaming pipeline with Storm -- Streaming a pipeline from Kafka to Storm to MySQL -- Streaming a pipeline with Kafka to Storm to HDFS -- Other popular real-time data streaming frameworks -- Kafka Streams API -- Spark Streaming -- Apache Flink -- Apache Flink versus Spark -- Apache Spark versus Storm -- Summary -- Chapter 7: Large-Scale Data Processing Frameworks -- MapReduce -- Hadoop MapReduce -- Streaming MapReduce -- Java MapReduce -- Summary -- Apache Spark 2 -- Installing Spark using Ambari -- Service selection in Ambari Admin -- Add Service Wizard -- Server placement -- Clients and Slaves selection -- Service customization -- Software deployment -- Spark installation progress -- Service restarts and cleanup -- Apache Spark data structures -- RDDs, DataFrames and datasets -- Apache Spark programming -- Sample data for analysis -- Interactive data analysis with pyspark -- Standalone application with Spark -- Spark streaming application -- Spark SQL application -- Summary -- Chapter 8: Building Enterprise Search Platform -- The data search concept -- The need for an enterprise search engine -- Tools for building an enterprise search engine -- Elasticsearch -- Why Elasticsearch? -- Elasticsearch components -- Index -- Document -- Mapping -- Cluster -- Type -- How to index documents in Elasticsearch? -- Elasticsearch installation -- Installation of Elasticsearch -- Create index -- Primary shard -- Replica shard -- Ingest documents into index -- Bulk Insert -- Document search -- Meta fields -- Mapping -- Static mapping -- Dynamic mapping -- Elasticsearch-supported data types -- Mapping example -- Analyzer -- Elasticsearch stack components ; Beats -- Logstash -- Kibana -- Use case -- Summary -- Chapter 9: Designing Data Visualization Solutions -- Data visualization -- Bar/column chart -- Line/area chart -- Pie chart -- Radar chart -- Scatter/bubble chart -- Other charts -- Practical data visualization in Hadoop -- Apache Druid -- Druid components -- Other required components -- Apache Druid installation -- Add service -- Select Druid and Superset -- Service placement on servers -- Choose Slaves and Clients -- Service configurations -- Service installation -- Installation summary -- Sample data ingestion into Druid -- MySQL database -- Sample database -- Download the sample dataset -- Copy the data to MySQL -- Verify integrity of the tables -- Single Normalized Table -- Apache Superset -- Accessing the Superset application -- Superset dashboards -- Understanding Wikipedia edits data -- Create Superset Slices using Wikipedia data -- Unique users count -- Word Cloud for top US regions -- Sunburst chart - top 10 cities -- Top 50 channels and namespaces via directed force layout -- Top 25 countries/channels distribution -- Creating wikipedia edits dashboard from Slices -- Apache Superset with RDBMS -- Supported databases -- Understanding employee database -- Employees table -- Departments table -- Department manager table -- Department Employees Table -- Titles table -- Salaries table -- Normalized employees table -- Superset Slices for employees database -- Register MySQL database/table -- Slices and Dashboard creation -- Department salary breakup -- Salary Diversity -- Salary Change Per Role Per Year -- Dashboard creation -- Summary -- Chapter 10: Developing Applications Using the Cloud -- What is the Cloud? -- Available technologies in the Cloud -- Planning the Cloud infrastructure -- Dedicated servers versus shared servers -- Dedicated servers -- Shared servers -- High availability ; Business continuity planning
In
All registered articles grouped by issue
...
...
...
...
...