Oreilly - Hadoop Fundamentals for Data Scientists - 9781491913161
Oreilly - Hadoop Fundamentals for Data Scientists
by | Released January 2015 | ISBN: 9781491913161


Get a practical introduction to Hadoop, the framework that made big data and large-scale analytics possible by combining distributed computing techniques with distributed storage. In this video tutorial, hosts Benjamin Bengfort and Jenny Kim discuss the core concepts behind distributed computing and big data, and then show you how to work with a Hadoop cluster and program analytical jobs. You'll also learn how to use higher-level tools such as Hive and Spark.Hadoop is a cluster computing technology that has many moving parts, including distributed systems administration, data engineering and warehousing methodologies, software engineering for distributed computing, and large-scale analytics. With this video, you'll learn how to operationalize analytics over large datasets and rapidly deploy analytical jobs with a variety of toolsets.Once you've completed this video, you'll understand how different parts of Hadoop combine to form an entire data pipeline managed by teams of data engineers, data programmers, data researchers, and data business people.Understand the Hadoop architecture and set up a pseudo-distributed development environmentLearn how to develop distributed computations with MapReduce and the Hadoop Distributed File System (HDFS)Work with Hadoop via the command-line interfaceUse the Hadoop Streaming utility to execute MapReduce jobs in PythonExplore data warehousing, higher-order data flows, and other projects in the Hadoop ecosystemLearn how to use Hive to query and analyze relational data using HadoopUse summarization, filtering, and aggregation to move Big Data towards last mile computationUnderstand how analytical workflows including iterative machine learning, feature analysis, and data modeling work in a Big Data contextBenjamin Bengfort is a data scientist and programmer in Washington DC who prefers technology to politics but sees the value of data in every domain. Alongside his work teaching, writing, and developing large-scale analytics with a focus on statistical machine learning, he is finishing his PhD at the University of Maryland where he studies machine learning and artificial intelligence.Jenny Kim, a software engineer in the San Francisco Bay Area, develops, teaches, and writes about big data analytics applications and specializes in large-scale, distributed computing infrastructures and machine-learning algorithms to support recommendations systems. Show and hide more Publisher resources Download Example Code
  1. Overview of the Video Course 00:08:24
  2. A Distributed Computing Environment
    • The Motivation for Hadoop 00:09:23
    • A Brief History of Hadoop 00:05:34
    • Understanding the Hadoop Architecture 00:12:24
    • Setting Up A Pseudo-Distributed Environment 00:03:47
    • The Distributed File System (HDFS) 00:11:16
    • Distributed Computing with MapReduce 00:07:45
    • Word Count - the "Hello, World" of Hadoop! 00:08:02
  3. Computing with Hadoop
    • How a MapReduce Job Works 00:10:27
    • Mappers and Reducers in Detail 00:19:17
    • Working with Hadoop via the Command Line: Starting HDFS and Yarn 00:07:54
    • Working with Hadoop via the Command Line: Loading Data into HDFS 00:07:05
    • Working with Hadoop via the Command Line: Running a MapReduce Job 00:07:55
    • How To Use Our Github Goodies 00:00:38
    • Working in Python with Hadoop Streaming 00:21:55
    • Common MapReduce Tasks 00:13:54
    • Spark on Hadoop 2 00:18:26
    • Creating a Spark Application with Python 00:22:31
  4. The Hadoop Ecosystem
    • The Hadoop Ecosystem 00:03:01
    • Data Warehousing with Hadoop 00:17:15
    • Higher Order Data Flows 00:11:21
    • Other Notable Projects 00:08:31
  5. Working with Data on Hive
    • Introduction to Hive 00:04:29
    • Interacting with Data via the Hive Console 00:10:40
    • Creating Databases, Tables, and Schemas for Hive 00:08:20
    • Loading Data into Hive from HDFS 00:09:26
    • Querying Data and Performing Aggregations With Hive 00:12:07
  6. Towards Last Mile Computing
    • Decomposing Large Data Sets to a Computational Space 00:07:56
    • Linear Regressions 00:20:11
    • Summarizing Documents with TF-IDF 00:14:11
    • Classification of Text 00:15:45
    • Parallel Canopy Clustering 00:11:03
    • Computing Recommendations via Linear Log-Likelihoods 00:14:51
  7. Show and hide more

    Oreilly - Hadoop Fundamentals for Data Scientists

    9781491913161.hadoop.fundamentals.for.OR.part1.rar

    9781491913161.hadoop.fundamentals.for.OR.part2.rar


 TO MAC USERS: If RAR password doesn't work, use this archive program: 

RAR Expander 0.8.5 Beta 4  and extract password protected files without error.


 TO WIN USERS: If RAR password doesn't work, use this archive program: 

Latest Winrar  and extract password protected files without error.


 Coktum   |  

Information
Members of Guests cannot leave comments.


SermonBox - Seasonal Collection

SermonBox - The Series Pack Collection

Top Rated News

  • Christmas Material
  • Laser Cut & Print Design Elements Bundle - ETSY
  • Daz3D - All Materials - SKU 37000-37999
  • Cgaxis - All Product - 2019 - All Retail! - UPDATED!!!
  • DigitalXModels Full Collections
  • Rampant Design Tools Full Collections Total: $4400
  • FilmLooks.Com Full Collection
  • All PixelSquid Product
  • The Pixel Lab Collection
  • Envato Elements Full Sources- 3200+ Files
  • Ui8.NET Full Sources
  • The History of The 20th Century
  • The Dover Collections
  • Snake Interiors Collections
  • Inspirational Collections
  • Veer Fancy Collections
  • All Ojo Images
  • All ZZVE Collections
  • All Sozaijiten Collections
  • All Image Broker Collections
  • Shuterstock Bundle Collections
  • Tattoo Collections
  • Blend Images Collections
  • Authors Tuorism Collections
  • Motion Mile - Big Bundle
  • PhotoBacks - All Product - 2018
  • Dekes Techniques - Photoshop & Illustrator Course - 1 to 673
Telegram GFXTRA Group
Udemy - Turkce Gorsel Ogrenme Setleri - Part 2
Videohive Wow Pack Series


rss