Overview
The main purpose of this 3-day course is to give participants the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.
Prerequisites
In addition to their professional experience, participants who attend this course should have:
- Programming experience using R, and familiarity with common R packages
- Knowledge of common statistical methods and data analysis best practices
- Basic knowledge of the Microsoft Windows operating system and its core functionality.
Working knowledge of relational databases.
Who Should Attend?
This course is recommended for people who wish to analyze large datasets within a big data environment. This course is also recommended for developers who need to integrate R analyses into their solutions.
Course Outline
- What is Microsoft R server
- Using Microsoft R client
- The ScaleR functions
Lab: Exploring Microsoft R Server and Microsoft R Client
- Understanding ScaleR date sources
- Reading data into an XDF object
- Summarizing data in an XDF object
Lab: Exploring Big Data
- Visualizing In-memory data
- Visualizing big data
Lab: Visualizing data
- Transforming Big Data
- Managing datasets
Lab: Processing big data
- Using the RxLocalParallel compute context with rxExec
- Using the revoPemaR package
Lab: Using rxExec and RevoPemaR to parallelize operations
- Clustering Big Data
- Generating regression models and making predictions
Lab: Creating a linear regression model
- Creating partitioning models based on decision trees
- Test partitioning models by making and comparing predictions
Lab: Creating and evaluating partitioning models
- Using R in SQL Server
- Using Hadoop Map/Reduce
- Using Hadoop Spark
Lab: Processing big data in SQL Server and Hadoop