HPC Monthly Workshop – September 2, 2014 – Big Data

XSEDE along with the Pittsburgh Supercomputing Center are pleased to announce a one day Big Data workshop, to be held September 2, 2014.

This workshop will focus on topics such as Hadoop and SPARQL.


Register for the on-site workshop at PSC here: https://portal.xsede.org/course-calendar/-/training-user/class/296/session/472


11:00AM – 1:00PM Eastern Time

Big Data Programming with Hadoop and Spark

This session will give an overview of programming big data applications focusing on Hadoop and Spark.

  1. Hadoop System Overview – This section will cover the basics of the Hadoop Environment. We will discuss the Map Reduce daemons, the scheduling and monitoring environment, and interacting with the distributed file system (HDFS).
  2. Hadoop Jobs – We will write a simple Java Map/Reduce program and run through the process of compiling, packaging, submitting, monitoring, and collecting the output of a Hadoop job. We will also briefly discuss other applications that run on the Hadoop platform such as HBase and Hadoop Streaming.
  3. Spark – We will discuss the Spark platform and its concept of Resilient Distributed Datasets. We will cover the relationship between Spark and Hadoop, and we will write and submit an example job. We will also discuss the Spark Machine Learning API.

2:00PM – 5:00PM Eastern Time

Urika Training

  • Learn the Graph Analytic approach to Data analysis, including some real-world examples.
  • Gain an introduction to the RDF data format and the SPARQL query lanquage, with hands-on practice.
  • Learn how to interact with the Sherlock Urika system.

Register for the on-site workshop at PSC here: https://portal.xsede.org/course-calendar/-/training-user/class/296/session/472