Apache Spark | iDigBio

iDigBio and Data Carpentry go to Africa

Location: BIS (TDWG) 2015 Biodiversity Information Standards:
An amazing 2 weeks in Nairobi, Kenya.
by Deb Paul, input from Libby Ellwood and Matt Collins.

Tags:

Biodiversity Information Standards (TDWG)

Nairobi

JRS Foundation

Gordon and Betty Moore Foundation

Data Carpentry

Biodiversity data mobibilization

Biodiversity informatics skills

Blog

Exploring unique values in iDigBio using Apache Spark

Data exploration for large datasets is always challenging. Often you are left with deciding between subsetting the dataset (randomly or on some facet), making slow progress waiting for results just to find that something needs to be fixed, or optimizing code for performance when you don't even know if the result is going to be interesting. Having a high-performance system capable of ad-hoc investigation has always been difficult and/or expensive.

Tags:

Whole-Dataset Analyses using Apache Spark

Poster Title: Whole-Dataset Analyses using Apache Spark Authors: Matthew Collins, Jorrit Poelen, Alexander Thompson

Tags:

Apache Spark

Data analysis

Big Data processing