How to Tame a Data Format

Presented by George Wilson
Tuesday 1:40 p.m.–2:25 p.m. in Medium Lecture Theatre CB11.00.401
Target audience: Developer

Abstract

Many programmers, data scientists, and other science and mathematics professionals spend a lot of time cleaning and working with data stored in nested formats, such as JSON; or in tabular formats, such as CSV. There are many libraries in many languages for reading and writing these formats. In strongly statically-typed languages, we achieve a benefit by imposing structure onto the parsed data. The messiness of real data makes this challenging, but hope is not lost! This talk will discuss design decisions made in existing open source libraries, and highlight the tools and techniques for designing practical and general purely functional libraries to work with these formats, while maintaining the benefits of strong static typing. We will learn about Encode and Decode type classes, zippers, and some different philosophies about error handling. We will see what differences exist in the application of these tools to nested and tabular data formats. An attendee will leave this talk with an understanding of the design of a modern functional encoding and decoding library. You should be able to jump into one of these libraries and understand how the pieces fit together. New tools will be added to your toolbelt for when you come up against a data format and need to write your own library.

Presented by

George Wilson

George Wilson is a very enthusiastic functional programmer at the Queensland Functional Programming Lab, under Data61/CSIRO, located in Brisbane, Australia. He thoroughly enjoys writing Haskell all day. George co-organises the Brisbane Functional Programming Group and enjoys teaching and learning functional programming and related areas of mathematics.