Notes from the Kölner R meeting, 18 September 2015
R in a big data pipeline
Download slides |
Yuki Katoh had travelled all the way from Berlin to present on how to embed R with luigi
into a heterogeneous workflow of different applications. This is especially useful when R needs to be integrated with hadoop/hdfs based technologies, such as Spark and Hive. Luigi is not unlike Make, which Kirill presented at our last meeting in June. In a configuration file Yuki specified the various workflow steps and dependencies between the jobs.
luigid
server allows Yuki to monitor the various parts of the dependency graph visually. Thus, he can see the progress of his workflow in real time and identify quickly, when and where a sub process fails. As Yuki pointed out, this becomes critical in production systems, where failures need to be known and fixed quickly, unlike when ones carries out an explorative analysis in a development/research environment. See also Yuki’s blog post for further details.Shiny + Shinyjs
Download presentation files |
Shiny is a very popular R package that allows users to develop interactive browser applications. Paul Viefers introduced us to the extension shinyjs
, a package written by Dean Attali. The name suggests already that the package provides additional JavaScript functionality. Indeed, it does, but without the need to learn JavaScript, as those functions are wrapped into R.
Experience vs. Data
Download slides |
The last talk of the meeting had a more statistical focus with examples from insurance. I repeated my talk from the LondonR user group meeting in June. One of the challenge in insurance is that despite of having many customers , insurance companies will have little claims data per customer to assess risks.
I presented some Bayesian ideas to analyse risks with little data. I used the wonderful “Hit and run accident” example from Daniel Kahneman’s book Thinking, fast and slow to explain Bayes’ formula, introduced Bayesian belief networks for a claims analysis and discussed the challenge of predicting events when they haven’t happened yet (also in Stan). Along the way I mentioned a few ideas on communicating risk, which I learned from David Spiegelhalter earlier this year.Next Kölner R meeting
The next meeting will be scheduled in December. Details will be published on our Meetup site. Thanks again to Revolution Analytics/Microsoft for their sponsorship.
Please get in touch, if you would like to present at the next meeting.
Citation
For attribution, please cite this work as:Markus Gesmann (Sep 22, 2015) Notes from the Kölner R meeting, 18 September 2015. Retrieved from https://magesblog.com/post/2015-09-22-notes-from-kolner-r-meeting-18/
@misc{ 2015-notes-from-the-kolner-r-meeting-18-september-2015,
author = { Markus Gesmann },
title = { Notes from the Kölner R meeting, 18 September 2015 },
url = { https://magesblog.com/post/2015-09-22-notes-from-kolner-r-meeting-18/ },
year = { 2015 }
updated = { Sep 22, 2015 }
}