R

R in Insurance 2017 Programme online

The programme for the 2017 R in Insurance conference in Paris has been published. Talks will discuss new ideas and research with the applications in life and general insurance, from network analysis, reserving, pricing to catastrophe modelling, followed by a conference dinner at the Musée d’Orsay. Registration is open until 22 May. Agenda 9:00 am - 9:10 am Welcome - Julien Pouget (Directeur de l’ENSAE) 9:10 am - 10:00 am Opening Keynote Session Textual analysis of expert reports to increase knowledge of technological risks - Julie Seguela, Covea 10:00 am - 11:00 am Session 1 - big data 10:00 - 10:20 › Network Analytics in Claims Level Predictive Modelling - Marcela Granados, Ernst & Young

R in Insurance 2017

The fifth conference on R in Insurance will be held on 8 June 2017 at ENSAE. ENSAE is the Paris Graduate School for Economics, Statistics and Finance. The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in Insurance. This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation.

Notes from the Kölner R meeting, 14 October 2016

Last Friday the Cologne R user group came together for two talks and a quiz at Eye/o, the company behind Adblock Plus, in Köln-Ehrenfeld. Eye/o were a great host, offering nibbles and drinks to warm up the event and pizza at the end. Cologne R user meeting at Eye/oThe first talk was given by Jiddu Alexander, a physicist turned freelance data scientist. Jiddu gave an introduction into the tidyverse. He presented the concept of tidy data, and how the tidyverse bundle can be used to manage multiple models.

Next Kölner R User Meeting: Friday 14 October

The 19th Cologne R user group meeting is scheduled for this Friday, 14 October 2016. We have three talks, followed by networking drinks. Introduction to the tidyverse tools - Jiddu Alexander Performance profiling and improvement in R - Nils Glück Batch processing of R-Scripts with Excel - Klaus Jacobi Venue: Eyeo GmbH, Lichtstraße 25, 50825 Köln For further details visit our KölnRUG Meetup site. Notes from past meetings are available here.

Notes from 4th Bayesian Mixer Meetup

Last Tuesday we got together for the 4th Bayesian Mixer Meetup. Product Madness kindly hosted us at their offices in Euston Square. About 50 Bayesians came along; the biggest turn up thus far, including developers of PyMC3 (Peadar Coyle) and Stan (Michael Betancourt). The agenda had two feature talks by Dominic Steinitz and Volodymyr Kazantsev and a lightning talk by Jon Sedar. Dominic Steinitz: Hamiltonian and Sequential MC samplers to model ecosystemsDominic shared with us his experience of using Hamiltonian and Sequential Monte Carlo samplers to model ecosystems.

Fitting a distribution in Stan from scratch

Last week the French National Institute of Health and Medical Research (Inserm) organised with the Stan Group a training programme on Bayesian Inference with Stan for Pharmacometrics in Paris. Daniel Lee and Michael Betancourt, who run the course over three days, are not only members of Stan’s development team, but also excellent teachers. Both were supported by Eric Novik, who gave an Introduction to Stan at the Paris Dataiku User Group last week as well.

googleVis 0.6.1 on CRAN

We released googleVis version 0.6.1 on CRAN last week. The update fixes issues with setting certain options, following the switch from RJSONIO to jsonlite. Screen shot of some of the Google ChartsNew to googleVis? The package provides an interface between R and the Google Charts Tools, allowing you to create interactive web charts from R without uploading your data to Google. The charts are displayed by default via the R internal help browser.

Notes from the 4th R in Insurance Conference

The 4th R in Insurance conference took place at Cass Business School London on 11 July 2016. This one-day conference focused once more on the wide range of applications of R in insurance, actuarial science and beyond. The conference programme covered topics including reserving, pricing, loss modelling, the use of R in a production environment and much more. The audience of the conference included both practitioners (c.80%) and academics (c.20%) who are active or interested in the applications of R in Insurance.

Notes from the Kölner R meeting, 9 July 2016

Last Thursday the Cologne R user group came together again. This time, our two speakers arrived from Bavaria, to talk about Spark and R Server. Introduction to Apache SparkDownload slidesDubravko Dulic gave an introduction to Apache Spark and why Spark might be of interest to data scientists using R. Spark is designed for cluster computing, i.e. to distribute jobs across several computers. Not all tasks in R can be split easily across several nodes in a cluster, but if you use functions like by in R, then it is most likely doable.

Notes from 3rd and 3.5th Bayesian Mixer Meetup

Two Bayesian Mixer meet-ups in a row. Can it get any better? Our third ‘regular’ meeting took place at Cass Business School on 24 June. Big thanks to Pietro and Andreas, who supported us from Cass. The next day, Jon Sedar of Applied AI, managed to arrange a special summer PyMC3 event. 3rd Bayesian Mixer meet-upFirst up was Luis Usier, who talked about cross validation. Luis is a former student of Andrew Gelman, so, of course, his talk touched on Stan and the ‘loo’ (leave one out) package in R.

Early bird registration for R in Insurance closes 30 May

Hurry! The early bird registration offer for the 4th R in Insurance conference, 11 July 2016, at Cass Business School closes 30 May. This one-day conference will focus once more on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered include reserving, pricing, loss modelling, the use of R in a production environment, and more. We have a fantastic programme with international speakers and conference dinner at Ironmongers Hall.

R in Insurance 2016 Programme

We are delighted to announce that the programme for the 4th R in Insurance conference at Cass Business School in London, 11 July 2016, have been finalised. Register by the end of May to get the early bird booking fee. The organisers gratefully acknowledge the sponsorship of Verisk, Mirai Solutions, Applied AI, Studio, CYBAEA and Oasis, without whom the event wouldn’t be possible. Agenda [09:00 - 10:00] Keynote 1:

New R package to access World Bank data

Staying on top of new CRAN packages is quite a challenge nowadays. However, thanks to Dirk’s CRANberries service I occasionally spot a new gem, such as wbstats, which appeared on CRAN last week. Similarly to the WDI package, wbstats offers an interface to the World Bank database. With the functions of wbstats the World Bank data can be searched and data for several indicators requested. Unlike WDI, the data is returned in a ‘long’ table with one column for all values and a separate column for the indicators.

Notes from 2nd Bayesian Mixer Meetup

Last Friday the 2nd Bayesian Mixer Meetup (@BayesianMixer) took place at Cass Business School, thanks to Pietro Millossovich and Andreas Tsanakas, who helped to organise the event. Bayesian Mixer at Cass First up was Davide De March talking about the challenges in biochemistry experimentation, which are often characterised by complex and emerging relations among components. The very little prior knowledge about complex molecules bindings left a fertile field for a probabilistic graphical model.

R in Insurance: Abstract submission closes end of March

Hurry! The abstract submission deadline for the 4th R in Insurance conference in London, 11 July 2016 is approaching soon. You have until the 28th of March to submit a one-page abstract for consideration. Both academic and practitioner proposals related to R are encouraged. Please email your abstract of no more than 300 words (in text or pdf format) to rinsuranceconference@gmail.com. Invited talks will be given by: Mario V. Wüthrich, RiskLab, Department of Mathematics, ETH Zurich.

Notes from the Kölner R meeting, 26 February 2016

Last Friday the Cologne R user group came together for the 17th time. This time, we were in for a special treatment, with two talks by psychologists! But, there was nothing to fear, we were in safe hands, and for the first time, we met at the new Microsoft office in Cologne. Lecture room at Microsoft, Cologne First up was Meik Michalke from the University of Düsseldorf presenting the RKWard project.

Next Kölner R User Meeting: Friday, 26 Feburary 2016

The 17th Cologne R user group meeting is scheduled for this Friday, 26 February 2016. We have two talks, followed by networking drinks. Introduction to Bayesian Regression Models using Stan with the brms package - Paul-Christian Bürkner (Uni Münster) RKWard: A Graphical User Interface and Integrated Development Environment for Statistical Analysis with R - Meik Michalke (Uni Düsseldorf) Venue: Microsoft Deutschland, Holzmarkt 2a Cologne 50676 DE, Köln For further details visit our KölnRUG Meetup site.

Bayesian Mixer on Meetup

We had our first successful Bayesian Mixer Meetup last Friday night at the Artillery Arms! We expected about 15 - 20 people to turn up, when we booked the function room overlooking Bunhill Cemetery and Bayes’ grave. Now, looking at the photos taken during the evening, it seems that our prior believe was pretty good.

The event started with a talk from my side about some very basic Bayesian models, which I used a while back to get my head around the concepts in an insurance context.

Using SVG graphics in blog posts

My traditional work flow for embedding R graphics into a blog post has been via a PNG files that I upload online. However, when I created a ‘simple’ graphic with only basic curves and triangles for a recent post, I noticed that the PNG output didn’t look as crisp as I expected it to be. So, eventually I used a SVG (scalable vector graphic) instead. Creating a SVG file with R could’t be easier; e.

First Bayesian Mixer Meeting in London

There is a nice pub between Bunhill Fields and the Royal Statistical Society in London: The Artillery Arms. Clearly, the perfect place to bring people together to talk about Bayesian Statistics. Well, that’s what Jon Sedar (@jonsedar, applied.ai) and I thought. Source: http://www.artillery-arms.co.uk/Hence, we’d like to organise a Bayesian Mixer Meetup on Friday, 12 February, 19:00. We booked the upstairs function room at the Artillery Arms and if you look outside the window, you can see Thomas Bayes’ grave.

Flowing triangles

I have admired the work of the artist Bridget Riley for a long time. She is now in her eighties, but as it seems still very creative and productive. Some of her recent work combines simple triangles in fascinating compositions. The longer I look at them, the more patterns I recognise. Yet, the actual painting can be explained easily, in a sense of a specification document to reproduce the pattern precisely.

Formatting table output in R

Formatting data for output in a table can be a bit of a pain in R. The package formattable by Kun Ren and Kenton Russell provides some intuitive functions to create good looking tables for the R console or HTML quickly. The package home page demonstrates the functions with illustrative examples nicely. There are a few points I really like: the functions accounting, currency, percent transform numbers into better human readable output

R in Insurance: Registration and abstract submission opened

Following the successful 3rd R in Insurance conference in Amsterdam last year, we return to London this year. The registration for the 4th conference on R in Insurance on Monday 11 July 2016 at Cass Business School has opened. This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance.

Next Kölner R User Meeting: Friday, 4 December 2015

The 16th Cologne R user group meeting is scheduled for this Friday, 4 December 2015 and we have great line up with three talks followed by networking drinks. Monitoring process change using Bayesian methods (Mick Cooney) A common business problem is to evaluate the effect of a change of process, and this talk will discuss a straightforward approach to this using conjugate priors. Editing R files with DataJoy (Dietmar Janetzko)

Notes from Warsaw R meetup

I had the great pleasure time to attend the Warsaw R meetup last Thursday. The organisers Olga Mierzwa and Przemyslaw Biecek had put together an event with a focus on R in Insurance (btw, there is a conference with the same name), discussing examples of pricing and reserving in general and life insurance. Experience vs. DataI kicked off with some observations of the challenges in insurance pricing. Accidents are thankfully rare events, that’s why we buy insurance.

Hierarchical Loss Reserving with Stan

I continue with the growth curve model for loss reserving from last week’s post. Today, following the ideas of James Guszcza [2] I will add an hierarchical component to the model, by treating the ultimate loss cost of an accident year as a random effect. Initially, I will use the nlme R package, just as James did in his paper, and then move on to Stan/RStan [6], which will allow me to estimate the full distribution of future claims payments.

Loss Developments via Growth Curves and Stan

Last week I posted a biological example of fitting a non-linear growth curve with Stan/RStan. Today, I want to apply a similar approach to insurance data using ideas by David Clark [1] and James Guszcza [2]. Instead of predicting the growth of dugongs (sea cows), I would like to predict the growth of cumulative insurance loss payments over time, originated from different origin years. Loss payments of younger accident years are just like a new generation of dugongs, they will be small in size initially, grow as they get older, until the losses are fully settled.

Non-linear growth curves with Stan

I suppose the go to tool for fitting non-linear models in R is nls of the stats package. In this post I will show an alternative approach with Stan/RStan, as illustrated in the example, Dugongs: “nonlinear growth curve”, that is part of Stan’s documentation. The original example itself is taken from OpenBUGS. The data describes the length and age measurements for 27 captured dugongs (sea cows). Carlin and Gelfand (1991) model the data using a nonlinear growth curve with no inflection point and an asymptote as $x_i$ tends to infinity:

R in Insurance 2016

Following the successful 3rd R in Insurance conference in Amsterdam this year, we will return to London next year. We will be back at Cass Business School, 11 July 2016. The event will focus again on the use of R in insurance, bringing together experts from industry and academia with a diverse background of disciplines, such as actuarial science, catastrophe modelling, finance, statistics and computer science. We are delighted to announce or keynote speakers already: Dan Murphy and Mario V.

ChainLadder 0.2.2 is out with improved glmReserve function

We released version 0.2.2 of ChainLadder a few weeks ago. This version adds back the functionality to estimate the index parameter for the compound Poisson model in glmReserve using the cplm package by Wayne Zhang. Ok, what does this all mean? I will run through a couple of examples and look behind the scene of glmReserve. However, the clue is in the title, glmReserve is a function that uses a generalised linear model to estimate future claims, assuming claims follow a Tweedie distribution.

Notes from the Kölner R meeting, 18 September 2015

Last Friday the Cologne R user group came together for the 15th time. Since its inception over three years ago the group evolved from a small gathering in a pub into an active data science community, covering wider topics than just R. Still, R is the link and clue between the different interests. Last Friday’s agenda was a good example of this, with three talks touching on workflow management, web development and risk analysis.

Next Kölner R User Meeting: Friday, 18 September 2015

The 15th Cologne R user group meeting is scheduled for this Friday, 18 September 2015 and we have a full agenda with three talks followed by networking drinks. R in big data pipeline with luigi (Yuki Katoh) R in big data pipeline: Put your awesome R codes into production. Learn how to build solid big data pipeline around it. shinyjs (Paul Viefers) Using JavaScript in shiny, without knowing JavaScript

Bayesian regression models using Stan in R

It seems the summer is coming to end in London, so I shall take a final look at my ice cream data that I have been playing around with to predict sales statistics based on temperature for the last couple of weeks [1], [2], [3]. Here I will use the new brms (GitHub, CRAN) package by Paul-Christian Bürkner to derive the 95% prediction credible interval for the four models I introduced in my first post about generalised linear models.

Visualising the predictive distribution of a log-transformed linear model

Last week I presented visualisations of theoretical distributions that predict ice cream sales statistics based on linear and generalised linear models, which I introduced in an earlier post. Theoretical distributionsToday I will take a closer look at the log-transformed linear model and use Stan/rstan, not only to model the sales statistics, but also to generate samples from the posterior predictive distribution. The posterior predictive distribution is what I am most interested in.

Visualising theoretical distributions of GLMs

Two weeks ago I discussed various linear and generalised linear models in R using ice cream sales statistics. The data showed not surprisingly that more ice cream was sold at higher temperatures. icecream <- data.frame( temp=c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1, 18.5, 19.4, 22.1, 22.6, 23.4, 25.1), units=c(185L, 215L, 332L, 325L, 408L, 421L, 406L, 412L, 522L, 445L, 544L, 614L) )I used a linear model, a log-transformed linear model, a Poisson and Binomial generalised linear model to predict sales within and outside the range of data available.

Generalised Linear Models in R

Linear models are the bread and butter of statistics, but there is a lot more to it than taking a ruler and drawing a line through a couple of points. Some time ago Rasmus Bååth published an insightful blog article about how such models could be described from a distribution centric point of view, instead of the classic error terms convention. I think the distribution centric view makes generalised linear models (GLM) much easier to understand as well.

Notes from the 3rd R in Insurance Conference

Photo: Arthur CharpentierThe R in Insurance conference in Amsterdam was a sold out success! Congratulations to the organising committee at the University of Amsterdam, and many thanks to our sponsors: Milliman, RStudio, CYBAEA, Deloitte, a.s.r., Triple A Risk Finance, AEGON, Delta Lloyd Amsterdam, QBE Re and APPLIED AI This one-day conference focused once more on applications in insurance and actuarial science that use R. Topics covered included reserving, pricing, loss modelling, the use of R in a production environment and more.

ChainLadder 0.2.1 released

Over the weekend we released version 0.2.1 of the ChainLadder package for claims reserving on CRAN. New FeaturesNew function PaidIncurredChain by Fabio Concina, based on the 2010 Merz & Wüthrich paper Paid-incurred chain claims reserving methodFunctions plot.MackChainLadder and plot.BootChainLadder gained new argument which, allowing users to specify which sub-plot to display. Thanks to Christophe Dutang for this suggestion. Output of plot(MackChainLadder(MW2014, est.sigma=“Mack”), which=3:6)ChangesUpdated NAMESPACE file to comply with new R CMD checks in R-3.

Adding mathematical notations to R plots

I have to admit that I find the plotmath expressions in R a little fiddly to annotate plots with mathematical notation. Apparently I am not the only one, but Stefano Meschiari did actually something about it. A few days ago his package latex2exp appeared on CRAN. The package provides the wonderful function latex2exp that translates LaTeX code into plotmath expressions. Brillant! All I have to remember is to escape the “&rdquo; character, that is write “\“ instead of “&rdquo;.

Notes from the Kölner R meeting, 26 June 2015

Last Friday the Cologne R user group came together for the 14th time. For the first time we met at Startplatz, a start-up incubator venue. The venue was excellent, not only did they provide us with a much larger room, but also with table-football and drinks. Many thanks to Kirill for organising all of this! Photo: Günter FaesWe had two excellent advanced talks. Both were very informative and well presented.

Next Kölner R User Meeting: Friday, 26 June 2015

The next Cologne R user group meeting is scheduled for this Friday, 6 June 2015 and we have an exciting agenda with two talks followed by networking drinks. Data Science at the Commandline (Kirill Pomogajko)An Introduction to RStan and the Stan Modelling Language (Paul Viefers)Please note: Our venue changed! We have outgrown the seminar room at the Institute of Sociology and move to Startplatz, a start-up incubator venue: Im Mediapark, 550670 Köln

How to place titles in lattice plots

I like the Economist theme in the latticeExtra package. It produces nice looking charts that mimic the design of the weekly newspaper, such as in this example:

For some time I wondered how I could put the title of my lattice plots into the top left corner as well (by default titles are centred). Reviewing the code of the theEconomist.theme function by Felix Andrews reveals the trick. It is the setting of par.

Using system and web fonts in R plots

The forthcoming R Journal has an interesting article on the showtext package by Yixuan Qiu. The package allows me to use system and web fonts directly in R plots, reminding me a little of the approach taken by XeLaTeX. But “unlike other methods to embed fonts into graphics, showtext converts text into raster images or polygons, and then adds them to the plot canvas. This method produces platform-independent image files that do not rely on the fonts that create them.

Back from R/Finance in Chicago

I had a great time at the R/Finance conference in Chicago last Friday/Saturday. Some brief takeaways for me were: From Emanuel Derman’s talk: It is is important to distinguish between theories and models. Theories live in an abstract world and for a given set of axioms they can be proven right. However, models live in the real world, are build on simplifying assumptions and are only useful until experiments/data proves them wrong.

Communicating Risk at the Bay Area R User Group

I will be speaking at the Bay Area User Group meeting tonight about Communicating Risk. Anthony Goldbloom from Kaggle and Karim Chine from ElasticR will be there as well. The meeting will be at Microsoft in Mountain View. Later this week I will give a similar presentation at the R in Finance conference in Chicago. Please get in touch if you are around and would like to share a coffee with me.

Posterior predictive output with Stan

I continue my Stan experiments with another insurance example. Here I am particular interested in the posterior predictive distribution from only three data points. Or, to put it differently I have a customer of three years and I’d like to predict the expected claims cost for the next year to set or adjust the premium. The example is taken from section 16.17 in Loss Models: From Data to Decisions [1]. Some time ago I used the same example to get my head around a Bayesian credibility model.

Hello Stan!

In my previous post I discussed how Longley-Cook, an actuary at an insurance company in the 1950’s, used Bayesian reasoning to estimate the probability for a mid-air collision of two planes. Here I will use the same model to get started with Stan/RStan, a probabilistic programming language for Bayesian inference. Last week my prior was given as a Beta distribution with parameters $\alpha=1, \beta=1$ and the likelihood was assumed to be a Bernoulli distribution with parameter $\theta$: $$\begin{aligned}

Predicting events, when they haven't happened yet

Suppose you have to predict the probabilities of events which haven’t happened yet. How do you do this? Here is an example from the 1950s when Longley-Cook, an actuary at an insurance company, was asked to price the risk for a mid-air collision of two planes, an event which as far as he knew hadn’t happened before. The civilian airline industry was still very young, but rapidly growing and all Longely-Cook knew was that there were no collisions in the previous 5 years [1].

R in Insurance 2015 Conference Programme

The programme for the 3rd R in Insurance conference is on-line. The event will take place on 29 June 2015 at the University of Amsterdam. Time to register now. Special thanks to our sponsors, without whom the conference wouldn’t be possible: CYBAEA, RStudio, APPLIED AI, Milliman, QBE Re, AEGON, Delta Lloyd Amsterdam , Deloitte. You find impressions from the previous events on www.rininsurance.com. We hope to see you in Amsterdam!

Combining several lattice charts into one

Last week I mentioned the grid.arrange function of the gridExtra package that allows me to combine graphical grid objects onto one page. The latticeExtra package provides another elegant solution for trellis (lattice) plots: the function c.trellis() or just c() combines the panels of multiple trellis objects into one. Here is minimal example from the help file of c.trellis: library(latticeExtra)

Combine different types of plots.

c(wireframe(volcano), contourplot(volcano))

In my next example I am using data from Eurostat, the statistical office of the European Union, showing the use of public transport in four countries.

Plotting tables alsongside charts in R

Occasionally I’d like to plot a table alongside a chart in R, e.g. to present summary statistics of the graph itself. Thanks to the gridExtra package this is quite straightforward. The function tableGrob creates a table like plot of a data frame, while arrangeGrob allows me to arrange ggplot2, lattice and grid graphical objects (short ‘grobs’, such as tableGrob) on a page. Here is a little example: Session InfoR version 3.

Test Driven Analysis

I mused over Test Driven Analysis on this blog before, but it was Richard Pugh’s talk on SAS to R Migration at LondonR last week that brought the topic back into my mind and clarified a few things. Rich’s presentation focused on the challenge of how to ensure that the new system ® would provide the same answers as the legacy system (SAS). This is when it clicked with me: My brain is just another system as well.

Interactive pivot tables with R

I love interactive pivot tables. That is the number one reason why I keep using spreadsheet software. The ability to look at data quickly in lots of different ways, without a single line of code helps me to get an understanding of the data really fast. Perhaps I can do the same now in R as well. At yesterday’s LondonR meeting Enzo Martoglio presented briefly his rpivotTable package. Enzo builds on Nicolas Kruchten’s PivotTable.

ChainLadder 0.2.0 adds Solvency II CDR functions

ChainLadder is an R package that provides statistical methods and models for claims reserving in general insurance. With version 0.2.0 we added new functions to estimate the claims development result (CDR) as required under Solvency II. Special thanks to Alessandro Carrato, Giuseppe Crupi and Mario Wüthrich who have contributed code and documentation. New FeaturesNew generic function CDR to estimate the one year claims development result. S3 methods for the Mack and bootstrap model have been added already:

R in Insurance: Abstract submission closes end of March

Hurry! The abstract submission deadline for the 3rd R in Insurance conference in Amsterdam, 29 June 2015 is approaching soon. You have until the 28th of March to submit a one-page abstract for consideration. Both academic and practitioner proposals related to R are encouraged. Please email your abstract of no more than 300 words (in text or pdf format) to r-in-insurance@uva.nl. The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance.

Notes from the Kölner R meeting, 6 March 2015

At last Friday’s Cologne R user group meeting we welcomed two Northerners from the left and right (or ‘right’ and ‘wrong’) side of the Rhine. Using R in Excel via R.NETGünter Faes and Matthias Spix Download slides Günter and Michael presented examples of a new R Excel plugin ‘Calidris‘ they developed using R.net. The plugin itself is written in C# and adds an R ribbon to Excel with pre-build functions.

Next Kölner R User Meeting: Friday, 6 March 2015

The next Cologne R user group meeting is scheduled for this Friday, 6 March 2015 and we have an exciting agenda with two talks, followed by networking drinks: Using R in Excel via R.NETGünter Faes and Matthias Spix MS Office and Excel are the ‘de-facto’ standards in many industries. Using R with Excel offers an opportunity to combine the statistical power of R with a familiar user interface. R.net offers a user friendly interfaces to Excel; R functions work just like Excel functions and are basically hidden away.

Minimal examples help

The other day I got stuck working with a huge data set using data.table in R. It took me a little while to realise that I had to produce a minimal reproducible example to actually understand why I got stuck in the first place. I know, this is the mantra I should follow before I reach out to R-help, Stack Overflow or indeed the package authors. Of course, more often than not, by following this advise, the problem becomes clear and with that the solution obvious.

Reading Arduino data directly into R

I have experimented with reading an Arduino signal into R in the past, using Rserve and Processing. Actually, it is much easier. I can read the output of my Arduino directly into R with the scan function. Here is my temperature sensor example again:

And all it needs to read the signal into the R console with my computer is: > f <- file(“/dev/cu.usbmodem3a21”, open=“r”) > scan(f, n=1) Read 1 item

R in Insurance 2015: Registration Opened

The registration for the third conference on R in Insurance on Monday 29 June 2015 at the University of Amsterdam has opened. This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance. Invited talks will be given by:

googleVis version 0.5.8 released

We released googleVis version 0.5.8 on CRAN last week. The update is a maintenance release for the forthcoming release of R 3.2.0. Screen shot of some of the Google ChartsNew to googleVis? The package provides an interface between R and the Google Charts Tools, allowing you to create interactive web charts from R without uploading your data to Google. The charts are displayed by default via the R internal help browser.

Extended Kalman filter example in R

Last week’s post about the Kalman filter focused on the derivation of the algorithm. Today I will continue with the extended Kalman filter (EKF) that can deal also with nonlinearities. According to Wikipedia the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS. Kalman filterI had the following dynamic linear model for the Kalman filter last week: $$\begin{align} x_{t+1} & = A x_t + w_t,\quad w_t \sim N(0,Q)\

Kalman filter example visualised with R

At the last Cologne R user meeting Holger Zien gave a great introduction to dynamic linear models (dlm). One special case of a dlm is the Kalman filter, which I will discuss in this post in more detail. I kind of used it earlier when I measured the temperature with my Arduino at home. Over the last week I came across the wonderful quantitative economic modelling site quant-econ.net, designed and written by Thomas J.

Notes from the Kölner R meeting, 12 December 2014

Last week’s Cologne R user group meeting was the best attended so far, and it was a remarkable event - I believe not a single line of R code was shown. Still, it was an R user group meeting with two excellent talks, and you will understand shortly why not much R code needed to be displayed. Introduction to Julia for R UsersDownload slidesHans Werner Borchers joined us from Mannheim to give an introduction to Julia for R users.

Next Kölner R User Meeting: Friday, 12 December 2014

The next Cologne R user group meeting is scheduled for this Friday, 12 December 2014. We have an exciting agenda with two talks on Julia and Dynamic Linear Models: Introduction to Julia for R UsersHans Werner Borchers Julia is a high-performance dynamic programming language for scientific computing, with a syntax that is familiar to users of other technical computing environments (Matlab, Python, R, etc.). It provides a sophisticated compiler, high performance with numerical accuracy, and extensive mathematical function libraries.

How cold is it? A Bayesian attempt to measure temperature

It is getting colder in London, yet it is still quite mild considering that it is late November. Well, indoors it still feels like 20°C (68°F) to me, but I have been told last week that I should switch on the heating. Luckily I found an old thermometer to check. The thermometer showed 18°C. Is it really below 20°C? The thermometer is quite old and I’m not sure that is works properly anymore.

First steps with ChainLadder: Import triangle from Excel into R

Taking the first step is often the hardest: getting data from Excel into R. Suppose you would like to use the ChainLadder package to forecast future claims payments for a run-off triangle that you have stored in Excel.

How do you get the triangle into R and execute a reserving function, such as MackChainLadder? Well, there are many ways to do this and the ChainLadder package vignette, as well as the R manual on Data Import/Export has all of the details, but here is a quick and dirty solution using a CSV-file.

Unknown pleasures

Have I missed unknown pleasures in Python by focusing on R? A comment on my blog post of last week suggested just that. Reason enough to explore Python a little. Learning another computer language is like learning another human language - it takes time. Often it is helpful to start by translating from the new language back into the old one. I found a Python script by Ludwig Schwardt that creates a plot like this:

Phase plane analysis in R

The forthcoming R Journal has an interesting article about phaseR: An R Package for Phase Plane Analysis of Autonomous ODE Systems by Michael J. Grayling. The package has some nice functions to analysis one and two dimensional dynamical systems. As an example I use here the FitzHugh-Nagumo system introduced earlier: $$ \begin{align} \dot{v}=&2 (w + v - \frac{1}{3}v^3) + I_0
\dot{w}=&\frac{1}{2}(1 - v - w)
\end{align} $$ The FitzHugh-Nagumo system is a simplification of the Hodgkin-Huxley model of spike generation in squid giant axon.

googleVis 0.5.6 released on CRAN

Version 0.5.6 of googleVis was released on CRAN over the weekend. This version fixes a bug in gvisMotionChart. Its arguments xvar, yvar, sizevar and colorvar were not always picked up correctly. Thanks to Juuso Parkkinen for reporting this issue. Example: Love, or to loveA few years ago Martin Hilpert posted an interesting case study for motion charts. Martin is a linguist and he researched how the usage of words in American English changed over time, e.

Visualising the seasonality of Atlantic windstorms

Last week Arthur Charpentier sketched out a Markov spatial process to generate hurricane trajectories. Here, I would like to take another look at the data Arthur used, but focus on its time component. According to the Insurance Information Institute, a normal season, based on averages from 1980 to 2010, has 12 named storms, six hurricanes and three major hurricanes. The usual peak months of August and September passed without any major catastrophes this year, but the Atlantic hurricane season is not over yet.

Running RStudio via Docker in the Cloud

Deploying applications via Docker container is the current talk of town. I have heard about Docker and played around with it a little, but when Dirk Eddelbuettel posted his R and Docker talk last Friday I got really excited and had to have a go myself. My aim was to rent some resources in the cloud, pull an RStudio Server container and run RStudio in a browser. It was actually surprisingly simple to get started.

Managing R package dependencies

One of my take aways from last week’s EARL conference was that R is more and more growing out of its academic roots into the enterprise. And with that come some challenges, e.g. how do I ensure consistent and systematic access to a set of R packages in an organisation, in particular when one team is providing packages to others? Two packages can help here: roxyPackage and miniCRAN. I wrote about roxyPackage earlier on this blog.

Notes from the Kölner R meeting, 12 September 2014

Last Friday we had guests from Belgium and the Netherlands joining us in Cologne. Maarten-Jan Kallen from BeDataDriven came from The Hague to introduce us to Renjin, and the guys from DataCamp in Leuven, namely Jonathan, Martijn and Dieter, gave an overview of their new online interactive training platform.

RenjinMaarten-Jan gave a fascinating introduction to Renjin, an R interpreter in the Java virtual machine (JVM). Why? Suppose all your other application are in the Java ecosystem, than an R engine in the JVM can use your tools for profiling/debugging, project/dependency management, release/repository management, continuous integration, component lifecycle management, etc.

Next Kölner R User Meeting: Friday, 12 September 2014

The next Cologne R user group meeting is scheduled for this Friday, 12 September 2014. We have a great agenda with international speakers: Maarten-Jan Kallen: Introduction to Renjin, the R interpreter for the JVM Jonathan Cornelissen, Martijn Theuwissen: DataCamp - An online interactive learning platform for RThe event will be followed by drinks and schnitzel at the Lux.

For further details visit our KölnRUG Meetup site. Please sign up if you would like to come along.

Zoom, zoom, googleVis

ChainLadder 0.1.8 released

Over the weekend we released version 0.1.8 of the ChainLadder package for claims reserving on CRAN. What is claims reserving?The insurance industry, unlike other industries, does not sell products as such but promises. An insurance policy is a promise by the insurer to the policyholder to pay for future claims for an upfront received premium. As a result insurers don’t know the upfront cost for their service, but rely on historical data analysis and judgement to predict a sustainable price for their offering.

googleVis 0.5.5 released

Earlier this week we released googleVis 0.5.5 on CRAN. The package provides an interface between R and Google Charts, allowing you to create interactive web charts from R. This is mainly a maintenance release, updating documentation and minor issues. Screen shot of some of the Google Charts New to googleVis? Review the examples of all googleVis charts on CRAN. Perhaps the best known example of the Google Chart API is the motion chart, popularised by Hans Rosling in his 2006 TED talk.

GrapheR: A GUI for base graphics in R

How did I miss the GrapheR package? The author, Maxime Hervé, published an article about the package [1] in the same issue of the R Journal as we did on googleVis. Yet, it took me a package update notification on CRANbeeries to look into GrapheR in more detail - 3 years later! And what a wonderful gem GrapheR is. The package provides a graphical user interface for creating base charts in R.

Thanks to R Markdown: Perhaps Word is an option after all?

In many cases Word is still the preferred file format for collaboration in the office. Yet, it is often a challenge to work with it, not so much because of the software, but how it is used and abused. Thanks to Markdown it is no longer painful to include mathematical notations and R output into Word. I have been using R Markdown for a while now and have grown very fond of it.

Hit and run. Think Bayes!

At the R in Insurance conference Arthur Charpentier gave a great keynote talk on Bayesian modelling in R. Bayes’ theorem on conditional probabilities is strikingly simple, yet incredibly thought provoking. Here is an example from Daniel Kahneman to test your intuition. But first I have to start with Bayes’ theorem. Bayes’ theoremBayes’ theorem states that given two events $D$ and $H$, the probability of $D$ and $H$ happening at the same time is the same as the probability of $D$ occurring, given $H$, weighted by the probability that $H$ occurs; or the other way round.

Notes from the 2nd R in Insurance Conference

The 2nd R in Insurance conference took place last Monday, 14 July, at Cass Business School London. This one-day conference focused once more on applications in insurance and actuarial science that use R. Topics covered included reserving, pricing, loss modelling, the use of R in a production environment and more. In the first plenary session, Montserrat Guillen (Riskcenter, University of Barcelona) and Leo Guelman (Royal Bank of Canada, RBC Insurance) spoke about the rise of uplift models.

Simple user interface in R to get login details

Occasionally I have to connect to services from R that ask for login details, such as databases. I don’t like to store my login details in the R source code file, instead I would prefer to enter the my login details when I execute the code. Fortunately, I found some old code in a post by Barry Rowlingson that does just that. It uses the tcltk package in R to create a little window in which the user can enter her details, without showing the password.

googleVis 0.5.3 released

Recently we released googleVis 0.5.3 on CRAN. The package provides an interface between R and Google Charts, allowing you to create interactive web charts from R. Screen shot of some of the Google Charts Although this is mainly a maintenance release, I’d like to point out two changes: Default chart width is set to ‘automatic’ instead of 500 pixels. Intervals for columns roles have to end with the suffix “.

Last chance to register for the R in Insurance conference

The registration for the 2nd R in Insurance conference at Cass Business School London will close this Friday, 4 July. The programme includes talks from international practitioners and leading academics, see below. For more details and registration visit: http://www.rininsurance.com. Still unsure? Review some impressions and presentations from last year’s conference. On behalf of the committee and sponsors, Mango Solutions, Cybaea, RStudio and PwC, we look forward to seeing you in London on 14 July!

Generating and visualising multivariate random numbers in R

This post will present the wonderful pairs.panels function of the psych package [1] that I discovered recently to visualise multivariate random numbers. Here is a little example with a Gaussian copula and normal and log-normal marginal distributions. I use pairs.panels to illustrate the steps along the way. I start with standardised multivariate normal random numbers: library(psych) library(MASS) Sig <- matrix(c(1, -0.7, -.5, -0.7, 1, 0.6, -0.5, 0.6, 1), nrow=3) X <- mvrnorm(1000, mu=rep(0,3), Sigma = Sig, empirical = TRUE) pairs.

Who will win the World Cup and which prediction model?

The World Cup has finally kicked off last Thursday and I have seen some fantastic games already. Perhaps the Netherlands appears to be the strongest side so far, following their 5-1 victory over Spain. To me the question is not only which country will win the World Cup, but also which prediction model will come closest to the actual results. Here I present three teams, FiveThirtyEight, a polling aggregation website, Groll & Schauberger, two academics from Munich and finally Lloyd’s of London, the insurance market.

The joy of joining data.tables

The example I present here is a little silly, yet it illustrates how to join tables with data.table in R. Mapping old data to new dataCategories in general are never fixed, they always change at some point. And then the trouble starts with the data. For example not that long ago we didn’t distinguish between smartphones and dumbphones, or video on demand and video rental shops. I would like to back track price change data for smartphones and online movie rental shops, assuming that their earlier development can be set to the categories they were formerly part of, namely mobile and video rental shops to create indices.

Early bird registration for R in Insurance closes tomorrow

The early bird registration offer for the 2nd R in Insurance conference, 14 July 2014, at Cass Business School closes tomorrow.

This one-day conference will focus once more on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered include reserving, pricing, loss modelling, the use of R in a production environment, and more. All topics are to be discussed within the context of using R as a primary tool for insurance risk management, analysis and modelling.

Notes from the Kölner R meeting, 23 May 2014

The 10th Kölner R user meeting took place last Friday at the Institute of Sociology and to celebrate the anniversary we invited Andrie de Vries to join us from Revolution Analytics. Andrie is well known in the R community; he is the co-author of the R for Dummies book and an active contributor on stackoverflow. Taking R to the Enterprise Andrie de Vries: Taking R to the Enterprise. Photo: Günter Faes

Next Kölner R User Meeting: Friday, 23 May 2014

The next Cologne R user group meeting is scheduled for this Friday, 23 May 2014. To celebrate our 10th meeting we welcome: Andrie de Vries (Revolution Analytics and Co-author of R for Dummies): Taking R to the Enterprise Markus Gesmann: googleVis overview and recent developmentsFollowed by drinks and schnitzel at the Lux. Further details available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.

The Wiener takes it all? A review of the 2014 Eurovision results

Saturday’s Eurovision Song Contest (ESC) from Copenhagen was hilarious as usual with acts from all over Europe and some more or less sensible gimmicks: a circular piano, a giant hamster wheel, a sea-saw, or indeed a beard and fancy dress. The results of the ESC were only a little different to what the bookmakers in the UK had predicted before the event started. Sweden was seen as the favourite, followed by Austria, Netherlands, Armenia and the UK.

Customising lines and points with googleVis

At the end of March Google released a new version of the Chart Tools API with new options for point shapes and line brushes. The arguments are called pointShape and lineDashStyle and can be set directly via googleVis. We published googleVis 0.5.2 on CRAN yesterday with added examples for those new options in gvisLineChart and gvisScatterChart. Note, these options can be used with most chart types as well, also in combination.

R in Insurance 2014: Conference Programme & Abstracts

I am delighted to announce that the programme and abstracts for the second R in Insurance conference at Cass Business School in London, 14 July 2014, have been finalised. Register by the end of May to get the early bird booking fee. The organisers gratefully acknowledge the sponsorship of Mango Solutions, CYBAEA, RStudio and PwC without whom the event wouldn’t be possible.

R in Insurance Cass Business School, London, 14 July 2014 9:00 - 10:00 Opening keynote:

Notes from the Tokyo R User Group meeting, 17 April 2014

Last Thursday I had the pleasure to attend the Tokyo R user group meeting. And what a fun meeting it was! Over 40 R users had come together in central Tokyo. Yohei Sato, who organises the meetings, allowed me to talk a little about the recent developments of the googleVis package.

Thankfully all talks were given in English: Takashi J. Ozaki presented on Visualisation of Supervised Learning with arules and arulesViz.

googleVis 0.5.1 released on CRAN

GoogleVis 0.5.1 was released on CRAN yesterday. New Features New functions gvisSankey, gvisAnnotationChart, gvisHistogram, gvisCalendar and gvisTimeline to support the new Google charts of the same names (without ‘gvis’). New demo Trendlines showing how trend-lines can be added to Scatter-, Bar-, Column-, and Line Charts. New demo Roles showing how different column roles can be used in core charts to highlight data. New vignettes written in R Markdown showcasing googleVis examples and how the package works with knitr.

Annotation charts and histograms with googleVis

After my posts on timeline, Sankey and calendar charts, this will be the last to introduce new chart types of the developer version of googleVis. Today I will give examples for the new annotation charts and histograms. Annotation chartsAnnotation charts have been part of the Google Chart tools for a long time and googleVis as well. However, in the past only a flash based version was available (gvisAnnotatedTimeLine in googleVis). With the new Google Charts Tools version also a HTML5 version was released.

Calendar charts with googleVis

My little series of posts about the new googleVis charts continues with calendar charts. Google’s calendar charts are still in beta, but they provide already a nice heat map visualisation of calendar year data. The current development version of googleVis supports this new function via gvisCalendar. Here is an example displaying daily stock price data. Loading For the code below to run you will require the developer version (≥ 0.5.0-4) of googleVis from GitHub and R ≥ 3.

Sankey diagrams with googleVis

Sankey diagrams are great for visualising flows from one set of data values to another. Although named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 to show the energy efficiency of a steam engine, the best know Sankey diagram is probably Charles Minard’s Map of Napoleon’s Russian Campaign of 1812, which he actually produced in 1869. Thomas Rahlf: Datendesign mit R The above example from Thomas Rahlf’s book Datendesign mit R shows that Minard’s plot can be reproduced with base graphics in R.

Reminder: Abstract submission for the 2014 'R in Insurance' conference will close this Friday

Timeline charts with googleVis

Last year at the Google I/O conference Mitchell Foley presented new developments of the Google Chart Tools API and one of the new features he mentioned were timeline charts (about 6 min into the talk).

Timeline charts are a great way of visualising different dates/events over time and are now also supported by googleVis from version 0.5.0 onwards (currently only available from GitHub). Here is an example, showing classroom allocation in the afternoon.

googleVis code development moved to GitHub

After nearly 4 years of developing googleVis on Google Code with SVN we decided to move to GitHub. The main reason was that Google stopped the facility of hosting pre-CRAN builds of the package for user testing. The devtools package on the other hand makes it really easy to install packages from source hosted on GitHub. Additionally, we hope that GitHub will make collaboration with others more effective. Thus, bookmark http://github.

Review: Kölner R Meeting 26 Feburary 2014

Last week’s Cologne R user group meeting was all about R and databases. We had three talks from a generic overview on how to connect R to databases, to a specific example with kdb+ and perhaps the future with ArangoDB, a NoSQL database. Connecting R with databasesDiego de Castillo’s talk focused on the use of relational databases, such as PostgreSQL, SQLite and Oracle. For all these databases dedicated R drivers exist on CRAN that can be used in a generic way via the DBI package.

Next Kölner R User Meeting: 26 February 2014

The next Cologne R user group meeting is scheduled for tomorrow, 26 February 2014. We are delighted to welcome: Diego de Castillo: R and databasesKim Kuen Tang: Hands on using R and kdb+ togetherFrank Celler: ArangoDB (Lightning Talk)Further details and the agenda are available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.

The organisers, Bernd Weiß and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their vector programme.

R in Insurance 2014 Conference Poster

Here is the poster for the 2nd R in Insurance conference on Monday 14 July 2014 at Cass Business School in London: R in Insurance 2014 conference poster. Download PDF version Important dead lines to keep in mind: Abstract submissions: 28 March 2014Early bird booking: 30 May 2014R in Insurance Conference: 14 July 2014For all further information see: www.rininsurance.com. The programme and the presentation files of the first R in Insurance conference have been published on GitHub.

Adding labels within lattice panels by group

The other day I had data that showed the development of many products over time. I grouped the products into categories and visualised the data as line graphs in lattice. But instead of adding an extensive legend to the plot I wanted to add labels to each line’s latest point. How do you do that? It turns out that panel.groups is there to help again. Here is my solution: R code

Registration for the 2014 'R in Insurance' conference has opened

The registration for the second conference on R in Insurance on Monday 14 July 2014 at Cass Business School in London has opened. This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered may include actuarial statistics, capital modelling, pricing, reserving, reinsurance and extreme events, portfolio allocation, advanced risk tools, high-performance computing, econometrics and more. All topics will be discussed within the context of using R as a primary tool for insurance risk management, analysis and modelling.

Does sexual activity change with age?

Recently the Guardian’s Data Blog reported about the results from the third National Survey of Sexual Attitudes and Lifestyles in the UK. One of the questions asked in the survey was if the participants had sex in the last four weeks. The results - a summary is available in this info graphic - show that the British have their most sexual active period when they are in their 20s - 40s.

Binomial testing with buttered toast

Rasmus’ post of last week on binomial testing made me think about p-values and testing again. In my head I was tossing coins, thinking about gender diversity and toast. The toast and tossing a buttered toast in particular was the most helpful thought experiment, as I didn’t have a fixed opinion on the probabilities for a toast to land on either side. I have yet to carry out some real experiments.

Fun with the Raspberry Pi

Since Christmas I have been playing around with a Raspberry Pi. It is certainly not the fastest computer, but what a great little toy! Here are a few experiences and online resources that I found helpful. SetupInitially I connected the Raspberry Pi via HDMI to a TV; together with keyboard, mouse and an old USB Wifi adapter. Everything worked out of the box and I could install Raspbian and set up the network.

How many more R-bloggers posts can I expect?

I noticed that the monthly number of posts on R-bloggers stopped increasing over the last year. Indeed, the last couple of months saw a decline in posts compared to the previous year. Thus, has most been said and written about R already? Who knows? Well, I took a stab at looking into the future. However, I can tell you already that I am not convinced by my predictions. But maybe someone else will be inspired to take this work forward.

Whale charts - Visualising customer profitability

The Christmas and New Year’s break is over, yet there is still time to return unwanted presents. Return to Santa was the title of an article in the Economist that highlighted the impact on online retailers, as return rates can be alarmingly high. The article quotes a study by Christian Schulze of the Frankfurt School of Finance and Management, which analyses the return habits of customers who bought at least five items over a five year period from a large European online retailer.

Review: Kölner R Meeting 13 December 2013

Last week’s Cologne R user group meeting was the best attended so far. Well, we had a great line up indeed. Matt Dowle came over from London to give an introduction to the data.table package. He was joined by his collaborator Arun Srinivasan, who is based in Cologne. Their talk was followed by Thomas Rahlf on Datendesign mit R (Data design with R). data.table Download slides Matt’s goal with the data.

Next Kölner R User Meeting: 13 December 2013

Quick reminder: The next Cologne R user group meeting is scheduled for this Friday, 13 December 2013. We are delighted to welcome: Matt Dowle and Arun Srinivasan: Introduction to data.tableThomas Rahlf: Book presentation - Datendesign mit RFurther details and the agenda are available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.

The organisers, Bernd Weiß and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their vector programme.

R in Insurance Conference, London, 14 July 2014

Following the very positive feedback that Andreas and I have received from delegates of the first R in Insurance conference in July of this year, we are planning to repeat the event next year. We have already reserved a bigger auditorium. The second conference on R in Insurance will be held on Monday 14 July 2014 at Cass Business School in London, UK. This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation.

Not only verbs but also believes can be conjugated

Following on from last week, where I presented a simple example of a Bayesian network with discrete probabilities to predict the number of claims for a motor insurance customer, I will look at continuous probability distributions today. Here I follow example 16.17 in Loss Models: From Data to Decisions [1]. Suppose there is a class of risks that incurs random losses following an exponential distribution (density $f(x) = \Theta {e}^{- \Theta x}$) with mean $1/\Theta$.

Predicting claims with a Bayesian network

Here is a little Bayesian Network to predict the claims for two different types of drivers over the next year, see also example 16.15 in [1]. Let’s assume there are good and bad drivers. The probabilities that a good driver will have 0, 1 or 2 claims in any given year are set to 70%, 20% and 10%, while for bad drivers the probabilities are 50%, 30% and 20% respectively. Further I assume that 75% of all drivers are good drivers and only 25% would be classified as bad drivers.

googleVis 0.4.7 with RStudio integration on CRAN

In my previous post, I presented a preview version of googleVis that provided an integration with RStudio’s Viewer pane (introduced with version 0.98.441). Over 80% in my little survey favoured the new default output mechanism of googleVis within RStudio. Hence, I uploaded googleVis 0.4.7 on CRAN over the weekend. However, there were also some thoughtful comments, which suggested that the RStudio Viewer pane is not always the best option. Indeed, Flash charts and gvisMerge output will still be displayed in your default browser, but also if you work on larger charts and with smaller screen, then the browser might still be the better option compared to the Viewer pane - of course you can launch the browser from the Viewer pane as well.

Display googleVis charts within RStudio

The preview version 0.98.441 of RStudio introduced a new viewer pane to render local web content and with that it allows me to display googleVis charts within RStudio rather than in a separate browser window. I think this is a rather nice feature and hence I have updated the plot method in googleVis to use the RStudio viewer pane as the default output. If you use another editor, or if the plot is using one of the Flash based charts, then the browser is still the default display.

High resolution graphics with R

For most purposes PDF or other vector graphic formats such as windows metafile and SVG work just fine. However, if I plot lots of points, say 100k, then those files can get quite large and bitmap formats like PNG can be the better option. I just have to be mindful of the resolution. As an example I create the following plot: x <- rnorm(100000) plot(x, main=“100,000 points”, col=adjustcolor(“black”, alpha=0.2)) Saving the plot as a PDF creates a 5.

Review: Kölner R Meeting 18 October 2013

The Cologne R user group met last Friday for two talks on split apply combine in R and XLConnect by Bernd Weiß and Günter Faes respectively, before the usual Schnitzel and Kölsch at the Lux. Split apply combine in R

The apply family of functions in R is incredible powerful, yet for newcomers often somewhat mysterious. Thus, Bernd gave an overview of the different apply functions and their cousins. The various functions differ in their object inputs, e.

Next Kölner R User Meeting: 18 Oktober 2013

Quick reminder: The next Cologne R user group meeting is scheduled for this Friday, 18 October 2013. We will discuss and hear about the apply family of functions and the XLConnect package. Further details and the agenda are available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here. Thanks to Revolution Analytics, who sponsors the Cologne R user group as part of their vector programme.

Creating a matrix from a long data.frame

There can never be too many examples for transforming data with R. So, here is another example of reshaping a data.frame into a matrix. Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years.

The format of the data frame above is how this kind of data is usually stored in a data base. However, I would like to see the payments of the different origin years in rows of a matrix.

Changing the width of bars and columns in googleVis

Changing the plotting width in bar-, column- and combo-charts of googleVis works identical and is defined by the bar.groupWidth argument. The dot in the argument means that it has to be split in R into bar=“{groupWidth:‘10%’}”. Examplelibrary(googleVis) cc <- gvisColumnChart(head(Population,10), xvar=“Country”, yvar=“Population”, options=list(seriesType=“bars”, legend=“top”, bar=“{groupWidth:‘10%’}“, width=500, height=450), chartid=“thincolumns”) plot(cc)Your browser doesn’t support iframes. Session InfoR version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages:

Using planel.groups in lattice

Last Tuesday I attended the LondonR user group meeting, where Rich and Andy from Mango argued about the better package for multivariate graphics with R: lattice vs. ggplot2. As part of their talk they had a little competition in visualising London Underground performance data, see their slides. Both made heavy use of the respective panelling / faceting capabilities. Additionally Rich used the panel.groups argument of xyplot to fine control the content of each panel.

ave and the "[" function in R

The ave function in R is one of those little helper function I feel I should be using more. Investigating its source code showed me another twist about R and the “[” function. But first let’s look at ave. The top of ave’s help page reads: Group Averages Over Level Combinations of Factors Subsets of x[] are averaged, where each subset consist of those observations with the same factor levels.

Doughnut chart in R with googleVis

The guys at Google continue to update and enhance the Chart Tools API. One new recent feature is a pie chart with a hole, or as some call them: donut charts. Thankfully the new functionality is being achieved through new options for the existing pie chart, which means that those new features are available in R via googleVis as well, without the need of writing new code. Doughnut chart exampleWith the German election coming up soon, here is the composition of the current parliament.

googleVis 0.4.4 released with new formatting options for tables

Over the weekend googleVis 0.4.4 found its way to CRAN. The function gvisTable gained a new argument formats that allow users to define the formats numbers displayed in tables. Thanks to J. Buros, who contributed the code. Example Loading

Session InfoR version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] googleVis_0.4.4 loaded via a namespace (and not attached):

ChainLadder 0.1.6 released with chain-ladder factor models

Version 0.1.6 of the ChainLadder package has been released and is already available from CRAN. The new version adds the function CLFMdelta. CLFMdelta finds consistent weighting parameters delta for a vector of selected age-to-age chain-ladder factors for a given run-off triangle. The added functionality was implemented by Dan Murphy, who is the co-author of the paper A Family of Chain-Ladder Factor Models for Selected Link Ratios by Bardis, Majidi, Murphy. You find a more detailed explanation with R code examples on Dan’s blog and see also his slides from the CAS spring meeting.

Setting axes limits with googleVis

I posted about the various googleVis axis options for base charts, such as line, bar and area charts earlier, but I somehow forgot to mention how to set the axes limits. Unfortunately, there are no arguments such as ylim and xlim. Instead, the Google Charts axes options are set via hAxes and vAxes, with h and v indicating the horizontal and vertical axis. More precisely, I have to set viewWindowMode : ‘explicit’ and set the viewWindow to the desired min and max values.

R in Insurance: Presentations are online

The programme and the presentation files of the first R in Insurance conference have been published on GitHub. Front slides of the conference presentations Additionally to the slides many presenters have made their R code available as well: Alexander McNeil shared the examples of the CreditRisk+ model he presented. Lola Miranda made a Windows version of the double chain-ladder package DCL available via the Cass knowledge web site.Alessandro Carrato’s 1-year re-reserving code is hosted on the ChainLadder project web site.

Review: Kölner R Meeting 19 July 2013

Despite the hot weather and the beginning of the school holiday season in North Rhine Westphalia the Cologne R user group met yet again for two fascinating talks and beer and schnitzel afterwards. Analysing Twitter data to evaluate the US Dollar / Euro exchange rates Dietmar Janetzko presented ideas to forecast US Dollar / Euro exchange rate movements for the following day. To forecast exchange rate movements, Dietmar distinguishes two school of thoughts.

Quick review: R in Insurance Conference

Yesterday the first R in Insurance conference took place at Cass Business School in London. I think the event went really well, but as a member of the organising committee my view is probably skewed. Still, we had a variety of talks, a full house, a great conference dinner and to top it all, the Tower Bridge opened while we had our drinks at the end of the evening. I will post a more complete review in the future with links to the files of the presentations and R code, once we had a chance to collate all the information.

googleVis tutorial at useR!2013

There is definitely R in July

The useR!2013 conference in Albacete, Spain, will commence next Wednesday, 10 July, and on the day before Diego and I will give a googleVis tutorial. The following Monday, 15 July, the first R in Insurance event will take place at Cass Business School and I am absolutely delighted with the programme and the fact that we are sold out. On Tuesday, 16 July, the LondonR user group meets in the City, awaiting presentations by Andrie de Vries (Revolution Analytics), Rich Pugh (Mango Solutions) and Hadley Wickham (RStudio).

googleVis 0.4.3 released with improved Geocharts

The Google Charts Tools provide two kinds of heat map charts for geographical data, the Flash based Geomap and the HTML5/SVG based Geochart. I prefer the Geochart as it doesn’t require Flash, but so far there have been two shortcomings with it: I couldn’t add additional tooltip information and the default Mercator projection shows Greenland the size of Africa. Both of those issues seemed to have been resolved by Google. Although the features aren’t officially documented and released yet, Mitchell Foley from the Google Chart Tools team presented the new developments at the Google I/O 2013 conference in May already.

R package development

Building R packages is not particular hard, but it can be a bit of a daunting endeavour at the beginning, particularly if you are more of a statistician than a computer scientist or programmer. Some concepts may appear foreign or like red tape, yet many of them evolved over time for a reason. They help to stay organise, collaborate more effectively with others and write better code. So, here are my slides of the R package development workshop at Lancaster University.

Interactive slides with googleVis on shiny

Following on from last week’s post, here are my slides on using googleVis on shiny from the Advanced R workshop at Lancaster University, 21 May 2013. googleVis on shiny Again, I wrote my slides in RMarkdown and I used slidify to create the HTML5 presentation. Unfortunately you may have to reload the slides that use googleVis on shiny as the JavaScript code in the background is potentially not ideal. Any pointers, which could help to improve the performance will be much appreciated.

Interactive presentation with slidify and googleVis

Last week I was invited to give an introduction to googleVis at Lancaster University. This time I decided to use the R package slidify for my talk. Slidify, like knitr, is built on Markdown and makes it very easy to create beautiful HTML5 presentations. Introduction to googleVis Separating content from layout is always a good idea. Markup languages such as TeX/LaTeX or HTML are built on this principle. Ramnath Vaidyanathan has done a fantastic job with slidify, as it is very straightforward to create presentations with R.

Claims Inflation - a known unknown

Over the last year I worked with two colleagues of mine on the subject of inflation and claims inflation in particular. I didn’t expect it to be such a challenging topic, but we ended up with more questions than answers. The key question and biggest challenge is to define what inflation, or indeed claims inflation actually is and how to measure it. We published a summary of our thoughts and findings in this month’s issue of The Actuary.

R in Insurance: Programme and Abstracts published

I am delighted to announce that the programme and abstracts for the first R in Insurance conference at Cass Business School in London, 15 July 2013, have been published. The conference committee received strong abstracts from academia and the industry, covering: PricingReservingData miningCapital modellingAutomate reportingCatastrophe modellingHigh-performance computingSoftware development managementRegister by the end of May to get the early bird booking fee. We gratefully acknowledge the sponsorship of Mango Solutions and CYBAEA, without whom the event wouldn’t be possible.

How to change the alpha value of colours in R

Often I like to reduce the alpha value (level of transparency) of colours to identify patterns of over-plotting when displaying lots of data points with R. So, here is a tiny function that allows me to add an alpha value to a given vector of colours, e.g. a RColorBrewer palette, using col2rgb and rgb, which has an argument for alpha, in combination with the wonderful apply and sapply functions.

Review: Kölner R Meeting 12 April 2013

Our 5th Cologne R user group meeting was the best attended meeting so far, with 20 members finding their way to the Institute of Sociology for two talks by Diego de Castillo on shiny and Stephan Holtmeier on cluster analysis, followed by beer and schnitzel at the Lux, a gastropub nearby. ShinyDiego gave an overview of the design principles behind shiny, which provides a powerful API to build web apps in pure R.

Test Driven Analysis?

At the last LondonR meeting Francine Bennett from Mastodon C shared some of her experience and findings from an analysis of a large prescriptions data set of the UK’s national health service (NHS). However, it was her last slide, which I found the most thought provoking. It asked for the definition of the following term: Test-driven analysis?Francine explained that test driven development (TDD) is a concept often used in software development for quality assurance and she wondered if a similar approach could be also used for data analysis.

How to set axis options in googleVis

Setting axis options in googleVis charts can be a bit tricky. Here I present two examples where I set several options to customise the layout of a line and combo chart with two axes. The parameters have to be set in line with the Google Chart Tools API, which uses a JavaScript syntax. In googleVis chart options are set via a list in the options argument. Some of the list items can be a bit more complex, often wrapped in {} brackets, e.

Next Kölner R User Meeting: 12 April 2013

Quick reminder: The next Cologne R user group meeting is scheduled for this Friday, 12 April 2013. We will discuss cluster analysis and shiny. Further details and the agenda are available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here.

Thanks also to Revolution Analytics, who sponsors the Cologne R user group as part of their vector programme.

Top 10 tips to get started with R

Be motivated. R has a steep learning curve. Find a problem you can’t solve otherwise. E.g. plotting multivariate data, a statistical analysis for which an R function exists already. Download and install R. Get to know the R console. Learn how to install additional packages, how to access the history, how to use auto completion and open the help system. Review the R Installation and Administration manual and check out the free books section on CRAN.

ChainLadder 0.1.5-6 released on CRAN

Last week we released version 0.1.5-6 of the ChainLadder package on CRAN. The ChainLadder package provides statistical models, which are typically used for the estimation of outstanding claims reserves in general insurance. The package vignette gives an overview of the package functionality. Output of plot(MackChainLadder(GenIns)) Since the last CRAN release Dan Murphy added new features to the MackChainLadder function and we fixed a bug in BootChainLadder. Here are he details:

Submit a talk for the first R in Insurance conference

The registration for the first R in Insurance is open and there is still time to submit a talk / lightning talk. The conference will take place at Cass Business School in London on Monday, 15 July 2013. This is the Monday following the useR! 2013 conference in Spain. Thus, if you come from overseas to Spain, why not stop in London on your way back? All further information and registration details are available on the Cass Business School conference site.

googleVis 0.4.2 with support for shiny released on CRAN

The new version of googleVis 0.4.2 is now available via CRAN. Many thanks to all who provided feedback on version 0.4.0 and particularly to Sebastian Campbell, John Maindonald and Aonan Zhang. As usual, if you find any issues or bugs, please send us an email or add a line to our online issues log. With version 0.4.0 we introduced support for googleVis on shiny. See my previous post for more details and examples.

How to use optim in R

A friend of mine asked me the other day how she could use the function optim in R to fit data. Of course there are built-in functions for fitting data in R and I wrote about this earlier. However, she wanted to understand how to do this from scratch using optim. The function optim provides algorithms for general-purpose optimisations and the documentation is perfectly reasonable, but I remember that it took me a little while to get my head around how to pass data and parameters to optim.

Create an R package from a single R file with roxyPackage

Documenting code can be a bit of a pain. Yet, the older (and wiser?) I get, the more I realise how important it is. When I was younger I said ‘documentation is for people without talent’. Well, I am clearly loosing my talent, as I sometimes struggle to understand what I programmed years ago. Thus, anything that soothes the pain of writing and maintaining documentation must be good and should help me to better understand my ‘old me’ in the future.

First steps of using googleVis on shiny

The guys at RStudio have done a fantastic job with shiny. It is really easy to build web apps with R using shiny. With the help of Joe Cheng from RStudio we figured out a way to make googleVis work on shiny as well. This allows you to make use of the Google Charts Tools in your shiny app directly from R. What I present here are three initial examples which seem to work in most browsers.

Registration for 'R in Insurance' conference has opened

The registration for the first conference on R in Insurance on Monday 15 July 2013 at Cass Business School in London has opened. The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance. The 2013 R in Insurance conference builds upon the success of the R in Finance and R/Rmetrics events. We expect invited keynote lectures by:

New Data Scientist role at Lloyd's

Review: Kölner R Meeting 6 February 2013

The fourth Cologne R user meeting took place last Wednesday at the Institute of Sociology. Thanks to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship. We had two fantastic talks by Klaus Jacobi and M.eik Michalke. Klaus talked about Eliminating cloud pixels in satellite images via chronological interpolation and Meik presented his new roxyPackage package, which makes it even easier to maintain R packages with roxygen2.

Next Kölner R User Meeting: 6 February 2013

Quick reminder: The next Cologne R user group meeting is scheduled for tomorrow, 6 February 2013. All details and the agenda are available on the KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here. Thanks also to Revolution Analytics, who are sponsoring the Cologne R user group as part of their vector programme.

Follow the ants to richness

A friend of mine told me the secret of making money at the stock market. “It’s easy”, he said. All I would have to do is to buy a big jar of ants. Then I should observe the ants movement on my kitchen table, while following the stock market. I shall keep the ants which walk in line with the stock market and remove those who don’t. Eventually I would have one ant left that walked all the way in line with the stock market.

Reserving based on log-incremental payments in R, part III

This is the third post about Christofides’ paper on Regression models based on log-incremental payments [1]. The first post covered the fundamentals of Christofides’ reserving model in sections A - F, the second focused on a more realistic example and model reduction of sections G - K. Today’s post will wrap up the paper with sections L - M and discuss data normalisation and claims inflation. I will use the same triangle of incremental claims data as introduced in my previous post.

Reserving based on log-incremental payments in R, part II

Following on from last week’s post I will continue to go through the paper Regression models based on log-incremental payments by Stavros Christofides [1]. In the previous post I introduced the model from the first 15 pages up to section F. Today I will progress with sections G to K which illustrate the model with a more realistic incremental claims payments triangle from a UK Motor Non-Comprehensive account:# Page D5.17

Reserving based on log-incremental payments in R, part I

A recent post on the PirateGrunt blog on claims reserving inspired me to look into the paper Regression models based on log-incremental payments by Stavros Christofides [1], published as part of the Claims Reserving Manual (Version 2) of the Institute of Actuaries. The paper is available together with a spread sheet model, illustrating the calculations. It is very much based on ideas by Barnett and Zehnwirth, see [2] for a reference.

Clone all your gists locally with R

I really like gists as a quick way to include more lengthly code snippets into my blog posts. However, I am not a git user as such, and so I was quite concerned when I noticed that all my gists on this blog had vanished after Christmas. I suppose this was a result of Github’s downtime on December 22nd. Thankfully an email to the support guys at Github resolved the issue within a few hours.

R in Insurance Conference, London, 15 July 2013

The first conference on R in Insurance will be held on Monday 15 July 2013 at Cass Business School in London, UK. The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance. This one-day conference will focus on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered may include actuarial statistics, capital modelling, pricing, reserving, reinsurance and extreme events, portfolio allocation, advanced risk tools, high-performance computing, econometrics and more.

Now I see it! K-means cluster analysis in R

Of course, a picture on a computer monitor is a coloured plot of x and y coordinates or pixels. Still, I was smitten by David Sparks’ posts on is.r(), where he shows how easy it is to read images into R to analyse them. In two posts [1], [2] he replicates functionality of image manipulation programmes like GIMP. I can’t resist to write about this here as well. David’s first post is about k-means cluster analysis.

Comparing regions: maps, cartograms and tree maps

Last week I attended a seminar where a talk was given about the economic opportunities in the SAAAME (South-America, Asia, Africa and Middle East) regions. Of course a map was shown with those regions highlighted. The map was not that disimilar to the one below. library(RColorBrewer) library(rworldmap) data(countryExData) par(mai=c(0,0,0.2,0),xaxs=“i”,yaxs=“i”) mapByRegion( countryExData, nameDataColumn=“GDP_capita.MRYA”, joinCode=“ISO3”, nameJoinColumn=“ISO3V10”, regionType=“Stern”, mapTitle=” “, addLegend=FALSE, FUN=“mean”, colourPalette=brewer.pal(6, “Blues”))It is a map that most of us in the Northern hemisphere see often.

Changing colours and legends in lattice plots

Lattice plots are a great way of displaying multivariate data in R. Deepayan Sarkar, the author of lattice, has written a fantastic book about Multivariate Data Visualization with R [1]. However, I often have to refer back to the help pages to remind myself how to set and change the legend and how to ensure that the legend will use the same colours as my plot. Thus, I thought I write down an example for future reference.

Data.table rocks! Data manipulation the fast way in R

I really should make it a habit of using data.table. The speed and simplicity of this R package are astonishing. Here is a simple example: I have a data frame showing incremental claims development by line of business and origin year. Now I would like add a column with the cumulative claims position for each line of business and each origin year along the development years. It’s one line with data.

Claims reserving in R: ChainLadder 0.1.5-4 released

Last week we released version 0.1.5-4 of the ChainLadder package on CRAN. The R package provides methods which are typically used in insurance claims reserving. If you are new to R or insurance check out my recent talk on Using R in Insurance. The chain-ladder method which is a popular method in the insurance industry to forecast future claims payments gave the package its name. However, the ChainLadder package has many other reserving methods and models implemented as well, such as the bootstrap model demonstrated below.

Simulating neurons or how to solve delay differential equations in R

I discussed earlier how the action potential of a neuron can be modelled via the Hodgkin-Huxely equations. Here I will present a simple model that describes how action potentials can be generated and propagated across neurons. The tricky bit here is that I use delay differential equations (DDE) to take into account the propagation time of the signal across the network. My model is based on the paper: Epileptiform activity in a neocortical network: a mathematical model by F.

googleVis 0.3.3 is released and on its way to CRAN

I am very grateful to all who provided feedback over the last two weeks and tested the previous versions 0.3.1 and 0.3.2, which were not released on CRAN. So, what changed since version 0.3.2? Not much, but plot.gvis didn’t open a browser window when options(gvis.plot.tag) were not set to NULL, but the user explicitly called plot.gvis with tag NULL. Thanks to Sebastian Kranz for reporting this bug. Additionally the vignette has been updated and includes an extended section on knitr.

googleVis 0.3.2 is released: Better integration with knitr

After last week’s kerfuffle I hope the roll out of googleVis version 0.3.2 will be smooth. To test the water I release this version into the wild here and if it doesn’t get shot down in the next days, then I shall try to upload it to CRAN. I am mindful of the CRAN policy, so please get in touch or add comments below if you find any show stoppers.

googleVis 0.3.0/0.3.1 is released: It's faster!

Version 0.3.0 of the googleVis package for R has been released on CRAN on 20 October 2012. With this version we have been able to speed up the code considerably. The transformation of R data frames into JSON works significantly faster. The execution of the gvisMotionChart function in the World Bank demo is over 35 times faster. Thanks to ideas by Wei Luo and in particular to Sebastian Kranz for providing the code.

Connecting the real world to R with an Arduino

If connecting data to the real world is the next sexy job, then how do I do this? And how do I connect the real world to R? It can be done as Matt Shottwell showed with his home made ECG and a patched version of R at useR! 2011. However, there are other options as well and here I will use an Arduino. The Arduino is an open-source electronics prototyping platform.

Next Kölner R User Meeting: 5 October 2012

The next Cologne R user group meeting is scheduled for 5 October 2012. All details and the agenda are available on the KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here. Thanks also to Revolution Analytics, who are sponsoring the Cologne R user group as part of their vector programme.

View Larger Map

Using R in Insurance, Presentation at GIRO 2012

Every year the UK’s general insurance actuarial community organises a big conference, which they call GIRO, short for General Insurance Research Organising committee. This year’s conference is in Brussels from 18 - 21 September 2012. Despite the fact that Brussels is actually in Belgium the UK actuaries will travel all the way to enjoy good beer and great talks. On Wednesday morning I will run a session on Using R in insurance.

Interactive web graphs with R - Overview and googleVis tutorial

Today I feel very lucky, as I have been invited to the Royal Statistical Society conference to give a tutorial on interactive web graphs with R and googleVis. I prepared my slides with RStudio, knitr, pandoc and slidy, similar to my Cambridge R talk. You can access the RSS slides online here and you find the original R-Markdown file on github. You will notice some HTML code in the file, which I had to use to overcome my knowledge gaps of Markdown or its limitations.

Sigma motion visual illusion in R

Michael Bach, who is a professor and vision scientist at the University of Freiburg, maintains a fascinating site about visual illusions. One visual illusion really surprised me: the sigma motion. The sigma motion displays a flickering figure of black and white columns. Actually it is just a chart, as displayed below, with the columns changing backwards and forwards from black to white at a rate of about 30 transitions per second.

googleVis 0.2.17 is released: Displaying earth quake data

The next version of the googleVis package has been released on the project site and CRAN. This version provides updates to the package vignette and a new example for the gvisMerge function. The new sections of the vignette have been featured on this blog in more detail earlier: Using googleVis with knitr (Link to post) Using Rook with googleVis (Link to post) Using Reduce with gvisMerge to display several charts on a page (Link to post)

London Olympics 100m men's sprint results

The 100m mean’s sprint finals of the 2012 London Olympics are over and Usain Bolt won the gold medal again with a winning time of 9.63s. Time to compare the result with my forecast of 9.68s, posted on 22 July. My simple log-linear model predicted a winning time of 9.68s with a prediction interval from 9.39s to 9.97s. Well, that is of course a big interval of more than half a second, or ±3%.

Rook rocks! Example with googleVis

What is Rook?Rook is a web server interface for R, written by Jeffrey Horner, the author of rApache and brew. But unlike other web frameworks for R, such as brew, R.rsp (which I have used in the past1), Rserve, gWidgetWWWW or sumo (which I haven’t used yet) Rook appears incredible lightweight. Rook doesn’t need any configuration. It is an R package, which works out of the box with the R HTTP server (R ≥ 2.

London Olympics and a prediction for the 100m final

It is less than a week before the 2012 Olympic games will start in London. No surprise therefore that the papers are all over it, including a lot of data and statistis around the games. The Economist investigated the potential financial impact on sponsors (some benefits), tax payers (no benefits) and the athletes (if they are lucky) in its recent issue and video. The Guardian has awhole series around the Olympics, including the data of all Summer Olympic Medallists since 1896.

Bridget Riley exhibition in London

The other day I saw a fantastic exhibition of work by Bridget Riley. Karsten Schubert, who is Riley’s main agent, has a some of her most famous and influential artwork from 1960 - 1966 on display, including the seminal Moving Squares from 1961. Photo of Moving Squares by Bridget Riley, 1961 Emulsion on board, 123.2 x 121.3cmIn the 1960s Bridget Riley created some great black and white artwork, which at a first glance may look simple and deterministic or sometimes random, but has fascinated me since I saw some of her work for the first time about 9 years ago at the Tate Modern.

Review: Kölner R Meeting 6 July 2012

The second Cologne R user meeting took place last Friday, 6 July 2012, at the Institute of Sociology. Thanks to Bernd Weiß, who provided the meeting room, we didn’t have to worry about the infrastructure, like we did at our first gathering. Again, we had an interesting mix of people turning up, with a very diverse background from chemistry to geo-science, energy, finance, sociology, pharma, physics, psychology, mathematics, statistics, computer science, telco, etc.

Applying a function successively in R

At the R in Finance conference Paul Teetor gave a fantastic talk about Fast(er) R Code. Paul mentioned the common higher-order function Reduce, which I hadn’t used before. Reduce allows me to apply a function successively over a vector. What does that mean? Well, if I would like to add up the figures 1 to 5, I could say: add <- function(x,y) x+y add(add(add(add(1,2),3),4),5)orReduce(add, 1:5)

Now this might not sound exciting, but Reduce can be powerful.

Reminder: Next Kölner R User Meeting 6 July 2012

This post is a quick reminder that the next Cologne R user group meeting is only one week away. We will meet on 6 July 2012. The meeting will kick off at 18:00 with three short talks at the Institute of Sociology and will continue, even more informal, from 20:00 in a pub (LUX) nearby. All details are available on the KölnRUG Meetup site. Please sign up if you would like to come along.

Hodgkin-Huxley model in R

One of the great research papers of the 20th century celebrates its 60th anniversary in a few weeks time: A quantitative description of membrane current and its application to conduction and excitation in nerve by Alan Hodgkin and Andrew Huxley. Only shortly after Andrew Huxley died, 30th May 2012, aged 94. In 1952 Hodgkin and Huxley published a series of papers, describing the basic processes underlying the nervous mechanisms of control and the communication between nerve cells, for which they received the Nobel prize in physiology and medicine, together with John Eccles in 1963.

Dynamical systems in R with simecol

This evening I will talk about Dynamical systems in R with simecol at the LondonR meeting. Thanks to the work by Thomas Petzoldt, Karsten Rinke, Karline Soetaert and R. Woodrow Setzer it is really straight forward to model and analyse dynamical systems in R with their deSolve and simecol packages. I will give a brief overview of the functionality using a predator-prey model as an example.

This is of course a repeat of my presentation given at the Köln R user group meeting in March.

Transforming subsets of data in R with by, ddply and data.table

Transforming data sets with R is usually the starting point of my data analysis work. Here is a scenario which comes up from time to time: transform subsets of a data frame, based on context given in one or a combination of columns. As an example I use a data set which shows sales figures by product for a number of years:df <- data.frame(Product=gl(3,10,labels=c(“A”,“B”, “C”)), Year=factor(rep(2002:2011,3)), Sales=1:30) head(df)

Product Year Sales

UK house prices visualised with googleVis-0.2.16

A new version of googleVis has been released on CRAN and the project site. Version 0.2.16 adds the functionality to plot quarterly and monthly data as a motion chart. To illustrate the new feature I looked for a quarterly data set and stumbled across the quarterly UK house price data published by Nationwide, a building society. The data is available in a spread sheet format and presents the average house prices and indexed to 100 in Q1 1993 by region in the UK from Q4 1973 to Q1 2012.

Interactive HTML presentation with R, googleVis, knitr, pandoc and slidy

Tonight I will give a talk at the Cambridge R user group about googleVis. Following my good experience with knitr and RStudio to create interactive reports, I thought that I should try to create the slides in the same way as well. Christopher Gandrud’s recent post reminded me of deck.js, a JavaScript library for interactive html slides, which I have used in the past, but as Christopher experienced, it is currently not that straightforward to use with R and knitr.

End User Computing and why R can help meeting Solvency II

John D. Cook gave a great talk about ‘Why and how people use R’. The talk resonated with me and highlighted why R is such a great tool for end user computing. A topic which has become increasingly important in the European insurance industry. John’s main point on why people use R is that R gets the job done and I think he is spot on. Of course that’s the trouble with R sometimes as well, or to quote Bo again:

Interactive reports in R with knitr and RStudio

Last Saturday I met the guys from RStudio at the R in Finance conference in Chicago. I was curious to find out what RStudio could offer. In the past I have used mostly Emacs + ESS for editing R files. Well, and what a surprise it was. JJ, Joe and Josh showed me a preview of version 0.96 of their software, which adds a close integration of Sweave and knitr to RStudio, helping to create dynamic web reports with the new R Markdown and R HTML formats more easily.

Waterfall charts in style of The Economist with R

Waterfall charts are sometimes quite helpful to illustrate the various moving parts in financial data, particularly when I have positive and negative values like a profit and loss statement (P&L). However, they can be a bit of a pain to produce in Excel. Not so in R, thanks to the waterfall package by James Howard. In combination with the latticeExtra package it is nearly a one-liner to produce a good looking waterfall chart that mimics the look of The Economist:

Next Kölner R User Meeting: 6 July 2012

Installing R packages without admin rights on MS Windows

It is not unusual that you will not have admin rights in an IT controlled office environment. But then again the limitations set by the IT department can spark of some creativity. And I have to admit that I enjoy this kind of troubleshooting. The other day I ended up in front of a Windows PC with R installed, but a locked down “C:\Programme Files” folder. That ment that R couldn’t install any packages into the default directory “C:\Programme Files\R\R-X.

Sweeping through data in R

How do you apply one particular row of your data to all other rows? Today I came across a data set which showed the revenue split by product and location. The data was formated to show only the split by product for each location and the overall split by location, similar to the example in the table below. Revenue by product and continent AfricaAmericaAsiaAustraliaEurope A 40% 30% 50% 40% 40%B 20% 40% 20% 30% 40%C 40% 30% 30% 30% 20%Total 10% 40% 20% 10% 20% I wanted to understand the revenue split by product and location.

Review: Kölner R Meeting 30 March 2012

The first Kölner R user meeting was great fun. About 20 useRs had turned up to exchange their ideas, questions and experience with R. Three talks about R & Excel, ggplot2 & XeLaTeX and Dynamical systems with R & simecol had kicked off the evening, with Kölsch (beer) losing our tongues further. Thankfully a lot of people had brought along their laptops, as unfortunately we lacked a cable to connect any of the computers to the installed projector.

Reminder: Kölner R User Group meets on 30 March 2012

Copy and paste small data sets into R

How can I embed a small data set into my R code? That was the question I came across today, when I prepared my talk about Dynamical Systems in R with simecol for the forthcoming Cologne R user group meeting. I wanted to add all the R code of the talk to the last slide. That’s easy, but the presentation makes use of a small data set of 3 columns and 21 rows.

Logistic map: Feigenbaum diagram in R

The other day I found some old basic code I had written about 15 years ago on a Mac Classic II to plot the Feigenbaum diagram for the logistic map. I remember, it took the little computer the whole night to produce the bifurcation chart. With today’s computers even a for-loop in a scripting language like R takes only a few seconds. logistic.map <- function(r, x, N, M){

r: bifurcation parameter

Changes in life expectancy animated with geo charts

The data of the World Bank is absolutely amazing. I had said this before, but their updated iPhone App gives me a reason to return to this topic. Version 3 of the DataFinder App allows you to visualise the data on your phone, including motion maps, see the screen shot below. Screen shot of DataFinder 3.0I was intrigued by the by the changes in life expectancy over time around the world.

googleVis 0.2.15 is released: Improved geo and bubble charts

The guys behind the Google Visualisation API don’t seem to rest. On 22 February 2012 they released an update of their API. Google added options for a gradient colour axis to bubble chart and a magnifying glass to geo chart, which opens when the user hovers over cluttered markers (excluding IE<=8). Those updates have been incorporated into version 0.2.15 of the googleVis package for R. Examples of new featuresHere are two examples demonstrating the new features.

Kölner R User Meeting 30 March 2012

Am 30. März 2012 möchte ich gerne das erste Kölner R Benutzer Treffen organisieren. Ich habe an den Treffen in London in den vergangen Jahren teilgenommen und hoffe auch in Köln Gleichgesinnte zu finden, die sich gerne bei einem Kölsch über R and das Leben unterhalten würden. I would like to organise the first R user group meeting in Cologne, Germany, on 30 March 2012. In the past few years I have participated at the London R user groups and I hope to find also like-minded people in Cologne, who would like to catch up over a Kölsch on R and life in general.

Big data seminar in London on 1 March 2012

David Chan from City University is organising an interdisciplinary symposium on tackling the ‘Big Data’ challenge on 1 March 2012. It is an open seminar trying to bring together academics and practitioners from across industry to tackle the challenges posed by “big data” - the growing amount of information that needs to be stored, searched, analysed and visualised in the digital age. The event will take place in the Oliver Thompson Lecture Theatre, Northampton Square, London EC1V 0HB.

Reshaping the IT world

During my university time I worked on the IT help desk for a while. One day I received a call from a professor, who said that his printer had stopped working. So I asked him, if there was a message on the display and if he could read it to me. “Oh yes”, he said, “it says: ‘Load A4 paper.’” Rachel King quotes a study by Cisco on ZDnet, which believes to have found out that college students and young employees under the age of 30 would rather take a lower salary than having no social media freedom, device flexibility and work mobility.

The reshape function

The other day I wrote about the R functions by, apply and friends, which allow me to operate on subsets of data. All those functions work nicely, if the data is given in the right format. More often than not it isn’t and I have to reshape the data beforehand. Thus, time to discuss the reshape function. I will focus on the reshape function in base R, and not the package of the same name.

googleVis 0.2.14 is released

Version 0.2.14 of the googleVis package was released on CRAN today. ChangesThe help files have been checked against changes of the Google Visualisation API, typos in the vignette have been ironed out (thanks to Pat Burns for pointing them out), a new section on dealing with apostrophes in column names has been added and the example in the section “Setting options” has been reviewed. For more details and demos check out the project site.

R is the easiest language to speak badly

I am amazed by the number of comments I received on my recent blog entry about “by”, “apply” and friends. I had started my post by pointing out that R is a language. Well indeed, I have come to the conclusion, that it is a language with lots of irregular expressions and dialects. It feels a bit like German or French where you have to learn and memorise the different articles.

Say it in R with "by", "apply" and friends

R is a language, as Luis Apiolaza pointed out in his recent post. This is absolutely true, and learning a programming language is not much different from learning a foreign language. It takes time and a lot of practice to be proficient in it. I started using R when I moved to the UK and I wonder, if I have a better understanding of English or R by now. Languages are full of surprises, in particular for non-native speakers.

Credit rating by country

The financial crisis has put a lot of pressure on countries’ long-term foreign currency credit ratings, with France recently being downgraded by S&P. Wikipedia provides a list of countries by credit ratings as report by US rating agencies S&P, Fitch, Moody’s and Dagong, a Chinese rating agency. So, what does the world look like today through the eyes of those rating agencies? I use the R packages XML and googleVis to read and display the data from Wikipedia with just a few lines.

Managing change

Why the old and the new need to share time together It takes time to appreciate the new. Even if the new is much better than the old. It is easy to forget when you yourself created the exciting new. At the end of August 2011 Google announced a new Blogger interface. The new interface offered about the same functionality, but had a different look and feel. At first I was reluctant to use it.

Feedback from vignette survey

Many thanks to all who participated in the survey about writing R package vignettes. Following my post last Thursday the responses came in quickly in the evening and all day on Friday. Since Saturday the response rate has been decreasing constantly and I think it is time for a summary based on the 56 responses received. Summary - How to write a good vignetteLength: Trust yourself, but aim for about 20 pages.

Survey: Writing package vignette

I am currently co-writing the vignette for the ChainLadder package and wonder what I should be focusing on. I have co-written the vignette of the googleVis package in the past and based it purely and what I thought would work. So, this is an experiment to find out, if user feedback will help me to write a better vignette. Let’s see how it develops. I will make the data available once I have at least 10 submission.

Is R turning into an operating system?

Over the years I convinced my colleagues and IT guys that LaTeX/XeLaTeX is the way forward to produce lots of customer reports with individual data, charts, analysis and text. Success! But of course the operating system in the office is still MS Windows. With my background in Solaris/Linux/Mac OSX I am still a little bit lost in the Windows world, when I have to do such simple tasks as finding and replacing a string in lots of files.

googleVis 0.2.13: new stepped area chart and improved geo charts

On 7th December Google published a new version of their Visualisation API. The new version adds a new chart type: Stepped Area Chart and provides improvements to Geo Chart. Now Geo Chart has similar functionality to Geo Map, but while Geo Map requires Flash, Geo Chart doesn’t, as it renders SVG/VML graphics. So it also works on your iOS devices. These new features have been added to the googleVis R package in version 0.

Data is the new gold

We need more data journalism. How else will we find the nuggets of data and information worth reading? Life should become easier for data journalists, as the Guardian, one of the data journalism pioneers, points out in this article about the new open data initiative of the European Union (EU). The aims of the EU’s open data strategy are bold. Data is seen as the new gold of the digital age.

LondonR, 6 December 2011

The London R user group met again last Wednesday at the Shooting Star pub. And it was busy. More than 80 people had turned up. Was it the free beer and food, sponsored by Mango, which attracted the folks or the speakers? Or the venue? James Long, who organises the Chicago R user group meetings and who gave gave the first talk that night, noted that to his knowledge only the London and Chicago R users would meet in a pub.

Fitting distributions with R

Fitting distribution with R is something I have to do once in a while, but where do I start? A good starting point to learn more about distribution fitting with R is Vito Ricci’s tutorial on CRAN. I also find the vignettes of the actuar and fitdistrplus package a good read. I haven’t looked into the recently published Handbook of fitting statistical distributions with R, by Z. Karian and E.J. Dudewicz, but it might be worthwhile in certain cases, see Xi’An’s review.

Interactive presentations with deck.js

Data analysis is often an iterative and interactive process. However, when I present about this subject, I feel often limited by the presentation software I use. It doesn’t matter if I use LaTeX/PDF, PowerPoint or Keynote. In all cases it is either very difficult or impossible to include interactive charts, such as Flash or SVG charts. As a result I have to switch between various applications during the talk. This can be fun, but quite often it is not.

Stochastic reserving with R: ChainLadder 0.1.5-1 released

Today we published version 0.1.5-1 of the ChainLadder package for R. It provides methods which are typically used in insurance claims reserving to forecast future claims payments. Claims development and chain-ladder forecast of the RAA data set using the Mack methodThe package started out of presentations given at the Stochastic Reserving Seminar at the Institute of Actuaries in 2007, 2008 and 2010, followed by talks at CAS meetings in 2008 and 2010.

Installing R 2.14.0 on an iBook G4 running Mac OS 10.4.11

My 12” iBook G4 is celebrating its 8th birthday today! Time for a little present. How about R 2.14.0? The iBook is still in daily use, mostly for browsing the web, writing e-mails and this blog; and I still use it for R as well. For a long time it run R 2.10.1, the last PowerPC binary version available on CRAN for Mac OS 10.4.11 (Tiger). But, R 2.10.1 is a bit dated by now and for the development of my googleVis package I require at least R 2.

Using Sweave with XeLaTeX

Using R with LaTeX via Sweave is a great way to create reproducible output. However, using specific fonts, e.g. your corporate fonts, can be painful with pdflatex. Over the last few weeks I have fallen in love with the TeX format XeLaTeX and its XeTeX engine. With XeLaTeX I had to overcome some hurdles, which I would like to share here: attaching files, trimming and clipping images, learning how to use the tikzDevice package.

R related books: Traditional vs online publishing

How many R related books have been published so far? Who is the most popular publisher? How many other manuals, tutorials and books have been published online? Let’s find out. A few years ago I used the publication list on r-project.org as an argument with the IT department that R is an established statistical programming language and that they should allow me to install it on my PC. I believe at the time there were about 20 R related books available.

Setting the initial view of a motion chart in R

Following on from my article about accessing and plotting World Bank data with R I want to talk about how to change the initial view of a motion chart. Over the last couple of weeks I have been asked a view times how to do this. For instance Stephen O’Grady wanted to create a motion chart, which shows initially a line chart, rather than a bubble chart. Changing the initial settings of a motion chart is actually quite easy, if you know how to.

Accessing and plotting World Bank data with R

Over the past couple of days I played around with the data sets of the World Bank, and I have to admit that I am blown away by it. It is amazing, to see what is available on their web site and it is worth visiting their Data Visualisation Tools page. It is fantastic that they provide an API to their data. They have used it to build an iPhone App which is pretty cool.

googleVis 0.2.9

Today we published googleVis 0.2.9 on CRAN. The new version updates the package for the new features of the Google Visualisation API and brings a new in-page editor option. Here is a simple example, displaying the participants of the R user Conference 2011 in Warwick by country. Notice the ‘Edit me’ button in the top left corner of the chart, which allows you to change and customise the graph. library(XML) url <- "http://www.