Soapbox

MacBook Air battery replacement

After four years of daily use our MacBook Air informed us that it needed a battery replacement. That’s kind of nice to know, in particular as it still feels speedy and otherwise just works. A new battery isn’t that expensive and according to iFixit it appeared to be quite easy to replace it. I needn’t to worry, it was actually super simple, given appropriate tools: Remove 10 screws from bottom caseOpen caseDisconnect batteryRemove 5 screws from batterySwap batteryReassemble everything back togetherJob done.

It is the small data that matters the most

Everyone is talking about Big Data1, but it is the small data that is holding everything together. The small slowly changing reference tables are the linchpins. Unfortunately, too often politics gets in the way as those small tables, maintained by humans, don’t get the attention they deserve; or in other words their owners, if they exists - many of these little tables are orphans, make changes without understanding the potential consequences on downstream systems.

Approximating the impact of inflation

The other day someone mentioned to me a rule of thumb that he was using to estimate the number of years $n$ it would take for inflation to destroy half of the purchasing power of today’s money: $$ n = \frac{70}{p}$$ Here $p$ is the inflation in percent, e.g. if the inflation rate is $2\%$ then today’s money would buy only half of today’s goods and services in 35 years. You can also think of a saving account with an interest rate of $2\%$ that would double your money in 35 years.

The Wiener takes it all? A review of the 2014 Eurovision results

Saturday’s Eurovision Song Contest (ESC) from Copenhagen was hilarious as usual with acts from all over Europe and some more or less sensible gimmicks: a circular piano, a giant hamster wheel, a sea-saw, or indeed a beard and fancy dress. The results of the ESC were only a little different to what the bookmakers in the UK had predicted before the event started. Sweden was seen as the favourite, followed by Austria, Netherlands, Armenia and the UK.

How many more R-bloggers posts can I expect?

I noticed that the monthly number of posts on R-bloggers stopped increasing over the last year. Indeed, the last couple of months saw a decline in posts compared to the previous year. Thus, has most been said and written about R already? Who knows? Well, I took a stab at looking into the future. However, I can tell you already that I am not convinced by my predictions. But maybe someone else will be inspired to take this work forward.

Whale charts - Visualising customer profitability

The Christmas and New Year’s break is over, yet there is still time to return unwanted presents. Return to Santa was the title of an article in the Economist that highlighted the impact on online retailers, as return rates can be alarmingly high. The article quotes a study by Christian Schulze of the Frankfurt School of Finance and Management, which analyses the return habits of customers who bought at least five items over a five year period from a large European online retailer.

Why models need a certain culture to flourish

About half a year ago Ian Branagan, Chief Risk Officer of Renaissance Re - a Bermudian reinsurance company with a focus on property catastrophe insurance, gave a talk about the usage of models in risk management and how they evolved over the last twenty years. Ian’s presentation, titled with the famous quote of George E.P. Box: “All models are wrong, but some are useful”, was part of the lunch time lecture series of talks at Lloyd’s, organised by the Insurance Institute of London.

Installing a SSD drive into a mid-2007 iMac

I have a mid-2007 iMac with a 2.4 GHz Core2Duo processor and despite the fact that it is already six years old, it still does a good job. However, compared to a friend’s recent MacBook Air with a solid state disk (SSD) it felt sluggish when opening programmes and loading larger documents. So, I thought it would be worthwhile to replace the old spinning hard disk drive with an SSD, instead of buying a new computer.

Don't be misguided by the beauty of mathematics, if the data tells you otherwise

I was trained as a mathematician and it was only last year, when I attended the Royal Statistical Society conference and met many statisticians that I understood how different the two groups are. In mathematics you often start with some axioms, things you assume to be true, and these axioms are then the basis from which new theory is derived. In statistics or more general in science you start with a theory, or better a hypothesis and try to disprove it.

Test Driven Analysis?

At the last LondonR meeting Francine Bennett from Mastodon C shared some of her experience and findings from an analysis of a large prescriptions data set of the UK’s national health service (NHS). However, it was her last slide, which I found the most thought provoking. It asked for the definition of the following term: Test-driven analysis?Francine explained that test driven development (TDD) is a concept often used in software development for quality assurance and she wondered if a similar approach could be also used for data analysis.

Next Kölner R User Meeting: 12 April 2013

Quick reminder: The next Cologne R user group meeting is scheduled for this Friday, 12 April 2013. We will discuss cluster analysis and shiny. Further details and the agenda are available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here.

Thanks also to Revolution Analytics, who sponsors the Cologne R user group as part of their vector programme.

Top 10 tips to get started with R

Be motivated. R has a steep learning curve. Find a problem you can’t solve otherwise. E.g. plotting multivariate data, a statistical analysis for which an R function exists already. Download and install R. Get to know the R console. Learn how to install additional packages, how to access the history, how to use auto completion and open the help system. Review the R Installation and Administration manual and check out the free books section on CRAN.

Next Kölner R User Meeting: 6 February 2013

Quick reminder: The next Cologne R user group meeting is scheduled for tomorrow, 6 February 2013. All details and the agenda are available on the KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here. Thanks also to Revolution Analytics, who are sponsoring the Cologne R user group as part of their vector programme.

Follow the ants to richness

A friend of mine told me the secret of making money at the stock market. “It’s easy”, he said. All I would have to do is to buy a big jar of ants. Then I should observe the ants movement on my kitchen table, while following the stock market. I shall keep the ants which walk in line with the stock market and remove those who don’t. Eventually I would have one ant left that walked all the way in line with the stock market.

Comparing regions: maps, cartograms and tree maps

Last week I attended a seminar where a talk was given about the economic opportunities in the SAAAME (South-America, Asia, Africa and Middle East) regions. Of course a map was shown with those regions highlighted. The map was not that disimilar to the one below. library(RColorBrewer) library(rworldmap) data(countryExData) par(mai=c(0,0,0.2,0),xaxs=“i”,yaxs=“i”) mapByRegion( countryExData, nameDataColumn=“GDP_capita.MRYA”, joinCode=“ISO3”, nameJoinColumn=“ISO3V10”, regionType=“Stern”, mapTitle=” “, addLegend=FALSE, FUN=“mean”, colourPalette=brewer.pal(6, “Blues”))It is a map that most of us in the Northern hemisphere see often.

Time for an old classic game: Moon-buggy

I discovered an old classic game of mine again: Moon-buggy by Jochen Voss, based on the even older Moon Patrol, which celebrates its 30th birthday his year. I remember installing the command line game on my Sun SPARCstation 1 computer at university many moons ago. Hours of fun! Well, waisted time actually. Never-mind, I am delighted to have found it again. You can’t beat command line games. One day I shall try to control the moon buggy with my Arduino.

From guts to data driven decision making

Source: Wikipedia, License: CC0 There is a wonderful cartoon by Loriot, a German humorist (1923 - 2011), about a couple sitting at a breakfast table, arguing about how to boil a four-and-a-half minute egg. The answer appears simple, but husband and wife argue about how to measure the time using experience, feelings and expert judgment (wife) or a clock (husband). The whole sketch is hilarious and is often regarded as a fine observation of miss-communication.

Next Kölner R User Meeting: 5 October 2012

The next Cologne R user group meeting is scheduled for 5 October 2012. All details and the agenda are available on the KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here. Thanks also to Revolution Analytics, who are sponsoring the Cologne R user group as part of their vector programme.

View Larger Map

Connecting data to the real world - The next sexy job?

At last week’s Royal Statistical Society (RSS) conference Hal Varian, Chief Economist at Google, gave a panel talk about ‘Statistics at Google’. Could he get a better audience than the RSS? Hal talked about his career in academia and at Google. He reminded us of the days when Google was still a small start up with no real idea about how they could actually generate revenue. At that time Eric Schmidt asked him to ‘take a look’ at advertising because ‘it might make us a little money’.

Are career motivations changing?

The German news magazine Der Spiegel published a series of articles [1, 2] around career developments. The stories suggest that career aspirations of young professionals today are somewhat different to those of previous generations in Germany. Apparently money and people management responsibility are less desirable for new starters compared to being able to participate in interesting projects and to maintain a healthy work life balance. Hierarchies are seen as a mean to an end, and should be more flexible, depending on requirements and skills sets.

Sigma motion visual illusion in R

Michael Bach, who is a professor and vision scientist at the University of Freiburg, maintains a fascinating site about visual illusions. One visual illusion really surprised me: the sigma motion. The sigma motion displays a flickering figure of black and white columns. Actually it is just a chart, as displayed below, with the columns changing backwards and forwards from black to white at a rate of about 30 transitions per second.

London Olympics 100m men's sprint results

The 100m mean’s sprint finals of the 2012 London Olympics are over and Usain Bolt won the gold medal again with a winning time of 9.63s. Time to compare the result with my forecast of 9.68s, posted on 22 July. My simple log-linear model predicted a winning time of 9.68s with a prediction interval from 9.39s to 9.97s. Well, that is of course a big interval of more than half a second, or ±3%.

London Olympics and a prediction for the 100m final

It is less than a week before the 2012 Olympic games will start in London. No surprise therefore that the papers are all over it, including a lot of data and statistis around the games. The Economist investigated the potential financial impact on sponsors (some benefits), tax payers (no benefits) and the athletes (if they are lucky) in its recent issue and video. The Guardian has awhole series around the Olympics, including the data of all Summer Olympic Medallists since 1896.

Bridget Riley exhibition in London

The other day I saw a fantastic exhibition of work by Bridget Riley. Karsten Schubert, who is Riley’s main agent, has a some of her most famous and influential artwork from 1960 - 1966 on display, including the seminal Moving Squares from 1961. Photo of Moving Squares by Bridget Riley, 1961 Emulsion on board, 123.2 x 121.3cmIn the 1960s Bridget Riley created some great black and white artwork, which at a first glance may look simple and deterministic or sometimes random, but has fascinated me since I saw some of her work for the first time about 9 years ago at the Tate Modern.

Reminder: Next Kölner R User Meeting 6 July 2012

This post is a quick reminder that the next Cologne R user group meeting is only one week away. We will meet on 6 July 2012. The meeting will kick off at 18:00 with three short talks at the Institute of Sociology and will continue, even more informal, from 20:00 in a pub (LUX) nearby. All details are available on the KölnRUG Meetup site. Please sign up if you would like to come along.

UK house prices visualised with googleVis-0.2.16

A new version of googleVis has been released on CRAN and the project site. Version 0.2.16 adds the functionality to plot quarterly and monthly data as a motion chart. To illustrate the new feature I looked for a quarterly data set and stumbled across the quarterly UK house price data published by Nationwide, a building society. The data is available in a spread sheet format and presents the average house prices and indexed to 100 in Q1 1993 by region in the UK from Q4 1973 to Q1 2012.

End User Computing and why R can help meeting Solvency II

John D. Cook gave a great talk about ‘Why and how people use R’. The talk resonated with me and highlighted why R is such a great tool for end user computing. A topic which has become increasingly important in the European insurance industry. John’s main point on why people use R is that R gets the job done and I think he is spot on. Of course that’s the trouble with R sometimes as well, or to quote Bo again:

Next Kölner R User Meeting: 6 July 2012

From the Guardian's data blog: Visualising risk

The Guardian published a nice summary and link collection of an interdisciplinary visualisation workshop hosted by Microsoft dedicated to visualising probability and risk. Check it out here. The links I found most interesting were those to the pages of Gregor Aisch and Moritz Stefaner. You may have come across their work in the past, as Moritz worked on the OECD better life index and Gregor contributed to the Where does my money go site.

German train monitor provides access to train delay data

The German newspaper Süddeutsche Zeitung (SZ) worked together with OpenDataCity to create an online train monitor of the German network: Zugmonitor. This is another great example of the new form of data journalism. The project provides access to data of train delays collected over 150 days between 2 October 2011 and 1 March 2012 and allows you to analyse the delays in more detail. Here is an example showing the delays by station.

Show me the data! Or how to digitize plots

I had mentioned the Guardian’s data blog and the need for more data journalism earlier here. What I really like about the Guardian’s approach in particular is that they share the data of their articles and encourage readers to use it. Of course there are perfectly valuable reasons for only displaying a chart and not making the underlying data available, e.g. to generate leads, as potential customers may get in touch with you asking for the underlying data, or technology issues that don’t allow you to upload data, etc.

Big data seminar in London on 1 March 2012

David Chan from City University is organising an interdisciplinary symposium on tackling the ‘Big Data’ challenge on 1 March 2012. It is an open seminar trying to bring together academics and practitioners from across industry to tackle the challenges posed by “big data” - the growing amount of information that needs to be stored, searched, analysed and visualised in the digital age. The event will take place in the Oliver Thompson Lecture Theatre, Northampton Square, London EC1V 0HB.

Reshaping the IT world

During my university time I worked on the IT help desk for a while. One day I received a call from a professor, who said that his printer had stopped working. So I asked him, if there was a message on the display and if he could read it to me. “Oh yes”, he said, “it says: ‘Load A4 paper.’” Rachel King quotes a study by Cisco on ZDnet, which believes to have found out that college students and young employees under the age of 30 would rather take a lower salary than having no social media freedom, device flexibility and work mobility.

R is the easiest language to speak badly

I am amazed by the number of comments I received on my recent blog entry about “by”, “apply” and friends. I had started my post by pointing out that R is a language. Well indeed, I have come to the conclusion, that it is a language with lots of irregular expressions and dialects. It feels a bit like German or French where you have to learn and memorise the different articles.

Credit rating by country

The financial crisis has put a lot of pressure on countries’ long-term foreign currency credit ratings, with France recently being downgraded by S&P. Wikipedia provides a list of countries by credit ratings as report by US rating agencies S&P, Fitch, Moody’s and Dagong, a Chinese rating agency. So, what does the world look like today through the eyes of those rating agencies? I use the R packages XML and googleVis to read and display the data from Wikipedia with just a few lines.

Managing change

Why the old and the new need to share time together It takes time to appreciate the new. Even if the new is much better than the old. It is easy to forget when you yourself created the exciting new. At the end of August 2011 Google announced a new Blogger interface. The new interface offered about the same functionality, but had a different look and feel. At first I was reluctant to use it.

Feedback from vignette survey

Many thanks to all who participated in the survey about writing R package vignettes. Following my post last Thursday the responses came in quickly in the evening and all day on Friday. Since Saturday the response rate has been decreasing constantly and I think it is time for a summary based on the 56 responses received. Summary - How to write a good vignetteLength: Trust yourself, but aim for about 20 pages.

Survey: Writing package vignette

I am currently co-writing the vignette for the ChainLadder package and wonder what I should be focusing on. I have co-written the vignette of the googleVis package in the past and based it purely and what I thought would work. So, this is an experiment to find out, if user feedback will help me to write a better vignette. Let’s see how it develops. I will make the data available once I have at least 10 submission.

Is R turning into an operating system?

Over the years I convinced my colleagues and IT guys that LaTeX/XeLaTeX is the way forward to produce lots of customer reports with individual data, charts, analysis and text. Success! But of course the operating system in the office is still MS Windows. With my background in Solaris/Linux/Mac OSX I am still a little bit lost in the Windows world, when I have to do such simple tasks as finding and replacing a string in lots of files.

Data is the new gold

We need more data journalism. How else will we find the nuggets of data and information worth reading? Life should become easier for data journalists, as the Guardian, one of the data journalism pioneers, points out in this article about the new open data initiative of the European Union (EU). The aims of the EU’s open data strategy are bold. Data is seen as the new gold of the digital age.

Stochastic reserving with R: ChainLadder 0.1.5-1 released

Today we published version 0.1.5-1 of the ChainLadder package for R. It provides methods which are typically used in insurance claims reserving to forecast future claims payments. Claims development and chain-ladder forecast of the RAA data set using the Mack methodThe package started out of presentations given at the Stochastic Reserving Seminar at the Institute of Actuaries in 2007, 2008 and 2010, followed by talks at CAS meetings in 2008 and 2010.

R related books: Traditional vs online publishing

How many R related books have been published so far? Who is the most popular publisher? How many other manuals, tutorials and books have been published online? Let’s find out. A few years ago I used the publication list on r-project.org as an argument with the IT department that R is an established statistical programming language and that they should allow me to install it on my PC. I believe at the time there were about 20 R related books available.