Wednesday 4 December 2013

A Recap of the Last Couple Months (Part 1)

It's been a while. I know. I had hoped to finish this a lot earlier and to make my regular contributions to the Stream's Stream. Oh well ... better late than never :)


STREAM Challenge Week (Morpeth, 7-12 July)

Back in July, (nearly) all research engineers from STREAM spent a couple days in Morpeth together. I am only going to show you some photos here. You can find out more about the event from my fellow STREAM-ers Jack Bloodworth and Sarah Cotterill.


Kielder Water and Forest Park
Walking around Morpeth
Carlisle Park - It's Picnic Time!
The Beautiful Morpeth Stepping Stones
STREAM Conference at Newcastle University
Inter-Cohort Rounders' Championship Tournament
Group Presentations for the Morpeth Flooding Challenge 
Dinner and Awards Presentation
STREAM Group Photo at Longhirst Hall


35th IAHR World Congress (Chengdu, 8-13 September)

For the very first time, I travelled back to my home country for work. After a rather busy 2012, I cut down significantly on travel and conferences this year. This conference was my only international duty (well, excluding the Institute of Water Annual Conference in Edinburgh which I travelled from England to Scotland) this year. The IAHR conference was definitely one of the biggest conferences I had ever been to with over 1400 attendances gathering in the Chengdu Century City International Convention Center.

On the first day of conference, I met with XP Solutions' distributor in China - Ewaters. We discussed some potential case studies based on projects in China and agreed on the dates for the post-conference software workshops in Shanghai.


Opening Ceremony. Looking at the back of Professors Dawei Han and Dragan Savic (my past and current supervisor) - what are the chances?

After the technical sessions in the afternoon, we were all invited to join Prof. Roger Falconer (IAHR President) for the president's reception where he welcomed everyone to the congress. It was a very interesting and entertaining evening with stunning face-changing performance by the local Sichuanese opera.

IAHR President's Reception at Shunxing Teahouse

On the third day, I had chosen to visit the Dujiangyan Irrigation Project for the in-congress technical tour. It is one of the oldest irrigation systems in the world and an excellent example of ancient Chinese science and engineering. The project has successfully prevented flooding in Chengdu ever since its completion about 2200 years ago. Simply astonishing!



In-Congress Technical Tour - Dujianyan Irrigation Project
Congress Dinner

On a more serious note, I suited up and delivered a presentation on the last day of the conference. The presentation was a summary of my progress so far and a prologue to the post-conference software workshops. It generated some interests from the Hong Kong Drainage Services Department's representatives. Afterwards, we discussed the possibilities of applying my work in Hong Kong and they had kindly invited me to attend their conference in Hong Kong next year.


As usual, flooding the crowd with colourful visuals.

After the conference, I had a short window to explore the city. (Making the most of it - right, Sarah?)


Wu Hou Shrine Museum
Jin Li, The Ancient Chinese Machine Gun and the Inevitable Invasion of Starbucks
Chengdu at night - Tianfu Square (a Yin Yang from above!)

Overall, I think the IAHR conference was an invaluable experience for me. It allowed me to connect with the people there and to gain a much better understanding of the situation, challenges and needs in China - which means a lot to me as it is my home country. I would also like to thank STREAM and my supervisor Dragan for giving me the opportunities to attend different conferences in last couple years.

Next Stop: Shanghai

XPDRAINAGE workshops in Shanghai and life after the China trip. Watch this space!



Friday 29 November 2013

Introducing CrimeMap - A Web App Powered by ShinyApps!

A few months ago I did a mini project using open crime data and R to create crime visualisations. At that time, I was already thinking about a web app using Shiny but I couldn't justify the time to develop the app and then set up a server etc. Not until two weeks ago when I received an invitation to join the alpha testing of ShinyApps.


ShinyApps - A Wonderful Discovery

I went through the ShinyApps' getting started guide. Everything looks pretty straight forward. So I decided to go ahead and moved my codes from the crime visualisation project to a new ShinyApps project. The progress was unexpectedly smooth. Given that I had no web application development experience prior to this mini exercise, I consider this a quick success! All credit goes to the RStudio team for providing these tools and hosting services (especially Tareef Kawaf who kindly answered all my questions). I would summarise the whole process in the following few steps:
  1. Sign up for a ShingApps alpha testing account here (you will need a Google account).
  2. Install packages shiny and shinyapps (click on the links for installation help).
  3. Sign in to my.shinyapps.io, give your ShinyApps account a name ("blenditbayes" in my case) and get your application token/secret.
  4. Go through the tutorials, create the ui.R and server.R scripts for your app.
  5. Test the app locally using runApp().
  6. Once you're happy with the app, apply your token/secret and deploy your app using deployApp().
That's it! I can focus my effort solely on developing the app. The rest has been taken care of and simplified by the deployApp() and other ShinyApps functions.
 

CrimeMap in Action

So here is my first ever web app "CrimeMap" powered by ShinyApps! I will go through the usage briefly in the following sections. I hope you can give it a try and give me some feedback (e.g. what features are missing?) so I can continue to improve it. Your comments will be valuable to the ShinyApps development team too.

Basic Usage

First, enter a location of your choice (e.g. London) within England, Wales and Northern Ireland. Select the first month of crime data collection and then the length of the analysis period. After that, click on the "Update" button. My experience is that the graphs should come up within a minute if the length of analysis is less than or equal to 6 months. Depending on the location and the length of analysis, the process might take longer (say a couple minutes). The outputs (at the time of writing) are displayed in three tabs: Data, Crime Map and Trends. The Data tab shows the original crime data records downloaded from the data.police.uk (I may add a feature like "download as CSV in future). The crime map is a density plot of the crime data. Finally, the trends tab shows bar charts of crime records over time in different crime categories. Right click on the map and you can save the image in its original size (1280 x 1280).




Customise the Maps

Continue to scroll down the menu on the left, you will find more settings for the map. Change the facet settings to create map facets according to crime type, categories and month. You can select different Google map format (roadmap, satellite, hybrid or terrain). There are also options to download the Google map image in high resolution (it takes a bit longer), in black & white and at various zoom levels. When you click the "Update" button to refresh the maps, it may take a while to show the new graphs if you have lots of facets.



Fine-tuning the Density Plots

The Density Plot Settings allow user to modify the "behind the scenes" ggplot2 codes. You can modify the layer transparency (alpha range), the number of bins, the width and the colour of the boundary lines as well as the colours of the gradient. My hope is to develop a user-friendly interface that allows users to quickly create maps with their own favourite themes. Surely, ggplot2 is much more powerful than that and has a lot more settings available. What other settings would you like to see here? Please let me know.



Exploring the Trends

Why am I including the trends visualisation? It just happened that I read this article around the same time I received the ShinyApps invitation. As I had already coded something for crime data visualisation, I thought it would be interesting to look at the data myself. So here is a handy tool for you to explore and to visualise the data with a few clicks. I will leave you with your own conclusions.



Feedback Please

As I mentioned above, I am new to web app development and this is my first ever experiment. Please have a go, create a few maps and let me know what can be done better. Thanks in advance!! All the codes are available here.

Thursday 15 August 2013

Creating a Quick Report with knitr, xtable, R Markdown, Pandoc (and some OpenBLAS Benchmark Results)

To cut a long story short, I always wanted to write professional-looking documents (technical reports and potentially my thesis) with R codes. No more copy and paste. No more Microsoft Word. At the same time, I don't feel comfortable with LaTeX. Somehow I found a workaround with knitr, xtable, R Markdown and Pandoc.

I must say that my solution is far from perfect as I haven't mastered the document layout configuration yet. But I did manage to get some satisfactory results (well, from a seasoned MS Word user's point of view) with minimal R Markdown, xtable and knitr codes.

Instead of showing some dummy results, I created a simple report on R-25 benchmark results with two versions of OpenBLAS (ver. 0.2.6-1: the default, 2-threaded version on my Linux and ver. 0.2.8-1: the latest, multi-threaded version which had been made available recently). In short, the latest OpenBLAS performed slightly better in most of the R-25 tests but two. For more details, download the full pdf here.

The code-generated report looks like this ...


... which I think is pretty enough for a quick report. When I look at the source R Markdown file which is nothing but geeky plain text (see below), I just can't find words to describe the awesomeness of knitr + xtable + pandoc. Thank you very much Yihui, RStudio team, David, Charles and John.


The codes are available on Github. This was my first attempt to code a report, the code structure isn't pretty enough for showcase but I had commented as much as possible. I hope you enjoy this blog post and give this code-generated report routine a try!

Updated (21 Aug 2012): A basic 4-step example can be found here.
Updated (01 Feb 2014): The basic 4-step example is now here.

Friday 28 June 2013

Quantifying Green Values of Drainage Systems

This is the very first blog post about my research project. Although I am not going to write much about it this time, I do have a Prezi presentation here. Enjoy!

Tuesday 25 June 2013

Visualising Crime Hotspots in England and Wales using {ggmap}

Two weeks ago, I was looking for ways to make pretty maps for my own research project. A quick search led me to some very informative blog posts by Kim Gilbert, David Smith and Max Marchi. Eventually, I Google'd the excellent crime weather map example by David Kahle and decided to stick with the gg-style approach.

Thanks to David Kahle and Hadley Wickham who ramped up that example and subsequently developed the {ggmap} package, making maps in R can be really intuitive and fun!

I wrote a wrapper function that takes a location within England and Wales, downloads crime data around that location over a certain period of time and creates crime weather plots. This blog post discusses the data used, methodology and the wrapper function with some worked examples. The codes are available here.

(Nov-2013 Update: I have updated the codes and created a web app using Shiny and ShinyApps. For more information, please read this new blog post.)

Data

The street-level crime data is one of the 9,000 datasets available from data.gov.uk. The data can be downloaded systematically via the Police API. The latest version of the API no longer requires authentication.

The following URL can be used to obtain crime records at street-level within a one-mile radius of a single point. The parameters required are latitude, longitude and month. The downloaded data is in JSON format which can be converted into R's data format using the {RJSONIO} package.

Example URLhttp://data.police.uk/api/crimes-street/all-crime?lat=52.629729&lng=-1.131592&date=2012-04

Methodology

The methodology can be summarised in the following six steps:

1. Obtain latitude and longitude of a user-defined location using ggmap::geocode.
2. Download crime data via the Police API as discussed above.
3. Convert JSON into a list and then a data frame.
4. Download a base map from Google using ggmap::get_googlemap.
5. Covert the base map into a ggplot object using ggmap::ggmap.
6. Add multiple layers on top of the base map using the data frame like a normal ggplot.

For more details, check out the functions in the codes:

  • "get.data" and "list2df" for steps 1, 2 and 3
  • "visualise.data" for steps 4, 5 and 6

Wrapper and Worked Examples

The wrapper function looks like this ...
crimeplot.wrapper <- function(
  point.of.interest = "London Eye",  ## user-defined location
  period = c("2013-01","2013-02"),  ## period of time in YYYY-MM
  type.map = "roadmap",  ## roadmap, terrain, satellite or hybrid
  type.facet = NA,  ## options: NA, month, category or type
  type.print = NA,  ## options: NA, panel or window
  output.plot = TRUE,  ## print it to a png file?
  output.filename = "temp.png",  ## provide a filename
  output.size = c(700,700)) ## width and height setting                              
... given the location, time period and a few more graphical settings, the wrapper can produce a crime weather map. The following worked examples illustrate the usage.

Example 1 - All crimes around London Eye from Jan-2013 to Apr-2013




Comments:
Here we can see a huge crime hotspt in the Soho district of London - an area full of bars, restaurants, theatres and nightclubs (did I mention Chinatown?)

Codes:
## Define the period
ex1.period <- format(seq(as.Date("2013-01-01"),length=4,by="months"),"%Y-%m")

## Use the wrapper
ex1.plot <- crimeplot.wrapper(point.of.interest = "London Eye",
                              period = ex1.period,
                              type.map = "roadmap",
                              output.filename = "ex1.png",
                              output.size = c(700,700))

Example 2 - Typical crimes and traffic incidents around London Eye from Jan-2013 to Apr-2013


(Note: click on the image to see original image in higher resolution)

Comments:
Now we seperate the data from British Transport Police (BTP) and all other forces (Force) using the facet function in {ggplot}. We can see a traffic black spot on the other side of River Thames.

Codes:
## Define the period
ex2.period <- format(seq(as.Date("2013-01-01"),length=4,by="months"),"%Y-%m")
 
## Use the wrapper
ex2.plot <- crimeplot.wrapper(point.of.interest = "London Eye",
                              period = ex2.period,
                              type.map = "roadmap",
                              type.facet = "type",
                              output.filename = "ex2.png",
                              output.size = c(1400,700))

Example 3 - Monthly crimes in Manchester for the year 2012 on a satellite map


(Note: click on the image to see original image in higher resolution)


Comments:
Using the facet function on "month", we can look at the changes in patterns over time. Looks like there is not much seasonality in Manchester as the crime hotspots remain hot over the year.

Codes:
## Define the period
ex3.period <- format(seq(as.Date("2012-01-01"),length=12,by="months"),"%Y-%m")

## Use the wrapper
ex3.plot <- crimeplot.wrapper(point.of.interest = "Manchester",
                              period = ex3.period,
                              type.map = "satellite",
                              type.facet = "month",
                              output.filename = "ex3.png",
                              output.size = c(1400,1400))

Example 4 - Crimes by categories in Liverpool from Jan-2013 to Apr-2013 on a hybrid map


(Note: click on the image to see original image in higher resolution)

Comments:
Now we separate different categories of crimes. It is interesting to see that only a small part of the city is affected by shoplifting and other theft while burglary, arson and vehicle crimes are very common problems in Liverpool.

Codes:
## Define the period
ex4.period <- format(seq(as.Date("2013-01-01"),length=4,by="months"),"%Y-%m")
 
## Use the wrapper
ex4.plot <- crimeplot.wrapper(point.of.interest = "Liverpool",
                              period = ex4.period,
                              type.map = "hybrid",
                              type.facet = "category",
                              output.filename = "ex4.png",
                              output.size = c(1400,1400))

Further Work

Further work is needed to ...
1. optimise the codes for "list2df" transformation (At the moment it is quite slow. I tried lapply but it didn't give me back the desired data frame format but I know there must be a solution.)
2. better automate the graphical settings for output resolution, font size etc.
3. make it interactive using {Shiny}

(Nov-2013 Update: I have improved (1) using plyr::ldply and done (2 & 3). See this.)

Acknowledgement

I would like to thank Yanchang Zhao for his excellent book titled "R and Data Mining: Examples and Case Studies" which encouraged me to shift from MATLAB to R. All embedded codes were Created by Pretty R at inside-R.org.

Key References

Wednesday 3 April 2013

Colour it up: my quest to master ggplot2 (part 2)

Tuesday 19 March 2013

Learning-by-doing: my quest to master ggplot2 (part 1)

Monday 18 March 2013

R, where should I start?


This is a dynamic post which I will continue to update whenever I find something new. Hope you will find the following links useful.

Online Courses for Learning the R language

Free Documentations for Learning the R Language

  1. R for Beginners by Emmanuel Paradis
  2. R Graphics by Paul Murrel
  3. ggplot2 (official documentation)
  4. Advanced R Programming by Hadley Wickham

Online Courses for Data Mining with R

e-Books for Data Mining with R

R Tutorials

  1. Twotorials by Anthony Damico (learning new tricks from short 2-min videos)
  2. Revolution Analytics Free Webinars
  3. ggplot2 Graphics Cheat Sheet
  4. 10 tips for making your R graphics look their best
  5. Making Maps with R
  6. Compiling R 3.0.1 with MKL support
  7. Flowing Data - Tutorials
  8. Quick-R
  9. R-Uni (A List of Free R Tutorials and Resources in University Webpages)

Interesting Blogs and Articles

Useful R Packages

  1. Ten R packages I wish I knew about earlier (Before you do anything, read this blog post first!!)
  2. caret (short for Classification And REgression Training) for a simple way to train and fine-tune model using different algorithms
  3. ff and bigmemory - two packages to solve memory issues with big datasets
  4. quantmod for financial modelling
  5. foreach and doSNOW for parallel computing in R

Interactive Development Environment

Sunday 17 March 2013

Blend what?

Why?

Over the years I have learned quite a few things about machine learning but I have never thought of writing them down properly. Too often I can't figure out exactly what I did when I look at my old codes. The time is NOW!

More importantly, I have fallen in love with the R programming language and the massive amount of useful packages from the R community. I want to talk about tricks, tools and useful resources for data mining with R (and sometimes my old favourite Matlab) here. 

Bayesian Ensemble Learning

One of the interesting tricks I learned is called "Bayesian Ensemble Learning". It involves combining (i.e. blending) different models to improve overall prediction accuracy. Although it has its downside (e.g. computationally expensive, difficult to interpret ...), it is certainly my favourite data mining technique at the moment. I also decided to name this blog with it long before I start writing this first post!

Research

There is also a need to promote my own research project online. So I guess there will be times I talk about drainage design, green infrastructure and decision support systems. This is not the main focus of the blog but I will try to create some funky graphs and explain my research to a wider audience when the time is right (i.e. when I eventually master the art of graphics in R).

OK, so here we go, this is my journey into the wonderful world of data science!