Tag: javascript

Posting abroad: my book reviews at ScraperWiki

It’s been a bit quiet on my blog this year, this is partly because I’ve got a new job at ScraperWiki. This has reduced my blogging for two reasons, the first is that I am now much busier but the second is that I write for the ScraperWiki blog. I thought I’d summarise here what I’ve done there just to keep everything in one place.

There’s a lot of programming and data science in my new job , so I’ve been reading programming and data analysis books on the train into work. The book reviews are linked below:

I seem to have read quite a lot!

Related to this is a post I did on Enterprise Data Analysis and visualisation: An interview study, an academic paper published by the Stanford Visualization Group.

Finally, I’ve been on the stage – or at least presenting at a meeting – I spoke at Data Science London a couple of weeks ago about Scraping and Parsing PDF files. I wrote a short summary of the event here.

datavisualization_andykirk javascriptthegoodparts1 machinelearningcover interactivevisualisation natural-language-processing-with-python

 

rinaction

Book review: Interactive Data Visualization for the web by Scott Murray

interactivevisualisation

This post was first published at ScraperWiki.

Next in my book reading, I turn to Interactive Data Visualisation for the web by Scott Murray (@alignedleft on twitter). This book covers the d3 JavaScript library for data visualisation, written by Mike Bostock who was also responsible for the Protovis library.  If you’d like a taster of the book’s content, a number of the examples can also be found on the author’s website.

The book is largely aimed at web designers who are looking to include interactive data visualisations in their work. It includes some introductory material on JavaScript, HTML, and CSS, so has some value for programmers moving into web visualisation. I quite liked the repetition of this relatively basic material, and the conceptual introduction to the d3 library.

I found the book rather slow: on page 197 – approaching the final fifth of the book – we were still making a bar chart. A smaller effort was expended in that period on scatter graphs. As a data scientist, I expect to have several dozen plot types in that number of pages! This is something of which Scott warns us, though. d3 is a visualisation framework built for explanatory presentation (i.e. you know the story you want to tell) rather than being an exploratory tool (i.e. you want to find out about your data). To be clear: this “slowness” is not a fault of the book, rather a disjunction between the book and my expectations.

From a technical point of view, d3 works by binding data to elements in the DOM for a webpage. It’s possible to do this for any element type, but practically speaking only Scaleable Vector Graphics (SVG) elements make real sense. This restriction means that d3 will only work for more recent browsers. This may be a possible problem for those trapped in some corporate environments. The library contains a lot of helper functions for generating scales, loading up data, selecting and modifying elements, animation and so forth. d3 is low-level library; there is no PlotBarChart function.

Achieving the static effects demonstrated in this book using other tools such as R, Matlab, or Python would be a relatively straightforward task. The animations, transitions and interactivity would be more difficult to do. More widely, the d3 library supports the creation of hierarchical visualisations which I would struggle to create using other tools.

This book is quite a basic introduction, you can get a much better overview of what is possible with d3 by looking at the API documentation and the Gallery. Scott lists quite a few other resources including a wide range for the d3 library itself, systems built on d3, and alternatives for d3 if it were not the library you were looking for.

I can see myself using d3 in the future, perhaps not for building generic tools but for custom visualisations where the data is known and the aim is to best explain that data. Scott quotes Ben Schniederman on this regarding the structure of such visualisations:

overview first, zoom and filter, then details on demand

Book review: JavaScript: The Good Parts by Douglas Crockford

javascriptthegoodparts1

This post was first published at ScraperWiki.

This week I’ve been programming in JavaScript, something of a novelty for me. Jealous of the Dear Leader’s automatically summarize tool I wanted to make something myself, hopefully a future post will describe my timeline visualising tool. Further motivations are that web scraping requires some knowledge of JavaScript since it is a key browser technology and, in its prototypical state, the ScraperWiki platform sometimes requires you to launch a console and type in JavaScript to do stuff.

I have two books on JavaScript, the one I review here is JavaScript: The Good Parts by Douglas Crockford – a slim volume which tersely describes what the author feels the best bits of JavaScript, incidently highlighting the bad bits. The second book is the JavaScript Bible by Danny Goodman, Michael Morrison, Paul Novitski, Tia Gustaff Rayl which I bought some time ago, impressed by its sheer bulk but which I am unlikely ever to read let alone review!

Learning new programming languages is easy in some senses: it’s generally straightforward to get something to happen simply because core syntax is common across many languages. The only seriously different language I’ve used is Haskell. The difficulty with programming languages is idiom, the parallel is with human languages: the barrier to making yourself understood in a language is low, but to speak fluently and elegantly needs a higher level of understanding which isn’t simply captured in grammar. Programming languages are by their nature flexible so it’s quite possible to write one in the style of another – whether you should do this is another question.

My first programming language was BASIC, I suspect I speak all other computer languages with a distinct BASIC accent. As an aside, Edsger Dijkstra has said:

[…] the teaching of BASIC should be rated as a criminal offence: it mutilates the mind beyond recovery.

– so perhaps there is no hope for me.

JavaScript has always felt to me a toy language: it originates in a web browser and relies on HTML to import libraries but nowadays it is available on servers in the form of node.js, has a wide range of mature libraries and is very widely used. So perhaps my prejudices are wrong.

The central idea of JavaScript: The Good Parts is to present an ideal subset of the language, the Good Parts, and ignore the less good parts. The particular bad parts of which I was glad to be warned:

  1. JavaScript arrays aren’t proper arrays with array-like performance, they are weird dictionaries;
  2. variables have function not block scope;
  3. unless declared inside a function variables have global scope;
  4. there is a difference between the equality == and === (and similarly the inequality operators). The short one coerces and then compares, the longer one does not, and is thus preferred. 

I liked the railroad presentation of syntax and the section on regular expressions is good too.

railroadsyntaxdiagramfor

Elsewhere Crockford has spoken approvingly of CoffeeScript which compiles to JavaScript but is arguably syntactically nicer, it appears to hide some of the bad parts of JavaScript which Crockford identifies.

If you are new to JavaScript but not to programming then this is a good book which will give you a fine start and warn you of some pitfalls. You should be aware that you are reading about Crockford’s ideal not the code you will find in the wild.