Tag Archive: scala

Oct 24 2017

Scala – installation behind a workplace web proxy

I’ve been learning Scala as part of my continuing professional development. Scala is a functional language which runs primarily on the Java Runtime Environment. It is a first class citizen for working with Apache Spark – an important platform for data science. My intention in learning Scala is to get myself thinking in a more functional programming style and to gain easy access to Java-based libraries and ecosystems, typically I program in Python.

In this post I describe how to get Scala installed and functioning on a workplace laptop, along with its dependency manager, sbt. The core issue here is that my laptop at work puts me behind a web proxy so that sbt does not Just Work™. I figure this is a common problem so I thought I’d write my experience down for the benefit of others, including my future self.

The test system in this case was a relatively recent (circa 2015) Windows 7 laptop, I like using bash as my shell on Windows rather than the Windows Command Prompt – I install this using the Git for Windows SDK.

Scala can be installed from the Scala website https://www.scala-lang.org/download/. For our purposes we will use the  Windows binaries since the sbt build tool requires additional configuration to work. Scala needs the Java JDK version 1.8 to install and the JAVA_HOME needs to point to the appropriate place. On my laptop this is:

JAVA_HOME=C:\Program Files (x86)\Java\jdk1.8.0_131

The Java version can be established using the command:

javac –version

My Scala version is 2.12.2, obtained using:

scala -version

Sbt is the dependency manager and build tool for Scala, it is a separate install from:

http://www.scala-sbt.org/0.13/docs/Setup.html

It is possible the PATH environment variable will need to be updated manually to include the sbt executables (:/c/Program Files (x86)/sbt/bin).

I am a big fan of Visual Studio Code, so I installed the Scala helper for Visual Studio Code:

https://marketplace.visualstudio.com/items?itemName=dragos.scala-lsp

This requires a modification to the sbt config file which is described here:

http://ensime.org/build_tools/sbt/

Then we can write a trivial Scala program like:

object HelloWorld {

def main(args: Array[String]): Unit = {

    println("Hello, world!")

  }

}

And run it at the commandline with:

scala first.scala

To use sbt in my workplace requires proxies to be configured. The symptom of a failure to do this is that the sbt compile command fails to download the appropriate dependencies on first run, as defined in a build.sbt file, producing a line in the log like this:

Error!

Server access Error: Connection reset url=https://repo1.maven.org/maven2/net/
sourceforge/htmlcleaner/htmlcleaner/2.4/htmlcleaner-2.4.pom

In my case I established the appropriate proxy configuration from the Google Chrome browser:

chrome://net-internals/#proxy

This shows a link to the pacfile, something like:

http://pac.madeupbit.com/proxy.pac?p=somecode

The PAC file can be inspected to identify the required proxy, in my this case there is a statement towards the end of the pacfile which contains the URL and port required for the proxy:

if (url.substring(0, 5) == ‘http:’ || url.substring(0, 6) == ‘https:’ || url.substring(0, 3) == ‘ws:’ || url.substring(0, 4) == ‘wss:’)

   {

       return ‘PROXY longproxyhosturl.com :80’;

   }

These are added to a SBT_OPTS environment variable which can either be set in a bash-like .profile file or using the Windows environment variable setup.

export SBT_OPTS="-Dhttps.proxyHost=longproxyhosturl.com -Dhttps.proxyPort=80 -Dhttps.proxySet=true"

As a bonus, if you want to use Java’s Maven dependency management tool you can use the same proxy settings but put them in a MAVEN_OPTS environment variable.

Typically to start a new project in Scala one uses the sbt new command with a pointer to a g8 template, in my workplace this does not work as normally stated because it uses the github protocol which is blocked by default (it runs on port 9418). The normal new command in sbt looks like:

sbt new scala/scala-seed.g8

The workaround for this is to specify the g8 repo in full including the https prefix:

sbt new https://github.com/scala/scala-seed.g8

This should initialise a new project, creating a whole bunch of standard directories.

So far I’ve completed one small project in Scala. Having worked mainly in dynamically typed languages it was nice that, once I had properly defined my types and got my program to compile, it ran without obvious error. I was a bit surprised to find no standard CSV reading / writing library as there is for Python. My Python has become a little more functional as a result of my Scala programming, I’m now a bit more likely to map a function over a list rather than loop over the list explicitly.

I’ve been developing intensively in Python over the last couple of years, and this seems to have helped me in configuring my Scala environment in terms of getting to grips with module/packaging, dependency managers, automated doocumentation building and also in finding my test library (http://www.scalatest.org/) at an early stage.

Jun 01 2017

Book Review: Scala for the Impatient by Cay S. Horstmann

scala_for_impatientI thought I should learn a new language, and Scala seemed like a good choice so I got Scala for the Impatient by Cay S. Horstmann.

Scala is a functional programming language which supports object orientation too. I’m attracted to it for a number of reasons. Firstly, I’m using or considering using a number of technologies which are based on Java – such as Elasticsearch, Neo4j and Spark. Although there are bindings to my favoured language, Python, for Spark in particular I feel a second class citizen. Scala, running as it does on the Java Virtual Machine, allows you to import Java functions easily and so gives better access to these systems.

I’m also attracted to Scala because it is rather less verbose than Java. It feels like some of the core aspects of the language ecosystem (like the dependency manager and testing frameworks) have matured rapidly although the range of available libraries is smaller than that of older languages.

Scala for the Impatient gets on with providing details of the language without much preamble. Its working assumption is that you’re somewhat familiar with Java and so concepts are explained relative to Java. I felt like it also made an assumption that you knew about the broad features of the language, since it made some use of forward referencing – where features are used in an example before being explained somewhat later in the book.

I must admit programming in Scala is a bit of a culture shock after Python. Partly because its compiled rather than interpreted, although the environment does what it can to elide this difference – Scala has an REPL (read-evaluate-print-loop) which works in the background by doing a quick compile. This allows you to play around with the language very easily. The second difference is static typing – Scala is a friendly statically typed language in the sense that if you initialise something with a string value then it doesn’t force you to tell it you want this to be a string. But everything does have a very definite type. It follows the modern hipster style of putting the type after the symbol name (i.e var somevariablename: Int = 5 ), as in Go rather than before, as in earlier languages (i.e int somevariablename = 5).

You have to declare new variables as either var or val. Variables (var) are mutable and values (val) are immutable. It strikes me that static typing and this feature should fix half of my programming errors which in a dynamically typed language are usually mis-spelling variable names, changing something as a side effect and putting the wrong type of thing into a variable – usually during I/O.

The book starts with chapters on basic control structures and data types, to classes and objects and collection data types. There are odd chapters on file handling and regular expressions, and also on XML processing which is built into the language, although it does not implement the popular xpath query language for XML. There is also a chapter on the parsing of formal grammars.

I found the chapter on futures and promises fascinating, these are relatively new ways to handle concurrency and parallelism which I hadn’t been exposed to before, I notice they have recently been introduced to Python.

Chapters on type parameters, advanced types and implicit types had me mostly confused although the early parts were straightforward enough. I’d heard of templating classes and data strctures but as someone programming mainly in a dynamically typed languages I hadn’t any call for them. I turns out templating is a whole lot more complicated than I realised!

My favourite chapter was the one on collections – perhaps because I’m a data scientists, and collections are where I put my data. Scala has a rich collection of collections and methods operating on collections. It avoids the abomination of the Python “dictionary” whose members are not ordered, as you might expect. Scala calls such a data structure a HashMap.

It remains to be seen whether reading, once again, chapters on object-oriented programming will result in me writing object-oriented programs. It hasn’t done in the past.

Scala for the Impatient doesn’t really cover the mechanics of installing Scala on your system or the development environment you might use but then such information tends to go stale fast and depends on platform. I will likely write a post on this, since installing Scala and its build tool, sbt, behind a corporate proxy was a small adventure. 

Googling for further help I found myself at the Scala Cookbook by Alvin Alexander quite frequently. The definitive reference book for Scala is Programming in Scala by Martin Odersky, Lex Spoon and Bill Venners. Resorting to my now familiar technique of searching the acknowledgements for women working in the area, I found Susan Potter whose website is here

Scala for the Impatient is well-named, it whistles through the language at a brisk pace, assuming you know how to program. It highlights the differences with Java, and provides you with the vocabulary to find out more.