Contact Me

Use the form on the right to contact me.

 

           

123 Street Avenue, City Town, 99999

(123) 555-6789

email@address.com

 

You can set your address, phone number, email and site description in the settings tab.
Link to read me page with more information.

P7060368.JPG

Explorations

Around the world in ... how long?

Andrew Elliott

In around 200 BCE Eratosthenes, a Greek mathematician, geographer and librarian in Alexandria, calculated the size of the Earth.

He knew that in Syene, about 20 days' travel away, at midday on the summer solstice, the sun would be directly overhead. A shaft of sunlight would shine straight down a well shaft, and vertical columns would cast no shadows.

At the same time, in Alexandria, he measured the angle between the sun and the zenith. The angle he measured came to around 1/50 of a circle. He knew Syene to be, in his terms, 5000 stadia away. (A stadion was 125 paces, defined variously as between  157 and 185 metres. We now measure the Alexandria-to-Syene distance as 840 km, equivalent to a stadion that measures 168 m).  

Using a little geometry (which, after all, means "Earth-measuring"), he was able to calculate the circumference of the Earth. Depending on which definition of stadion you think he used, his answer (250,000 stadia) comes out to between 38,250 km and 46,250. Taking 50 times the distance from Alexandria to Syene (now modern-day Aswan) gives a result of 42,000 km. The true pole-to-pole circumference of the Earth is very close to 40,000 km. Eratosthenes pretty much nailed it. Remarkable for 2200 years ago.


If the distance from Alexandria was a 20-day journey on foot, then Eratosthenes could have worked out that that to walk all the way around the world (assuming such a route was possible) would have taken 1000 days. Using our modern measurement, that means 40 km a day, quite reasonable for a fit adult over good terrain.

As the title suggests, in Jules Verne's novel Around the World in 80 Days, his hero Phileas Fogg manages his circumnavigation in 80 days. In fact, as the map below shows, virtually all of the journey occurs north of the equator. The most southerly point is near Singapore, which is pretty much on the equator. The indirect route lengthens the journey. The fact that the circumnavigation is not at the Earth's widest latitude shortens it. These factors more or less balance out and the journey as mapped comes to approximately 38,000 km, just 5% less than the circumference of a great circle. Managing this in 80 days meant that Fogg made an average speed of 475 km per day.

Now, Phileas Fogg (and don't forget his faithful companion Passepartout) made it in 80 days, but how long would it take to circle the Earth using other means of transport? Here's a table showing how long it would take to travel 40,000 km in different ways:

To travel around the worldat avg speedfor would take
On foot5 km/h8 h / day1000 days
By car100 km/h8 h / day50 days
By commercial plane800 km/h20 h / day2 1/2 days
By military plane (B52)940 km/hcontinuous42h 23 m
International Space Station26,000 km/hcontinuous92 m

But to close, let's turn to Shakespeare. In A Midsummer Night's Dream, he has Puck say: I'll put a girdle round about the earth In forty minutes. That means Puck would be traveling at 1,000 km each minute: that's a megametre per minute.

A Trip to the Theatre

Andrew Elliott

The Greek Theatre in the town of Taormina in Sicily has a spectacular view of the bay and of Europe’s highest volcano, Etna. The theatre itself dates to the third century BCE and although it's called the “Greek” Theatre, is largely the work of the Romans (the giveaway is that it is predominantly brick-built).

It’s regularly used as a concert venue, and the descriptive material suggests that it originally had a capacity of 5000. Is this claim credible? Let’s see if we can make an independent estimate of our own to compare.

As the image shows, the seating is arranged in sections, seven of them in total. So, trying to arrive at our own estimate of the capacity, we can tackle the simpler task of forming an idea of the capacity of one of those sections (and then later multiply by seven). Let’s take a closer look at one of the sections:

Taormina2.png

Counting the rows of seats (the lower ones are original stone, the upper ones are wooden bleachers), we get to 26 rows of seats currently in place. But there is some evidence of further, unrestored structure lower down, so we can guess that there might have been a further block of perhaps 12 rows there, for a total of 38 rows.

How many people might sit on one of those rows? More on the back rows, fewer in the front, but a reasonable figure for one of the middle rows might be 15 people.

So, we have 7 sections of 38 rows, each accommodating (on average) 15 people. Multiply those together to get 3990. It’s the right order of magnitude, but somewhat short of the 5000 claimed: perhaps that’s an optimistic claim?

But hold on! Those seven sections don’t make a complete semicircle. There is, in fact, on each side, space for a further section which would bring the total to nine sections, and that would make the total number of seats 5130. We can probably conclude that 5000 is a fair estimate.

Beware the Giant Ants! Or not ...

Andrew Elliott

In the 1956 horror movie, Them, the plot revolves around "atomic testing in 1945 [that] developed ... dangerous mutant ants". 

The relationship between length, area and volume is sometimes called the "square-cube law": as the linear dimension of an object increases, so the surface area increases by the square of the multiplier, and the volume increases by the cube of the multiplier. The square-cube law explains why this great cliché of horror movies is so improbable. 

From the movie poster these ants look to be easily four metres in length, which means they must be around 1000 times longer than the 4 mm ants we are familiar with.But, in accordance with the square-cube law, a thousand-fold increase in length would mean a million-fold increase in measures of area and a billion-fold increase in measures of volume. Since the strength of the ants' legs would relate to the area of the cross-section of their limbs, while the mass of their bodies would relate to their volume, it follows that the ants' bodies would now be 1000 times too heavy for their limbs, and they would simply collapse under their own weight. The same would apply to the mass of their internal organs, now 1000 times too heavy to be contained by their chitinous "skins". Visualisation is left as an exercise to the reader.

 

How Many Tennis Balls Does It Take To Fill St Paul's?

Andrew Elliott

There's a report available on the internet on the acoustic characteristics of St Paul’s Cathedral in London, and it has this little snippet of information: the interior volume of the cathedral is 152,000 m3. Is that a credible number? Let’s use a little bit of solid geometry to do some rough-and-ready cross-comparison.

A quick google at some pictures and measurements tells me that to a very rough approximation, the interior main body of St Paul’s can be approximated by a cuboid, roughly 50m wide, 150m long and 30m high. The famous interior Whispering Gallery is at 30 height and the exterior Stone Gallery around the dome is at 53 metres.

Based on this I don’t think it’s too unreasonable to imagine a simplified shape with the following interior dimensions (if you pushed all the interior stonework to the edges). Width 40m x height 25m x length of 140m, giving a total of 140,000 m3. The dome is around 30m in diameter and together with the cylindrical drum it sits on adds another approximately 30m to the interior vertical height. Working this through gives about another 18,000 m3 for the dome. We’ve reached a total of 158,000 m3 which is enough to convince me that the figure that the acoustic engineers used is probably close enough.

Now for the tennis balls. If you tumble a load of balls into a container, they won’t completely fill the space. If you pack them super-carefully you can bring the proportion of space filled to around 78%, but if you just let them settle for themselves, you can expect around 65% of the space to be filled. A tennis ball of 6.8cm diameter will have a volume of around 165 cm3, but when loosely packed, will occupy a volume of around 250 cm3, roughly a cupful. This means a box with a volume of one cubic metre will hold around 4000 tennis balls (not allowing for the “edge effect” which stops them from being so closely packed around the edges). And that means that the interior of St Paul’s will hold 152,000 times as much for a total of 608 million tennis balls.

But what if, instead of tennis balls, we used pool balls? With a diameter of 5.715 cm, their volume is just about 60% of the volume of a tennis ball. You can see where I’m going with this, can’t you? A metre cubed can accommodate 6700 pool balls, and if we multiply up, we get to 1,018,400 pool balls to fill St Paul’s. And that’s one way to visualise a billion.

How Much Did King Kong Weigh?

Andrew Elliott

The Empire State Building always brings to mind that iconic image of King Kong atop the skyscraper, swatting away biplanes as he clutches Fay Wray in his massive hand. But how massive? How much would the 1933-version of the mighty Kong have weighed?

King Kong, the 1933 movie

King Kong, the 1933 movie

IMDb provides some relevant information on scale. Apparently, the size of the enormous ape varies from location to location and scene to scene. The publicity described Kong as 50 foot tall, but the sets in the jungle of his home island were consistent with an 18ft beast. The models for close-up photography of his hand were built to a scale that would fit a 40 ft animal, and the New York scenes were consistent with a 24 foot scale. Since it was the image of Kong on the Empire State Building that sparked this thought, let’s go with that figure, and treat him as 7.32 m tall.

If we take a Western Gorilla as the model for Kong when calculating height/weight ratios, we can scale up the height and use the square-cube law to scale up the weight. A very large gorilla of this species would be around 1.8m high and would weigh about 230 kg. So Kong was just over 4 times as tall as a very large gorilla, and using the cube of that ratio to scale his weight, we need a factor of 67.25 to give us a final mass of just under 15,500 kilograms. Does this seem reasonable? Three times as big as an elephant? I guess it does.

It’s entirely feasible that Kong would be scornful of the aircraft, since the planes used in the scene were Curtiss O2C-2 'Helldivers', which have a gross mass of a little over 2000 kg, around one eighth of his weight. But those annoying planes, equipped as they are with machine guns, finally cause the mighty Kong to lose his grip and tumble the 381 metres (52 times his own height), to the street below. And compared to that iconic building, the giant ape comes in at less than 1 / 20,000 of the mass of the Empire State Building itself, which is estimated to weigh 331 million tons.

Plain Talking About Numbers

Andrew Elliott

I've recently been taking forward an idea that's been in the back of my mind for a while: www.IsThatABigNumber.com is a website that has a simple aim: to put big numbers in context, and in so doing, start to develop a more intuitive feel for them.

While I can intellectually understand the meaning of large numbers, typically written in scientific notation (e.g., 2.5 x 10^8 or expressed in billions and trillions), that's not quite the same as having a "feeling" for very large numbers. In fact, when I really think about it, I think my sense of comfort with numbers runs out somewhere around the 1000 mark. That is, I think I can visualise 1000 items without things becoming blurry, but not much more than that.  But that is another blog post for another day.

The topic for today is how we talk about numbers. The website IsThatABigNumber.com is all about numbers, and the expression of those numbers needs to be clear and comprehensible.

Take measurements of length: I was taught about the SI system, based on meters, kilograms and seconds. Now for scientists and engineers, it's perfectly fine to talk about 4 x 10^7 m. It's convenient for calculations and it's the proper thing to do. But if I want to explain how long the equator is, I want to about 40 thousand kilometers instead.

Because? Because that's the way folk talk. Not 4 x 10^7 m; not 40 Megameters; not even 40 million meters. In my mind, things that can be measured using "meters" as the unit range from a bit less than one meter, to a somewhat more than a thousand. Half a meter? 0.5m is just fine; a 10,000m race? That's fine too. 50,000m? Nah, I'm better with 50km; 0.02m? Nope, give me 2cm or 20mm.

So, here are some of the principles that I am using for IsThatABigNumber:

For all numbers)

  • Numbers are expressed in three parts: a base magnitude between 1 and 1000, followed by a multiple, and where needed, a unit.  So the population of the world is expressed as 7 billion, not 7,000,000,000 (all those zeroes? too hard to grok)
  • The multiple used is based around powers of 1000, with the exception that ...
  • "12,500" is more natural than "12.5 thousand", so for numbers in the 1000 - 999,999 range, we make an exception and use numerals
  • But "12.5 million" is more natural than 12,500,000, so for a million and beyond, we use "*illion" words, to the limit of septillion - 10^24 (and I struggle with septillion!)
  • Beyond septillion, fall back to scientific notation starting with 10^27. In this area, the game is pretty much out of the hands of "folk", and in the hands of the scientists.

The, when it comes to units: for distance measures:

  • Meters are used between 1m and 999m
  • Kilometers are used for distances above 1km
  • Millimeters are used for distances below 1 m.

For measuring mass:

  • Kilograms are used for masses above 1kg
  • Grams are used for masses below 1 kg
  • (Thinking about using metric tons - 1000kg for bigger masses - but currently undecided)

Time is a whole separate problem, not yet addressed.  For now, years are the only units in use, but really, days and seconds seem more natural for small time periods. But then this is about BIG Num8ers.

Money is the other measure included in IsThatABigNumber.com. For now, US Dollars are the standard unit, rendered with a "$" sign.

Is That A Big Number?

Andrew Elliott

Do numbers make you numb?

If they do, have a look  here (www.isthatabignumber.com) to restore some number sensitivity. Or read on to understand why...

Way back in May 1982, Douglas Hofstadter (he of "Gödel, Escher, Bach" fame) wrote an article for Scientific American called "Number Numbness, or Why innumeracy may be just as dangerous as illiteracy". To provoke the readers to think about how they internalise big numbers, he concocted this scenario:

'The renowned cosmologist Professor Bignumska, lecturing on the future of the universe, had just stated that in about a billion years, according to her calculations, the earth would fall into the sun in a fiery death. In the back of the auditorium a tremulous voice piped up: "Excuse me, Professor", but h-h-how long did you say it would be?" Professor Bignumska calmly replied, "About a billion years." A sigh of relief was heard. "Whew! for a minute there, I thought you'd said a million years."

The absurdity of the comment arises because a million and a billion years are both so far beyond our lifespans as to make the difference meaningless from a personal point of view. In the article, he makes the case that most people have little real grasp of large numbers: not really being able to distinguish millions from billions from trillions, even though there is a thousand-fold difference between each.

But while this distinction may not give us sleepless nights when used in comparison to human lifespans, there are areas of life (national and corporate budgets, national population statistics, even hard disk sizes) where the billion vs million distinction DOES affect our lives, and many of us lack the "Number Sense" to be aware, instinctively, of the difference. Hofstadter argues that this "numbness" to numbers causes a loss of perspective, to the detriment of public debate.

Numbers in the News

The media themselves often fail to establish a proper context for the numbers in the news. Any number ending in "...illion" just ends up in a mental category called "big number".  

In November 2015, the UK public sector net borrowing was around £14 billion; debt was around £1.5 trillion.  Are those big numbers? Of course they are, but are they unexpectedly big? Are they alarmingly big? Are they big in context?

Lionel Messi earns around 25 million Euros a year.  Is this a big number? Of course it is, but how big, in context? And what context should we use? Other footballers? Other sports people? Other individuals? Corporations?

I'm a huge fan of the BBC Radio 4 programme "More or Less". This programme tears apart statistical claims floating about current debates: I think it makes a vital contribution to understanding what's really going on, and debunking inaccurate claims. And one question they will often start with, when looking at some reported statistic is "Is that a big number?".

So, Is That A Big Number?

All this is by way of introducing an idea I am currently working on - an online service to answer just that question. Enter a number, any number, and it'll respond with a bunch of relevant comparisons, to put the number in context. 

For example: in 2015, there were 72.4 million cars sold in the world. Is that a big number? the web service tells us: "One for every 100 people in the world". 17.5 million cars sold in the USA? That's "One for every 18 people in the USA" Big numbers? You can draw your own conclusions. And that's the point: to allow people to make informed judgements by putting things in context.

We'll throw in a few quirky measures too, just for fun. How long is an Imperial Star Destroyer, in terms of X-Wings? How long is a football pitch in terms of iPhones laid end to end?

It's very much in development but you can play around with what's been done here (www.isthatabignumber.com).  As you can see from all the not-yet-live links there's a lot more to come. We're hoping to use this as a hub for a variety of numeracy-related services: a number-led blog, educational resources.

So, is 25 million Euros a big number? Click this link to see:
http://www.isthatabignumber.com/itabn/compare?number=25+m+EUR

I'd love to think this could play some small role in helping people such as journalists, teachers or just the curious to better understand the numbers around us.

How Fresh is that Code?

Andrew Elliott

One of the beauties of the "R" programming language is the vitality of the user community. Language users are continuously uploading newly developed or revised versions of extension functionality. Looking at the range of packages available on CRAN, the "Comprehensive R Archive Network" I was struck by how many of these packages had recent versions resistered. So, I decided to dig a little, and at the same time give you a little flavour of quick and dirty data exploration with R. Some highlights:

Load in the package list from CRAN:

packages<- getRPackages("http://cran.r-project.org/web/packages/available_packages_by_date.html")

How many packages are in the archive?

dim(packages)[1]
## [1] 7422

Date of stalest package?

min(packages$dt)
## [1] "2005-10-29 UTC"

Date of freshest package?

max(packages$dt)
## [1] "2015-11-03 UTC"

Ooh! that's today: how many packages are fresh today?

nrow(packages[packages$dt==max(packages$dt),])
## [1] 5

And just for interest, which are they?

packages[packages$dt==max(packages$dt),c("name", "dt")]
##            name         dt
## 1      DLMtool  2015-11-03
## 2   epiDisplay  2015-11-03
## 3         MM2S  2015-11-03
## 4    quickmapr  2015-11-03
## 5  SALTSampler  2015-11-03

Ok, so let's compute the ages of the packages (in weeks). How many packages are less than 4 weeks old?

today<-max(packages$dt)
packages$age<-interval(packages$dt,today)/edays(7)
sum(packages$age<=4)
## [1] 587

Around 8%! let's look at the distribution by age - for convenience convert weeks to approximate years:

ageInYears <- packages$age / 52
hist(ageInYears, breaks=20)

More than half the packages are fresher than 1 year old; and it's easy to see that the growth took off just about 4 years ago after several years of slow burn. Let's look at the growth just over the past year (roughly 44 weeks):

freshThisYear<-packages[packages$age<=44,]$age
hist(freshThisYear, breaks=44)

I think it's clear that the takeup of R continues to accelerate, if the freshness of the user-contributed archive is any sort of guide.

"R" is for Re-use

Andrew Elliott

Previously on "R is for ..."

One of R's greatest strengths is the level of activity in the user community and the range of packages that have been developed and contributed to the general good. There are thousands of packages out there and the list grows daily. How is the young data scientist to stay on top of this flood of material?, I hear you ask. Various helpful lists have been contributed by bloggers and other commentators, such as 10 R packages I wish I knew about earlier. The CRANtastic website provides a list of the favourites based on user ratings http://crantastic.org/popcon, and r-bloggers provides a list by frequency of download in RStudio http://www.r-bloggers.com/a-list-of-r-packages-by-popularity/.

Dependencies

Another way of looking at this, is to look at which packages are most fundamental to the broader R community - which packages do package authors build upon. The CRAN repository provides structured data on each package: among the data provided are "Depends", and "Imports", which list the packages each is built upon. It seemed a fun thing to see which packages were most depended upon, which were the most fundamental in the R ecosystem.

First-Order

For this exercise I didn't bother distinguishing between "Depends" and "Imports" - I wrote a simple routine to take the list of packages from CRAN, and then for each, to harvest from the relevant page on the CRAN website, the contents of "Depends" and "Imports" properties, and stash those package names in a table which I called "antecedants". The table has columns "self", the package in question, "ante", the antecedant package and "order" the depth of the dependency.

        options(width=100)
        source("Rpackages.R")
        load("Packages.Rda")
        load("Antecedants.Rda")
        head(antecedants)
##         self         ante order
## 1  cleangeo            sp     1
## 2  cleangeo         rgeos     1
## 3  cleangeo      maptools     1
## 4     smerc  SpatialTools     1
## 5     smerc        fields     1
## 6     smerc          maps     1

That gave the first order dependencies, and here are some interesting glimpses into that table. I used table to count the order-1 dependency for each antecedant, to see which are most re-used, and then sort that table to reveal the top ten.

        ante1<-table(antecedants[antecedants["order"]==1,]$ante)
        anteSorted1<-ante1[order(ante1, decreasing=TRUE)]
        length(anteSorted1)
## [1] 1458
        dim(antecedants[antecedants["order"]==1,])
## [1] 10330     3
        head(anteSorted1, 10)
##
##     MASS     Rcpp  ggplot2     plyr   Matrix  lattice  stringr reshape2       sp  mvtnorm
##      374      370      321      266      183      173      157      151      146      142

So something over a thousand packages are in some way re-used, for a total of over 10,000 order-1 dependencies, and the most popular include many of the usual suspects like ggplot and plyr.

Going Deeper

But just looking at the first level is not good enough. If your package builds on, say, ggplot2, which has among its antecedants, plyr, then of course plyr is an antecedant of your package too, but a second-order antecedant. So we need to get recursive, and we can do this just by analysing the antecedants table. So we can build the order 2 antecedants table based on the order 1 table; and the order 3 from the order 2, and so on, until we finally bottom out and reach the maximum depth. Along the way we need to make sure we don't double-count - if a packages uses ggplot2 and also uses plyr directly, we don't want to be double-counting plyr.

So for example here are the most frequent order 3 dependencies.

        ante3<-table(antecedants[antecedants["order"]==3,]$ante)
        anteSorted3<-ante3[order(ante3, decreasing=TRUE)]
        head(anteSorted3, 10)
##
##      lattice         Rcpp      stringr RColorBrewer         plyr     magrittr       digest
##          674          659          503          469          465          421          394
##    dichromat     labeling      munsell
##          380          380          380

And having chased this down, until there were no more levels, the winners are ...

        anteN<-table(unique(antecedants[,-3])$ante)
        anteSortedN<-anteN[order(anteN, decreasing=TRUE)]
        top10ante<-head(anteSortedN, 10)
        top10ante
##
##         Rcpp      lattice         MASS     magrittr      stringi      stringr       digest
##         1341         1119         1048          911          876          864          853
##         plyr RColorBrewer   colorspace
##          799          654          650

So what are these packages that float to the top of the list?

        packages[trim(packages$name) %in% names(top10ante),1:2]
##                name                                                            desc
## 449           Rcpp                                  Seamless R and C++ Integration
## 674           MASS   Support Functions and Datasets for Venables and Ripley's MASS
## 1489       lattice                                          Trellis Graphics for R
## 1820       stringi                          Character String Processing Facilities
## 2013          plyr                Tools for Splitting, Applying and Combining Data
## 2460       stringr        Simple, Consistent Wrappers for Common String Operations
## 2871    colorspace                                        Color Space Manipulation
## 3479        digest                  Create Cryptographic Hash Digests of R Objects
## 3661  RColorBrewer                                            ColorBrewer Palettes
## 3796      magrittr                                   A Forward-Pipe Operator for R

Oh, and ...

Just for fun, some other bits and pieces

The deepest dependency:

        head(antecedants[antecedants$order==max(antecedants$order),])
##                self       ante order
## 53579  BIFIEsurvey     lattice    10
## 53580  BIFIEsurvey        Rcpp    10
## 53581  BIFIEsurvey     stringi    10
## 53582  BIFIEsurvey    magrittr    10
## 53583  BIFIEsurvey  colorspace    10

The number of dependencies for each order maxes out at second-order dependencies, and then tails away:

        table(antecedants$order)
##
##     1     2     3     4     5     6     7     8     9    10
## 10330 14504 11647  8289  5052  2582   932   208    34     5

The most dependent packages - the ones which will pull in the greatest number of other packages:

        selfN<-table(unique(antecedants[,-3])$self)
        selfSortedN<-selfN[order(selfN, decreasing=TRUE)]
        top10self<-head(selfSortedN, 10)
        top10self
##
##  BIFIEsurvey      miceadds         immer          sirt     treescape       semPlot       bootnet
##           120           119           108           106            92            87            84
##    IATscores           RAM     diveRsity
##            83            82            81

And these highly dependent packages, what do they do?

        packages[packages$name %in% names(top10self),1:2]
##               name                                                                     desc
## 386         immer                                Item Response Models for Multiple Ratings
## 437     treescape              Statistical Exploration of Landscapes of Phylogenetic Trees
## 484   BIFIEsurvey                    Tools for Survey Statistics in Educational Assessment
## 1546     miceadds    Some Additional Multiple Imputation Functions, Especially for\n'mice'
## 1817         sirt                                Supplementary Item Response Theory Models
## 2231          RAM                        R for Amplicon-Sequencing-Based Microbial-Ecology
## 2277    IATscores                 Implicit Association Test Scores Using Robust Statistics
## 2921      bootnet                Bootstrap Methods for Various Network Estimation Routines
## 3526    diveRsity   A Comprehensive, General Purpose Population Genetics Analysis\nPackage
## 4335      semPlot       Path diagrams and visual analysis of various SEM packages'\noutput