A Simple Visual Random Number Test

Random numbers are very difficult to test since there are so many ways a series of numbers can exhibit non-random properties. Correlation between consecutive number in a random sequence and an uneven distribution of numbers across the entire sequence are just two important tests.

I was writing a simple random routine in C# and made the mistake of placing the declaration of the Random object inside the loop:

private ListRandNumbers()
var numbers = new List();

for (int i = 0; i < 100000; i++)
var rnd = new Random();
numbers.Add(rnd.Next(1, 1000));

return numbers;

In .NET this will produce a very poor series of random numbers since by default the Random object is automatically seeded with a DateTime and so if it is placed within a loop it will be created and seeded multiple times with identical values. Developers normally write tests to check an application’s data quality , but writing a battery of statistical tests to check a simple random number routine is working correctly is overkill.

A neat solution to this is to use a simple visual inspector with plots the values on a two dimensional surface. The below ASP.NET MVC controller would output the a bitmap of the above random number routine.

public ActionResult RandomNumberImage()
var path = @"c:usersyourfilepathfilename.bmp";
Bitmap bm = new Bitmap(1000, 1000);
var randNumbers = RandNumbers();
var randNumberCount = randNumbers.Count();

for (int i = 0; i {
bm.SetPixel(randNumbers[i], randNumbers[i+1], Color.Black);

bm.Save(path, System.Drawing.Imaging.ImageFormat.Bmp);
return File(path, "image/bmp");

This outputs the below visualisation:

Definitely not a random data set! Now we can correct the mistake and instantiate the Random object outside the loop:

private ListRandNumbers()
var numbers = new List();
var rnd = new Random();

for (int i = 0; i < 100000; i++)
numbers.Add(rnd.Next(1, 1000));
return numbers;

Now lets look at the output of the correctly implemented routine:

This appears much better. In addition to a broader scatter, note that there appears to be more data points, this is because in the previous routine numbers were often identical and hence plotted on top of eachother.

Visualisation can also help surface more subtle issues in random number generation. The below image is from a data set generated using the PHP random number generator on a Windows machine:

Source : random.org

The relationships between these numbers may be quite difficult to discern using statistical tests but for the above image there are clearly some relationships within the data set and hence it would not be random.

Note that this is not a robust statistical test, the inbuilt random number generation in .NET is generally considered to perform poorly (as is the case with Excel random numbers)

Correlation – The Need For ‘Stationary’ Data

How correlated are Intel and Google’s stock  prices? The below graph shows the daily close of prices of both from mid 2009 to mid 2011.

GOOG INTL Stock Prices

From first impressions, it certainly looks like the two price series move in tandem and should have a high correlation. Indeed it turns out that correlation coefficient of the two stock price series is 0.88 , indicating a high 88% correlation  between the Google and Intel stock prices.

However, this is totally misleading – in reality the correlation between the two is a mere 36%.

Correlation, in common with most time-series data analysis techniques requires ‘stationary’ data as an input. To be stationary the data must have a constant variance over time and be mean reverting. Stock price data (and many other economic data series) exhibit trending patterns which violates the criteria of stationarity. Transformation to stationary data is quite simple, however, as converting the daily price closes into daily returns will normally be sufficient. The return series of a stock is usually considered as stationary for time series analysis purposes, since it is mean-reverting (as the daily returns oscillate above and below and constant mean) and has a constant variance (the magnitude of the returns above and below the mean will be relatively constant over time despite numerous spikes).

The daily series of returns (ie percentage price changes) for both Google and Intel stocks can be seen below. Not that there is no trend to the series which moves above and below a constant mean – which for daily stock price returns is almost always very close to 0%.

GOOG INTL Stationary Series

The requirement for stationary data in calculating correlation can also be explained intuitively. Imagine you were looking to hedge a long position in Google stock with a short position in Intel, you would want the return on the Google stock to the match the return on the Intel stock. Hence correlating the prices would be irrelevant, in such a scenario you would want to know the correlation between the two sets of returns as this is what you would essentially be attempting to match with the hedge.

 Correcting For Drift And Seasonality

In correlating stock price data, transforming the raw price data to returns is usually considered sufficient, however , to be more rigorous any additional trends could be stripped out of the data. Most models of stock price behaviour include the risk free interest rate plus a required rate of return as a constant drift over time – the argument being that stock investors require this return for holding the stock and over the long term the stock should deliver that return. Thus, the this return could be backed out of the series before calculating correlation. In practice, since we are dealing with daily returns, the long term drift as a minimal impact on the calculation of correlation.

Some economic data series such as durable goods orders exhibit strong effects of seasonality. When raw durable goods orders data is transformed into percentage changes, it is indeed mean reverting with a constant mean. However, the series will still not be stationary due to the strong seasonality effects – orders will be much much higher during the Christmas shopping season and so the percentage changes will always spike at the time resulting in a non constant variance.

Seasonality can be dealt with by cleaning the data series using another series which exhibits the same seasonality. In the case of durable goods orders, the raw CPI index (note: not the percentage change in CPI) would be such as series since the CPI index will typically spike during shopping seasons. Thus the durable goods orders could be divided by the CPI to arrive at a ‘deflated’ durable goods series which could then be made stationary by transforming it into percentage changes between periods.






The Myth of Card Counting

There are lots of articles info on how to count cards but v little on the returns. Ever wonder why there is so little, or why casinos are relatively unconcerned with card counting – there are card counting books in casino gift shops and a few innocuous rule changes would probably wipe it out altogether. The underlying reason is that far from being a ticket to wealth, the returns from card counting are atrocious.


First why does card counting work at all? In Blackjack the dealer has the advantage of collecting a player’s bet whenever the player busts regardless of the dealers outcome. The player has two main advantages, a payout of 3/2 for Blackjack (an Ace / Ten pair) and the ability to stand on any card combination whereas the dealer must hit up to 17.

The result of this is that a deck with a heavy concentration of high value cards greatly favours the player. High cards increase the likelihood of busts, and a player can avoid these by standing on low values.  For example, a player with a card combination equaling 12 facing a dealer’s 6 show card should stand as the next card is likely to bust the player, the dealer however, will have a high likelihood of busting as he will be required to take at least two cards from a deck loaded with high value cards. Thus the effect of high cards is more to bust the dealer than win the player hands. High cars also increase the chances of blackjack for the player.

It would therefore be advantageous to know when there is a high proportion of high value cards in a deck so a player can increase bets and stand on lower values.

Basic Strategy

The starting point for any blackjack counting system is basic strategy – a set of rules determining when to hit, stand, split and double-down based on the player’s cards and the dealer’s show card. This can be represented as a grid or listing of rules. This shouldn’t be a daunting task although you need to be almost flawless at this – only one error per twenty shoes is permissible.


Next is learning a counting system. Hi-Lo is the most popular system in which a value of one is added for a cards of value 6 and under, one is deducted for 10 value cards and aces, and 7,8,9 are ignored. Thus for a sequence of Jack, 8, 3, Ace , the count would be -1 (ie -1, 0, +1, -1).

This appears extremely simple but requires a great amount of effort since it needs to be executed almost flawlessly (only one or two counting errors are permitted over a six deck shoe). This typically takes several hours per day for two to three months.

This count (‘running count’) only gives the excess number of high cards over low cards, however what we need is the proportion of high cards relative the to the remaining cards. So the ‘running count’ needs to be divided by the number of remaining decks in the shoe to arrive at the ‘true count’. To do this the counter also needs to keep a count of the number of cards played (a rough approximation is usually sufficient).

Once you have mastered card counting, you need to learn the modifications to basic strategy. Since the strategy will be very different in situations where the true count is high. That’s a solid three to six months work, spending of several hours per day.

Calculating Returns

What advantage does all this effort give a card counter over the dealer? A shade over 1%. Hardly juicy, but lets work through the returns.

We will look at this as an investment and so work backwards from the bankroll. Say you have $100,000 to invest in the bankroll. Your betting unit (i.e. the amount your bet is increased for every +0.5 in the true count) should be $200 – this isn’t an exact number but to avoid the fatal blow of wiping out the betting unit 0.1% to 0.5% of the bankroll, in this example I went with 0.2%.

With perfect basic strategy and perfect card counting the expected returns will be the betting advantage multiplied by the betting unit. Thus, in this case the expected return on a hand would be 0.01 x $200 = $2.

Assuming you can play at a rate of 50 hands per hour that gives you $100 per per hour. Its possible to play more hands per hour, especially playing one-on-one with the dealer, but this increases the chance of being detected and also leads to counting errors due to the speed of play.

Next, to set up a confidence interval to estimate the distribution of returns over time. The standard deviation of a bet in blackjack is 1.1 (slightly larger than the bet size due to the increased payout in the event of blackjack). Assuming a normal distribution we can therefore say that the earnings will be the expected return (ie mean) plus or minus three standard deviations with 99.7% certainty.

The real issue for blackjack card counters is that randomness dominates until you play a very large number of hands.

Take the scenario after 100 bets :
The expected return would be $2 x 100 = 200. The standard error increases with the square root of trials therefore the standard error after 100 trials is 10 x (1.1 x 200) = 2200. Thus we can say with 99.7% certainty that after 100 hands of blackjack the return should be $200 plus or minus $6600 (or between -$6400 and +$6800).

Even 100,000 hands doesn’t provide a guaranteed return.  The expected return after 100,000 hands  is $200,000 plus or minus $208,560 (SQRT(100,000) x 220 x 3).

You really need to be approaching half a million hands of blackjack to be deep into positive returns. After 500,000 hands you would be 99.7% sure of having a return of $1,000,000 plus or minus $466,620. Alas 500,000 hands would take about 5000 hours or 625 days of playing 8 hours per day.

Now To The The Real World

Unfortunately the returns only get worse once you start adjusting for the real world. Card counting is actually very obvious, even an inexperienced dealer can quickly identify a counter. The reason is that the betting profile of a counter is totally different to any other player. Dealers will all have a working knowledge of basic strategy and will know a player playing decent basic strategy – which a card counter will do.

The problem is that a card counter will suddenly deviate from playing perfect basic strategy with a low bet to a high bet with major deviations from basic strategy. Some situations in particular are a major tell. Never splitting tens is not just basic strategy but also common sense, however, if when the true count and the dealer show card is a 5 or 6 then a counter must split tens.

The best way to avoid detection is to play as a team with one player as the counter who plays basic strategy but no more. Once the count is high the counter will signal the high-roller who plays a larger bet and does not follow basic strategy. This, however, vastly dilutes returns since the 1% advantage is divided between two players and one player will be playing at approximately a 0.5% disadvantage (although on a lower bet).

Given the large number of hands required and the necessity of playing with more than  one player, card counters usually play in large teams. However, the meager 1% advantage over the house means there is very little return to be shared around.

Hence, most card counting teams have a hierarchy – investors/team leaders, senior player and junior players. With the players being paid a very small return for their work.

In practice the real trick of card counting is finding smart, energetic workers willing to put in six months of training (probably unpaid) and then work back-breaking shifts of 8-12 hours a day at the tables for a shade above minimum wage.