# Playing with Data – Part 1

There are tons of things we can do with data. This is the first in a series of posts on what we can do with publicly available data and share some simple tricks with examples using Excel. (I do endorse Excel as it is a simple, fast and brilliant tool that can churn lower volumes of data quickly, and yes, it has earned me a few brownie points in my past).

A simple attempt at Monte Carlo

Google ‘Monte Carlo’ and most of the sites point to an example on how the value of Pi can be estimated using the Monte Carlo method. However, I won’t delve into what the method is or what Pi means or how to Google.
Below is an extract from Wikipedia for a quick read:

Let us attempt to do this in Excel, and see how a combination of “Randbetween” function and Pythagoras theorem helps us estimate the value of Pi.

Plot the circumference of 1/4th of the circle

-          To plot the circumference of the circle, first generate the co-ordinates of a large number of points on the circumference.
-          Any point on the circumference is made of two components – X and Y co-ordinates
-          Easiest way to plot these points is to simply ‘generate these points’
-          Remember trigonometry? The points can be simply thought of as formulas

• X = R*Cosine(angle)
• Y = R*Sine(angle)

-          This means, if we can take the cosine and sine values of a large number of angles (data points) between 0° and 90° then, bingo! we can plot a quarter circle.
-          Excel has an inbuilt function for calculating Cosine and Sine values, but then, you need to supply the angle in radians! (But worry not, as Excel has a Radians function, which comes to our rescue)
-          In this example, the value of the radius is 1000. So, the formula is -

• X = 1000 * COS (RADIANS ( ANGLE in DEGREES))
• Y = 1000 * SIN (RADIANS (ANGLE in DEGREES))

-          Now, for about 180 data points, I should increase the degree by 0.5, starting at 0°, all the way up to 90°.

Snapshots of the above steps:

-          Enter 0 in Cell A2
-          Enter “=A2+0.5” in A3
-          Drag drop the column A3, till you hit 90°

-          To calculate the X co-ordinates on the circumference for each Angle, simply enter “=1000*COS(RADIANS(A2))” in Cell B2
-          Drag and drop to calculate the X co-ordinates of all circumference points for every angle in Column A

-          To calculate the Y co-ordinates on the circumference for each Angle, simply enter “=1000*SIN(RADIANS(A2))” in Cell C2
-          Drag drop to calculate the Y co-ordinates of all circumference points for every angle in Column A

Want to see how it looks like in the Scatter plot?

-          Select the X, Y data (Cell B2, C182)
-          Click on Insert in the main TAB menu (Ribbon)
-          Select Scatter plot, Scatter with only Markers option
-          Voila! You can see the outline of the quarter of your circle, in the first quadrant
-          Play around with options to customize the marker, colour, fill, labels, legends etc.
-          You can also delete the intermediary lines, to make the graph look cooler

Plot the Square

-          The simplest way to do this is plot (0,0), (0,1000), (1000,0) and (1000,1000) and insert a line graph
-          But then, lets complicate it by doing a scatter plot
-          We need the co-ordinates of all points of two sides only

• One side will be X= 1000, Y = 0 through 1000
• The other side will be X = 0 through 1000, Y = 1000

-          Given that we have 180 data points for the circumference, let’s do the same number of data points for the two sides of the square
-          The first points are X=1000, Y=0
Enter the values in Cells D2, E2
-          Now, we need the value of D (X) to remain at 1000, while the value of E (Y) changes till 1000, and then we need the value of (Y) to be at 1000 and the value of (X) to decrease to 0

We achieve this with the following formula:

Calculate the value of Y. Y needs to increase till 1000, and then stay at 1000. The following formula in E3 dragged till the end will do the trick:

“=IF(AND(E2<=1000,D2=1000),IF((E2+1000/90)>1000,1000,E2+1000/90),1000)”

Calculate the value of X. X stays at 1000 till Y reaches 1000, and then, decreases to 0. The following formula in D3 dragged till the end does the trick

=IF(E3<1000,1000,D2-1000/90)

Now, to see the Square and the Circle (1/4th of the circle) together, follow these steps below:

-          Right click on the scatter plot and select “Select Data”

Select the series that you want to plot (Column D and E)

Now you can view the square as well in the scatter plot!

Generating random numbers to spray on the graph

Next, we need to generate random numbers to spray on the graph. The ratio of the total number of random number plots within the square to the dots inside the quarter of the circle will be in the ratio of 4:Pi . But then, counting the number of dots inside the circle is going to be tedious, so we will employ a simpler way to count the same.

To generate random dots / spray inside the square:

-          Enter “=RANDBETWEEN(0,1000)” in Column F for X co-ordinates
-          Enter “=RANDBETWEEN(0,1000)” in Column G for Y co-ordinates
-          Now, drag the formula up to say, 10,001 rows to generate 10,000 random spray points

Include the F and G series in the graph.

-          Right click on the scatter plot graph
-          Click ‘Select Data”
-          ‘Add’ data columns F and G

Now, if only we can spray the dots in different colours to know which dots fall outside the circle and which ones are inside!

Formatting the spray colour

Unfortunately, we can’t have different colours based on certain conditions in the same series for a scatter plot. So, the easiest way out is to split the random spray plots into two series – Series within the circle and Series outside the circle. We use the good old Pythagoras theorem for the same.

Using the Pythagoras

Length of the hypotenuse of a right angled triangle is the square root of the sum of squares of the other two sides.

-          Assume we draw a right angled triangle with (0, 0) as one end, (Random X, Random Y) as the second point, and (0, Random Y) as the third point.
-          The distance from (0,0) to (Random X, Random Y) is simply:

Square Root of [(Random X) ^2 + (Random Y) ^5]

-          If this value is less than 1000, it means the spray point is inside the circle, and if it is greater than 1000, then it is outside the circle. Simple isn’t it?
-          Insert the formula in the H column, and then, drag and drop for all the random numbers

Cell H2: “=SQRT(F2^2+G2^2)”

Now, to split the Random spray series into two, split the columns F and G into ‘Points inside the Circle’ and ‘Points outside the Circle’. Formulas as given below:

Now, we add these two series on our scatter plot and the spray points are coloured differently!

Estimating Pi

-          The above round about stuff we did was just to graphically represent what it looks like to generate Random numbers and plot it on a graph, using EXCEL.
-          None of this is actually required if we have to simply estimate Pi from random numbers, but doing it through counting the dots is not going to work
-          We simply count the number of random points with distance less than 1000 for the count inside the circle, and the total number of random points is the total number of points inside the square.
-          The following formulas help in doing the same, and also knowing the deviation of the estimated Pi from the actual value of Pi.

Applying Monte Carlo in the context of Banking Intellisense: Monte Carlo and related methods are used in the application of Risk Measurements. For some of the key parameters like Currency Valuations, Portfolio Loss and Economic Capital (Value at Risk), Monte Carlo method can be applied to model the outcome / predict the values with probability of occurrence. If an asset portfolio contains historical default rates and the dimensions of customers who defaulted, using the Simulation Method, the mostly likely portfolio returns can be predicted and capital can be adjusted to provision for expected loss in the portfolio.

Full Excel working is captured below:

Playing with data – Part 1

This entry was posted in Analytics, Business Intelligence, Compliance, Data Warehousing, Predictive Analytics and tagged , , , , , , , . Bookmark the permalink.