blog background image

Predictive Analysis on Stock Prices using R Studio - A case study for Beginners

Sanket Thodge | Monday, January 8, 2018 | Category: Sensex,Analytics,R & SIP

Sensex,Analytics,R & SIP

Sensex, NSE (National Stock Exchange), BSE (Bombay Stock Exchange), Stockholm, NASDAQ, Dow Jones, Nikkei - these all words lights up trade-stocks-PE ratio-money volatility in our brain; INVESTMENT.

I am rigorously studying the whole system of stock market from the last few months. I am a Hadoop Developer (or BigData Developer) working in one of the India's biggest MNC and geek by nature. In the whole process while understanding the system of stock market I found an investment method called SIP (Systematic Investment Plan).

SIP is for those traders who are willing to invest in the stock market, but cannot due to either of the reasons - lack of time, lack of knowledge of stocks or lack of a corpus (money to invest). I will try to explain SIP in brief.

SIP is used by investors to invest a specific share of money (resource allocation) in respective stock or portfolio (set of stocks of different companies) periodically. Periodically might stand for every month, every quarter or every year. It is best type of investment for a salaried person who is not willing to take a lot of risk, is not a regular trader and want strong corpus in long-term (10+ years). SIP is benefited by money-volatility and compounding on your invested amount (ignore the jargons).

So, now the use case is that a person named John wants to invest Rs. 100/- in his portfolio for every month. John's portfolio contains share of only one company (the guy is poor) - ABC Corp. John wants to continue investing in this share for next 3 years i.e. 36 months.

John will be buying shares of this company for next three years and will sell them thereafter. So, logically John will be making more profit when he will have a number of possible shares in these three years. More the number of shares more will be his profit.

This implies that when John will be investing his Rs. 100/- every month he should invest his amount when the shares of ABC Corp have less price in every month.

I will simplify it! John have Rs. 100/- in his kitty. He can decide the date from the month when he wants to invest. Say he decided to invest the amount on every 15th for the next 36 months. John has one more friend Michael, whom he has suggested to buy same share, but Michael decides to invest on the 5th of every month. He believes it's his lucky number.

Logically from a layman perspective, this should not make any difference. But it does!!

After 36 months John end-up having 72 shares, whereas his friend Michael ends with 108 shares. That's a huge difference. Why so?

It is because on the 15th of every month shares of ABC Corp have a steep rise in prices, whereas on the 5th of every month, traders sell the shares, bringing price of each share at its least price in that respective month.

So when John was investing Rs. 100/- as the price of one share was high, say Rs. 50/- each he was able to buy only 2 shares each month. On the contrary, his friend Michael was buying when price were less. Say Rs. 33.33/-. So, his friend managed to get 108 shares i.e. 3 shares/month. Here we have considered the mean value of the share; we have taken total amount invested by John or Michael divided by number of shares bought by respective person.

This was the same case scenario for me. I therefore downloaded the data from the archive for the past 25 years of BSE for all listed companies. 1st Jan 1990 to 1st April 2015.

Data included the date of the stock market, opening, its highest intraday, lowest intraday and closing in CSV (comma separate value) format.

I uploaded it on R Studio and then started working using the best statistical tool available: R.

I am pasting a few lines from the dataset as how data were appearing.

I simply extracted dates from each row of the 'Date' column and pasted it beside 'Close' column. Total dataset consisted of around 6048 rows. New dataset looked like -

After getting this I grouped all rows in the dataset according to values in 'Extracted_date' column. Not only that, but I also took the mean (average) of all the values in 'Close' column. And with that I was left with only 31 rows which were 31 days of each month with only 'Close' column and mean of those values. New table looked as follows -

Uploaded the above data from R Studio to Excel to get following graph.

From the new manipulated data I found out that on 4th, 5th and 26th of every month stock prices were least. So, I would suggest to invest your Rs. 100/- on one of these days.

But the stock market is full of uncertainty. And 25 years is a huge period to make any predictions. Also a ruling government and its economic policies play a major role in the development of the stock market. Therefore, I trimmed data as per the ruling government. First Indian National Congress (Congress) and then Bhartiya Janta Party (BJP).

Rows were trimmed from 6048 rows to 2465 rows. I extracted only those days when Indian National Congress was ruling party. The date was 10th May 2004 to 25th March 2014.

I followed the same method to get above output and then trimmed the dataset. And got the following output-

This new graph shows us a very different picture. Here we are getting 2nd and 9th as the dates for investment.

And this is the last graph for the Bhartiya Janta Party (BJP) ruling tenure. From 27th March 2014 to 1st April 2015-

If observed the trend we can find that as per our case scenario 7th and 15th are the dates when we should invest in SIP during BJP tenure.

Let's check all the manipulated data relative to each other.

Whole Sensex tenure - 4th, 5th and 26th

Congress tenure - 2nd and 9th

BJP tenure - 7th and 15th

We got 7 different dates as per different time duration. Out of which 5 dates falls before the 10th of every month. But even now we cannot choose that perfect one date we want to invest money into. Here I have taken data of all companies that are publicly listed. But if we need a perfect date for our portfolio, we need to consider data on only those companies that we have in our portfolio or are considering for future growth.

In the following graph I have compared all three graphs by converting them into percentage. Investing on the 7th of BJP's tenure would have given the same returns as of investing on Congress's 9th and 2nd.

This is a relative graph so it will show us a better picture.

Believing on this analytics I myself will be investing my money through SIP as per my portfolio. If you want any guidance, then feel free to contact me.

This experiment doesn't end here. We can run predictive analysis to know which stock to invest when to get the best returns using pattern matching.


UPDATE: My model predicted 6th and 7th market will fall and on 8th it shall surge. Sensex plunged on 5th, 6th and 7th May and surged on 8th.

My prediction proved above expectation and might be a milestone in precisely predicting rise and fall of Sensex.

I am providing graphical analysis of last 2 weeks about what I predicted and what actually happened.

Date---Predicted Index---Actual Index

4-------moderate rise----minor fall
5-------heavy fall----------heavy fall
6-------heavy fall----------moderate fall
7-------heavy rise---------heavy rise
8-------moderate rise----moderate rise
11-----moderate rise----moderate rise
12-----minor fall-----------heavy fall
13-----moderate fall-----moderate rise
14-----minor fall-----------minor rise
15-----minor fall-----------minor rise


Mail me your portfolio or queries or if you have any challenging analytical task on [email protected]

# +91-9028398260

Website -

Sanket Thodge

Sanket has an expertise of Hadoop and Analytics and experience in banking and ecommerce field. He has worked for a major ecommerce toys manufacturer and with one of the top banks in world.
He provides training in Hadoop Admin, Hadoop Development, Cloudera Admin, Hortonworks Admin, AWS Developer, Data Science in R, and also Project support/ Consultancy to Digital (BigData, Cloud, IOT, Data Science) Projects.