Can we solve the tomato crisis using analytics ?

Prerit Saxena
4 min readJan 23, 2021
Tomato crisis new — India 2018

Recently came across this news article on “The Logical Indian” website.

Source: The Logical Indian

Link: https://thelogicalindian.com/news/tomato-farmers-dump-lake-tamil-nadu/

Demand-Supply mismatch is a recurring problem for many of the developing economies. A lot of businesses struggle with it day-in and day-out. But the problem aggravates when it comes to situations like this.

In the above problem, too much harvest has led to steep drop in price of tomatoes in the state of Tamil Nadu in India and the poor farmers can’t really do much about it. The stakeholders in the problem are not just the consumers, wholesalers or producers but also the thousands of families who depend on this for their livelihood. Most of the farmers are not even able to earn back the production cost of the crop.

A lot of factors have led to the growth of this problem:

  • High sale price of tomatoes in last year luring farmers to switch to tomato production.
  • Limited water supply forcing farmers to produce tomatoes as the only alternative.
  • Lack of knowledge of estimated demand of these crops in the market.
  • Bad cold storage facilities.

Now, when the situation has already turned worse and the fact that tomatoes are perishable items with short shelf life, the government and local market shareholders are left with limited options.

However, better late than never, a lot can still be done to prevent these kind of scenarios in future. But my intention in this article is to talk from an Analyst/Data Scientist point of view.

Good data generation and collection can help predicting Demand-Supply match in future. Given that tomatoes are grown all over the country and have a very short growing cycle(~ 3 months), good amount of data can be collected in a few years.

Major data variables in the data can include:

  1. Average Price of tomatoes in the market by every season.
  2. Production volume of tomatoes in the market per season.
  3. Sale volume of tomatoes in the market per season.
  4. Number of rainy days/sunny days for every season.
  5. Precipitation amount for the season
  6. Amount of water available.
  7. Amount of water utilized for production of tomatoes.
  8. Number of farmers cultivating tomatoes
  9. Cultivated land area
  10. Local transportation costs.
  11. Labour costs.

The list in not exhaustive but these can be major players in determining the price of tomatoes in the market.

Data Collection:

From the above factors, 1,2 and 3 are kept track of by local market associations, 4 and 5 are reported accurately by weather department, 6 and 7 is generally found in records of irrigation department, 8 can be obtained through a guesstimate(not exactly recorded anywhere), 9 is recorded by state tax department but contains a lot of noise, 10 and 11 can be estimated from local labour associations as well.

Adding crop insurance will help in making values of variable 8 and 9 more accurate.

Type of Analysis:

Method — 1:

Data contains both — dependence on factors and a temporal element. Hence, panel regression can be a suitable method of analysis for the same. Panel regression takes input in the form of panel data which is linear regression format with a time element in Y variable. The pre-requisite of “no auto-correlation” is violated and hence it is different from a linear regression.

Performance evaluation:

Performance can be evaluated by predicting Sale Price of n+1, n+2 and n+3 seasons from a training data of n seasons. A higher weight can be given to n+1 as compared to n+2 and n+3 for the purpose of evaluation.

Method — 2:

A stacked model can be built which can be a combination of 3 models:

1) A panel regression model to predict demand for a particular season.

2) A panel regression model to predict production for a particular season.

3) A linear regression model to predict Sale Price taking inputs from model 1 and 2 and additional inputs from variables like transportation costs which might be independent for every season.

Performance evaluation:

Performance can be evaluated by minimizing error in the predicted value of target variable.

Outcomes:

Predicting Sale Price of tomatoes with low error can form a basis for educating farmers about the dangers of over production. It can give governments and local market wholesalers a good estimate of what to expect in the upcoming season rather than relying on guesstimates. It will also help transport companies and cold storage facility owners to plan ahead for the upcoming season. And above all, it will also help save livelihood of farmers and their families who rely heavily on agriculture.

--

--

Prerit Saxena

Data Scientist at Microsoft, passionate about making a social impact through data