0% found this document useful (0 votes)
122 views4 pages

MOSDAC Data Extraction Guide

The document outlines data extraction methods for the MOSDAC website, detailing various types of data available such as satellite imagery and interactive maps. It provides specific techniques for static and dynamic scraping, API access, file downloads, and geospatial data processing, along with example code snippets. Additionally, it emphasizes the importance of adhering to site policies and using official APIs when available.

Uploaded by

Suchismita Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views4 pages

MOSDAC Data Extraction Guide

The document outlines data extraction methods for the MOSDAC website, detailing various types of data available such as satellite imagery and interactive maps. It provides specific techniques for static and dynamic scraping, API access, file downloads, and geospatial data processing, along with example code snippets. Additionally, it emphasizes the importance of adhering to site policies and using official APIs when available.

Uploaded by

Suchismita Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Extraction Methods for www.mosdac.gov.

in

Overview of Website Content

---------------------------

MOSDAC (Meteorological and Oceanographic Satellite Data Archival Centre)

includes:

- Satellite imagery and data products (static + dynamic)

- Interactive maps and charts

- FAQs and documentation

- Searchable data archives

- Downloadable files (PDF, NetCDF, GeoTIFF, etc.)

Data Extraction Methods

------------------------

| Type | Description | Tools |

|------------------|---------------------------------------------|-----------------------------|

| Static Scraping | HTML pages, FAQs, documents | BeautifulSoup,

Scrapy |

| Dynamic Scraping | JavaScript-rendered content (maps, charts) | Selenium,

Playwright |

| API Access | Hidden API endpoints | Browser DevTools,

Requests |

| File Downloads | PDFs, NetCDF, GeoTIFFs | wget, curl, requests

| Geospatial Data | Map layers, GeoTIFF | GDAL, rasterio |


Procedure

---------

1. Inspect Website Structure:

- Use browser DevTools to check HTML, API, and JS.

2. Static Data Extraction:

Example using BeautifulSoup:

```

import requests

from bs4 import BeautifulSoup

url = 'https://www.mosdac.gov.in/site/content/faq'

res = requests.get(url)

soup = BeautifulSoup(res.text, 'html.parser')

questions = soup.select('.faq-question-class')

for q in questions:

print(q.get_text())

```

3. Dynamic Content Extraction:

Example using Selenium:

```

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

options = Options()

options.add_argument('--headless')

driver = webdriver.Chrome(options=options)

driver.get('https://www.mosdac.gov.in/live')
data = driver.page_source

```

4. API Access via Network Interception:

```

import requests

url = 'https://www.mosdac.gov.in/api/data?type=...'

headers = {'User-Agent': 'Mozilla/5.0'}

params = {'date': '2025-07-08', 'product': 'temp'}

res = requests.get(url, headers=headers, params=params)

print(res.json())

```

5. Automating File Downloads:

```

import requests

file_url = 'https://www.mosdac.gov.in/file_download/sample_data.tif'

r = requests.get(file_url)

with open('data.tif', 'wb') as f:

f.write(r.content)

```

Tool Summary

-------------

| Tool | Use Case |

|------------------|--------------------------------|

| BeautifulSoup | HTML parsing |


| Scrapy | Large-scale scraping |

| Selenium | JavaScript/dynamic data |

| Requests | APIs and file downloads |

| Postman | API testing |

| GDAL/rasterio | Geospatial data processing |

Important Considerations

-------------------------

- Check robots.txt at: https://www.mosdac.gov.in/robots.txt

- Respect site policies and licenses

- Use delay/timers in automated requests

- Prefer official APIs if available

You might also like