Here are some smashing resources to know:
First you of course new an AWS account and setup S3 bucket with a data set. Take some time, but check out all the free data you can used among these. Next part we will get into creating a SageMaker notebook instance for modeling
1. Kaggle
- Description: Kaggle is a platform for data science competitions and a vast repository of datasets on a wide range of topics.
- Website: Kaggle Datasets
- Examples: Titanic survival data, house prices, and image classification datasets.
2. UCI Machine Learning Repository
- Description: A well-known repository containing a diverse collection of datasets for machine learning research.
- Website: UCI Machine Learning Repository
- Examples: Iris dataset, wine quality dataset, and adult income dataset.
3. Google Dataset Search
- Description: A search engine specifically for finding datasets across the web.
- Website: Google Dataset Search
- Examples: Various datasets from multiple domains including science, government, and education.
4. AWS Public Datasets
- Description: A collection of publicly available datasets hosted on Amazon Web Services.
- Website: AWS Public Datasets
- Examples: Genomics data, satellite imagery, and climate data.
5. Microsoft Azure Open Datasets
- Description: A collection of open datasets made available for research and experimentation on the Azure cloud platform.
- Website: Microsoft Azure Open Datasets
- Examples: Weather data, healthcare data, and public safety data.
6. Data.gov
- Description: The U.S. government’s open data site, providing access to a wealth of public datasets.
- Website: Data.gov
- Examples: Government spending, healthcare, education, and environmental data.
7. Open Data on AWS
- Description: Datasets that are available on AWS under an open data license.
- Website: Open Data on AWS
- Examples: Landsat data, genomic data, and climate data.
8. Quandl
- Description: A platform providing financial, economic, and alternative datasets for economic analysis and trading.
- Website: Quandl
- Examples: Stock prices, commodity prices, and economic indicators.
9. World Bank Open Data
- Description: Provides free and open access to global development data.
- Website: World Bank Open Data
- Examples: Global financial indicators, demographic data, and development indicators.
10. GitHub Repositories
- Description: Many data scientists and organizations share datasets on GitHub repositories.
- Website: GitHub Datasets
- Examples: Datasets for NLP, computer vision, and various machine learning tasks.
