Timbo Smash

Read it, Smash it!

Now some sagemaker smashing Part 1 – TBD Getting data to use

Here are some smashing resources to know:

AWS SageMaker Documentation

AWS Machine Learning Blog

Amazon SageMaker Examples

First you of course new an AWS account and setup S3 bucket with a data set. Take some time, but check out all the free data you can used among these. Next part we will get into creating a SageMaker notebook instance for modeling

1. Kaggle

  • Description: Kaggle is a platform for data science competitions and a vast repository of datasets on a wide range of topics.
  • Website: Kaggle Datasets
  • Examples: Titanic survival data, house prices, and image classification datasets.

2. UCI Machine Learning Repository

  • Description: A well-known repository containing a diverse collection of datasets for machine learning research.
  • Website: UCI Machine Learning Repository
  • Examples: Iris dataset, wine quality dataset, and adult income dataset.

3. Google Dataset Search

  • Description: A search engine specifically for finding datasets across the web.
  • Website: Google Dataset Search
  • Examples: Various datasets from multiple domains including science, government, and education.

4. AWS Public Datasets

  • Description: A collection of publicly available datasets hosted on Amazon Web Services.
  • Website: AWS Public Datasets
  • Examples: Genomics data, satellite imagery, and climate data.

5. Microsoft Azure Open Datasets

  • Description: A collection of open datasets made available for research and experimentation on the Azure cloud platform.
  • Website: Microsoft Azure Open Datasets
  • Examples: Weather data, healthcare data, and public safety data.

6. Data.gov

  • Description: The U.S. government’s open data site, providing access to a wealth of public datasets.
  • Website: Data.gov
  • Examples: Government spending, healthcare, education, and environmental data.

7. Open Data on AWS

  • Description: Datasets that are available on AWS under an open data license.
  • Website: Open Data on AWS
  • Examples: Landsat data, genomic data, and climate data.

8. Quandl

  • Description: A platform providing financial, economic, and alternative datasets for economic analysis and trading.
  • Website: Quandl
  • Examples: Stock prices, commodity prices, and economic indicators.

9. World Bank Open Data

  • Description: Provides free and open access to global development data.
  • Website: World Bank Open Data
  • Examples: Global financial indicators, demographic data, and development indicators.

10. GitHub Repositories

  • Description: Many data scientists and organizations share datasets on GitHub repositories.
  • Website: GitHub Datasets
  • Examples: Datasets for NLP, computer vision, and various machine learning tasks.