- 1 FREE DataSets (Real-World)
- 2 Most Popular Research Datasets
- 3 Most Popular Kaggle Datasets
- 4 Most Popular Deep Learning Datasets
- 5 Generic Datasets
FREE DataSets (Real-World)
In this article you will go on a voyage through genuine machine learning issues. You will perceive how machine learning can really be utilized as a part of fields like Education, Science, Innovation, Medicine etc .
Each machine learning problem recorded likewise incorporates a connection to the freely accessible dataset. This implies if a specific solid machine learning issue passionate’s you, you can download the dataset and begin rehearsing quickly.
Most Popular Research Datasets
Wine dataset: Given a compound examination of wines predict the starting point of the breeze.
Car evaluation dataset. Given insights about autos anticipate the assessed security of the auto.
Breast Cancer Wisconsin dataset. Given the aftereffects of an indicative test on on breast tissue, predict whether the mass is a tumor or not.
Iris dataset. Given flower estimations in centimeters anticipate the species of iris.
Heart Disease dataset. Given the consequences of different indicative tests on a patient foresee the measure of coronary illness in the patient.
Poker Hand dataset. Given a database of poker hands predict the nature of the hand.
Human activity recognition using smart phones dataset. From Smartphone development information anticipate the kind of movement performed by the individual holding the Smartphone.
Forest fires dataset. Given meteorological and different elements foresee the burned zone of backwoods fires.
Adult dataset. Given evaluation information anticipate with an individual will gain more than $XX,XXX a year.
Internet Advertisements dataset. Given the subtle elements of pictures on site pages anticipate whether a picture is a notice or not.
Abalone dataset. Given the estimations of abalone anticipate the age of the abalone.
Wine Quality dataset. Given different estimations of wine anticipate the nature of the wine.
Most Popular Kaggle Datasets
Bike Sharing Demand. Given daily bicycle rental and climate records anticipate future every day bicycle rental request.
Restaurant Revenue Prediction. Given the points of interest of an eatery site foresee the income of the eatery in a given year.
Rossmann Store Sales. Given verifiable deals information for items crosswise over stores, forecast future deals.
Otto Group Product Classification Challenge. Given highlights of products data group items into one of 9 item classifications.
Liberty Mutual Group: Property Inspection Prediction. Given the points of interest of examined properties foresee a peril score for properties.
Higgs Boson Machine Learning Challenge. Given the portrayal of recreated molecule impacts foresee whether an occasion rots into a Higgs boson or not.
Forest Cover Type Prediction. Given cartographic factors anticipate forest cover type.
Amazon.com Employee Access Challenge. Given authentic asset get to changes for employee foresee the assets required by employees.
The Analytics Edge. Given points of interest of new your circumstances articles foresee which news paper articles will be prominent.
Most Popular Deep Learning Datasets
Open Images Dataset : Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
Fashion-MNIST is a dataset of Zalando‘s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes.
IMDB Reviews : This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. They provide a set of ~25,000 highly polar movie reviews for training, and ~25,000 for testing.
The Wikipedia Corpus : Wikipedia is a relatively big and consistent resource for NLP researchers to work with. However, it is not straightforward even to extract meaningful sentences and portions which are useful for the research.
Free Spoken Digit Dataset : FSDD is an open dataset, which means it will grow over time as data is contributed. A simple audio/speech dataset consisting of recordings of spoken digits in
wav files at 8kHz.
Million Song Dataset : The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features.
Sentiment140 : Sentiment140 isn’t open source, but there are resources with open source code with a similar implementation. It has rich features like id of tweet, date of tweet, query, text of tweet and popularity of tweet.
MNSIT : MNIST is a standout amongst the most prominent Deep learning datasets written by hand digits and contains a huge training set of cases which you should not miss out.
WordNet : WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.
VisualQA: VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.
LibriSpeech : LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.
data.gov.in: This is the home of the Indian Government’s open data. You can see data from industries, climate, health care etc.
data.gov : This is the home of the U.S. Government’s open data. You can see datasets varying from climate, education, energy, Finance etc
RBI : Data accessible from the Reserve Bank of India. This incorporates a few measurements on currency showcase tasks, adjust of installments, utilization of saving money and a few items.
World Bank : The open Data from the World bank.
Google datasets : Google gives a few of the datasets such as infant names, data from GitHub open archives, few stories and so on.
Amazon Web Services (AWS) datasets – Amazon gives a few of enormous datasets, which can be utilized on their platform or on your PCs.
Youtube labeled Video Dataset : This Dataset consists of 80+ Lakh YouTube video IDs and associated labels and it comes with pre-computed, vision features from billions of frames.
UCI Machine Learning Repository : This is obviously the most popular information store. It is generally the primary spot to go, in the event that you are searching for datasets identified with machine learning repositories.
Driven Data : Driven Data discovers genuine difficulties where data science can be utilized to make a positive social effect.
Movie Review Data : This Datasets gives accumulations of motion picture survey archives and sentences named as for their subjectivity status.
Spam – Non Spam : This Datasets consists of a classifier classifying the SMS as spam or non-spam.
MovieLens : It consists of online field experiments in MovieLens in the regions of automated content recommendation, recommendation interfaces, tagging-based recommenders.
Reddit Datasets Subreddit : Reddit is a community forum, here we can sort datasets by popularity or votes to see the most popular ones. You can also see some interesting discussions.
Jester : This Datasets is all about online jokes recommender system for fun.
You can do few dataset trials from 40 Fun Machine Learning Projects for Beginners and utilize 100+ Final Year Project Ideas in Machine Learning for your machine learning real problems postured or explored by science and business associations around the globe.
Even all the more energizing that these various issues have openly accessible datasets and are additionally generally contemplated and comprehended.
This implies you can download the information at the present time and investigate the issue by actualizing your own particular model, or recreate another person’s from a paper.
Note: Some of these datasets are gigantic in size. Please ensure you have good internet connection to download.
TOP AND BEST GITHUB PROJECTS (AI ML DL)