Have you tried to construct data science projects, but you have been intimidated by the vastness of the concepts used and code? In this article I have collected for you the top 20 Kaggle data science projects and the links to their source code.  

1. Heart-disease-prediction 

 Heart disease prediction project mainly involves training a machine learning model that will be able to predict if someone is suffering from a heart disease, and it has an accuracy level of 87%. Since it predicts in advance, it will provide insights to the doctors, and they will adapt the right diagnosis and treatment depending on the patient basis.

 CLICK FOR MORE DETAILS

2.  House Prices Advanced Regression Techniques 

It involves a process where a home buyer provides a description of their dream apartment/house. You will utilize 79 explanatory variables that will be describing each aspect of the residential homes around that area.  You are supposed to predict the final prices of the houses/ residential homes given. You require python with the libraries (NumPy, matplotlib, seaborn, scikit-learn and XGboost), and machine learning to carry out the data collection.

 CLICK FOR MORE DETAILS

3. Prediction of Airbnb new user booking 

 In the 21st century people have developed a traveling culture which has given rise to in the demand for travelling house booking. The solution to this problem is coming with a platform where travelers are able to book empty rooms in host houses. The platform is meant to predict which city or country that the travelers would like to choose for their 1st booking by machine learning methods. This project utilizes algorithms such as logistic regression, tree, SVM and XGBoost which develop models that help in identification of user’s behavior patterns. 

CLICK FOR MORE DETAILS

4. Pneumonia Diagnosis using X-rays 96 Percent 

 This project is meant to diagnose X-ray images of the lungs of a human being utilizing self-laid convolutional neural network and pass on leaning through inception V3. You need to lay a neural network that is repeatedly tuned for the best hyperparameters and utilized a variety of utility function of Kera like callbacks for check pointing and learning the rate decrease 

CLICK FOR MORE DETAILS

5. Plant Seedlings Classification 

 This project is meant to effectively differentiate a crop seedling from a weed. You will need to have a database with a collection of different images of plants and when it’s shown a plant its able to tell the species and also tell whether it is a weed or a plant seedling

 CLICK FOR MORE DETAILS

6. Ml workflow automation 

 This project has a Python-based machine learning which its main aim is to show the archetypal ML workflow around a Jupyter notebook it also proves ideas on key steps automation and also utilization of the titanic binary classification dataset that is hosted on Kaggle. The         ML work flow entails data visualization and exploration, model section and training. 

CLICK FOR MORE DETAILS

7. 3D object detection for autonomous vehicles 

 The world of technology is rapidly evolving, and the automotive industry needs to keep up with the changing trends. Self-driving cars have gained a lot of popularity, and it is very hyped though, most of the vehicles are branded autopilot and can’t drive without human assistance. This project is meant to solve a bigger problem which is 3D object detection over semantic maps.

 CLICK FOR MORE DETAILS

8. MNIST Kaggle Competition The Winning Solution. 

 This project gives you a step-by-step guide on how to solve and win MNIST competition on Kaggle. It utilizes the following techniques that will help you get a step-by-step increment on the test set accuracy. 

  1. Random forest algorithm 
  2. A convolutional neural network 
  3. CNN with Data argumentation 
  4. An assembly of CNN’s 
  5. CNN’s ensemble that has learning rate anneal er and batch normalization. 
  6. Multiple DL and ML algorithms. 

CLICK FOR MORE DETAILS

9. Global Wheat Detection 

 This project showcases how deep learning is utilized in detection of wheat heads form different crops. It detects wheat heads from different outdoor photos of wheat plants with inclusion of different data sets around the world. You will be able to come up with the size and number of what heads around the world.

 CLICK FOR MORE INFORMATION

10. Bio Response 

 This project main objective is to come up with a good model so that you can, as optimally as this data allows, relate molecular information to an actual biological response. In each row of this data set it represents a molecule while the first column is a representation of experimental data that describes actual biological response.

 CLICK FOR MORE DETAILS

11. Kaggle predict future sales 

 In this project, you are given historical sales data on a daily basis. The problem you are supposed to solve is predicting the cumulative number of products that are sold in every shop for the test set. The list of the products sold and shops keeps on changing every month this project model is supposed to determine the expected sales.

 CLICK FOR MORE DETAILS

12. State Farm Distracted Driver Detection 

 There are increased road accidents because some drive while texting, captured by social media or in a lively hand-held conversation on their phones. This project is supposed to do a classification of driver’s behavior i.e. driving attentively, wearing a seat belt or taking a selfie with friends on the back seat all this is based on a dataset of 2D dashboard camera images. 

CLICK FOR MORE DETAILS

13. COVID-19 classification 

 COVID-19 has become a pandemic. It is being diagnosed by the use of reverse transcription polymerase chain reaction. Xray machines provide a variety of chest images for early diagnosis of COVID-19 This project should be able to go through the images and determine which has COVID-19 and distinguish from normal and those with pneumonia.

 CLICK FOR MORE DETAILS

14. Emotion classification 

 This project entails a rooted learning face detector and an emotion classification DNN to group seven/six normal human emotion. Emotion classification is a very challenging task to carry out utilizing computer vision. It utilizes SSD object detection algorithm to extract face from an image.

 CLICK FOR MORE DETAILS

15. Ultra sound nerve segmentation 

 This project utilizes a rooted convolutional network that is adapted for segmentation in a way that image levels features can be easily learned for classification of each pixel. Because images have less or more spatial structure (nerves mostly in the same region) locally connected layers are utilized in parallel to convolutional ones from 10 x 14 resolution. 

CLICK FOR MORE DETAILS

16. Football Dataset Analysis 

 This project main objective is to study football dataset Analyze, extract information from it and make forecasts based on that data. I.e to identify strengths’ and weaknesses of a team and provide ways to measure and help improve its performance.

 CLICK FOR MORE DETAILS

17. Kaggle Rainfall Prediction 

 This machine learning projected main aim is to learn and predict rain behavior based on 14 weather features. It applies KNN model, Random Forest model and clustering model to get its values.

 CLICK FOR MORE DETAILS

18. Lung cancer detection 

 This project consists of a computer aided diagnosing system that autodetects lung cancer. It first detects the lung region using image processing techniques like dilation, outlining, median filter, and flood fill algorithms to the CT scan photos.

 CLICK FOR MORE DETAILS

19. Volume control using hand gestures recognition 

  In this project the computer camera studies human body motions i.e. gestures hence the word gesture recognition which makes the PC understand human language hence building a better link between machines and you rather than just use of GUIs.

 CLICK FOR MORE DETAILS

20. Cat human face classification 

 In this project, you create a model that can classify human and cat faces. You will utilize cat and human face data and use it to train the model then test it using your laptop’s camera using a cat’s face, and it will differentiate. 

CLICK FOR MORE DETAILS

Conclusion

 The source code of the above projects can be easily found on GitHub, all you need to do is follow up the link. Get started and construct a project in data science start with one project and once done proceed to others it will be quite easier.