Here I wanted to list some key resources and influences I have found helpful.
For Learning
- Coursera - A huge range of courses, but some key ones for me have been:
- Machine Learning – Andrew Ng – A very good primer for ML
- Social Network Analysis – Lada Adamic
- Mathematical Biostatistics Boot Camp 1 and 2 – These provided a good refresher in statistics after a few years away.
- SQL Zoo – I tend to recommend this to people starting out learning SQL. It’s been around a few years though, so there may be better starters available now!
- Enthought Canopy Online Python Training – This is very good training for someone who already knows how to code, and maybe use R, to get to grips with Python’s Scipy/Numpy/Pandas packages. Notably, this training is free touniversity members.
- Practical Computer Vision with Simple CV – Combined with SimpleCV python package, this provides good entry to computer vision.
- Hortonworks Apache Pig & Hive Training – Good intro to Hadoop 2, as well as providing a grounding in Pig / Hive.
For Working
- RStudio – De facto R environment. Mixes REPL, scripting and graph output nicely for semi interactive work.
- IPython Notebook – This does for Python what R studio does for R (and more). Very inituitive working environment for doing data science work.
- Github / Gist – Git obviously as the in source control is excellent for large projects and applications. Gist is a very useful lightweight version of git, useful for posting and sharing code snippets.
- NBViewer - This is a good online tool for sharing IPython Notebook’s. Takes an ipynb file uploaded to git or gist and displays it as a notebook.
- Gephi - A fantastic tool for learning about graph analysis. Use in conjunction with ‘Social Network Analysis’ on Coursera, and also has good guides of it’s own here.