Here I wanted to list some key resources and influences I have found helpful.

For Learning

  • Coursera - A huge range of courses, but some key ones for me have been:
  • SQL Zoo – I tend to recommend this to people starting out learning SQL. It’s been around a few years though, so there may be better starters available now!
  • Enthought Canopy Online Python Training – This is very good training for someone who already knows how to code, and maybe use R, to get to grips with Python’s Scipy/Numpy/Pandas packages. Notably, this training is free touniversity members.
  • Practical Computer Vision with Simple CV – Combined with SimpleCV python package, this provides good entry to computer vision.
  • Hortonworks Apache Pig & Hive Training – Good intro to Hadoop 2, as well as providing a grounding in Pig / Hive.

For Working

  • RStudio – De facto R environment. Mixes REPL, scripting and graph output nicely for semi interactive work.
  • IPython Notebook – This does for Python what R studio does for R (and more). Very inituitive working environment for doing data science work.
  • Github / Gist – Git obviously as the in source control is excellent for large projects and applications. Gist is a very useful lightweight version of git, useful for posting and sharing code snippets.
  • NBViewer - This is a good online tool for sharing IPython Notebook’s. Takes an ipynb file uploaded to git or gist and displays it as a notebook.
  • Gephi - A fantastic tool for learning about graph analysis. Use in conjunction with ‘Social Network Analysis’ on Coursera, and also has good guides of it’s own here.