Use the command-space spotlight search thing a lot. like to change an option, you better be typing "command-space" then "s" and then it will probably autocomplete-suggest "system preferences" which is what you want, so you can just press enter and get there in 4 keystrokes.
SizeUp lets you easily move windows to occupy the left or right half of the screen, quartants of the screen, maximize, send to other desktops, monitors, etc etc, using hotkeys. It's great and makes me way more productive, can't recommend enough. It costs like 13 bucks, just buy it, or use the trial version, but seriously, its 13 bucks. It doesn't seem that useful but being able to consistently operate in splitscreen mode and stuff is great when reading documentation/papers/math and coding at the same time, etc.
Karabiner Elements lets you remap keys. I did this because as a PC programmer I was so used to the ctrl key (which acts more like the command key on windows) being under my left pinky, where Mac laptops put the "fn" key, and the hand contortions for "command-x/c/v" seem so much more unnatural, I wanted to remap my keys. I mapped my fn key to left_command so it was under my left pinky like on a PC, left_command to left_option, left_option to fn. Karabiner-Elements is free (I think).
Sublime Text 3 is the correct text editor to use. Don't use it to write code for languages that have a good IDE (usually that means an IDE made by JetBrains, e.g. IntelliJ and all its plugins like for Scala, etc. PyCharm is just basically the IntelliJ Python plugin bundled as a different appplication.) Install Package Control on it to easily install new packages and use it to edit things like config files, LaTeX (if you're not collaborating on Overleaf), this document, etc. It costs money and will bug you occasionally. Depends on how much money you have.
Skim is a great PDF reader with support for annotating documents and an API for syncing up with other applications, such as tex
editing plugins for Sublime.
I use the app QuickRes to let me hack around with crazy window resolutions on my retina display. I don't know how much this is necessary on the modern OSX in in 2018 vs in 2012 when you needed it, but you can technically run at like native retina resolution if you were insane. Generally highest resolution you can stand is the best, because you have more screen real estate to work with, you can split screens with SizeUp, and be more productive.
Go to System Preferences > Trackpad and enable:
Under the Point & Click tab:
Tap to click. If you're physically depressing your trackpad to click stuff I hate you.
Secondary click (right click) should be "click or tap with 2 fingers".
Under Scroll & Zoom tab:
Enable Scroll direction: Natural. It's sooooo much better. It's like you're pushing and pulling the page: "Get away from me, internet! Get over here, internet!".
Enable pinch to zoom.
I don't care about the others.
Under More Gestures tab:
I enable all of them but to be honest, the ones I really use are:
Swipe up with 3 fingers for Mission Control (see all windows).
Swipe down with 3 fingers for Exposé (see all windows of current application).
In most text situations, using the arrow keys plus holding option will let you skip over entire words at a time. I CAN'T EMPHASIZE ENOUGH HOW MUCH MORE PRODUCTIVE THIS MAKES YOU. Especially if you hold down shift, you can use it to quickly highlight specific words or lines to copy and paste and cut.
Using the shell is important and very powerful. To demonstrate this power, the following one-liner downloads the lyrics of the song Juicy by Biggie Smalls and reads it over your speakers (warning, lots of explicit lyrics):
curl -Ls http://genius.com/The-notorious-big-juicy-lyrics | xmllint --html --xpath "//div[@class='lyrics']" - 2> /dev/null | tr $'\n' ' ' | sed 's/<[^>]*>//g' | sed 's/\[[^]]*\]//g' | say
Get a good .bash_profile
file that colors things differently, makes your prompt nice, etc. Set your terminal to white on black. Add the alias ll
for "ls -lha" because you'll use it a lot.
I've also heard good things about zsh so maybe do that (with someone's monster config script)
ls
shows the contents of the current folder.ls -lha
is like ls but shows sizes, hidden files, more information, etc. it's good and should be aliased to ll.~
is an alias for your home directory, .
is an alias for the current directory, ..
is an alias for your parent directory.cd DIR
or pushd DIR
changes directories. pushd
is nice because it lets you return to where you came from later with popd
, and nests. pwd
prints the full path of the current directory.man COMMAND
prints the help page for a given COMMANDcat FILE
prints a file to the screen (standard out or stdout)|
separator (pipe) between two commands pipes the stdout of one to the stdin of another. Many utilities can accept input from stdin (standard in) so this lets you chain cool commands.>
separator, used as COMMAND > filename
writes the stdout of COMMAND
to that file. very useful.echo FOO
prints FOO to the screen. echo $FOO
prints the environment variable FOO
to the screen. This is a good way to check your path, the ordered list of folders that your shell looks in when you type a command.touch FILENAME
creates a new empty file called FILENAME
. Nice to add a blank README.md
to a new git repository or whatever.mv SOURCE DEST
moves SOURCE
to DEST
.cp SOURCE DEST
copies SOURCE
to DEST
. If source is a directory, you need cp -r
for recursive
.ln -s DEST SOURCE
makes a shortcut ("symlink" or "soft link") without actually moving or copying. Very useful, and dangerous, because notice that the order of arguments is reversed. You can mess a ton of stuff up.rm FILE
removes a file. if its a directory, you need "rm -r DIR".wc -l FILENAME
counts the number of lines in a file. you can also count words, etc. even if you couldn't, you could use tr
to change " " to "\n" and pipe that to wc -l
(see "slightly more advanced" section).du -h -s
gives the disk space usage of the current directory. du -h -s DIR
gives it for DIR.head -n NUM FILE
gives the first NUM
lines of FILE
. There's a similar command, tail
. head
also accepts negative lengths, use the man
page. if you don't give it a file, it will use stdin
, this is common of most commands so you can chain them together with pipes.ls
, etc):sort
sorts lines in ascending by default, in dictionary order as characters. -n
makes it sort in numeric order, -r
gives descending, -k
lets you pick a column as a key to sort by, -t
lets you pick a field delimiter to create columns.uniq
gives the unique lines, but only by collapsing neighboring elements because generally we are stream processing. So, to do a true unique count, you want to sort
first. cat FILE | sort | uniq | wc -l
gives the number of unique lines in a file.sed
lets you edit streams with regular expressions but on Macs they use a crufty old BSD sed so you type sed -E
to make it act more like Linux sed. usually something like cat file | sed -E 's/REGEX/whatever it is you want to do with the matches/g'
will get the job done.
grep
lets you search with regular expressions though once again I think you need to do grep -E
because uhh Mac. the -v
option inverts the match. This is useful for finding things in a text file or finding a file in a directory, etc. For example, cat FILENAME | grep -E '^foo'
gets the lines starting with "foo".
awk
is really nice for manipulating tabular data and more, but a bit arcane. It can be useful to do stuff like figure out the longest line, or just pluck out specific columns.
If you want longest line in characters, that would be something like cat tempfile | awk 'BEGIN {FS=""} {print NF}' | sort -rn | head -n 1
, whereas the longest line in words (things separated by a space) would be cat tempfile | awk 'BEGIN {FS=" "} {print NF}' | sort -rn | head -n 1
. FS
means "field separator" and NF
means "number of fields".
If you want the index of the line with the most fields, that could be something gross and imperative (but shows the power of awk
) like
cat tempfile | awk 'BEGIN {FS=" ";idx=0} {print idx,NF;idx+=1}' | sort -rnk 2 | head -n 1 | cut -d' ' -f 1
cut -d' ' -f 1
is equivalent to awk 'BEGIN{FS=" "} {print $1}'
, it sets a delimiter and selects a column or subset of columns.
Don't hesitate to ignore this stuff once it gets really hairy and just write a quick Python script. Doing complicated things in bash is often a fool's errand.
nano
. vim
and emacs
are madness.vim
, you can quit it by typing ":", then "q", then enter. No kidding.emacs
, you can quit by holding control-x-c.emacs
.curl
or wget
. i don't think wget is on Macs by default but if you are using a package manager WHICH YOU SHOULD BE you can grab it as easy as brew install wget
.scp
(ugh) or rsync
(yea!) to upload/download stuff from remote sites. rsync
is the best because it uses a sophisticated diffing algorithm to send minimal data, which is useful if its something you keep updating a lot but only incrementally, like a code directory.bg
command will let it continue running on another process while you use this terminal. The fg
command brings it back to the foreground and puts you back in the action and unable to use your terminal. You can do this with more than one process. It's often useful when you want to download a few things at once, etc.fork
for more multiprocessing.xargs
is very slept on, but it lets you take a stream of incoming lines and repeatedly execute a command on them, with the -I
option it gives a lambda-like syntax where you introduce a free variable. For example, if you want to copy a bunch of files in the current directory, matching a certain prefix, to the target directory DEST
, you could do ls | grep -E '^PREFIX' | xargs -I X cp X DEST
.tr
will translate specific characters one to another.join
can do SQL style joins on specific columns and stuff.cut
, paste
etc. I do most of it with awk
or Python.Use git
repositories for everything (even non-code stuff like this document) and commit more often than you think you should. it's always nice to have a full history.
The command line git
is hard to use. Use SourceTree, it's free, incredibly slow, and lets you do all kinds of git
stuff like revert files and commit things without losing your mind learning git
commands. Branching is also good to work on new features without mauling your existing code, but I've been known to just copy-paste instead.
I hate Dropbox, probably going to switch to something else, but put your projects in a folder that is automatically backed up remotely. I currently use Dropbox but should switch to Google Drive or whatever doesn't take days for a fresh sync.
Use IDEs, not text editors. Even the most die-hard emacs fan will eventually crumble upon seeing the power of IntelliJ and the like. IntelliJ Community Edition is free to all, or if you have a student email you can get the professional edition for free. I don't think I've ever used any of the features of the professional edition. One of the best features (besides actually working, type-based autocomlete) is being able to "Go To Definition" of functions that are only in libraries you are referencing. Looking at library code to see what it does is often more useful than googling a bunch of confusing documentation. USE "GO TO DEFINITION" A LOT I CANNOT OVEREMPHASIZE HOW MUCH EASIER THIS WILL MAKE YOUR LIFE. The IDE will also point out syntax errors, do type-aware autocomplete, etc, before you even run it.
Some languages don't have an IDE or an IntelliJ plugin, then use Sublime Text with the appropriate packages, or maybe there's a Jupyter notebook addon that handles the language, which can be very cool.
Learn the shortcuts of your IDE to do things like rename symbols, search for symbols, classes, files, look up definitions of functions/classes or binding points of variables, find the usages of a function/class/variable. Hotkeys are good and don't hesitate to customize them. A lot of them are not good to begin with. Why is "go to definition" command-b and not command-d for "definition"? etc.
Use Jupyter notebooks. There are more languages supported now than just Python, and the interactivity is really awesome, and there are now plugins to basically build web apps with gui widgets that interact with code.
Use Jupyter notebooks a lot to develop new ideas, interactively explore, and make little algorithms and peek at data before moving it into a more rigorously curated home in an IntelliJ project. Do this iteratively, you can import external Python modules into Jupyter.
Use PyCharm for your main development. I think it's basically just the IntelliJ Python plugin packaged as a standalone product. The community edition is free, professional edition with a student email is free. All rules from "general coding stuff" apply.
Use the pip package manager, or Anaconda, or both (recommended), and/or virtualenv (I just use Anaconda instead of virtualenv
for making environments, but it could be good for distribution), to get libraries and manage packages / Python package environments. This will make your life incredibly easier. More detail in data science section.
Use the IntelliJ Scala plugin.
Don't get too cute with the module system and make yourself a total mess of code, but don't get too uncute either, otherwise you might as well be using Java. Higher kinded types are wild, as are first class modules.
The collections are slow. For loops are slow. Many things are compiled to methods that should be simple member lookups, like private vars in traits.
We like Python for data science because of libraries like scipy
, numpy
, matplotlib
, tensorflow
, pandas
, jupyter
, lots of other stuff. Otherwise we'd be using a language whose for loops weren't glacial and stuff. It's all about the tooling and libraries. I would be using Scala or OCaml all day otherwise.
I recommend installing Anaconda as a combination environment manager / package manager specifically for scientific computing. It lets you create different environments with their own Python versions and package versions, etc. Useful for checking out what package upgrade broke your project, as well as generally having complete control over your Python, especially in situations where you don't have root like on clusters. Sometimes anaconda doesn't have an up to date package. You can always use pip
in this case, or sometimes the package will make you build your own package from source and install it. This should still happen safely in your current Anaconda environment. Anaconda rules.
Learn to love numpy
. Get comfortable with vectorizing your code (turning it into batch operations on (pseudo)tensors). It's the way numerical algorithms are programmed these days, when possible, to allow GPU acceleration.
scipy
has crazy functions you won't believe. Never assume some wild polygamma function or something hasn't been implemented.
matplotlib
is great for exploring data, visualizing. Especially in combination with jupyter
.
I use TensorFlow for most machine learning since I use neural networks a lot. Some people use PyTorch. It has its advantages, but TensorFlow's "eager" mode is an attempt to capture those advantages too. I will bet on the enormous team run by Jeff Dean to eventually outpace everything in the end, and its the only library that can run on the Google TPU cloud, I imagine.
Scikit-learn is a great library for general machine learning that has some good implementations of gnarly algorithms and can be very useful. For example when doing data analysis I wanted to try some sparse inverse covariance estimation, and Scikit-learn did it with no problem using its GLasso
implementation.
graphviz
can let you do some cool visualizations of graph data, especially inside jupyter
notebooks.
I don't currently use pandas
much but apparently it is amazing for manipulating and exploring data.
Any good scientist is going to have to write up some gnarly papers at some point, that's what tex
is for.
I recommend the Sublime Text tex
plugins when editing locally, paired with Skim to view pdfs it can sync up when it renders and scroll around, etc.
Usually though, you might as well edit your tex
in the cloud, Overleaf is great for this. It even works for collaborative editing like Google Docs and will render previews in the browser.
When you just want to write little snippets of math without worrying about a document, LaTeXiT is an amazing tool. You can also easily drag/export/copy created formulae out of it in various formats, including vectorized. Very useful when typesetting math for presentations that you'll end up doing in Keynote or PowerPoint, too, or pasting into emails.