Josh Nichols - Software Engineer
- josh@technicalpickles.com
- http://technicalpickles.com
- Savannah, GA
Goals
- To build products that users enjoy using
- To work on a team that works well together and supports each other
- To write code that developers don't hate
- To engineer systems that can be monitored, operated, and scaled
Skills
- Programming Languages
-
- Ruby
- JavaScript
- Bash
- Java
- CoffeeScript
- Python
- View Languages
- Testing
- Tools
- Deployment
- Web Frameworks
- Databases
- Operating Systems
-
- Mac OS X
- Gentoo Linux
- Ubuntu Linux
- Agile practices
Skill Stories
- Endlessly researches and evaluating new tools, techniques, processes, etc
- Frequently reads and patches source code of tools and libraries used for day-to-day development.
- Comfortable with going from a freshly installed system, and preparing it to act as a production server
- Create prototypes to experiment or better understand problems, tools, techniques, etc
Career History
GitHub, Distributed
Infrastructure Engineer, June 2013 - July 2015
Senior Application Engineer (Developer Experience), July 2015 - March 2016
Senior Application Engineer (Internal Tools), March 2016 - Present
Rails Machine, Distributed
Awesomeness Engineer of Supreme Versatility, January 2010 through July 2012
Chief Technology Officer, July 2012 through April 2013
-
I have extensive knowledge of the ‘Rails Machine Stack’, in the sense of all the tools and services that enable day-to-day work to happen:
- I built everythingisawful, an application to help periodically review & audit the monitoring systems. I use this to find alerts that have been outstanding to track why they were left unattended and get them fixed, as well as used it as a source of inspiration to refactor & improve what we monitored, and the process we use to handle them
- I periodically reviewed the status of support tickets. I added some heuristics to identify when tickets are in a ‘weird’ state, such as missing a deadline, missing a description, older than a week. For open tickets, etc. I look for ways to close the loops, unblock tickets that have been languishing open. I also have been slowly but steadily improving how we use HelpSpot, including adding and removing filters as they become apparent, customizing statuses, adding custom fields, adding automated responses, and generally improving the day to day experience of HelpSpot.
- I periodically reviewed backups, including fixing backups on hosts that that are failing, calculating capacity, reallocating, and improving visibility.
- I was the primary maintainer and contributor of all Rails Machine open source projects.
- I made massive improvements to moonshine & many plugins, to significantly reduce the amount of manual work that has to happen for managed deploys.
- I built beanmachine, an application to help automate billing. It has logic to automatically calculate Credit Hours from Application Management fees to apply if they haven’t been used this billing period. It pulls time entries from HelpSpot tickets to add as line items to invoices. For BaseCamp-managed projects, it reduces work even more, with only a few clicks needed to invoice for the previous month’s logged time across all BaseCamp projects. A side effect of this work is that OurCustomers now tracks FreshBook and Ubersmith client IDs, BaseCamp project IDs, and Authorize.net Ids, which could be used for the foundation of further billing improvements.
- I overhauled how we track & allocate IP addresses. It previously was managed via DNS of managedmachine.com for the ‘managed subnets’, and plain text files for the self-managed subnets. The problem was we ran out of IPs in the ‘managed subnets’, that led to double-allocating IPs, and in addition to generally being time consuming. IPD already modeled IP addresses, but they were treated as artifacts of servers, rather than a first class object themselves. I refactored this modeling such that we model all our Subnets, which has many IP addresses, and a Guest has many IP address assignments, and allow multiple guests to be intentionally assigned the same IP address (ie heartbeat or mmm). As part of rolling this out, I audited our IP assignments to make sure there weren’t duplicates, and that any IP address that was pingable is assigned to a server.
- I added support to IPD for tracking bare metal hosts. Previously, our HelpSpot/OurCustomers livelookup service wasn’t able to work for bare metal hosts, because they simply didn’t exist. This addition meant that livelookup would work as expected, and baremetal servers could be treated like any other server, in the context of using our internal tools.
- I automated large parts of the weekly report we generate for our VIP customers. Among other things, it puts together a list of tickets, abuse reports, monitoring alerts, and open GitHub pull requests. The work to do it manually wasn’t hard, but it was time consuming and easy to miss something. This made it trivial to do each week, and also opened the door for applying this to other customers. A side effect of this work is that OurCustomers now tracks GitHub information about account/applications/users.
- Most day-to-day work relies heavily on systems and tools I’ve built, managed, or improved.
-
I implemented ChatOps. As part of that, I dove into learning CoffeeScript and understanding how Hubot worked. I re-implemented our bot using hubot, and re-implemented most of our campfire functionality in hubot. I also began making improvements to hubot and hubot-scripts, and soon was asked to be a hubot maintainer. Since then, I have continued to keep an eye out for things that can either be done directly from Campfire with ChatOps, or at the very least provide additional context for getting work done. Some examples of improvements I’ve made in this vein:
claptrap alerts
for showing monitoring alerts in helpspot for a specific guest (used for striking)claptrap pmta
for showing metrics from PMTA, andclaptrap autopmta
for displaying those metrics every N minutesclaptrap backup critical
for showing hosts needing backups fixedclaptrap billing
for linking to correct billing system for a particular guest/accountclaptrap basecamp
for linking to the basecamp project for a particular guest/accountclaptrap authorize.net
for linking to the authorize.net page for a particular guest/accountclaptrap dns
for linking to the DNS Admin interface for a specific guestclaptrap free ip
for finding list of anassigned IP addressesclaptrap guests
for finding guests in IPDclaptrap github
for linking to the github repo for a guest’s applicationclaptrap hostip
for geocoding IP addresses/domains, for cases when a guest is getting slammed by specific IPs, to find out if they are coming from somewhere suspiciousclaptrap hosts
for finding hosts in IPDclaptrap moonshine me
for linking to moonshine & pluginsclaptrap newrelic
for showing all accounts with NewRelic, or linking to newrelic for a particualr guest, which is super useful while handling extended outages.claptrap page
for paging specific people easily from Campfireclaptrap papertrail
for dumping logs from papertrail into campfireclaptrap puppet
for linking to Puppet’s Type Reference for a particualr resource, ie file. saves googlingclaptrap raid
for showing hosts with raid warningsclaptrap redis
for linking to redis command referenceclaptrap scout
for linking to the scout page for a particular guestclaptrap teach me
for accumulating links for referenec, such as documentation about Rails Machine’s inner workingsclaptrap who's on call
for showing who is on call, also announces it on shift changes
-
I spoke at Rocky Mountain Ruby about ChatOps, and received a surprising amount of good feedback on the process I used, as well as the story I told on the road to ChatOps. I also gave the same talk at Geekend, but I don’t think they were quite as ready for it.
-
I acted as the Curator of the Rails Machine Way of Support, and work to improve the team’s quality of customer interactions. This spans a lot of things, including:
- optimizing ticket language for readability
- wordsmithing ticket & responses to create the right tone of awesomeness
- rectify the differences between what a customer says, and what a customer wants, and giving them that
- close ticket loops, and trending ticket towards done
- practicing inbox zero
- optimizing handling of tickets to provide great support, while minimizing the number of back & forth responses with customer
-
I provided expert operational consulting for our customers customers. Based on some rough scripting, easily paid for my salary in consulting alone. In addition, this consulting has supported a 1000% growth of monthly revenue for a VIP customer, as well as moderate growth for a number of our other customers
-
I provided support to a VIP customer for the months leading up to their peak traffic for the year, allowing them to have smooth sailing during their most important time of year, further building their confidence in us.
-
I acted as an escalation points for for difficult, critical, and extended outages. Some of the types of issues I’ve worked with in this case:
- help customers deal with a large influx of traffic
- diagnose guests not booting
- diagnose & fix mysql replication failing
- diagnose & fix heartbeat suffering split brain
- fix application errors after a ruby upgrade
- debug large memory increases in application servers
- debug request queue backing up due to riak issues
- investigate & fix 503 errors caused by slow customer backup processes
- blocked evil IP addresses flooding a server with traffic
- investigate mmm processes with large memory
- investigate & mitigate application performance issues during peak traffic loads
-
I’ve written and collected extensive documentation covering a wide range of topics of topics. This includes the tools we use day to day, to details on the operation of different tools & services, to information about the company. Some examples:
- haproxy
- heartbeat
- mysql
- mysql mmm
- memcached
- redis
- mongodb
- xfs
- caching
- lvm
- sidekiq
- ntp
- sphinx
- pagerduty
- postfix
- ssl
- postgresql
- denyhosts
- delayed_job
- solr
- xsendfile
- screen
- resque
-
I was a training resources for onboarding new hires. This includes generating documentation, syllabuses, and schedules, as well as directly teaching, pairing, and providing feedback on code & support.
Fan Vs Fan, Distributed
Fan Vs Fan is a site for sports fans where they can face off with other members using their webcams, as well as write articles and record fancast shows.
Lead Engineer (Contract), March 2009 through January 2010
- Was soley responsible for planning, developing, deploying, and maintaining site
- Worked with client to plan site direction, identify business requirments, and schedule feature implementation
- Worked with EngineYard to monitor fanvsfan.com, and respond to downtime and other production problems
- Monitored site performance (both client and backend), evaluated improvements, implemented them
SNIF Labs, Boston, MA
SNIF Labs designed and built an intelligent dog tag which lets you monitor your dog's activity while you're away and keep in touch with his friends and yours.
Software Engineer (Full-time), August 2008 through December 2008
- Evaluated and introduced Ruby and Rails 'best practices' to improve code clarity and maintainability
- Evaluated and introduced agile practices which reduced management overhead and improved developer productivity
- Automated build system by implementing a continuous integration server
- Improved test coverage from 40% to 80%
- Migrated 2 years of subversion data into git on GitHub
Boston Ruby Group, Boston, MA
The Boston Ruby Group is composed of some 400 members in the Greater Boston area with the shared interest in the Ruby programming language.
Member, Presenter, and Organizer (Volunteer), November 2007 through Present
- Presented Git as a Subversion replacement, Extracting Plugin and Gems from Rails application, RailsRumble 2008 Retrospective, and Rake: The Familiar Stranger
- Schedules local and out-of-town speakers to present at monthly meetings
- Collaborates with publishers and other sponsors to provide giveaways at meetings
- Developed and maintains the group's website for managing events, jobs, and projects
Broad Institute, Cambridge, MA
The Broad Institute of MIT and Harvard is a research institute dedicated to the study of genomics for the biomedical sciences.
Software Engineer (Full-time), Chemical Biology Department, June 2007 through August 2008
- Designed and implemented a web application for managing chemical biology research on a team of 8 developers
- Collaborated with chemists and chemical screeners to determine requirements and to improve user experience
- Extended Acegi Security to support internal authentication infrastructure
- Implemented AJAX to provide a better user experience for long user tasks, reducing user errors
- Evaluated, implemented, and embraced new technologies to improve product quality, developer productivity, and developer sanity
- Presented technical talks to the Broad developer community: Java on Gentoo Linux and You, Me, and Acegi
Gentoo Linux
Gentoo Linux is a special flavor of Linux that can be automatically optimized and customized for just about any application or need. Extreme performance, configurability and a top-notch user and developer community are all hallmarks of the Gentoo experience.
Public Relations Project (Volunteer), January 2008 through July 2008
- Researched and implemented changes to the project, resulting in improved community relations, openness, and transparency
- Drafted announcements for Gentoo's front page
- Documented best practices for the project, such as for writing accessible announcements
Ruby Project (Volunteer), December 2006 through July 2008
- Provided online support for users on the #gentoo-ruby IRC channel
- Triaged and resolved Ruby bugs filed with Gentoo's Bugzilla
- Maintained widely used Ruby packages including: Rails, Capistrano, and RSpec
Java Project Lead (Volunteer), January 2006 through June 2007
- Oversaw recruitment and training of potential developers, doubling the team size
- Collaborated with other open source leaders to promote and facility Linux as a Java platform
Java Project (Volunteer), Spring 2005 through June 2007
- Drastically improved Gentoo Linux as a Java platform
- Worked online support for users on the #gentoo-java IRC channel and gentoo-java mailing
- Triaged and resolved Java bugs filed with Gentoo's Bugzilla
- Contributed up bug fixes and enhancements to upstream projects
- Wrote documentation including end-user documentation and developer documentation
- Maintained many widely used Java packages, including: Eclipse, Groovy, JRuby, Maven, Ant, Spring, Hibernate
R. R. Donnelly (formerly Banta Internet Solutions), Cambridge, MA
RR Donnelley is the world's premier full-service provider of print and related solutions.
Software Engineer (Full-time), August 2005 through June 2007
- Developed and maintained web-based CMSs and marketing tools
- Collaborated with the architect and senior engineers for project planning and high level design
- Evaluated, implemented, and embraced new technologies to improve product quality, developer productivity, and developer sanity
Scientific Computation Research Center, RPI, Troy, NY
The Scientific Computation Research Center is an organization focused on the development of reliable simulation technologies for engineers, scientists, medical professionals, and other practitioners.
Assistant System Administrator (Intern), Fall 2001 through Spring 2005
- Supported approximately 30 workstations, 20 servers, and 4 clusters.
- Compiled, configured, and installed various open source products for Linux, Solaris and IRIX.
- Worked towards improving the network's infrastructure to facilitate the administration of the network.
- Configured and maintained several network services including: IMAP, SMTP, mailing list, web hosting, network monitoring, centralized logging, and automated installations.
Education
Rensselaer Polytechnic Institute, Dual B.S. Computer Science and Psychology May 2005