Configuring git credentials for CodeCommit and other repositories

Working with AWS CodeCommit repositories in addition to other repositories from the same git configuration can be a challenge depending on your git configuration. I’d like to share an approach that works for me when using HTTPS (instead of SSH keys) and hopefully it will be helpful for you, too. I will be describing a solution that works for macOS and MSFT VisualStudio Code.

The solution I’m going to suggest works with using static CodeCommit credentials from the Credentials section of your AWS IAM user object. An alternative solution is to use IAM keys with CodeCommit, but it requires credential helper configuration with the AWS CLI in order to handle the dynamics of IAM session management. With static git credentials, there is no need for AWS CLI integration. Either solution also requires adding a section to your $HOME/.gitconfig file for the credential being used.

I have found that it helps to break out different repositories (CodeCommit, GitLab, GitHub, etc.) into separate .git-credentials files for HTTPS access. This is because the git-credential helper logic is sensitive to credential sorting by top-level domain. For example, I’ve tried using a single .git-credentials file for multiple repos at the same top level domain and it works when a single set of credentials is used for access to all the repositories. However, if I have different credentials (e.g., different personal access tokens in GitLab) for different repositories within the same top-level domain, problems arise.

After you’ve generated your CodeCommit credentials in the IAM console for your IAM user, you will need to configure a .git-credentials file in your home directory for the repository(ies) you want access to via git commands. Let’s say we are working with a repo called “awesome-microservice” in us-east-1. Here’s what the HTTPS git credential string looks like:

https://jsmith-at-012345678912:somesuperdupersecretstring@git-codecommit.us-east-1.amazonaws.com/v1/repos/awesome-microservice

Next, store this string in a file, I might name it something like:

$HOME/.git-credentials.awesome-microservice

Now that you have your credential file, we need to tell the git binaries where to find the credential. Create/update your $HOME/.gitconfig file with a new credential section for your CodeCommit credentials:

[credential "https://git-codecommit.us-east-1.amazonaws.com/v1/repos/awesome-microservice"]
  helper = store --file /Users/jsmith/.git-credentials.awesome-microservice

 
At this point, you should be able to clone down your repo from CodeCommit without having to input a username/password. If you have other repos to access, in CodeCommit or someplace else, you can repeat these steps if you use HTTPS and static git credentials to connect to those repos.

Something Wicked This Way Comes?

I’ve been through multiple ‘reboots’ in my tech career. I started out initially doing networking on PCs (DOS/Win/Mac/Linux): installing network interface cards (NICs), configuring them, and then moving on to the next machine. It was a very dynamic and transformational period in the tech world. Then, I became a “systems administrator”, which at the time in the mid-90’s connoted a high degree of skill with UNIX and UNIX-like operating systems used for distributed applications like the brand new “World Wide Web” among other things. At the time, I was as enthusiastic as ever, but also now a bit seasoned, which gave me just enough bravado to advance my career into the private sector (after starting out exclusively in academia and higher-ed).

That was the biggest jump I’d experienced in my nascent tech career path, at a time when companies were hiring ALL THE PEOPLE who had remotely even heard something about this thing called the Internet. I was lucky, I had both knowledge and experience. As a grad student, before deciding to become a computer professional, in addition to reading journal articles for my doctoral program of study, I’d print out and read Internet RFCs. This was back when there were less than a few hundred or so of them, and you recognized a lot of the authors’ names because they were writing them all. This Internet thing was undeniably super cool and, like many of my peers, I could already sense that it was going to the world dramatically.

A few years before, I recall having a conversation with my major professor at one point (before leaving the academic world for the computing world), and describing to him this thing called The Well and proposing that we do some research into the influence of computer networks on human social behavior. Mind you, this was around 1990, still several years before the average person owned a computer or means of connecting to the Internet from home. I was envisioning an advanced network of social communication not unlike today’s social media platforms (my chosen field of study was called social cognition, and I focused on impression formation and attitude formation).

I think he laughed at me, because neither one of us at the time had any real grasp of the concept at that time. I had been on the Well, and using BBS’es a lot, was starting to explore FTP and Archie and Gopher. And, I had heard of HCI research (Human Computer Interaction) but was imagining something else as a field of study that didn’t exist, with an added social element that I thought was completely intriguing to consider.  He didn’t really know where to go with my idea, so we dropped it and decided that it was really outside of what we were studying and trying to accomplish with our respective lines of research.

After making the jump to the private sector and privatization of Internet “sites”, most of my work was still in the sysadmin realm, but soon expanded to include both storage and networking on an enterprise scale. I wore whatever hat needed wearing on a given day in a given workplace, and grateful that my self-taught tech skills seem to allow me to progress in my career without many obstacles. As time marched on, problems with staffing became more of a scale issue: hire more and more engineers to get the infrastructure substrate solid and support application development faster.

Around 2005 or so, I began noticing a trend of people applying for systems and software engineering jobs with little to no experience and deep understanding, but plenty of technical training and certifications. Positions were starting to go unfilled but, also many new hires were simply not up to speed and often caused more problems than they solved. It didn’t matter what industry or the company, no workplace was immune to this problem it seemed.

Now, it seems this problem has become both chronic and more acute. I believe that’s part of the rise of DevOps. DevOps itself has resisted hard definitions, but I tend to think of it as basically adopting a more agile methodology for software development projects and infrastructure operations, by tapping people to do both development and operations work. It’s been embraced by organizations large and small, but it still remains early. And finding a good devops engineer, one that can effectively traverse the trees as well as the forest, so to speak, is even more difficult. Positions are still going unfilled and orgs are turning to trying to coach and train existing staff, to get them “up to speed” on devops practices.

And, lo and behold, guess what? It’s not working,  except in the rarest of occasions in my experience. Yes, devops is expanding, but it’s dragging a lot of legacy cruft with it and dilutes the gains of the paradigm shift. I just recently read this article by James Beswick in which he describes almost exactly the same story I’ve been sharing with you here, but with the backdrop of cloud adoption and the role that automation is playing in eliminating human labor. It’s a real thing, I see it happening in many places.

To the extent that the cost-reduction process is the prime driver for cloud adoption, it’s natural to expect some sort of fallout like eliminating staff. However, and this is where my concern spikes a bit, companies seem to be risking the gains offered by devops in the cloud by an attitude of “Well, we can’t find the right person, so let’s just have our existing staff to get training and certifications and all will be well”. In all but the rarest of situations, that isn’t working. Many people are not able to make the jump, and it drags down the entire org which is still managing to operate in the field of the devops/cloud paradigm without realizing the essential benefits (streamlining of workflows, automation and code testing, rapid cut/release cycles, etc.).

It remains to be seen if we are simply at the bottom of the S-curve as Beswick suggests; it’s a compelling analogy, one that’s worked to describe technological bifurcation in the last 40 years or so. But, we clearly have a definite problem in terms of a lack of skills and capabilities in the existing labor supply that is being magnified with an insatiable demand by orgs to move faster and better daily. Is it simply a waiting game, whereby more automation will eventually fill in the skilled labor gaps? Machine learning and state machine processing can effectively replace many humans. Data mining is occurring on a level that is mind-boggling and, to be honest, downright frightening.

For example, Michael Bloomberg’s recent massive investment in a data mining approach to controlling the electorate does not bode well for you and I. While I applaud the goal, I am deeply afraid of the implications for what is left of consumer (i.e., voter) privacy.  Where are the conversations about ethics and individual freedom and privacy? Where is the EFF or other orgs that normally care about such things? Have we all just given up any and all pretense of having privacy in this new world of data mining and individual profiling? We know where this can go, look at China’s recent ratcheting up of control over its citizens via massive investments in data-mining and facial recognition.

From a purely technological perspective, it’s nothing short of amazing how advances in cloud computing are removing barriers, sometimes daily, to processing what has always seemed like an infinite sea of data. Yet, at the same time, I am starting to feel a bit like how I think Bill Joy must have felt in the early 2000’s when he expressed regret over contributing indirectly to the rise of nanotechnology and what that meant for the future of humankind. My fearfulness is very real and legitimate, but at the end of the day, I don’t feel like I can stop the machine. I hope we survive this new period of even more rapid technological advancements in our culture and learn from it.

GitLab Workflow and VSC

I work almost exclusively these days in Visual Studio Code, for reasons blogged about previously. I also primarily use GitLab for SCM and today I discovered a nifty VSC extension, GitLab Workflow.

To use GitLab Workflow, simply add configuration to your user settings for your GitLab instance URL (if self-hosted) and a GitLab personal access token. This configuration is done automatically when you launch VSC the first time after installing the extension.

Some of the immediate benefits I’m realizing with the extension are:

  • ability to create a merge request (MR) for your current working branch,
  • ability to quickly inspect a CI pipeline via status bar indicator and then a simple click and select to open the pipeline in your browser
  • sidebar navigation for GitLab including issues and MRs

I love working in GitLab and VSC and now, especially with some custom keybindings in VSC, I am able to work even more productively within the VSC editor window without needing to switch over to my browser.

Empathy In The Last Place You’d Expect It

For *nix tech friends, who’d have thought we’d ever see this day?

“I am going to take time off and get some assistance on how to
understand people’s emotions and respond appropriately.”

               — Linus Torvalds, LKML, Sun, 16 Sep 2018 12:22:43 -0700

I’ve always chalked up Linus’ poor empathy skills to the demands of maintaining this amazing juggernaut he created and gave to the world freely. This was especially so in the early days when he was regularly on defense from others trying to do the same thing as he was, others who were older, more experienced, more respected, etc. (e.g., Tanenbaum, rms, the FreeBSD crew). He was but a young grad student, albeit an extremely industrious and clever one at that. But, he always has had a problem with empathy, and it was always just part of the landscape, perhaps even a “rite of passage” if you will, for becoming a kernel developer (I never was a kernel dev, but as a well-rounded system administrator and free software zealot, I followed the LKML closely for its first decade).

This was, of course, long before the concept of empathy ever entered into the fabric of professional technologists. Some of Linus’ infamous attacks on others who were helping him as volunteers were mind-blowingly abusive and the stuff of hushed conversation at conferences and banter at Linux User Group functions all over. In the late 1990’s, I was doing pro bono work for rms and the GNU Project, first as a volunteer evaluator then as the first coordinator of GNU software evaluators (all volunteers). We worked directly with Richard via email/IRC to review software submitted by developers to be considered for membership in the GNU operating system.

And, while rms could be highly challenging to work with/for, and he was somewhat regularly, he was never abusive. He pushed us hard and could be as stubborn as a mule, but he was always fair and, ultimately, respectful if you had a logical position that was different from his. I wouldn’t say, however, that he was ever “empathic”, because again… it wasn’t a concept that was part of the fabric of FOSS development at the time. It was a very different period in the history of Internet technology. You had to have fire-proof pants to be involved with a lot of what was going on at the time (remember flame wars?).

So, his latest LKML posting reveals a different Linus now, one whom I don’t think I ever expected to see, and it’s promising to see this evolution (if that’s what it is) and what it may mean for the future of his interface with the kernel development community. I’m glad to see this day come and hope it makes Linux even more successful than it already has been. Good luck, Linus!

Pythonic Moments

I’ve been heads-down with development work for months it seems, a lot of it in Python, and found a couple of great resources recently that I wanted to share.

If you code in Python and haven’t already heard of Dan Bader or his book, “Python Tricks: A Buffet of Awesome Python Features“, check both of them out. Dan has a series of helpful YouTube videos as well, some of which cover Python Tricks material.

If you listen to podcasts, check out Michael Kennedy‘s “Talk Python To Me” series. He also does a more concise, headline-oriented podcast called “Python Bytes

Huzzah, I’m AWS Certified!

I passed the AWS Certified Solutions Architect – Associate exam on 12/11 and the AWS Certified Developer – Associate exam on 12/15. Woohoo!

awsCertSolArchAssocawsCertDeveloperAssoc

AWS re:Invent 2017 Download

I had a good time at AWS re:Invent 2017 last week, despite being sick as a dog for most of it. Though I caught fewer sessions than I would have liked, the ones I did attend on serverless topics were top notch. Here are some links to my favorites:

ARC401 – Serverless Architectural Patterns and Best Practices

Highlights:

  • Serverless foundations
  • Web applications
  • Data Lakes
  • Stream processing
  • Operations automation (e.g., Tailor, for automating AWS account creation)
  • Excellent review of best practices and new features in Lambda

SRV401 – Become a Serverless Black Belt: Optimizing Your Serverless Applications

Highlights:

  • Optimization Katas
    • Lean Functions
    • Eventful Invocations
    • Coordinated Calls
    • Serviceful Operations
  • Cold start issues in Lambda
  • Instrumenting Lambda with XRay
  • Resource allocation
  • Concurrency vs. latency
  • Compelling customer story from ACloudGuru’s VP of Engineering on going 100% serverless

SRV305 – What’s New in Serverless

Highlights:

  • Announced Serverless Application Repository
  • Reviewed new Lambda console
  • Reviewed new Lambda features
  • Reviewed Cloud9 IDE
  • Reviewed XRay tracing for Lambda
  • New API Gateway features
  • Compelling customer story from FICO’s VP of Engineering

Integration Gotcha: SNS and Lambda

When using SNS pub/sub components, a common integration pattern is to use Lambda to process SNS messages. This can include the use of data blobs as the SNS payload for doing file processing, data transformations, and archiving data in S3 among other things. SNS messages have a large payload limit of 256KB per message, but I recently ran into a situation where I could not reliably deliver messages that were sized well under that limit.

As it turns out, when Lambda is consuming your SNS large payloads in events, you hit a limit within Lambda that is exactly half of the SNS payload limit. For Event (asynchronous) invocations in Lambda, there is a 128KB payload limit. So, if your SNS messages are not being processed by Lambda, check the size of the messages and verify that they are below 128KB. This was a confusing problem until I looked at the CloudWatch console for SNS message deliveries and noticed the errors there.

 

 

Patterns for Kinesis Streams

I’ve recently been working on a streaming component in a project and have been spending a lot of time with both Kinesis Streams and Kinesis Firehose. I tend to think of the two as event queue frameworks, with Firehose also having the ability to forward events to other AWS services like ElasticSearch (for Kibana-style dashboarding) and backup the same data to a S3 bucket. If you don’t need either of those destinations, then most likely you will get plenty of mileage out of working with Streams alone.

Potential uses abound, but one powerful pattern is making Kinesis a destination for CloudWatch Logs streams via subscription filters. By creating a Kinesis stream and making it a CloudWatch log destination in one account, you can readily add CloudWatch subscription filters in other accounts to create a cross-account log sink. Once your CloudWatch Logs are in one or more Kinesis Streams shards, you can process that log data via Lambda and/or possibly forward to Kinesis Firehose for ES/S3 delivery. There’s a great blog post over at Blend about this exact sort of usage, including a link to their GitHub repo for the CloudFormation templates they use to build and deploy the solution.

One of the best overviews I’ve read recently about design and scale-out issues around event queue processing and how Kinesis resolves, by design, a lot of the challenges therein (e.g., data duplication, ABA problems) is by the fine folks over at Instrumental, entitled “Amazon Kinesis: the best event queue you’re not using“. If you are considering using Kinesis at scale, or are already designing/deploying a consumer/producer pattern to be used with Kinesis, I highly recommend you check out the Instrumental blog post.

 

Python’s logging() module in a boto3/botocore context

Python’s logging module provides a powerful framework for adding log statements to code vs. what might be done via using print() statements. It provides a system of logging levels similar to syslog-style levels that can be used to produce both on-screen runtime diagnostics as well as more detailed logs with full debug level insights into per module/submodule behavior.

Managing usage of logging() can be complicated, especially around the hierarchical nature of the log streams that it provides. I have developed a simple boto3 script that integrates logging to illustrate a basic usage that is easy to adopt and, in the end, not much more work than using print() statements. For detailed information on logging beyond what I present here, consult the excellent Python docs on the topic, as well as the links in the References section at the end of this post.

Logging Configuration

The setup for logging() that I am using involves two configuration files, logger_config.yaml and logger_config_debug.yaml. The difference between the two files has to do with the log levels used by the log handlers. By default, the example module deployVpc.py uses the logger_config setup. This config will produce no screen output by default except at the ERROR level and above. It produces a log file, however, that contains messages at the INFO level for the module and at the WARNING level for boto-specific calls.

Note: boto (including botocore) ships with some logging() active at the INFO level. While not as detailed as DEBUG, there’s enough busyness to that level of logging by boto that you will likely want to not see its messages except when troubleshooting or debugging your code. This is the approach I took with the current configuration, by opting to set custom logger definitions for boto and friends, so that the root logger will not by default display boto’s native log level messages.

Let’s take a look at the default logging configuration file I’ve put together, logger_config.yaml:

---
version: 1
disable_existing_loggers: False
formatters:
  simple:
    format: "%(asctime)s %(levelname)s %(module)s %(message)s"
  fancy:
    format: "%(asctime)s|%(levelname)s|%(module)s.%(funcName)s:%(lineno)-2s|%(message)s"
  debug:
    format: "%(asctime)s|%(levelname)s|%(pathname)s:%(funcName)s:%(lineno)-2s|%(message)s"

handlers:
  console:
    class: logging.StreamHandler
    level: DEBUG
    formatter: simple
    stream: ext://sys.stdout

  screen:
    class: logging.StreamHandler
    level: ERROR
    formatter: fancy
    stream: ext://sys.stdout

  logfile:
    class: logging.handlers.RotatingFileHandler
    level: DEBUG
    formatter: debug
    filename: "/tmp/deployVpc.log"
    maxBytes: 1000000
    backupCount: 10
    encoding: utf8

loggers:
  boto:
    level: WARNING
    handlers: [logfile, screen]
    propagate: no
  boto3:
    level: WARNING
    handlers: [logfile, screen]
    propagate: no
  botocore:
    level: WARNING
    handlers: [logfile, screen]
    propagate: no
  deployVpc:
    level: INFO
    handlers: [logfile, screen]
    propagate: no
  __main__:
    level: INFO
    handlers: [logfile, screen]
    propagate: no

root:
  level: NOTSET
  handlers: [console, logfile]

I chose to use YAML for the configuration file as it’s easier to parse, both visually and programmatically. By default, Python uses an INI file format for configuration, but both JSON and YAML are easily supported.

At the top of the file is some basic configuration information. Note the disable_existing_loggers setting. This allows us to avoid timing problems with module-level invocation of loggers. When logging per module/submodule, as those modules are imported early in your main script, they will not find the correct configuration information as it’s yet to be loaded. By setting disable_existing_loggers to False, we avoid that problem.

The remaining file consists of four sections:

  • formatters
  • handlers
  • loggers
  • root logger definition

Formatters

Formatters are used to define the log message string format. Here, I am using three different formatters:

  • simple – very simple and brief
  • fancy – more detail including timestamp for a helpful log entry
  • debug – fancy with module pathname instead of module name, useful for boto messages

By default, I leave simple for the console handler (for root logger), use fancy for the screen handler, and debug for the logfile handler.

Handlers

Handlers are used to define at what level, in what format, and exactly where a particular log message should be generated. I’ve left console in its default configuration, but added a StreamHandler and a RotatingFileHandler. Python’s logging module supports multiple types of handlers including Syslog, SMTP, HTTP, and others. Very flexible and powerful!

  • console – used by the root logger
  • screen – log ERROR level and above using fancy formatting to the screen/stdout
  • logfile – log DEBUG level messages and above using debug formatting to a file in /tmp that gets automatically rotated at 1MB and retention of 10 copies

Loggers

Loggers are referenced in your code whenever a message is generated. The configuration for a given logger is found in this section of the configuration file. In my case, I wanted a separate logger per module/function if necessary, so I’ve made entries at that level. I also include entries for boto and friends so I can adjust their default log levels so I don’t see their detailed information except when and where I want to (i.e., by logging at WARNING instead of INFO or DEBUG for normal operation). A logger entry also defines where log streams should end up. In this case, I send all streams to both my screen handler and my logfile handler.

I also don’t want custom loggers to propagate messages throughout the logging hierarchy (i.e., up to the root logger). So I’ve set propagate to “no”.

Implementing logging in code

Setup

I created a module called loggerSetup.py which is where I do the initialization for defining how logging() will be configured, via the configuration files:

#!/usr/bin/env python
"""Setup logging module for use"""

import os
import logging
import logging.config
import yaml

home = os.path.expanduser('~')
logger_config = home + "/git-repos/rcrelia/aws-mojo/boto3/loggerExample/logger_config.yaml"
logger_debug_config = home + "/git-repos/rcrelia/aws-mojo/boto3/loggerExample/logger_config_debug.yaml"

def configure(default_path=logger_config, default_level=logging.DEBUG, env_key='LOG_CFG'):
    """Setup logging configuration"""
    path = default_path
    value = os.getenv(env_key, None)
    if value:
        path = value
    if os.path.exists(path):
        with open(path, 'rt') as f:
            config = yaml.safe_load(f.read())
        logging.config.dictConfig(config)
    else:
        logging.basicConfig(level=default_level)

def configure_debug(default_path=logger_debug_config, default_level=logging.DEBUG, env_key='LOG_CFG'):
    """Setup logging configuration for debugging"""
    path = default_path
    value = os.getenv(env_key, None)
    if value:
        path = value
    if os.path.exists(path):
        with open(path, 'rt') as f:
            config = yaml.safe_load(f.read())
        logging.config.dictConfig(config)
    else:
        logging.basicConfig(level=default_level)

This module defines two functions: configure() and configure_debug(). This provides another way of running a non-default logging configuration without using the LOG_CFG environment variable (i.e., on a per-module basis). When you setup logging in your module like so:

loggerSetup.configure()
logger = logging.getLogger(__name__)

You would simply edit the first line to use .configure_debug() instead of .configure().

 Usage

Usage is straightforward, simply do the following in each module you wish to use logging(). Refer to the deployVpc.py script for the full syntax and usage around these bits of code.

Note: deployVpc.py requires use of AWS API key access that is stored in a config profile (I used one called ‘aws-mojo’, change to your own favorite profile). It will create a VPC and Internet Gateway in your AWS account. But it will also, by default, remove those objects as well. Caveat emptor…

  1. Import the logging modules and loggerSetup module
import logging, logging.config, loggerSetup
  1. Activate the logging configuration and define your logger for the module
loggerSetup.configure()
logger = logging.getLogger(__name__)

Note: By using __name__ instead of a custom logger name, you can easily re-use this setup code in any module.

  1. Add a logger command to your code using the level of your choice:
logger.info('EC2 Session object created')

That’s all there is to it. Below are some screenshots that show the handler output (screen and logfile) for both the default and debug configurations. Hopefully this will encourage you to look at using Python’s logging() framework for your own projects.

The full source for all of the logging module configuration as well as sample boto script is available over on GitHub in my aws-mojo repository.

Screenshots

Example: Default configuration – output to screen handler (should be no output except ERROR and above)

Default screen handler output

Example: Default configuration – output to logfile handler (should be messages at INFO and above for your code and at WARNING and above for boto library code messaging)

Default logfile handler output

Example: Debug configuration – output to screen handler (should be messages at INFO and above for your code and at WARNING)

Debug screen handler output

Example: Debug configuration – output to logfile handler (should be messages at DEBUG and all levels for your code and boto library code messaging)

Debug logfile handler output

References

Continue reading