AWS re:Invent 2017: Architecting Security and Governance Across a Multi-Account Stra (SID331)



welcome to reinvent 2017 thank you all for being here thank you for taking the time I'm sure there's plenty of things you could be doing and you're here and thank you for coming right after a holiday weekend hopefully your families are ok I'm guessing if you're here you've heard about AWS you've heard about accounts and you're thinking about how do I set up accounts within AWS or maybe you've already set up some and something went wrong somewhere hopefully not but if it has how do I think about this what's the framework around thinking about AWS accounts so we'll go on a little adventure imagine for a moment that you are Little Red Riding Hood you're living in the forest you go visit grandma sometimes you run across the wolf sometimes you don't and you heard about this cloud thing at first you were confused but then you figured out what it was you can host servers you can deploy them anywhere around the world within minutes and you can set things up on your ready to go and you started thinking well grandma makes these some amazing cookies and amazing pies and I'm going to start selling them online so you set up your database account you're in your little red cape hunched over your laptop typing away and you built this whole application and before long you set up a site you've got the basket people add the pies and the cookies they want you ship them out you send it over never mind there's no postal service in the forest but you manage to figure it out the money starts coming in profits grow you start making money and you start selling all of this then the Seven Dwarves hear about this and they're also interested they go mine for gold they decided maybe we can sell some gold online too so you give them the credentials they log in they sit there write their code and they also start making money and they're so excited about they even start selling t-shirts that say I love the cloud so if you're wearing any of these that's where they came from but one day sales stop coming in and you're wondering why you let it go for a day you figure maybe people are in a holiday they're not ordering but they're not there and on top of that you get a note form a to be a security that tells you there's Bitcoin mining going on in your account so of course you suspect the dwarves you go ask them and of course they decided well we do gold more gold mining let's try this Bitcoin mining thing and you also find out that your servers are no longer up because they went in and accidentally killed something you didn't have cloud trail enabled so you didn't have that to stay p.i logs what was going on within the account you have no idea what happened or why but then you decide that's it I'm gonna create a separate account for them I'll keep my stuff in my account selling the pies and cookies give them their account enable cloud trail so at least down the line I know what's going on and things start moving along your sales start growing things start happening and money is coming in again and then sales drop again now you know it's not them because they've got their own account and you start looking into it and trying to figure out and before long you get a note from the Evil Queen that she has your account because you had your credentials you were sharing them you're using the route account and she got access to them so she blocked you from the account she asked for all the apples that you have because she wants to poison them to get Snow White of course you being Red Riding Hood you're a good person you decide not to give him so she burns everything down within the account deletes all the order history cloud trail logs are gone they're all sitting within that same account and you didn't even have time anymore to go visit Grandma because you're bankrupt so how do we live happily after after in this and as we start seeing customers from red riding-hood to large enterprises how do you start thinking through the accounts and what do you get at the end of this session we're looking to give you a multi account enterprise ready framework to think through AWS so that you don't also end up with no money and bankrupt and somebody is stealing all your information and an action plan to implement that approach for those of you don't know an AWS account is an account you set up you log in you have an email on a password and you set up users and servers and instances it is the highest level of isolation within AWS meaning that if I set up an account and you set up an account that level of separation and isolation is the same as you and someone else within the same company it is truly separate it is a bit then from an API limits perspective or API throttling its again down to that account that is the highest level of isolation there and from a billing perspective so there are applications for the Seven Dwarfs we're doing a lot of network traffic all those transfer charges and things that you couldn't tag you don't have that separation within a single account so that's the only true billing separation that you have and when we start working with customers oftentimes you'll start with a single account I started with a single account when I started a database for example and another end of the spectrum we have customers that have thousands of accounts 30,000 40,000 there's customers that have thousands of accounts but to get from one account to thousands of account the level of automation discipline maturity and approach to it is different and how do we get to potentially that model now not everybody will be in the thousands of accounts so it does it make sense for everyone but how do we build the fundamentals most customers are somewhere in between and the complexity when I go from one account to more starts to get more difficult and more complex in terms of two aggregation distribution of the data once I have it for example billing accounts but working within a single account can be dangerous people might override one another just like they went in and killed her instances something like that could happen or if somebody manages to leak credentials or accidentally checks in something into github having that limited blast radius you might have multiple teams different areas of responsibility different business model you need different isolation whether it's because they have different levels of data or there's a subject of compliance program that they need to work with maybe a different set of security controls perhaps I need everything to be encrypted in that one account because it's hosting highly confidential data but another one is a front-facing web application it's public information still protected of course but I don't need every single thing to be encrypted I'm serving it to the public out you might have completely different business processes gold mining and Bitcoin mining is very different from selling pies or selling cookies and again finally building isolation being able to know what everyone spent where when how down to that individual thing so I'll turn it over to Ben from Thomson Reuters to talk about their story and their approach for the multi account thank you Sam so I'm here today to talk to you about the journey that Thomson Reuters has been on with regards to our multi count strategy so if you didn't attend this session last year you may not know that this is actually the second time we're presenting a multi account strategy so if after this session you want to go out there and consume more knowledge about how we've gone through this journey if you've gone to YouTube and type in sa c3 1:9 you'll be able to get up last year's presentation and find out more information about that but this is by no means a prerequisite for this presentation so in this session I really want to focus on actually the changes we've made over the past year and the lessons we've learned and actually talk through the key factors that have driven us sometimes to create new accounts and at times actually not create new accounts so a few high-level bullet points around Thomson Reuters we are a global organization we operate more than 100 countries we provide information and products to legal tax financial professionals and you also may notice for reuters.com witches our news and media division which is the world's largest international multimedia news provider and there's a bit of a shameless plug as you'll see on the screens they're actually presenting later this week they've actually migrated Reuters calm into edible dress in a highly available multi-region setup so if that seems interesting to you then I would recommend you tag along so we actually have five business units including Reuters that makes up Thomson Reuters and then we have a centralized technology team there really spans across them and we have about 12,000 technologists within our organization today so 12,000 dwarves that are trying to get access to crate resources in our account so that's really the scale that we're trying to operate on so before we actually decided to create any accounts the first question really asked ourselves was how do we want to establish connectivity between our AWS accounts back to our on-premise data centers so like a lot of organizations we knew that we need a direct connect to really provide us with that consistent network performance and bandwidth for our applications required so we went ahead and implemented the hub and spoke model which is fairly traditional so we provisioned multiple direct connect connections into a centralized a degress account then from that we create spokes in the forms of private virtual interfaces to allow those accounts to consume those direct connect connections we set out by actually provision those connections into the region's appeared quite closely with our existing data centers and that was really to keep latency to a minimum so whilst we're living in this hybrid world their applications could talk back to our data centers a lot quicker they're really since we'd set up that network connectivity we've really had a lot of requests come come through from our business units in two general areas the first is actually address regions so having for asking for additional region support so additional Direct Connect connections into new regions we didn't initially set up set out with and these were for reasons such as data residency so new laws are coming through all the time about where data must live maybe new growth opportunities so actually deploying our products into new markets and actually for latency requirements too so we're 80s may have a region that we actually don't have an existing data center we can take advantage of that to get closer to our customers so this is really driving us to look at actually how much each of these components can scale because on the other side of things we've also got the ADA breast account so actually having requests for new accounts for things such as project isolation so there may be a typical compliance workload or a data sensitivity reason that we just want to segregate that workload into its own account to better isolator things such as API limits so if they're a new application comes along that is going to be hammering one of the Amazon API is we may make that choice to move that into its own account so we don't hit those limits and the final one is actually billing separation so the only way to actually see all the costs for an application is by taking it into its own account you can't see things such as some of the networking cost still non taggable resources the only way you can actually encapsulate all those costs is by moving it into its own account so again this is telling us how to how we can scale these components so looking at things such as how many data centers can we hook up to a DBS how many regions can we sport how many direct connections connect connections can we have and actually how many accounts can we create that can utilize this connectivity so on this aspect of scalability if i'll stria you pick out one thing that's keeping us from ballooning our number of accounts it's really the network connectivity requirements or accounts have and that's really because of some of the hard limits that exist within Amazon today so I'd like to walk through some of the ones that we're finding that we keep running into the first one is those private virtual interfaces so the things that actually allow a VPC to consume that direct connect connection you can currently only have 50 accounts that can use each of those product those in direct connect connections so actually can't scale past that without provisioning new lines again another one is actually VPN connection so if you are using VPNs to establish connectivity back to our data centers are actually to our customers again there's a limit on how many of these we can have and finally VPC peering so this allows you to create private network connectivity between V pcs so this is important for us where we're wanting to our applications across to shared services our security tooling that's hosted in another V PC that we want to pair it with again there is a hard limit on how many times you can do this so this really keeps us from scaling as much as we really want to and it's really driving us to look at how we can scale past these limits kind of what can we do to get around them so we can actually evolve our strategy and create new accounts on demand so we're locate things such as provisioning new Direct Connect connections looking at new features such as the dark connect gateway which has just been released using VPNs in conjunction with Direct Connect so we where would they don't have a requirement on that service and actually looking at whole new network topologies so moving away from the Hubbins boat model and actually introducing things such as transit V pcs and mesh networks and actually asking our business units whether they can actually use the public internet more because as soon as you start using the public internet you don't actually need some of these components so it removes them as a limitation so we're really looking at all these options at the moment but each of them come at their own overheads they come their own costs and actually some of the limitations but it's something what we're trying to do to actually investigate how we can scale our networking to support additional accounts because what we don't want to do is become that bottleneck to teams where when a legitimate request comes through to actually for a new account or for a new region we can actually caitiff them so staying that one step ahead so that we can set up the networking as it's needed so once we had a general idea of how we wanted to establish that network connectivity and had laid out the groundwork it was really the next question was how do we want to create new accounts so before last you have reinvented the only way to do this was actually to go into the console manually and doin that she typed in the details but thankfully since then Amazon's released organizations so this now allows us to create new accounts programmatically by consuming the organization's API so really a wanting to move towards this model now and the workflow that were writing will look a little bit like this so the first step is actually used the organization's API to create that new account as part of the organization's provisions across account role into that new account we then assume it to have permissions to perform actions in it will then inflate that account so bootstrap it with actually the actual foundational networks that we need so provisioning things such as the V PC subnets root tables etc what we classify as a foundational layer we then move it into an organizational unit so within organizations you can represent your accounts in a topology view so you can group them by business unit by environment and I'll talk about why that's important in a second and the final step we go through is actually delete that organization's role is a very privileged role that it creates after we've inflated that account we want to remove it because it's no longer needed so this is really trying to get us to the point where we can do self-service account creation that we can do things in a repeatable and consistent way that we don't have to have someone going to do it manually and is prone to error it's also making us look at the on-premise dependencies involved so actually although we can automate this process in Amazon they're actually on-premise dependencies are involved such as IP management so when you create a new V PC you need to mark your management IP space allocating you need to an email address register in it so you'll need to your SMTP server to dish that out there's a lot of dependencies involved so if it takes us two weeks to get a new email address to allocate to account it doesn't matter how automated it isn't it AWS we need to focus on actually all those dependencies – and that's where we're really focusing our efforts now and then the other reason we're using organizations is to take advantage of what's called service control policies so like I said there's organizational topology view that you can represent is your accounts and what this allows you to do is do overarching black lists and white lists of iron permissions to say that all our accounts under the root level or under a business unit can't perform certain actions so we're really wanting to use this for things such as disabling cloud trail we don't want anyone to turn off logging so we can use service control policies to actually enforce that so even if someone gives us someone permission in the account itself through ion the SCP overrides it so it's just that another security control that we can use that Amazon's offering us so turning water towards the inflation process so what we do when we inflate that account the first thing we do is actually do vault those root credentials so the vault the root user is created as part of that process we want to vault them because they are very political privileged and really for break glass situations we then create a set of service management records so we have an internal itinerary of that new account so you can raise changes against it we then set up Federation so we federally access into our accounts so we have to set up that and we also set up and provision a set of operations roles to allow us to log into that account we did do the V PC and networking set up so we actually create the V PC the subnets root tables etc and optionally do things such as provision Direct Connect and V PC peering if it's required and then the final step is actually to layer in those security controls and set up logging so we turn on that login to get them all sending to an account we then also deploy a set of iron rules so a security iron rule for a security account and also a set of custodian rules and I'll talk about what they are in a later slide so things we've kind of learn along the way the first is to use a workflow tool so not all our accounts look the same our sandbox account looks very different to our production account as you can imagine so having a workflow tool to allow us to pick and choose what is deployed into that account is extremely useful it's giving us that flexibility to say actually how an account should look if an account doesn't require Direct Connect or PPC peering in a case of a sandbox then it's an optional set that we can just turn off and because of this this is allowing us to build up CloudFormation dynamically we're not working with static templates that we have to go in hard code changes every time a new AZ comes out that we want to take advantage of because the workflow tool allows us to kind of pick and choose what happens we can then build up that confirmation dynamically version it and then we have a represented view in version history of how that account has changed and the thing we're really trying to focus on now is making its configuration driven as possible so that inflation phase is all driven through a single configuration file again that's to stop us going in manually making bespoke changes we can just set the configuration file to be slightly different the workflow tool will pick that up and then progress with it and inflate at how we want and this inflation process is really there to again get consistency across our accounts we really want one production account to look identical to another we don't want them looking different again this could be for a security posture point of view we don't want some security controls to be in one account and not to be in another having an automated process that's driven by configuration we can make sure that those accounts look so once we had that process to actually create new accounts and inflate them it really enabled us to start building out our enterprise accounts so I'd like to talk through some of the ones that we've created the first is what we call a logging account and this is one we didn't have from day one so this is used to store things such as cloud trail logs VPC flow logs and s3 bucket access logs and these this is really to provide us with the ability to see who did what on our account when did they do it and this is really to allow us to determine things such as who's deleted a particular resource who's downloaded a particular s3 object or actually why can't I connect to this instance so we currently do log as every organization should do but we destroy it in two locations at the moment both shared services and our security account and that's really because we have processes running in both factually consume those logs but we really want to move away from this and centralize them into a single account and we want this because it's a single source of truth this is where we want to get to one place to go that has all our logs it's one place to secure so we don't have to secure two different locations and again we can have very limited access once we've configured this logging account no one needs to have access to it the account itself will set up with multiple s3 buckets so the reason we don't have one bucket and log to that one is actually because bucket policies have a limb on how much text you can include in them if we need to reference all our accounts to give it access we simply wouldn't fit it in so we actually create multiple buckets separated by environment or business unit and that will allow us to get around that limitation and then the bucket policy itself we can add read-only permissions to it they'll allow things such as our security tooling access to consume those logs and that's just a step that we can go through it just allowing that read-only access the next account that we created is actually something we call the custodian account and this is another one that we don't didn't have as of last year so as an organization we've built up a set of best practices best practices that we want all up developers to follow now we trust our developers but we really want to way of just verifying that they're being followed so it's really that trust but verify exercise want to go through so in order to do this what we needed was a single pane of glass into her accounts one place to go to see actually the estate and whether things are being followed and where they're not being so we looked at the services Amazon offer we looked into trust his advise and we looked at a SS config we saw that they both very much operator to regen or a single account level that it's not a multi account setup that you can use it for and they also very much act in a detector notify mob or they'll notify you of something that's not being followed but it won't action it so we've actually built up a set of best practices we just want to enforce especially if it's in a security angle we don't want to have to wait for us to send out an email to someone that a developer then has to action it and then go in and make a change we just want to step in make that change and then as a retroactive action go notify them that has happened and why it's happened so they don't do it again in the future just do that notify is good for some situations but we really want to just enforce others and just step in and make that change if it's not being followed so this is really led us to create something we call the custodian account so as I mentioned in the inflation process every single TR account is created we create a set of custodian roles and we create two roles it's a read-only role and a readwrite role and then the custodian account looks very similar has got a corresponding set of roles which have the ability to assume the roles in every single one of our accounts so that gives it the visibility into all of it and it's really that gives us that single pane of glass then really after that it's just a choice of picking what talling to use so what can we use to enforce this policies but also do things such as cost management so like on Thanksgiving if development instances are turned on that they've just forgotten to switch off that we just want to step in and just turn them off and actually things at the end of the Working Day we just want to step in and turn things off that should have been turned off so we've initially gone out there and start to use Capital One's cloud custodian to allow us to write those policies but there's account it's really there to be used as a hub that we can deploy the services into once we kind of find out what tooling is out there and fill those gaps that we identify it's that place we can use it's a single pane of glass the next Enterprise that currently we've created and we have from day one it's the security account so it works in conjunction with the custodian account but it's really there for our security teams to use it's for things such as processing the logs from the logging account hosting the security tooling so things such as threat protection forming instant management and actually identifying if something happens they have a place to go and triage it and also doing things such as security order and also instant management that account is there for that team to use for their day-to-day activities and then this is really the final of our enterprise accounts what we call the shared services account and this was originally designed to host our shared network services so things such as Direct Connect DNS servers Bastion hosts network monitors and actually where we build out our a.m. eyes and we originally aggregated these all into one account as you can see as this list starts growing it means we have to allow more and more people in access to that account to manage their applications deploy them and support them so since then we're actually making the decision to separate out the ones we classify as business critical so things such as Direct Connect in DNS which are pivotal to establishing connectivity between our on-premise stage centers and our eight aggress accounts and the reason we're doing this so we can do more limited access we can limit down to role based segregation only the people that need to absolutely access these account to monetize services have access to them and this in turn helps us reduce that blast radius so this really leads me on to actually what do we create for our business units so what do they have to use so we are see at the moment from day one we created a set of sandbox account so we create one sample but sandbox accounts for each of our business units and it's really there for them to do team innovation so time box pcs experimentation in a team environment we set it up so it has no data center connectivity is its own Island a playground environment it's multi tenant so because it's her business unit we've got lots of teams for that business unit using that same account and as such we have a set of restrict for permissions because it's a shared environment we can't allow access to destroy V pcs or subnets because is it dependent on resource by a variety of teams and as such we also do a full account inflation so but what I mean by this is we make this account look very similar to our upper environments so this means that if a proof of concept works out we want to promote it into non production environment or production it's very easy to do that business because the accounts look very similar we can just update the account ID and then just orosi roll it through but after we kind of created these accounts we got some feedback from developers and it was really the we call they call that a few things so such as they can't learn some of the VPC fundamentals so if they're training for amazon certification or you've got network teams they need to go in and discover how to use some of these foundational services such as VP C's route tables Internet gateways they need to have that access to learn and also things such as consuming templates from a degress all of them depend on things like the default VB C which we actually remove as part of that inflation process so since then and taking on that feedback because they are our customers this is why we do it we've created a new set of sandbox accounts and we're going to create them for every developer I know it's a lot so as you can imagine 12,000 developers 12,000 accounts it gets it's a big number but I can tell you why we can do this so this is an area that is there for learning and experimentation so it's allow them to do it in a solo environment again that no data center connectivity so it's removing a lot of those limitations there's no connectivity back to our shared services back to our data centers is its own Island so we're not hitting a lot of those limits because we're not requiring any of them there's single tenant so they're used for the that developer only it's their account to use and as such that you can have a full set of permission so they can do what they need they can create VPC so they can destroy them and ultimately we can just refresh them back to their current state and we also do a very minimal account inflation by which we still bring it under consolidated billing so under that master organisations account and also we deploy the security role and the custodian animal so we still have that visibility into those accounts the next set of accounts we create for business units is what we call software development lifecycle accounts and this is where they build and deploy their business applications so as of last year we did a non-production account in a production account feature our business units and it worked really well but what we were seeing that we give developers read/write access to non production we give them read-only to production and what that means is that developers can make changes manually in the console in that account so amend that not all resources may have been part of their build pipeline so when they deploy to production something might have been missing because someone made a change behind the scenes so we really wanted a way of fixing that so what we've done is we've renamed non production to development we've created a new account called staging and it's really there to stage our deployments before they go to production to test them out in a production like way and it's kept in sync with the production so it looks very production like now allows them that area that they can test it in and then the second accountant change we've actually made is to create a new one we're creating something that's called a disaster recovery account so we can tell ourselves as an organization we have all the controls and processes in place that means that no process will ever go rogue and it will ever have access to production that shouldn't and we believe this but what we really want is that contingency plan in place so that if a worst-case scenario happens we don't have to go into a production account diagnose the issue try and revert what's changed what we can do is just sever ties to production and failover to dr and prove that's account is going to be there from day one it means that we can actually replicate the data as soon as it's in production of production applications into dr so it allows that failover process to happen easier because if all the data for an application lives in a production account then it makes that failure of a process extremely difficult because you don't know that data may have been deleted so the accounts themselves look very similar they setup in the same way through inflation process the only thing that really changes is that those security processes just elevate this queue to controls elevate as we got through the environments and the access to them get more and more restrictive as you can see we've really started small so we create a small set of accounts for each of our business units we keep talking to them to find out what are their new applications that coming along when would they need in your account as in what type of uses cases are coming in the future and this is really because in a multi strategy what I really consider is being the pivotal point and where you should concentrate on is actually how many of these accounts you should create because if you actually create them at a business unit level you can have a very small number if you switch to it being more of a per micro service and you're very drastically going to get into the thousands so we're really there to focus on starting small laying down the processes that are in place to set us up for scaling in the future but because we're in this multi-tenant world we're really looking at a different ways we can allow developers to work in a set up to make sure that people don't step on each other's toes when you've got lots of developers working in the same account so this is really driven us to look at more I am so I am can actually be used to provide that resource isolation between teams to really prevent the developer and team a/e can't delete the resources of developer and team B and this is done by actually selling things called tag conditions and resource names in your iron policies so when we onboard a new application into AWS for them to start building the application we create a set of human roles that they actually log into the account as with Federation and that's on a per project level so they're logging in as an accountant or sorry as a project so what that means is we can actually set conditions in these policies so it could be an example so looking at EC to terminate instances controlling who can terminate an instance we can set a condition use the conditional operator string equals which says that if this property exists then it must equal this value so in this example the developer would only be able to terminate an instance where I was tagged with a tag named ID and it was equal to one two three so if they tried to delete a resource owned by someone else with a different application ID they would get a permission denied because we have that mandatory tagging policy out there that enforces that every single one of our resources has these tags that applied such as an application ID we can do this fairly easily and because not all applications actually support this tag based permissioning we actually can use things such as resource names as well where that ability is not available so looking at the action I am past role so the ability to pass an iron role to a service we can actually set a source name one that includes a value and a wildcard which basically says you can only pass this iron role if it starts with your application ID so this is a good way of actually limiting who can pass the iron roles around by different teams and it's limited just to them so as you can imagine this does come with some overhead and we're still experiencing this as we go along and also there isn't a hundred percent coverage so not all Amazon services support tag-based permissioning and iron roles all resource names but we're really finding that once we write these first set of policies once we employ templates and automation we think that it's actually really fairly easy to manage because it's actually after that's just the process of when in your Amazon server comes out you are another set of policies just for that service so this there is a bit of management but when you can start applying this automation it actually makes things a lot easier so this is something we're really considering looking at so this really brings me to the last of our business unit accounts and this is what we look at we're calling a CI CD account this is one we haven't created yet but it's one word about two so after we provisioned kind of the dev the staging the production accounts so our software Devon and life cycle accounts we realized we didn't really have a good location to store our CI CD pipelines we're going to deploy them into these accounts but actually they would be deploying cross environments which didn't feel right and then also we looked at kind of the shared services so deploying the CI CD pipelines there but again we wanted that I can't be extremely limited access so we're looking at a new place to store our CEO CD put the CI CD pipelines and actually perform things such as chaos engineering so testing how self-healing our applications are because we really want to make sure of that the cloud is there to use these types of tools so this is driving us to create what we're going to call a CI CD account I'm going to create one per business unit it's there to host the build pipelines and things like the artifacts store then as part of that process will deploy a set of cross account roles one into each of the accounts which will give the build pipeline access to deploy into them and again we'll employ those resource base permissioning that I mentioned in the previous slide to limit down the developer pipeline for product a can only provision and in fact interact with resources created by their pipeline then also then we have the code services running in each of those accounts so then the general process looks like the build pipeline will build the artifact to store in the artifact store and then for each of the accounts it will assume the cross account role to give it permission to it it will then deploy the cloud formation and then actually deploy the application and then it does that through the environments and this will also give this the ability because it's not deployed into one of our business unit accounts it means that our applications are very portable in a multi account world we may create a new account for a new product because it's hitting one of the Amazon API is quite hard and because the CCD pipelines in its own separate account it means that we can just substitute an account ID plug in a separate cross account role into a different account to give it access and then move our applications around a lot easier so through this is really where we were last year so if you actually go away and actually view only last year's presentation this is what we really present so along the top you can see we've got our organization's master account which is therefore the consolidated billing and now the automated account creation process and scps with the security account for our security team to use to host CIC deep sorry security tooling and perform things such incident management we have a shared services account to host those shared network services and then on the business unit side we have a sandbox account at a team level so team-based pocs a development account for them to build their applications and then production to host them but then since that point the new accounts that we're about to create we've got the logging account to centralize our logs to have a single pane of glass into all our logs and one place to secure we have a custodian account to start enforcing policy that we've built up over time the lessons we've learned and also do things such as forming cost management we've separated out where we classify as a business critical services so we've got the new Direct Connect account and DNS account so you can do more limited access and reduce our blast radius within a new staging account to stage our deployments for our business units to allow them to test out a production deployment in a production like way that allows them to give that have that ability to try it out and then a new DR account to manage worst-case scenario if a process went rogue that we have the ability to migrate out of an account into a already set up account like for like for production and then we have this CI CD account that really spans across them to give it the ability to deploy our applications and cloud formation into those accounts and then finally we have a developer account so these are per developer accounts and we were in oh really now just looking at the processes to wrap around this to agree things such as monthly spending limits and also do things such as the on-boarding and off-boarding process so that when a new developer comes on we create a new account for them but also as they leave we destroy them and also do things such as regularly destroy the resource so we refresh it back to its current state so this is a the model we are moving to as you can imagine the next year we actually imagine we're going to have a lot more accounts especially in the business unit space we're really ensuring at the moment that we have the networking controls and automation in place to really that ensure that we actually scale in a well-managed way and then we don't hit any of these limits that were aware of because ultimately we're still learning as we go along and we're just making sure that we have all those in place so with that I'd like to hand back over to Sam to continue for our station so hopefully by now you've started to think through this multi-account and so far it sounds like this amazing solution to everything that there is and it's the best thing since sliced bread nothing comes for free and there are of course some cons and things that are there so yes it gives you the complete security and resource isolation gives you that smaller blast radius that simplified billing per account but you've also got the aggregation how do I get all of my logs and all of my resources and everything from all my accounts into a single source of truth and more importantly how do I even go and analyze it and process it same thing with the distribution of that gear if I'm generating billing reports across all of my accounts how do I tell each account owner how much they're spending there's a certain set up and operations over it how do I make sure I manage all of those accounts all those Kraus account roles getting everything that you've got more complex security policies as well now I've got organizations SCPs you've got I am roles and you've got accounts that are shared you've got other things in between so when we start thinking about this there's a set of principles or goals or tenants however you want to call it that we need to think through one aim for being automated more automated we are the easier we can replicate and the less we give access to humans our CCO likes to say the more he can keep the humans away from the data the less likely something goes wrong maybe somebody didn't get enough sleep didn't log in at the right time or typed in the wrong command scaleable by being automated we also want something that's scalable as I create additional accounts as I launch them I want to be able to create them easily and scale out and be able to do that aggregation that distribution make it a self-service as possible you don't want to be the one that's getting the call at 3 o'clock in the morning cuz somebody isn't logging into an account where they're trying to create something being able to provide a self-service while still being able to define guardrails instead of being blockers around it what can people do defining those rules and policies and being able to action on sometimes automatically sometimes an email sometimes I'll get an email if I put in something that's open to the world on the inner and get a note lets us take a look at it does it need to be on the internet auditable i need to be able to view what's going on within the accounts and know if there's any potential areas of risk and final and I think most importantly is being flexible as new services come out as new approaches come out this model will change there's something you start out with today but they'll be unique requirements in fact what Thomson Reuters actually very slightly from the framework we're about to show you because there's specific requirements and things that they needed then when I start setting up every account there is a set of things I need to do on day one take my route credentials and lock those away create my initial admin user group put one or two people in there and that's my break class but the root account should almost never use them even for your personal account at home keep that locked away certainly don't generate access keys for it because that account has got unlimited powers I can't go into the account and establish a policy enable cloud trail not just for the region you're operating in but for every region if somebody does compromise your account they're gonna aim to launch things in every single region that they can especially they're trying to do a Bitcoin mining or other things so enable it for every single region it's a simple checkbox and you have those locks you're only paying for the storage so there's nothing there nothing happens think about your enterprise role development teams QA administrators what does that map into in terms of privileges within the AWS environment canada Vella / go kill a production server and we start thinking through how do we build this Fedder 8 into all of your accounts the only exception might be the developer sandboxes and that's a decision that gets made at some point but you federated into the account that means the jointer and lever process if somebody comes into your organization they get to access if they leave you've got policies in place already to remove them from your central directory so immediately their access is revoked that cross account role to the security account to be able to audit and verify and validate what's running make sure that's created in every single account and think about the actions and conditions that you want to apply to the accounts maybe I don't want non-encrypted EBS volumes I want to make sure that every object in every s3 bucket is an encrypted object I can think about these actions and conditions and define what I want them to look like so and what accounts should I create well first one is our master organizations account that spans everything all the accounts are there the billing goes into that you can define service control policies from there there is this central or enterprise accounts that you'll create these get created once you're logging account your security account Direct Connect or networking account shared services billing tooling then once I've define those now there's the accounts per product line or per business unit so maybe my sandbox my gap my pre prod my prod and then finally back to that idea being flexible there will be other accounts you will need to create might be for a compliance reason might be for some other random reason but be aware that you will at some point have to create them and just like TR went from having a certain approach and now creating additional accounts or red riding-hood from going into a single account and creating two to start building things out you need to be flexible so to quickly go through each one of those accounts the first one is our organization's master account it's not connected to your data center there's very minimal resources in it it's where your bills and things go it's also where you can define your service control policies your volume discounts minimal resources of any and the amount of access to that account should be as limited as possible that is a very powerful account even if you don't have a cross account roll into things that account can go turn off everything because it can define a service control policy that blocks access to everything else when a nativist organization is used to create an account there is a cross account role that it creates in the sub-account you take that and you use it to do the initial baselining of the account things like creating the role for the security account or in the case of TR also for the custodian account but you delete that role when you're done with it we want to keep the areas of responsibility limited and we're gonna have a separate role in the security account that allows us to be able to do things like that when they're needed you can apply service control policies things like deny access to stop cloud trail logging and because it's at an SCP level that means even the root account can't go in and stop it I can define I don't want anybody to be able to attach an Internet gateway to a V PC you've defined a V PC it's connected to your data center through a Direct Connect or a VPN and you don't want anyone to be able to attach an inner gateway I can define a policy like this as an SCP and do it when I do need to make one of those things as an administrator I move it into a maintenance maybe group make that change move it back and this way the policies apply once we've done that and created our organization's account where do we put all of our logs we create a logging account it's a centralized place you create a versioned s3 bucket every time you add an object it creates a new version it's restricted you enable MFA delete on that bucket meaning that every time I try and delete an object I have to enter an MFA token because these are your single source of truth logs things like your cloud trail logs your security logs and extremely limited access there should be almost no one logging into the account you'll provide read-only access to the logs to other accounts that need them so for each developer for example access to their own cloud trail logs but it's very limited there should almost be no one logging in to that account and it's there to hold the logs as your single source of truth once we've created that let's move on to the security account get have created security first but we need somewhere to store the logs so we create the security account we send our logs to the logging account again cloud trail VPC flow logs all the logs you've got enabled at least around the security portion of it that might be connected to your data center depending on how your security tooling works some customers might have it some customers may not but it's an option for you that could be through direct connect or a VPN okay your security tools and auditing tools may be things that are processing and analyzing you cloud trail AWS config rules and it hosts that cross account read/write role there's gonna be two roles I read only role that's the one that audits the environment checks for what's going on and does scans or any security groups open to the whole world and if so should they be there's a read/write role for your Incident Response for example there's a compromise or something is wrong I can't call the person or I can't page them maybe I log in and I do something to stop it maybe isolate a compromised instant instance again extremely limited access certainly more limited access for the ability to assume that read/write role because that role can do anything in any of the accounts once you've created that the next one is our Direct Connect or our network account for some customers you might need to reconnect for others you don't but this idea of areas of responsibility I have an area of responsibility and business critical infrastructure such a direct connect and I separate into that account and you'll see that theme work throughout it's all about the areas of responsibility what can I limit is a blast radius and what do I allow people to do so again I send its logs into the logging account limited access and I start building it down then shared services there might be common tooling and services you need to provide for your organization and again connect that one to your data center it might need access to a central directory or other resources its logs going to the logging account we're gonna do a back fill step here and connect it to the security account because the security team will probably need access to the resources that are running in shared services again things like our DNS our Active Directory LDAP deployment tools possible golden ami our pipelines scanning infrastructure do I have any active instances are there any improper tags are there any snapshots sitting there that shouldn't be monitoring might be sitting there now might be just like Thomson Reuters at some point realized there is portions of these I spin out into separate accounts but as a starting framework to think through it this is the idea around once we've done that now remember that organizations account is very powerful and it's got a lot of things within it so create a separate billing tooling account so instead of giving access to that organization's account that's got a lot of power in it there's a separate account for managing billing and tooling and reporting and things around that again same thing send the logs to our logging account our billing reports our tooling usage metrics maybe things around usage optimization unreserved instance management should I be purchasing more should I cut back what does it look like how is my utilization okay and again limited access to the people letting you if you're subject to a compliance program maybe PCI maybe sock – maybe HIPAA maybe you create an internal audit account providing read-only access to the logs and things they need to establish the compliance with that program and again it's regulatory compliance I'm sending the logs – back to my logging account read-only access to the logs limited access to those people that need that level of access and if you're interested in the compliance aspect of it we have a whole other session focused primarily on the compliance aspects of this multi account architecture that's E&T 324 once I've done that now let's think about a development team so we have this other umbrella around the developer accounts and we create a sandbox per developer that allows them to go in innovate build things and again we send the logs to our logging account it's not connected to our data center this is where they can go download the latest open source package go do whatever they want experiment learn it's an innovation space define a fixed spending limit maybe that's 50 maybe it's a hundred maybe it's a million but that's our defined spending limit if you're giving a million dollar a month in spending I don't know but you can think through it but you define that limit not only will that help them understand how EWS works innovate use new services try things out by allowing them to be autonomous and experiment they've also got that responsibility around the billing for the month it allows their development teams to start thinking through cost aware architecture how much will it cost me to actually deploy this so once we've got a development teams taken care of how do we start deploying our solutions and that's where we refine things whether it's a business unit might be a product or a set of resources for an area of responsibility for a team and it's based on the level of isolation that you need and you want to match your development lifecycle you might be doing development pre prod prod you might be development alpha beta gamma prod match that development lifecycle and these accounts you're going to scale those out and you create multiples of those sets so development account and get send our logs to our logging environment door logging account connect it to our data center but a development network you're not connecting it to your production so the rules that exist around separation of different environments hire environment lower environment they still apply connect it to my shared services by the way all of those central accounts might have a version that's prod pre prod because again you want to test out those changes before you roll them out across the organization again people can develop iterate quickly they might be doing things manually their collaboration within there and it's just a stage of your development lifecycle if you have a fancy name for it other than development use that name then it's my pre prod again send my logs to my logging account connect it to my shared services it'll probably leverage tooling and things from there whether it's DNS or a central directory and connected to my data center by the way a lot of these connections could take form of EPC peering or other services around connecting VP C's together privately so non-overlapping IPs and you want to think through that approach its production like the idea behind this account is that this is where things should look like just like production if you use automated deployments use automated deployments if you automate automated deployments with a manual approval use automated deployments with a manual approval but make it look as close to production as possible so it might be your staging maybe your QA but has to match your production environment as closely as possible now once I've done pre prod the next step is I'll go into production you can have a few other stages in there depending on how you're set up but we've got a production environment once again send my logs to my logging account connect it to my shared services connect it to a production network in my data center and it's our production applications and hopefully you are promoting that from pre prod and not going in and just deploying it right in very limited access and now I recognize we are a lot of messaging around DevOps and everything should be automated and do that the fact is that's not going to happen overnight and for some environments it might not be something that you do so this is also another opportunity where maybe the production account is owned by deployment team somewhere or an Operations team that helps make that deployment happen when it's time to do it and I got an aim for as for being as automated as possible but you can also define that area of ownership based on the responsibility for that team now as we start growing you might have multiple teams they might start building things they might actually start having common tooling or a common set of services that are needed for that business unit maybe it's a data link or some piece of information that they need so I might have a shared services for the business unit and there might be more than one depending on what kind of service I'm offering and that's different from the central shared services again we send the logs over we connected to our shared services connect it to our data center connected to pre prod fraud all the accounts that it needs to be connected to this account grows organically so this is not something you immediately go today and say I'm going to create it I'm ready to go I have shared services it'll become clear as things become common you might spin out a team to be responsible for a set of services and they might offer them a shared services to the rest of the accounts or it might be your central data leak put everything in that a set of s3 buckets and allow accounts to be able to go in and do it your common tooling across that particular business unit might fit in there and once we define that another one that you might want to think about is a sandbox now I've got a sandbox for each individual individual developer but maybe I want a collaboration space for them to work as a team so again that one would be disconnected sending the logs over to the logging account new initiative it's disconnected it is the experimentation place it's for innovation people can play and collaborate together as a team so when I start thinking about this idea of a sandbox innovation pipeline what does that look like so you've got our developer accounts you've got our bu accounts it might look like this I might create a POC account for instance so for example I've got my developer accounts from there I create POC account and that's where development teams work together collaborate then son once I'm ready move that into a sandbox going to development and continue through my development lifecycle the way I would I could also go from the POC account right to the development account that sandbox might not be necessary or for some organizations I've got my developer accounts I figured out what I needed to figure out I'm ready I just go right to development again think about the framework how do I on a build this and it might actually be multiples of these within the same organization depending on how the team's work and interact together and back to that idea of being flexible and these special accounts you might have a regulatory compliance reason for it might have additional security controls PII you might have complex product or a platform I know you're anxious to get out it's almost time so let's summarize this is a diagram everything drawn out I don't know if you can read it but the organization's account that's our account management our logging account our centralized logs security our configural security tools shared services maybe directory dns limit monitoring billing tooling cost monitoring our eyes sandbox for our experiment whether it's developer sandbox or the common sandbox that we talked about development account as a stage of our development life cycle pre prod staging QA and prod for our production now I promised you an action plan so here's your homework define your tagging strategy what does that look like how do I tag my resources across my organization is there a cost center think about your automation strategy now start creating an organization's account from there create your logging your security your shared services billing tooling possibly and start with the developer sandbox ease and while the development teams are learning and ramping up you're in the process of setting up a bunch of other things and the action plan how do I start thinking through this what is my strategy around it and the different checklists of things to do so my logging accounts security accounts as well Direct Connect all of those things so how do I think about this and what are the steps now this is by no means exhaustive but hopefully this gives you a starting point of what to think about and what items to enable or disable then if you notice there was a lot of references to a common checklist here it is this is something that's pretty much common to every single account that you even create we are working by the way on a white paper around this topic when we're gonna have a lot more details and things and that should be coming out soon for those of you interested in this topic which I assume you are since you haven't left and we're a minute over we actually built a track around multi-account this year you are in the first one there that's underlined in case you couldn't figure it out we've got implementation we've got auditing we've got a number of other sessions there thank you and please do flower your gallery [Applause]

Maurice Vega

6 Responses

  1. He's basically using accounts as something between Docker containers and steps in a devops pipeline, right up to the restore state for rollback. That is so cool.

  2. With the concept of disposable accounts how do you actually get rid of them? A conundrum: there's no "DeleteAccount" in AWS Orgs API.

    Apparently you have to leave the org, make it standalone, put you credit card details and thaaan you can delete it. I find it super cumbersome. If creating new account in Orgs (not talking about inviting to org) is 1 API call, deleting it should be equally straight forward.

Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment