{"id":134,"date":"2020-02-26T04:33:27","date_gmt":"2020-02-26T04:33:27","guid":{"rendered":"https:\/\/reviewnprep.com\/blog\/?p=134"},"modified":"2024-08-29T01:00:25","modified_gmt":"2024-08-29T01:00:25","slug":"practical-learnings-from-aws-architect-certification-dynamodb-for-the-mainframe-cobol-vsam-programmer","status":"publish","type":"post","link":"https:\/\/reviewnprep.com\/blog\/practical-learnings-from-aws-architect-certification-dynamodb-for-the-mainframe-cobol-vsam-programmer\/","title":{"rendered":"Practical Learnings from AWS Architect Certification &#8211; DynamoDB for the mainframe\/COBOL\/VSAM Programmer"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">How to Convert VSAM to AWS dynamoDb?<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"#Background\">Background<\/a><\/li>\n\n\n\n<li><a href=\"#H1\">What is Amazon DynamoDB<\/a><\/li>\n\n\n\n<li><a href=\"#H2\">What is IBM VSAM (Virtual Sequential Access Method)<\/a><\/li>\n\n\n\n<li><a href=\"#H3\">What are DynamoDB Partitions and How do they work?<\/a><\/li>\n\n\n\n<li><a href=\"#H4\">What are CIs (control interval) and CAs (control area) in VSAM &amp; how does CI\/CA splits work?<\/a><\/li>\n\n\n\n<li><a href=\"#H5\">VSAM files Versus DynamoDB<\/a><\/li>\n\n\n\n<li><a href=\"#H6\">Migrating VSAM files\/DB2 tables to DynamoDB<\/a><\/li>\n\n\n\n<li><a href=\"#H7\">Additional Examples and Further Reading<\/a><\/li>\n\n\n\n<li><a href=\"#H8\">Thank you<\/a><\/li>\n<\/ul>\n\n\n\n<p><strong>(TL;DR)<\/strong><\/p>\n\n\n\n<p id=\"Background\"><a><strong>Background<\/strong><\/a><\/p>\n\n\n\n<p>I wanted to explain the reasoning behind the blog\nfor two very specific reasons.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>My struggle with arriving at the title of the<br>blog to make it relevant to the audience. If I had just chosen the title for<br>the mainframe OR the COBOL or the VSAM, a lot of mainframe fanatics like myself<br>would have pooh-poohed this blog. This is the reason I included all three and<br>strategically included them with slashes. <\/li>\n\n\n\n<li>I wanted to provide our mainframe programmers a<br>way into the cloud by leading them into something like -NoSQL databases &#8211;<br>considered really complex for us relationally minded mainframe technologists<br>and provide them a way to equate it with something that we have worked on for<br>years. <\/li>\n<\/ol>\n\n\n\n<p>This idea came me as I was preparing for my <a href=\"https:\/\/reviewnprep.com\/amazon-web-services-certification\">AWS Solutions Architect<\/a> exam and I have been meaning to write this as soon as it struck me to derive a parallels between the two completely disconnected paradigms. <\/p>\n\n\n\n<p id=\"H1\"><a><strong>What is\nAmazon DynamoDB<\/strong><\/a><\/p>\n\n\n\n<p><strong>Amazon DynamoDB<\/strong>&nbsp;is\na fully managed NoSQL database service that supports key-value and\ndocument data models, and enables developers to build modern, serverless\napplications that can start small and scale globally to support petabytes of\ndata and tens of millions of read and write requests per second. It automatically manages the data traffic of tables over\nmultiple servers and maintains performance. It also relieves the customers from\nthe burden of operating and scaling a distributed database. Hence, hardware\nprovisioning, setup, configuration, replication, software patching, cluster\nscaling, etc. is completely managed by AWS.<\/p>\n\n\n\n<p><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/HowItWorks.CoreComponents.html\">Basic Concepts\nof DynamoDB<\/a><\/p>\n\n\n\n<p>DynamoDB database is made\nof tables, items and attributes. Each table is a collection of data consisting\nof zero or more items and each item can consist of one or several atomic\nattributes identifying the item. <\/p>\n\n\n\n<p><strong>Primary Keys<\/strong><\/p>\n\n\n\n<p>When you create a table, in\naddition to the table name, you must specify the primary key of the table. The\nprimary key uniquely identifies each item in the table, so that no two items\ncan have the same key.<\/p>\n\n\n\n<p><strong>DynamoDB supports two different kinds of primary keys:<\/strong><\/p>\n\n\n\n<p><strong>Partition\nkey<\/strong>&nbsp;\u2013 A simple primary key, composed of one attribute known as\nthe&nbsp;<em>partition key<\/em>. DynamoDB uses the partition key&#8217;s value as input\nto an internal hash function. The output from the hash function determines the\npartition (physical storage internal to DynamoDB) in which the item will be\nstored.<\/p>\n\n\n\n<p><strong>Partition\nkey and sort key<\/strong>&nbsp;\u2013 Referred to as a&nbsp;<em>composite primary key<\/em>, this\ntype of key is composed of two attributes. The first attribute is the&nbsp;<em>partition\nkey<\/em>, and the second attribute is the&nbsp;<em>sort key<\/em>.<\/p>\n\n\n\n<p>DynamoDB\nuses the partition key value as input to an internal hash function. The output\nfrom the hash function determines the partition (physical storage internal to\nDynamoDB) in which the item will be stored. All items with the same partition\nkey value are stored together, in sorted order by sort key value.<\/p>\n\n\n\n<p>In\na table that has a partition key and a sort key, it&#8217;s possible for two items to\nhave the same partition key value. However, those two items must have different\nsort key values.<\/p>\n\n\n\n<p>There are many features of\nDynamoDB. Instead of reproducing the documentation I am listing some of the\nlinks I have found really useful from AWS as well as other sources. <\/p>\n\n\n\n<p>Performance at scale<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/bp-general-nosql-design.html\">Key Value &amp;<br>Document Data Models<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/aws.amazon.com\/dynamodb\/pricing\/provisioned\/#.E2.80.A2_DynamoDB_Accelerator_.28DAX.29\">Microsecond<br>latency with DAX \u2013 DynamoDB Global Accelerator<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/GlobalTables.html\">Multi-Region<br>replication with Global Tables<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/Streams.html\">Real-time data<br>processing with DynamoDB streams<\/a><\/li>\n<\/ul>\n\n\n\n<p>Serverless<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/Streams.html\">On demand<br>read\/write capacity modes<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/AutoScaling.html\">Auto Scaling<\/a><\/li>\n<\/ul>\n\n\n\n<p>It\u2019s been proven that\nDynamoDB performs <a href=\"https:\/\/aws.amazon.com\/blogs\/database\/amazon-dynamodb-auto-scaling-performance-and-cost-optimization-at-any-scale\/\">better<\/a> as\nload increases. <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/Streams.Lambda.html\">DynamoDB<br>Streams &amp; Lambda Triggers<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/Streams.Lambda.html\">Complex<br>Workflow support with DynamoDB ACID transactions<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/EncryptionAtRest.html\">Encryption at<br>rest<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/PointInTimeRecovery.html\">Point-in-time<br>recovery for DynamoDB<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/BackupRestore.html\">On Demand<br>Backup &amp; Restore<\/a><\/li>\n<\/ul>\n\n\n\n<p id=\"H2\"><a><strong>What is IBM VSAM (Virtual Sequential Access Method)<\/strong><\/a><\/p>\n\n\n\n<p>From\nIBM\u2019s definition of a mainframe system at its&nbsp;<a href=\"https:\/\/www.ibm.com\/support\/knowledgecenter\/zosbasics\/com.ibm.zos.zmainframe\/toc.htm\" target=\"_blank\" rel=\"noreferrer noopener\">IBM knowledge center<\/a>&nbsp;&#8211;<\/p>\n\n\n\n<p><strong><em>A\nmainframe is what businesses use to host the commercial databases, transaction\nservers, and applications that require a greater degree of security and\navailability than is commonly found on smaller-scale machines<\/em><\/strong><em>.<\/em><\/p>\n\n\n\n<p>I\nhave no statistics to prove but a large amount of financial, banking,\nbrokerage, travel and food processing companies continue to use IBM mainframes\nto securely house, process and manage data. However, a lot of this landscape is\nchanging with cloud and innovations around hybrid cloud architectures and\ndesign patterns. There are several innovations that IBM is working on to etch\nits place and solidify it on the cloud side. More on this later, at this time,\nI wanted to focus on the virtual storage access methods a.k.a VSAM datasets and\nhow close some of the design features of VSAM datasets are with AWS DynamoDB.<\/p>\n\n\n\n<p>To\nread more about IBM mainframe data storage types and concepts around it please\nsee &#8211;<\/p>\n\n\n\n<p>\u00b7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=\"https:\/\/www.ibm.com\/support\/knowledgecenter\/zosbasics\/com.ibm.zos.zconcepts\/zconc_zosstorfilesds.htm\" target=\"_blank\" rel=\"noreferrer noopener\">z\/OS Storage Constructs: Files Systems, datasets etcetera<\/a><\/p>\n\n\n\n<p>\u00b7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=\"https:\/\/www.ibm.com\/support\/knowledgecenter\/zosbasics\/com.ibm.zos.zconcepts\/zconcepts_150.htm\" target=\"_blank\" rel=\"noreferrer noopener\">Dataset Access Methods on IBM Mainframes<\/a><\/p>\n\n\n\n<p>\u00b7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=\"https:\/\/www.ibm.com\/support\/knowledgecenter\/zosbasics\/com.ibm.zos.zconcepts\/zconcepts_159.htm\" target=\"_blank\" rel=\"noreferrer noopener\">Dataset Record Formats on IBM Mainframes<\/a><\/p>\n\n\n\n<p>A\nlot of information is available on IBM\u2019s zOS concepts site regarding VSAM. The\nvirtual storage access method can refer to a specific dataset type as well as\nthe access method for managing the various dataset types. There are four kind\nof VSAM dataset types \u2013<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>VSAM Type<\/strong>   <\/td><td>\n  <strong>Storage<\/strong>\n  <\/td><td>\n  <strong>Usage<\/strong>\n  <\/td><\/tr><tr><td>KSDS (Key Sequence Data Set)   <\/td><td>Records are organized using key fields and can be   accessed or new records inserted using key fields. The data can be accessed   sequentially or randomly by explicitly specifying the key values in the START   key (ExclusiveStartKey).   <\/td><td>IBM&#8217;s IMS System   <\/td><\/tr><tr><td>ESDS (Entry Sequence Data Set)   <\/td><td>Records are organized in sequential order &amp; data   is accessed sequentially   <\/td><td>IBM&#8217;s IMS &amp; DB2 System   <\/td><\/tr><tr><td>RRDS (Relative Record Data Set)   <\/td><td>This format allows accessing records by a record   (#item) number. The data can be accessed randomly   <\/td><td>\n  &nbsp;\n  <\/td><\/tr><tr><td>LDS&nbsp; (Linear Data Set)   <\/td><td>This format allows to store data in byte stream data   set   <\/td><td>IBM DB2   <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>VSAM files can be\nprocessed or used in a program ONLY after the definition of the file is in\nplace using IBM\u2019s access method services (AMS). Here is a sample of a KSDS VSAM\nCLUSTER definition including index and data components.<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table><tbody><tr><td><strong>VSAM FILE TYPE<\/strong><\/td><td><strong>Sequential Access<\/strong><\/td><td><strong>Random Access<\/strong><\/td><td><strong>Dynamic Access<\/strong><\/td><\/tr><tr><td>VSAM sequential (ESDS)   <\/td><td>\n  Yes\n  <\/td><td>\n  No\n  <\/td><td>\n  No\n  <\/td><\/tr><tr><td>VSAM indexed (KSDS)   <\/td><td>\n  Yes\n  <\/td><td>\n  Yes\n  <\/td><td>\n  Yes\n  <\/td><\/tr><tr><td>VSAM relative (RRDS)   <\/td><td>\n  Yes\n  <\/td><td>\n  Yes\n  <\/td><td>\n  Yes\n  <\/td><\/tr><tr><td>Access Mode in File Control (when Allowed)   <\/td><td>ACCESS IS SEQUENTIAL   <\/td><td>ACCESS IS RANDOM   <\/td><td>ACCESS IS DYNAMIC   <\/td><\/tr><tr><td>Key Access Pattern   <\/td><td>Initiate Reading of KSDS VSAM with a START KEY   <\/td><td>Pass Exclusive Start Keys when reading VSAM   <\/td><td>It combines features of sequential &amp; random access of VSAM   file   <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Defining a VSAM cluster<\/p>\n\n\n\n<p>Here is a how a VSAM cluster is defined:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"417\" src=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Screen-Shot-2020-02-25-at-9.36.18-PM-1024x417.png\" alt=\"\" class=\"wp-image-141\" srcset=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Screen-Shot-2020-02-25-at-9.36.18-PM-1024x417.png 1024w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Screen-Shot-2020-02-25-at-9.36.18-PM-300x122.png 300w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Screen-Shot-2020-02-25-at-9.36.18-PM-768x313.png 768w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Screen-Shot-2020-02-25-at-9.36.18-PM.png 1114w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p id=\"H3\"><strong><a>What are DynamoDB Partitions and How do they work?<\/a><\/strong><\/p>\n\n\n\n<p>A partition is an allocation of storage for a table, backed by\nsolid-state drives (SSDs) and automatically replicated across multiple\nAvailability Zones within an AWS region.<\/p>\n\n\n\n<p>Data in DynamoDB is spread across multiple DynamoDB partitions.\nAs the data grows and throughput requirements are increased, the number of\npartitions are increased automatically. DynamoDB handles this process in the\nbackground.<\/p>\n\n\n\n<p>When we create an item, the value of the partition key (or hash\nkey) of that item is passed to the internal hash function of DynamoDB. This\nhash function determines in which partition the item will be stored. When you\nask for that item in DynamoDB, the item needs to be searched only from the\npartition determined by the item&#8217;s partition key.<\/p>\n\n\n\n<p>The internal hash function of DynamoDB ensures data is spread\nevenly across available partitions. This simple mechanism is the magic behind\nDynamoDB&#8217;s performance.<\/p>\n\n\n\n<p><strong>Limits of a Partition<\/strong><\/p>\n\n\n\n<p>DynamoDB Partition Size Limit = 10 GB<\/p>\n\n\n\n<p>DynamoDB Item Size Limit = 400kb<\/p>\n\n\n\n<p>Total items stored in a partition = 10GB\/400KB&nbsp; =\n10485760KB\/400KB = 26214 items<\/p>\n\n\n\n<p>Each partition can support a maximum of 3,000 read capacity\nunits (RCUs) or 1,000 write capacity units (WCUs) irrespective of the size of\nthe data.<\/p>\n\n\n\n<p><strong>When and How Partitions Are Created<\/strong><\/p>\n\n\n\n<p>When a table is first created, the&nbsp;provisioned\nthroughput&nbsp;capacity of the table determines how many partitions will\nbe created. The following equation from the DynamoDB Developer Guide helps you\ncalculate how many partitions are created initially.<\/p>\n\n\n\n<p><em>InitialPartitions (rounded up) = ( readCapacityUnits \/ 3,000 ) +\n( writeCapacityUnits \/ 1,000 )<\/em><\/p>\n\n\n\n<p>Which means that if you specify RCUs and WCUs at 3,000 and 1,000\nrespectively, then the number of initial partitions will be&nbsp;<\/p>\n\n\n\n<p>( 3_000 \/ 3_000 ) + ( 1_000 \/ 1_000 ) = 1 + 1 = 2.<\/p>\n\n\n\n<p>Suppose you are launching a read-heavy service in which a few\nhundred authors generate content and a lot more users are interested in simply\nreading the content. So, you specify RCUs as 1,500 and WCUs as 500, which\nresults in one initial partition&nbsp;( 1_500 \/ 3000 ) + ( 500 \/ 1000 ) = 0.5 +\n0.5 = 1.<\/p>\n\n\n\n<p><strong>Subsequent Allocation of Partitions<\/strong><\/p>\n\n\n\n<p>Let&#8217;s go on to suppose that within a few months, the application\nbecomes very popular and lots of authors are publishing their content to reach\na larger audience. This increases both write and read operations in DynamoDB\ntables.<\/p>\n\n\n\n<p>As a result, you scale provisioned RCUs from an initial 1500\nunits to 2500 and WCUs from 500 units to 1000 units.<\/p>\n\n\n\n<p>(2500 \/ 3000) + (1000 \/ 1000) = 1.83 = 2<\/p>\n\n\n\n<p>The single partition splits into two partitions to handle this\nincreased throughput capacity. All existing data is spread evenly across\npartitions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"276\" src=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture1-1024x276.png\" alt=\"\" class=\"wp-image-135\" srcset=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture1-1024x276.png 1024w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture1-300x81.png 300w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture1-768x207.png 768w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture1.png 1043w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><strong>Allocation of Partitions<\/strong><\/figcaption><\/figure>\n\n\n\n<p>Another important thing to notice here is that the increased\ncapacity units are also spread evenly across newly created partitions. This\nmeans that each partition will have&nbsp;2_500 \/ 2 =&gt; 1_250&nbsp;RCUs\nand&nbsp;1_000 \/ 2 =&gt; 500&nbsp;WCUs.<\/p>\n\n\n\n<p><strong>When Partition Size Exceeds Storage Limit of DynamoDB Partition<\/strong><\/p>\n\n\n\n<p>With time, the partitions get filled with new items, and as soon\nas data size exceeds the maximum limit of 10 GB for the partition, DynamoDB\nsplits the partition into two partitions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"999\" height=\"629\" src=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture2.png\" alt=\"\" class=\"wp-image-136\" srcset=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture2.png 999w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture2-300x189.png 300w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture2-768x484.png 768w\" sizes=\"(max-width: 999px) 100vw, 999px\" \/><figcaption class=\"wp-element-caption\"><strong>When Partition Size Exceeds Storage Limit of DynamoDB Partition<\/strong><\/figcaption><\/figure>\n\n\n\n<p>The splitting process is the same as shown in the previous\nsection; the data and throughput capacity of an existing partition is evenly\nspread across newly created partitions.<\/p>\n\n\n\n<p><strong>How Items Are\nDistributed Across New Partitions<\/strong><\/p>\n\n\n\n<p>Each item has a partition key, and depending\non table structure, a range key might or might not be present. In any case,\nitems with the same partition key are always stored together under the same\npartition. A range key ensures that items with the same partition key are\nstored in order.<\/p>\n\n\n\n<p>There is one caveat here:&nbsp;<strong>Items with\nthe same partition key are stored within the same partition, and a partition\ncan hold items with different partition keys&nbsp;<\/strong>\u2014 which means that\npartition and partition keys are not mapped on a one-to-one basis. Therefore,\nwhen a partition split occurs, the items in the existing partition are moved to\none of the new partitions according to the&nbsp;<strong>mysterious internal hash\nfunction of DynamoDB<\/strong>.<\/p>\n\n\n\n<p id=\"H4\"><a><strong>What are CIs (control interval) and CAs (control area) in VSAM &amp; how does CI\/CA splits work?<\/strong><\/a><\/p>\n\n\n\n<p>Visually here is how the\nVSAM cluster looks like on the mainframe systems. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"974\" height=\"808\" src=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture3.png\" alt=\"\" class=\"wp-image-137\" srcset=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture3.png 974w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture3-300x249.png 300w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture3-768x637.png 768w\" sizes=\"(max-width: 974px) 100vw, 974px\" \/><figcaption class=\"wp-element-caption\"><strong>CI\/CA Split<\/strong><\/figcaption><\/figure>\n\n\n\n<p>A control interval [CI]\nis the VSAM unit of I\/O and the structure around the logical records an\napplication manipulates. Choosing a CI size requires understanding many factors\nand is generally a trade-off between saving DASD by using large blocks vs.\nusing smaller CIs for potentially better online performance. The application\nrecord length also matters because VSAM has discrete CI sizes to pick from,\nwhich rarely accommodate the records with no space left over. A Control Area\n(CA) is a logical grouping of CIs, usually set at a cylinder. The programmer\nhas no direct control over CA size. Instead, VSAM picks a CA size based on the\ndata set\u2019s allocation units.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"974\" height=\"285\" src=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture4.png\" alt=\"\" class=\"wp-image-138\" srcset=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture4.png 974w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture4-300x88.png 300w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture4-768x225.png 768w\" sizes=\"(max-width: 974px) 100vw, 974px\" \/><\/figure>\n\n\n\n<p>R1-R5 &#8211; Records in the\ncontrol-interval (CI)<\/p>\n\n\n\n<p>FS &#8211; Free Space on the CI\nused for expansion<\/p>\n\n\n\n<p>RDF &#8211; Record definition\nfields describing the length of the records in CI<\/p>\n\n\n\n<p>CIDF &#8211; Control\ninformation definition fields.<\/p>\n\n\n\n<p>In a\ncluster definition, the CI free space is expressed as the percentage of bytes\nto leave empty when a cluster initially loads. For CAs, the free space is the\npercentage of empty CIs left at the end of the area.<\/p>\n\n\n\n<p>As VSAM loads a cluster, it lays the records end to end into an\nempty CI until it determines the next record will leave less than the minimum\namount of free space. VSAM then proceeds to put records into the next available\nCI. This continues until the number of remaining free CIs reaches the CA free\nspace limit and VSAM moves on to the next CA.<\/p>\n\n\n\n<p>Each CI has its prescribed amount of free space specified in the\ncluster\u2019s definition. For instance, specifying 10 percent free space for a 4K\ndata CI VSAM will reserve at least 409 free bytes. At the end of each CA are\nseveral empty CIs, depending on the CA free space in the definition. For\nexample, a data component with 90 CIs per CA and a 10 percent CA free space\nwill have nine free CIs per CA. The actual number of free bytes in a CI is\nindeterminate because it depends on the size of the records going into the CI\nat the time. But VSAM guarantees there should never be less free space than\nwhat\u2019s specified in the CI free space definition.<\/p>\n\n\n\n<p>Free\nspace applies only at data set loading time. After that, VSAM uses the reserved\nfree space when the application adds records while maintaining the records in\nkey sequence in a CI. If there isn\u2019t enough free space in the CI, VSAM\ndivides the records more or less evenly and leaves half of them in the original\nCI. The other half goes into one of the empty CIs left at the end of the\nCA. <strong>This is called a CI split<\/strong>. If there aren\u2019t any free CIs, VSAM\nhas to go through a <strong>CA split<\/strong>. To accomplish this, VSAM moves half\nof the CIs in the CA to empty space at the end of the cluster. The other half\nof the CIs stay in the original CA. With some clusters having upward of 90 CIs\nper CA, you can see where this turns into a lot of work.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"974\" height=\"543\" src=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture5.png\" alt=\"\" class=\"wp-image-139\" srcset=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture5.png 974w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture5-300x167.png 300w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture5-768x428.png 768w\" sizes=\"(max-width: 974px) 100vw, 974px\" \/><\/figure>\n\n\n\n<p id=\"H5\"><a><strong>VSAM files Versus DynamoDB<\/strong><\/a><\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table><thead><tr><td><strong>Virtual    Sequential Access Method (VSAM)<\/strong>    <\/td><td><strong>DynamoDB<\/strong>    <\/td><\/tr><\/thead><tbody><tr><td>VSAM datasets are   defined as clusters   <\/td><td>DynamoDB&#8217;s highest   level of data organization is a partition.   <\/td><\/tr><tr><td>VSAM clusters   contains the data as well as the index components   <\/td><td>DynamoDB&#8217;s base   table contains the partition key &amp; SORT key (Local secondary indexes) and   additional attributes identifying a table item.    <\/td><\/tr><tr><td>VSAM Path   definitions are definitions of alternate indexes   <\/td><td>DynamoDB GSI&#8217;s are   alternate\/secondary indexes    <\/td><\/tr><tr><td>Alternate Indexes   (path definitions) are defined after base VSAM cluster and key definitions   are made   <\/td><td>DynamoDB GSI&#8217;s are   provisioned after the base table is created   <\/td><\/tr><tr><td>Alternate indexes is   virtualization of the same VSAM data to be accessed by different key and   serve a distinct access pattern   <\/td><td>DynamoDB GSI&#8217;s are   definition of keys different from base table to serve a distinct access   pattern   <\/td><\/tr><tr><td>VSAM control   interval (CI) and control area (CA) are defined during the base cluster   definition<br>   <br>   .DEFINE CLUSTER (NAME(VSAM.FILE.NAME)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#8211;<br>   BLOCKS(number)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   &#8211;<br>VOLUMES(volume-serial)&nbsp; &nbsp;&#8211; Volume on which the cluster resides <br>[INDEXED \/ NONINDEXED \/ NUMBERED \/ LINEAR] \u2013 Type of dataset<br>RECSZ(average maximum)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#8211; Average   &amp; maximum record size<br>[FREESPACE(CI-Percentage, CA-Percentage)]&nbsp; &#8211; Free space definition<br>CISZ(number)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#8211;   Size of control interval   <\/td><td>DynamoDB partitions   are defined based on the provisioned throughput capacity using the below   logic &#8211;<br>   <br>InitialPartitions   (rounded up) = ( readCapacityUnits[RCU] \/ 3,000 ) + ( writeCapacityUnits[WCU]   \/ 1,000 ) <br><br>Each partition can support a maximum of 3,000 read capacity units (RCUs) or   1,000 write capacity units (WCUs) irrespective of the size of the data.    <\/td><\/tr><tr><td>\n  The CI size\n  definition of an alternate index is usually different from the base table\n  <\/td><td>The provisioned RCU   and WCU of the GSI is different from the base table   <\/td><\/tr><tr><td>VSAM control   interval (CI) and control area (CA) splits occur to accommodate new data into   the cluster by utilizing the free space allocated during cluster definition   <\/td><td>DynamoDB partitions   occur when provisioned RCUs and WCUs are increased to accommodate increase in   traffic and load or when the table partition reaches the 10GB limit.    <\/td><\/tr><tr><td>The RLS (record   level sharing) feature of VSAM allows it to be shared across different logical   partitions (LPARs) on an MVS system   <\/td><td>DynamoDB tables are   replicated across at least three availability zones in a region and based on   the usage of consistent read parameter during I\/Os you can get the most   current or stale data when reading the table.    <\/td><\/tr><tr><td>Based on initial   definition of the cluster, its possible for the VSAM cluster to undergo CI   size changes during file reorganizations   <\/td><td>Due to insufficient   provisioned capacity of RCU and WCUs, a DynamoDB table read or writes can   create hot partitions leading to incorrect results.   <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p id=\"H6\"><strong><a>Migrating VSAM files\/DB2 tables to DynamoDB<\/a><\/strong><\/p>\n\n\n\n<p>If you understand DynamoDB and are comfortable with\nthe simplicity involved in determining the complex access patterns, then it\nwill become apparent to you how easier it is de-normalize the data present in\nplethora of your VSAM files and design a single DynamoDB table that contains\nall your data. <\/p>\n\n\n\n<p>The\nsame logic can be used for DB2 tables. However, I wanted to touch the VSAM and\nflat (physical sequential \u2013 PS) files first due to a virtualization and\naccessibility struggle that organizations are going through in an effort to\nmake data that exists on mainframe VSAM &amp; flat files accessible to\ndistributed applications.<\/p>\n\n\n\n<p>IBM\u2019s\nnative SQL stored procedures have simplified how DB2 data is virtualized to\ndistributed applications. The REST (Representative State Transfer) native API\nimplemented in DB2 V 12 is a light weight interface using HTTP POST \/ GET\nrequest handling to drive SQL and stored procedures, with the result sets being\nreturned in JSON (Java Script Object Notation) format.<\/p>\n\n\n\n<p>Here\nis my very high level attempt at re-thinking data model of a trading\napplication spread across multiple VSAM files with varying fields and file\nlengths. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"974\" height=\"633\" src=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture6.png\" alt=\"\" class=\"wp-image-140\" srcset=\"https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture6.png 974w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture6-300x195.png 300w, https:\/\/reviewnprep.com\/blog\/wp-content\/uploads\/2020\/02\/Picture6-768x499.png 768w\" sizes=\"(max-width: 974px) 100vw, 974px\" \/><figcaption class=\"wp-element-caption\"><strong>Example<\/strong><\/figcaption><\/figure>\n\n\n\n<p>There are several use cases &amp; examples\navailable on the web for adopting single table design patterns using DynamoDB &#8211;<\/p>\n\n\n\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=HaEPXoXVf2k\">Advanced\ndesign patterns using DynamoDB \u2013 Reinvent Session<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=DIQVJqiSUkE\">Data Modeling\nwith DynamoDB \u2013 Reinvent Session<\/a><\/p>\n\n\n\n<p>Some of the examples shared during\nthe session will help you understand the NoSQL Database systems layer by layer.\n<\/p>\n\n\n\n<p id=\"H7\"><a><strong>Additional Examples and Further Reading<\/strong><\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/www.dynamodbguide.com\/about\/\">https:\/\/www.dynamodbguide.com\/about\/<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/www.alexdebrie.com\/\">https:\/\/www.alexdebrie.com\/<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/www.alexdebrie.com\/posts\/dynamodb-single-table\/\">DynamoDB\nSingle Table Design<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/workbench.html\">NoSQL\nWorkbench for DynamoDB<\/a> <\/p>\n\n\n\n<p>DB2 \u2013 NSPS \u2013<br>Future of computing<\/p>\n\n\n\n<p><a href=\"https:\/\/www.ibm.com\/support\/knowledgecenter\/SSEPEK_12.0.0\/apsg\/src\/tpc\/db2z_createnativesqlprocedure.html\">Creating -DB2\nNSPs<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/www.businesswire.com\/news\/home\/20170405006402\/en\/Rocket-Software-Unlocks-Mainframe-Data-API-Economy\">Rocket Software\n\u2013 VSAM Virtualization<\/a><\/p>\n\n\n\n<p id=\"H8\"><a><strong>Thank you<\/strong><\/a><\/p>\n\n\n\n<p>If there are any questions, concerns, suggestions or feedback please feel to email me or leave a comment under the blog. It will help me and other users!!<\/p>\n\n\n\n<p>Thank you!<\/p>\n\n\n\n<p><a href=\"https:\/\/reviewnprep.com\/blog\/cloud-101-for-mainframe-developers\/\">Also read Cloud 101 for Mainframe Developers<\/a><\/p>\n\n\n\n<p>AUTHOR: Mukesh is a cloud enthusiast with a bias for AWS. You can reach him<strong> <\/strong>on <a href=\"https:\/\/www.linkedin.com\/in\/mukesh-s-a399303\/\">LinkedIn<\/a>. <\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-medium-font-size\"><strong>Announcement from ReviewNPrep<\/strong>: Our Marketplace is live now. <a href=\"https:\/\/reviewnprep.com\/marketplace\/details\/aws-certified-solutiona-architecture-associate\/13\/EXAM\" target=\"_blank\" rel=\"noreferrer noopener\">Check out AWS Solutions Architect SAA-C02 Practice exam with over 800 questions to help you pass this exam<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Convert Mainframe VSAM to AWS DynamoDb<\/p>\n","protected":false},"author":1,"featured_media":163,"comment_status":"closed","ping_status":"closed","sticky":true,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,11,4,3],"tags":[18,6,26,27],"class_list":["post-134","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-aws","category-aws-solution-architect","category-certification-reviews","category-reviewnprep","tag-aws","tag-aws-cloud-certifications","tag-mainframe-to-aws","tag-vsam-to-dynamodb"],"_links":{"self":[{"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/posts\/134"}],"collection":[{"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/comments?post=134"}],"version-history":[{"count":32,"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/posts\/134\/revisions"}],"predecessor-version":[{"id":5583,"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/posts\/134\/revisions\/5583"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/media\/163"}],"wp:attachment":[{"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/media?parent=134"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/categories?post=134"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/reviewnprep.com\/blog\/wp-json\/wp\/v2\/tags?post=134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}