4

gcobol: a native COBOL compiler

 2 years ago
source link: https://lwn.net/Articles/887927/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

gcobol: a native COBOL compiler

[Posted March 15, 2022 by corbet]
The gcobol project has announced its existence; it is a compiler for the COBOL language currently implemented as a fork of GCC.

There's another answer to Why: because a free Cobol compiler is an essential component to any effort to migrate mainframe applications to what mainframe folks still call "distributed systems". Our goal is a Cobol compiler that will compile mainframe applications on Linux. Not a toy: a full-blooded replacement that solves problems. One that runs fast and whose output runs fast, and has native gdb support.

The developers hope to merge back into GCC after the project has advanced further.


(Log in to post comments)

gcobol: a native COBOL compiler

Posted Mar 15, 2022 17:04 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

Am I correct in interpreting the phrase "distributed systems" to mean "there's a whole separate computer on each person's desk instead of one big computer the size of a room?" If so, that is actually hilarious (but of course this is a real problem that needs to be solved...).

gcobol: a native COBOL compiler

Posted Mar 15, 2022 20:05 UTC (Tue) by k3ninho (subscriber, #50375) [Link]

I figured it means you break up the functions and modules to execute across containers and functions-as-a-service micro-VM's of a cloud provider.

gcobol: a native COBOL compiler

Posted Mar 16, 2022 19:56 UTC (Wed) by smurf (subscriber, #17840) [Link]

This is exactly what the banks are doing. They compartmentalize their legacy COBOL applications into nice little VMs – except that in some cases they're not-so-little because the one COBOL compiler that supports their old code cannot produce machine code that works on any current hardware. So yes, your accounts might *still* be balanced by some 60s machine getting emulated by the equivalent of qemu.

There's rumors of two levels of emulation out there (in production code!) but I don't really believe that until I see it.

gcobol: a native COBOL compiler

Posted Mar 16, 2022 20:54 UTC (Wed) by mpr22 (subscriber, #60784) [Link]

I am completely willing to believe that there is still someone running code for the IBM 14xx in an emulator that runs on a S/370 where the "S/370" is now just a VM on an IBM z in somebody's production data centre.

gcobol: a native COBOL compiler

Posted Mar 16, 2022 21:37 UTC (Wed) by smurf (subscriber, #17840) [Link]

Possibly, but the "S/370" is running native code in its VM (more or less), so it doesn't really count IMO.

Apple Macs are on their fourth CPU architecture by now; too bad nobody's running their M68k emulator on their PowerPC emulator on their x86 emulator on their M1 Mac.

gcobol: a native COBOL compiler

Posted Mar 17, 2022 13:21 UTC (Thu) by scientes (subscriber, #83068) [Link]

Don't forget that Apple screwed their users by switching to 32-bit Intel when 64-bit "Intel" (really AMD) CPUs existed, so you missed one transition.

gcobol: a native COBOL compiler

Posted Mar 18, 2022 12:30 UTC (Fri) by khim (subscriber, #9252) [Link]

Apple never had 64bit PowerPC laptops. Which means there never was the situation when someone may release purely 64bit app, they had to provide dual-mode apps.

That's what allowed it to switch, relatively painlessly, to 32bit Intel.

And Apple have done it on each transition: 68K Macs have never got 68060, thus PowerPC wasn't a slow-down even for emulated apps, PowerPC never got 64 bit laptops, thus Apple can go back to 32bit Intel… Intel Macs never got a modern GPU thus switch to “Apple silicone” was not a regression.

That's why Apple was able to transition form one CPU architecture to some other one, but with PC or Android that's impossible. Where you have full control over the platform that transition happens regularly. Just look on consoles.

gcobol: a native COBOL compiler

Posted Mar 17, 2022 23:42 UTC (Thu) by jschrod (subscriber, #1646) [Link]

I have seen 3 levels of emulation at the Dresdner Bank, more than 2 decades ago, during the Y2K frency.

FTR: Once I programmed Cobol, but I didn't compile. ;-)

gcobol: a native COBOL compiler

Posted Mar 18, 2022 1:32 UTC (Fri) by ncm (subscriber, #165) [Link]

Five levels of emulation used to be not unheard of. One may hope those have been retired, but if not we might instead have seven-level systems today. And, still faster than on original hardware.

The countervailing force would be that at some point one may expect an emulator to have been coded in a portable language, and the source code for that emulator not yet misplaced; then, an emulation could be moved sideways to a new host rather than itself being wrapped.

The source code for the original program, inner emulations, and specs for the machines and OSes emulated are of course all lost in time, along with will to read them if ever found.

gcobol: a native COBOL compiler

Posted Mar 16, 2022 8:01 UTC (Wed) by professor (subscriber, #63325) [Link]

With "distributed systems" we mean other platforms outside Z, today it is mostly X86 Linux and Windows systems.

gcobol: a native COBOL compiler

Posted Mar 16, 2022 13:31 UTC (Wed) by immibis (subscriber, #105511) [Link]

I guessed it meant a company with more than one server computer - say, one server per application instead of one really big server that runs all the applications.

I think modern mainframes are server clusters running cluster-aware operating systems that emulate old mainframes, but don't tell the admins that...

gcobol: a native COBOL compiler

Posted Mar 15, 2022 19:12 UTC (Tue) by Wol (subscriber, #4433) [Link]

In other words, "we want to move our haulage firm away from using artics to using porsches, so we can deliver the goods quicker".

Yup, decent Cobol on PCs is a goal worth having, but if the hardware isn't up to snuff (as I understood, mainframe CPUs are woefully underpowered compared to PCs), "real programs" won't be ported across because they'll be too slow.

"whose output runs fast" ... the whole point of a mainframe is it's *hard* to overwhelm the i/o bus. Port your mainframe program to a PC and chances are the i/o bus will collapse under the load.

Cheers,
Wol

gcobol: a native COBOL compiler

Posted Mar 15, 2022 19:25 UTC (Tue) by acarno (subscriber, #123476) [Link]

Perhaps it depends on what you define a "mainframe" as, but IBM's latest z15 mainframe system (at least that's how they market it) uses some rather powerful chips if Wikipedia is to be believed: https://en.wikipedia.org/wiki/IBM_z15_(microprocessor)

gcobol: a native COBOL compiler

Posted Mar 15, 2022 19:53 UTC (Tue) by pwfxq (subscriber, #84695) [Link]

The early mainframes used to have a lot of function specific processors to offload work from the main CPU. Although the CPU appeared sluggish, it had a lot less work to do. Some of them took the CISC architecture to the extreme and even had microcode where you could write your own instructions!

As acarno mentioned, looking at the specs of the Z15 CPU, it isn't a slouch. (e.g. 5.2GHz max fequency)

gcobol: a native COBOL compiler

Posted Mar 15, 2022 22:00 UTC (Tue) by bartoc (subscriber, #124262) [Link]

The wikipedia page indicates there's some kind of XML co-processor, which .... boy would I like to see some technical documentation on how that thing works.

gcobol: a native COBOL compiler

Posted Mar 15, 2022 22:11 UTC (Tue) by Wol (subscriber, #4433) [Link]

Shades of the MegaHurtz wars ... processor speed on its own means nothing ...

(AMD chips many moons ago were both more powerful, and slower, than the equivalent Intel. So Intel emphasized how fast their chips were to try and counter the fact that AMD were better. Have you heard of "pipeline stall"? AMD chips might have been slower, but they wasted far less effort on mis-prediction ...)

Okay, I would expect IBM chips to be state-of-the-art, but reading that article it looks like the processors can be assigned to all sorts of tasks. Including driving i/o. I think it still stands that porting programs from mainframe to PC is going to hit trouble on the i/o bus ...

Cheers,
Wol

gcobol: a native COBOL compiler

Posted Mar 16, 2022 0:12 UTC (Wed) by kenmoffat (subscriber, #4807) [Link]

Nicely put!

I used to earn my living as an IBM mainframe application programmer using mostly COBOL. Compared to a modern PC, the processors appear slow - but they could handle a massive amount of I/O. I see claims that farms of NVMe drives can do massive I/O, so perhaps such setups could use this. That depends, of course, on which variant of COBOL you wished to migrate from - ISTR IBM had a lot of extensions, and of course much of the fun stuff was weird:

I recall that we needed to copy files from a VSAM package, and the application had to read the -1 subscript (word-size) of a file definition (probably an ESDS, but it's more than 30 years ago) to find out how long a particular record was.

But tell that to the youth of today and they won't believe you ;-)

gcobol: a native COBOL compiler

Posted Mar 16, 2022 17:26 UTC (Wed) by Wol (subscriber, #4433) [Link]

Yup. The youth of today think that throwing hardware at it is the best solution. We've pretty much hit the limit with 5GHz processors - until we find a way of communicating faster than light chip speeds can't really get any faster. Communication across your typical motherboard is limited to about 1GHz for the same reason ...

Us old hands cut our teeth when you had to put smarts into the logic, not throw brute force at it. And that technique still pays dividends today ... Why do I spend so much time today watching Excel send a query to Oracle while I'm pining for Pick ...

"We can solve any problem by introducing an extra level of indirection."
"…except for the problem of too many levels of indirection."

Cheers,
Wol`

gcobol: a native COBOL compiler

Posted Mar 17, 2022 13:21 UTC (Thu) by scientes (subscriber, #83068) [Link]

The Power10 global memory addresses between a whole rack of computers is pretty impressive.

gcobol: a native COBOL compiler

Posted Mar 16, 2022 19:24 UTC (Wed) by Sesse (subscriber, #53779) [Link]

The Z-series CPUs are basically what you get when trying to make a CPU that is as fast as possible, without real regard for cost or power usage. So you end up with a super-wide, watercooled high-frequency chip with lots of accelerators. Everything is dialed up to 11 :-)

(This is not criticism; IBM is simply targeting a different design space from most other chip manufacturers.)

gcobol: a native COBOL compiler

Posted Mar 16, 2022 7:35 UTC (Wed) by anton (subscriber, #25547) [Link]

I have listened to several presentations (and I think for more than one project) about porting IBM mainframe programs to cheaper hardware by translating the binary to, e.g., C and then compiling that C program on a cheaper machine.

The advantage of this approach is that the machine language of IBM mainframes is a simpler language than Cobol and that it can be used for programs where the source has been lost, or where the original program was written in assembly language. The advantage of having a Cobol compiler is that you can maintain the result at the Cobol level (if you have Cobol programmers) instead of having to deal with whatever the Cobol compiler in combination with the translator produced.

My impression from the presentation(s?) at the project end was that the customer(s?) were satisfied with the result.

It may be that current IBM mainframe offer capabilities that your run-of-the-mill PC or even PC-based server does not have (although I think that many claims about mainframe superiority are pure bullshit, as is evidenced by the fact that these claims are rarely (if ever) supported by hard benchmark numbers), but a legacy program from the last century does not need the capabilities of current mainframes. And every PC now runs rings around last-century mainframes, including in I/O.

gcobol: a native COBOL compiler

Posted Mar 17, 2022 8:39 UTC (Thu) by Wol (subscriber, #4433) [Link]

> (although I think that many claims about mainframe superiority are pure bullshit, as is evidenced by the fact that these claims are rarely (if ever) supported by hard benchmark numbers)

The big problem here is "Where are the industry benchmarks?". And that is a SERIOUS problem.

As I know from the Pick space, everybody wants their database benchmarks to include JOINs. A concept Pick *does not have*, because joins are both expensive, and for Pick completely un-needed.

Any benchmark which asks "how fast can you do this expensive operation" will unfairly cripple a system which has no need for that particular operation.

(For Pick, the answer to "how fast can you do a join?" is "I don't. I do an indexed lookup. It's physically impossible to do it any faster!")

Cheers,
Wol

gcobol: a native COBOL compiler

Posted Mar 17, 2022 12:54 UTC (Thu) by nix (subscriber, #2304) [Link]

> (For Pick, the answer to "how fast can you do a join?" is "I don't. I do an indexed lookup. It's physically impossible to do it any faster!")

"I optimize all my queries by hand". OK, OK, that tells me everything I need to know about how you do complex queries (you mostly don't: it's too hard). Meanwhile SQL databases let you say what you want rather than how you want it done, and have the machine do the boring bits of picking the data out. Most of the time, even at enormous scale, this is good enough. I've seen people whip out queries that combine a hundred tables to get results needed just once and do it in a couple of hours. Doing that with Pick sounds like it would take weeks or simply be more or less impossible.

Pick had semi-competitors in the same era that worked the same way, like DBase. They died for a reason: the same reason people don't write whole systems in assembler any more. Computers can now do a good enough job. Not as good as could be done by the whole thing being written by hand, perhaps, but doing that is *so much harder* and takes so much more of the actually expensive human time rather than the nearly-free computer time that it's *obvious* that moving to getting the machine to optimize physical reads for you is the right approach.

gcobol: a native COBOL compiler

Posted Mar 17, 2022 13:50 UTC (Thu) by Wol (subscriber, #4433) [Link]

> > (For Pick, the answer to "how fast can you do a join?" is "I don't. I do an indexed lookup. It's physically impossible to do it any faster!")

> "I optimize all my queries by hand".

NO I DON'T.

Do you use calculated fields in SQL? I write a one-line calculated field in the table definition, and the database converts it into an indexed lookup.

Which means, in effect, I write the join ONCE, and it's there for any query that wants to use it. Contrast that to SQL, where I have to rewrite the join for every single view, and as I'm learning the hard way, it gives me ample opportunity to screw up every time.

In my new job, I am regularly writing horrible, complex SQL queries. That would be so much easier with a decent query language running on a proper database :-)

Yes it's ages ago, the story refers to a P90, but if it takes a team of consultants six months to get Oracle on a twin Xeon to run faster than Pick on a P90, there's something damn wrong there ...

> I've seen people whip out queries that combine a hundred tables to get results needed just once and do it in a couple of hours. Doing that with Pick sounds like it would take weeks or simply be more or less impossible.

Well, from my example above, I guess it would be more like ten minutes - and from an ORDINARY developer, not some whizz-kid genius (I notice you didn't say YOU could do that ...)

Another poster said here that once people start porting Cobol to PCs, they realise that just maybe a mainframe might be the right tool for the job. Most of the stories I hear about people porting Pick to relational are about how companies went bankrupt because they suddenly discovered that their IT department - despite having plenty of resource to manage the Pick system - was just way too small to cope with the resources to feed the hungry relational monster. And I think pretty much every study I've heard about (admittedly very few) showed that - for the same size company - Pick needed maybe a third (or less?) the resources of the equivalent relational system.

And a few years back some University or someone bought a Pick-style system for their astronomical catalog. Okay the alternative was Snoracle, but they had to seriously cheat - disabling indexes, doing batch updates, whatever whatever - to achieve the target 100K insertions/min. When the - I think it was Cache - system went live, it just breezed through 250K without breaking a sweat.

What's that quote I mentioned elsewhere "just adding another level of indirection will solve any problem ... except the problem of too many levels of indirection".

I'm hoping we'll soon announce an industrial-grade Free Pick system that people can play with, and if you truly approach it with an OPEN mind, I'm sure you'll be blown away by the simplicity and power.

Cheers,
Wol

gcobol: a native COBOL compiler

Posted Mar 17, 2022 20:15 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

gcobol: a native COBOL compiler

Posted Mar 17, 2022 22:42 UTC (Thu) by Wol (subscriber, #4433) [Link]

> > Do you use calculated fields in SQL? I write a one-line calculated field in the table definition, and the database converts it into an indexed lookup.

> You mean, like this: https://www.postgresql.org/docs/14/ddl-generated-columns....

Exactly. Except all in Pick they're all virtual.

Bear in mind, in Pick the table definition includes, in the definition of a column, the offset into the row. (In Pick, a row is an array.) Relational hides this from the user.

So instead of defining the location as an offset into the row, I can define it as a calculation that is evaluated on viewing. Let's assume I have two normal columns called PRICE and VAT.RATE, I can just define a third column VAT with the location "PRICE * VAT.RATE". Or if I have a column ITEM.CODE, I can define a column ITEM.NAME as "TRANS( ITEMS, ITEM.CODE, ITEM.NAME)" which means "read the table items, look up the item.code, return the item.name". And because it's key/value, it's a single request to disk ... there's absolutely no searching of the ITEMS file. That's why Pick doesn't have joins - anywhere SQL would use a join, Pick just uses a TRANS.

> Or this: https://www.postgresql.org/docs/14/rules-materializedview...

So my Pick table IS your materialised view, with up-to-date data, because I've predefined all items of interest as virtual generated columns. Of course, I can't update them through this view ... but postgresql has the same limitation, and I believe many RDBMSs have difficulty updating views ...

> Or this: https://www.postgresql.org/docs/14/sql-createtrigger.html

Ummm. Pick copied triggers from relational, so it did nick plenty of good ideas :-)

Cheers,
Wol

gcobol: a native COBOL compiler

Posted Mar 16, 2022 9:10 UTC (Wed) by joib (subscriber, #8541) [Link]

From what I've read mainframes have been moving towards commodity technology as much as possible, as the economics just isn't there to make everything custom like it was back in the day. CPU's are obviously the exception as they need to be custom in order to run the ISA. But I/O stuff is PCIe, FC, NVME just like you'd find in a high-end x86 server setup.

Maybe the mainframes have more PCIe lanes etc. than you'd find even in the highest end x86 server due to the market they're targeting, but that's a difference in degree not in kind. I'd guess there's lots of mainframe workloads that'd run just fine on a slightly lower end platform.

gcobol: a native COBOL compiler

Posted Mar 16, 2022 18:03 UTC (Wed) by mmaug (subscriber, #61003) [Link]

"whose output runs fast" ... the whole point of a mainframe is it's *hard* to overwhelm the i/o bus. Port your mainframe program to a PC and chances are the i/o bus will collapse under the load.

But a lot of the I/O load on the mainframe was memory management. Emulate the section swapping making it a no-op and your left with an I/O load that any JS code written by a script kiddie could make look pedestrian. We had constraints that we had to design and program for, those are not the constraints we have in this world...

gcobol: a native COBOL compiler

Posted Mar 15, 2022 22:13 UTC (Tue) by bartoc (subscriber, #124262) [Link]

I think this is quite lovely. There are a lot of pretty decent ideas in languages like cobol (and to a lesser extent Fortran and pascal, or even Ada) that have not gotten the attention they deserved because everyone was so distracted with C, C++, and java, and so forth (but not forth, I think).

It will be nice to see what kind of nifty code they can generate for the kinds of programs Cobol is suited to. Maybe it will make filing my taxes easier some day (afaict the US IRS electronic filing system uses COBOL quite extensively).

gcobol: a native COBOL compiler

Posted Mar 16, 2022 9:29 UTC (Wed) by sdalley (subscriber, #18550) [Link]

This sounds a really interesting project. James Lowden & Co. seem like an intrepid bunch!

> Eventually, if the gcc maintainers are interested,
> we would like to pursue full integration with gcc.
> For the moment, we have questions
> we're hoping can be answered here
> by those who ran the gauntlet before us.
> Given the state of the internals documentation,
> that seems like our best option.
> We've been rummaging around in the odd sock
> drawer for too long.

It would be interesting to hear whether they considered using LLVM as a foundation rather than gcc, as the Rust developers did, since it is reputedly more modular with better documented pieces. Was it because gcc has a lot more available back-ends? Do they have more of an inside track with the gcc developers and/or familiarity with gcc internals and can thus get more quickly up to speed?

James did say:

> I am happy to debate the lunacy of this project...

so I just thought I'd ask. At the end of the day it's absolutely their project and they can of course do what they wish.

gcobol: a native COBOL compiler

Posted Mar 17, 2022 17:20 UTC (Thu) by jklowden (guest, #107637) [Link]

Thanks for your generous comments. I can answer your question. :-)

We didn't know anyone working on GCC or have any prior experience extending GCC. My "read" on the project is that it would be amenable, culturally and technically, to adding new languages, as evidenced by the several new additions in recent years.

I do have some prior negative experience working with LLVM. Upon a time, I wanted to use the clang "toolkit" to produce a code-analysis database for C++ projects. Experience in large C++ shops proved that existing tools to analyze and navigate C++ projects are either based on C (or not even that), and thus can't distinguish A::foo from B::foo, or fall over when the corpus approaches 1 million SLOC.

For example: show all functions calling A::f(int) and their antecedents, back to main(). Show all calls to A::f(int) that provide an rvalue for the argument and ignore the returned value. There used to be tools to do that kind of thing in C (Digital's Source Code Analyzer, for one) but never a free one, and I've never heard of any for C++ at any price.

Clang at the time was thinly documented and already very complicated. I was unable to get off the ground with it. Because of that experience -- and, yes, because of the wide adoption and deployment of GCC -- we opted to base our project on GCC.

FTR: my experience is just my experience. Time has gone by, and the same clang river is not even there to step into again. You asked how we decided, and that's the answer.

gcobol: a native COBOL compiler

Posted Mar 17, 2022 18:24 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

> We didn't know anyone working on GCC or have any prior experience extending GCC

I would interested in a blog post or two discussing this aspect of the journey, how long did it take to be productive, what you found easy or difficult, what do you consider the positives of the architecture etc.

gcobol: a native COBOL compiler

Posted Mar 16, 2022 23:48 UTC (Wed) by professor (subscriber, #63325) [Link]

The problem with IBM Z/Mainframe is that you always have to defend it to outside people that have no idea what it is about! (Anyway they always in some way seem to have the "correct" opinion/knowledge about it somehow..mostly based on some stupid article somewhere i guess..)

IBM, please make the platform available for people not having $999999999 in their pocket please!.

I think this thing is good since it can get new people to be interested in COBOL and when they finally are in their finest conversion cloud they realize that "oh, this crap actually runs best on the IBM Z/Mainframe anyway so why are we not doing it? Why GOTO Java, crap, etc."

gcobol: a native COBOL compiler

Posted Mar 17, 2022 8:41 UTC (Thu) by Wol (subscriber, #4433) [Link]

+1

Universities don't have mainframes ...

(Didn't some ISP somewhere buy a bunch of mainframes as the cheapest way to provision/sell loads of managed servers for geeks like us running linux?)

Cheers,
Wol

gcobol: a native COBOL compiler

Posted Mar 18, 2022 0:00 UTC (Fri) by jccleaver (subscriber, #127418) [Link]

What we really need is a "mainframe to cloud native" converter, since in both cases we are simple lusers on some far off abstraction that we have to pay for CPU time on.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK