Software developers routinely handle large volumes of complex, heavily structured files filled with intricate syntaxes and little margin for error.

Does this sound like your own line of work…?

Here are some of the most common techniques developers use to streamline their workflow and how you can incorporate them when dealing with richly formatted, plain natural language.

  • Integrated Development Environment (IDE)
  • Syntax Highlighting
  • Brace Matching
  • Predictive Typing / Auto-Complete
  • Dynamic Version Control (Git/GitHub)

Integrated Development environment (IDE)

Software engineers and web developers customarily use what is referred to as an integrated development environment or “IDE”. As the name suggests, those platforms offer an integrated workspace for developers to edit, debug, compile or interpret source codes.

But the key takeaway for our own purposes is the ability to operate multiple files simultaneously within the same project environment. This means users can declare a dedicated “project” folder and effortlessly import, use, or refer to sections of text which were used in other text files pertaining to the same project.

This is known as the “DRY” principle (as in “don’t repeat yourself!”) which aims at reducing repetition of software patterns and replacing it with abstractions to avoid redundancy.

This allows for surgical precision when it comes to referencing certain variables or concepts defined in other sections without having to replicate definition sources.

This can also come in particularly handy in document-intensive industries, such as large-scale construction, infrastructure, healthcare, and banking which comprise of massive volumes of documents defining complex legal, financial, and technical mechanisms all relating to the same project.

In complex legal transactions, it is not uncommon to have few 100s references to other concepts in only few lines of text! And spread across multiple text files relating to the same subject matter. Applying the DRY concept would substantially reduce the risk for errors, which would not be a bad idea when dealing with language naturally designed to reflect certainty and predictability.

Syntax highlighting

Color-coded syntax highlighting is used by programmers to display source code in different colors, each associated with a unique meaning . This feature facilitates reading and writing in a structured language… one that is not open to interpretation.

Sounds like your own professional syntax?

Syntax highlighting makes both structures and syntax errors visually distinct and content becomes easier to read and understand. Highlighting does not affect the meaning of the text itself; it is intended only for the human eye, which proves to be effective when applied to other business (and social) languages just as well.

Syntax highlighting improves the readability and context of the text; especially for structured content that spans several pages. The reader can easily ignore large sections of comments, code or text and find errors only by skimming through pages.

Most IDEs, for example, highlight certain data types in pre-defined colors. Consequently, spotting a missing separator becomes much easier because of the contrasting color of the text.

Research shows that syntax highlighting significantly reduces the time taken for a programmer to internalize the semantics of a program, enabling them to pay less attention to standard syntactic components such as keywords.

Using unwynd, you can use syntax highlighting with plain natural language by either configuring your own syntax or using the default modes based on your own preferences.

Brace Matching

Brace matching (or bracket matching) is another important feature used predominantly among developers. This makes it simple to see if a brace has been left out or to locate the matching brace based on location of the cursor, sometimes by highlighting the pair in a different color.

The purpose is to help the writer or reader navigate through the written content and spot any improper matching, which would cause ambiguity or conflicts of language in heavily structured text documents (such as corporate and legal documentation).

This sounds trivial, but lawyers can sometimes spend years in costly litigation arguing about the parties’ intention in light of omitted brackets.

Predictive typing / Auto-complete

Predictive typing or sometimes referred to as “auto-complete” is an input technology used where one key or a set of keys are associated with certain pre-configured rules or concepts. Each key press results in a prediction or terms or sentences.

Auto-complete could allow for an entire word to be input by single keypress making the whole typing experience more efficient and, inevitably, more cost-effective. It makes efficient use of fewer device keys to input writing any types of files.

This can also considerably reduce the margin of errors when referring to previously defined concepts. To reduce drafting ambiguities, it is important to refer to defined concept with the precise concept that has been associated with the definition.

Predictive typing can play a key role in ensuring that defined terms, concepts, or other time/cost-sensitive concepts are used in the appropriate manner. Click here to learn more about what predictive typing can do for you.

Dynamic version control (Git/Github)

We would not cover the topic fully without addressing the world of version control. Who isn’t familiar with the good old file naming techniques “V1”, “Version 2”, “dated 04122020” or “comments by John D.”?

There is a solution for this too!

Git is a distributed version-control system for tracking changes in source files which has been around for 15 years and is designed for coordinating work among programmers. But it can also be used to track changes in any set of files in distributed, non-linear workflows.

With the use of pre-configured functions, a programmer can “commit” its own changes to a dedicated branch which can later be merged to the “master” branch once any possible conflicts have been addressed. Git will keep track of all changes made to a particular file across the entire lifecycle of the project workflow.

While Git is free and open-source software distributed under the terms of the GNU General Public License version 2, it requires a fair amount coding experience or at least some degree of proficiency in command-line interface. It is not readily available for use by non-developers.

Tagged : / / / /

We Created Document Dysfunction. it Is Time to Fix It.

It is time for some of us building software to take a hard look in the mirror.

​For years, we promised technology would solve the world’s information management problems, but 85% of business information is still “dark data,” potentially useful insights lost in a rising tide of disconnected documents, emails, Slack conversations, voice-to-text messages, and myriad other forms.

​As the digital transformation accelerates, the sheer volume and opacity of documents make it harder to ensure quality, consistency, accountability, and regulatory compliance.

We call this problem “document dysfunction,” and it affects nearly every type of organization, from finance to health care to real estate to government and more, impacting millions of citizens, customers and companies.

What does document dysfunction look like?

  • It’s a bank with thousands of loan documents, but zero visibility into the terms and conditions that impact the value of those loans.
  • A government agency with hundreds of project agreements that need to be audited and updated due to a regulatory change.
  • A commercial real estate firm with hundreds of contracts, but no insight into millions of dollars in underlying obligations.
  • A health care system with dozens of doctors spending “pajama time” every night recording and writing patient notes in a laborious and disconnected process.

​Now multiply those cases by hundreds of thousands of companies and organizations around the world. That’s document dysfunction.

It is not enough to shrug our shoulders and say “hey, we just make the tools… we are not responsible for how people use them.” It’s time for the tech industry to step up and help solve these problems.

Right now, there are lots of smart people working to use artificial intelligence to tackle mind-boggling problems like asteroid mining or AI enhanced humans.

We think that is great, but we are focused on using AI to solve much more mundane problems. We are a document engineering company, and we think AI can solve the information management problems that afflict businesses large and small.

​If that sounds boring compared to human settlements on Mars, that is okay.

We think “Boring AI” could be a pretty big deal.

We envision a world where documents that are written for humans can, quickly and securely thanks to AI, be understood as data by computers. Even better, we envision a world where AI helps people construct documents that are engineered for maximum data reuse from the start, fostering human creativity and unlocking billions of dollars in increased efficiency, improved compliance, and business insights for companies around the world.

We know we are not the only people thinking about these issues — researchers and academics and other luminaries have been raising these issues for years. But we think science and technology have advanced to the point where we can finally solve these problems.

​We see five principles that can lead us to more effective solutions:

First, we need to bring together multiple scientific domains in innovative and powerful ways.

Of course, we need to apply artificial intelligence in natural language processing using machine learning methods like neural networks or Bayesian techniques. But we also need other disciplines like image processing and recognition, semi-structured information, declarative markup, and even approaches inspired by natural sciences like the theories of cognition and evolution. Breaking down the walls and combining these disciplines will give us new ways to solve these very hard problems.

Second, instead of “Big Data,” we need AI that understands “Small Data”– the unique sets of business documents distinctive to individual companies.

There is a lot of this “Small Data,” and each company’s small data is different. What people call Big Data artificial intelligence these days is usually just highly supervised machine learning on massive datasets. The preparation of those datasets is labor intensive and prohibitively expensive for most individual companies. We need algorithms that are smart enough to figure out your specific documents in your company or even your division within your company, in a potentially small volume, with only minimal learning and guidance.

Third, this focus on company-specific “Small Data” will enable us to maintain the privacy and security of each individual customer.

As an industry, it’s fine to develop and hone algorithms using massive amounts of publicly available documents and data sets, but we should not use learning from one customer to train algorithms for use with other customers. At a time when some are looking to combine data from multiple customers to increase their insights, raising questions about privacy and security, we believe it is better to treat each customer’s data as its own unique universe.

Fourth, past attempts to use AI to try to solve business data and document problems have failed because they focused on the wrong altitude — helping to complete words or sentences instead of applying AI to the document as a whole.

Algorithms need to understand the structure and strategy behind a company’s business documents, not just the co-occurrence of individual words and phrases. If we can create tools that can understand the different portions of a document, and their unique usages in an individual company, COOs will have powerful new ways to accelerate performance, monitor accountability, and ensure legal and regulatory compliance.

And fifth, to be truly effective, we need solutions that do not disrupt existing workflows or require massive investments in staff training, IT development, or armies of consultants.

From the start, AI should enrich the tools and routines that frontline workers already use to get their work done. The past 50 years have proven that you cannot force employees to adapt to straitjacket templates, you have to provide solutions that fit into how they already work, and reduce their repetitive tasks to foster their creativity. The more users accept the AI’s help, the smarter and more helpful the AI will become. It’s a virtuous cycle.

​It is fashionable to say that “AI is going to take our jobs,” but we can do better. Companies that focus on AI to cut costs may do okay in the short run, but companies that use AI to empower their frontline workers and drive their strategic advantage will be the real winners.

The future isn’t about AI making human beings obsolete. The future is about AI making human beings and companies more productive, effective, and creative.

We don’t think that’s boring at all.

And we look forward to working with others across the industry to make that future a reality.

We want to start a public conversation about these issues.

Tell us your document dysfunction horror stories, or your dream for how technology could give you greater efficiency and control. Or maybe you completely disagree and have never met a dysfunctional document in your life. Or maybe you think our principles are all wrong — we’d still like to hear from you!

Join the document dysfunction conversation on Twitter.

Learn more at Docugami.com.

Tagged : / / / / /