How We Automate 80-100% of Media Workflows with Cognitive Computing

Cognitive computing has been on a lot of minds lately. Looking into the capabilities of Artificial Intelligence to imitate human perception to some extent, the technology innovators have discovered that cognitive computing is a better fit for that.

We suddenly realized that a lot more can be done in that regard — instead of imitating only the perception, we can have technology make decisions like humans.

Sharing the idea among the team members of AIHunters, we have tasked ourselves with an ambition of cognitive business automation in the media and entertainment industry.

Let us take you on a tour of how we did that — deliver the solution that puts innovation towards optimizing the video processing and post-production, while pushing beyond the limitations of regular AI analysis.

Target industry acquired: media and entertainment

In the media and entertainment business, video content is at the center of everything. It’s what keeps us glued to our phones, TVs, and laptops for the vast majority of our free time, generating huge revenues and brand recognition.

Those hundreds of hours of video are created by humans for the most part. We get it: technology is nowhere near matching human creativity.

But what about routine work? Those tasks result in countless hours of manual work performed by hundreds of editors.

Answering that question, our AI scientists figured out a solution to reduce the time and costs of video processing and post-production: cognitive computing technology paired with a unified and well-structured workflow.

Let’s talk about the solution we came up with.

The AIHunters team has built a cloud platform that can analyze video content, make informed decisions, and act on them with no human assistance. We call it CognitiveMill™.

__CognitiveMill™__can analyze a tremendous variety of video content, relying on such technology as:

Deep learning;
Digital image processing;
Cognitive computer vision;
Traditional computer vision.

The solution chews through anything you throw at it. Sporting events, movies, TV shows, user-generated content, and live streams — it can do it all.

The problem with regular AI for video analysis

Trying to tackle the challenge, we came across a problem: the limitation of deep learning algorithms would not allow us to effectively use them for reasoning and adaptive decision-making.

Other traditional means of processing have also proven to be ineffective: neural networks are tied to the training sets, while complex end-to-end deep learning pipelines can act as the basis of cognitive automation only theoretically.

Dealing with video content of different types and genres is beyond the scope of mentioned technologies.

So, we had to figure out our own approach.

We looked into combining the best practices of deep learning, cognitive science, computer vision, probabilistic AI, and math modeling to imitate human behavior: our solution can not only see like a human, but also think and decide like one.

So — let us show you how CognitiveMill™ processes complex video data.

Cognitive computing is the decider-type

To start things off, we have divided the cognitive decision-making process into two stages:

Representations stage. Here, the cloud robot imitates the way humans focus on things and perceive them. It does so with the help of deep learning, digital image processing, cognitive and traditional computer vision, and emulating human eyes.
Cognitive decisions stage. Imitating the way the human brain works, the robot makes decisions based on content analysis. Facilitating the process, we apply probabilistic AI, cognitive science, machine perception, and math modeling.

The profound research of our AI scientists enabled them to create over 50 algorithms, laying the foundation for cognitive business automation.

And we are going about the process with ease of use in mind.

We developed separate modules that contain each of the cognitive abilities imitating human thought processes. That approach helps us easily re-use and combine modules to imitate various cognition flows down the line.

So you can forget about all of that retraining and re-modeling stuff. A couple of adjustments here and there and we are ready to tackle the new challenge.

Wrapping the tech into a scalable product

To further enhance the development process, we have built docker containers for each AI module. The productization functionality was moved to a special SDK that we support independently.

With all of that, the AI team doesn’t have to bother with frame rate or file system issues and focuses on developing math algorithms.

Furthermore, each of the modules is configured to facilitate a certain pipeline depending on a business case.

But this is not all we do to make ‌building our pipelines more efficient. On top of that, we rely on GPU transcoding module to:

Optimize resizing process to streamline computer vision and deep learning analysis;
Not be affected by media container differences and broken meta.

We have also added downloading media workers along with media container parsers to facilitate the cloud productization pipeline, orchestrating the whole thing with Kubernetes.

All of the data is stored with the help of EFS. The workers have access to a queue of module segments with tasks created for them.

At the head of it all is a central processing DB that relies on a set of scheduling microservices to control and prioritize the tasks.

Platform events are all logged in Kafka journal.

We manage the business logic of the platform through the QBIT (internal name) microservice. That’s where pipeline configuration and flow processing happens.

QBIT manages Kafka messages and makes changes to the DB accordingly.

Powering an outside communication layer, we have implemented an independent RTP media server for live stream ingestion.

To top it all off, we have created the ZOOLU (internal name) microservice that monitors Kubernetes and ROVAR microservice that builds dynamic web visualizers for any type of media cognitive automation output.

But that all is just scattered tech words. Let’s see how the entire process flows.

The system receives a new process request;
The process gets registered in Kafka journal;
The pipeline downloads the video asset which the customer has provided;
The downloaded file is transcoded into several files. Depending on the pipeline configuration, the files get different resolutions;
The pipeline creates a proxy to adapt the media for web UI visualization;
The platform creates a task for pipeline stages.

At the end of the workflow, depending on the pipeline configuration, you get a JSON file that contains metadata on highlight-worthy scenes, specific events, or time-markers that can be used for post-production, and so on.

After that is done, the EFS is automatically cleaned and the reusable files get moved to S3.

So what’s next?

You can take that JSON file to your infrastructure and use third-party software to further process it, or even use it in other CognitiveMill™ pipelines.

There is also nothing to tinker with to get CognitiveMill™ to work: you can easily create new processes through the API or web UI.

Apart from processing media content, CognitiveMill™ also offers options for monitoring, administration, and scaling the production for each customer.

Answering the media and entertainment demand

CognitiveMill™ offers a new way of content production and management for OTT platforms, broadcasters, TV channels, telecoms media producers, sports leagues, and everybody else involved in video content generation.

Our product solves a variety of problems that media and entertainment companies come across on a daily basis. Answering the demand for solution to major issues, we offer the platform that includes six products:

CognitiveReelz™ — creates sports highlights compilations automatically;
CognitiveSkip™ — enables viewers to skip side content and content creators to insert targeted and intelligent ads, as well as correct EPG;
CognitiveCrop™ — crops the video from landscape to portrait aspect ratio prevalent on social media;
CognitiveCast™ — identifies the cast members of the movie, marking main and secondary characters based on the plot;
CognitiveShapes™ — recognizes moving graphics based on your sample in live and recorded video content;
CognitiveNude™ — detects inappropriate content in videos and marks the segments in the timeline.

With this roster of tools, you can increase the effectiveness of video content production and management, automating the most trivial and mundane workflows.

Have a human shoot a video, and get a robot to process it. Sounds like a great deal to me.

Conclusion

It’s great and all, but what about the benefits the technology brings to the table for media and entertainment companies?

Let’s roll the numbers then.

Here’s what we managed to achieve in less than one year:

We built over 50 proprietary AI algorithms that are packed in more than 15 cognitive automation pipelines;
We cover 20+ business cases with our cognitive computing products;
Cognitive Mill™ is miles ahead of humans in terms of speed: it processes video content 50 times faster;
As of now, Cognitive Mill™ analyzes over 100 hours of content on a daily basis. Having analyzed the market's demand, we are now striving to reach 2000 hours of media per day.

With Cognitive Mill™ you get an automation solution that delivers the best video production features serving any purpose you might need.

Frankly speaking, the sky's the limit here.

Unlike regular AI that some might use for video content analysis, our cognitive computing cloud platform can be easily adapted to cover even more business cases. All of our pipelines can be customized and adapted to your use case in a very short time.

Much shorter compared to the time it takes to re-train the AI.

You can begin with cognitive business automation today — start with seeing how

Cognitive Mill™ works.