Computer vision and machine learning have the potential to be real game-changers in surgery and healthcare. In this article, I'm going to outline how this technology has been integrated so far, how it’s going to smash the hurdles of the past, and lead us into the future!

This is a pretty weighty topic, but I’m going to break it down into a few easy-to-digest talking points:

Before we look into how this technology can really act as a solution to problems, let’s zoom in closer on the challenges. Historically, invasive surgery has had some pretty drastic long-term effects on patients.

Interested in seeing more? Download Max's presentation below.

Warning: The presentation contains some graphic images of surgery.

Computer Vision meets Robotics - The Future of Surgery
Max Allan, Senior Computer Vision Engineer at Intuitive, describes groundbreaking robotics innovations in surgery and the healthcare industry.

Invasive procedures

Retropubic prostatectomy

Here, a surgeon will make around a 10 to 15-centimeter incision from your belly button down to your pelvis. The aim is to remove the prostate and the lymph nodes. In the case of prostate surgery, it's normal to have around six weeks of recovery time after it finishes before you can go back to a normal life. That's pretty debilitating, right?


A surgeon needs to be able to access the oral cavity and the pharynx to deal with oropharyngeal tumors. The surgeon has to split open a patient's jaw, lip, and tongue in order to open the face-up.

It sounds drastic, I know, but it’s necessary to get both of their hands inside. Even after a year of reconstructive surgery, you’re looking at lifelong consequences of significant facial scarring.

Median sternotomy

This is a very common cardiothoracic operation. A surgeon will make a large incision in the middle of your chest, cutting through the sternum and opening up the ribcage. This is so they can access the heart and lungs.

Obviously, it is an enormously invasive operation. It would typically require around three months before a person can get back to basic mobility after you've had this operation. But even then, you’re not out of the woods, you’re gonna need around six months to get back to any semblance of normal life.

The main takeaway

No, I'm not describing this graphic stuff just to gross you out. The fact is, massively invasive operations can take a lot from patients. Just imagine this kind of trauma occurring to your body.

Imagine the long-term consequences for this kind of trauma: patients are having to suffer from long recovery times, post-operative pain, scarring, and blood loss.

In the most extreme circumstances, a surgeon needs to simply be able to get their hands directly on the diseased tissue. There isn’t really much concern for the long-term discomfort of the patient at all. So, what this has led to is…

Minimally invasive surgery

This has been around for about 30 to 40 years, and it's fairly common. Around 35% of surgeries are minimally invasive. Rather than making enormous incisions in a patient's body, instead, you make small keyhole ports.

Minimally invasive surgery

The surgeon will slide these long, thin tools called ‘laparoscopic instruments’ through these ports. These then slide along the laparoscopic camera through another port. They then treat the patient's tissue in a more remote way.

On the image on the left-hand side, you can see the surgeon is observing what’s going on inside the patient on a monitor.

Minimally invasive endoscopic prostatectomy

We're replacing that 10 to 15-centimeter incision down your abdomen with four or five single-centimeter ports. This cuts down the recovery time from six weeks to something more like four weeks. A significant reduction in recovery time!

Transoral endoscopic ultrasonic surgery

This is as opposed to the very graphic mandibulotomy I described in the first section. Rather than cutting open a patient's face, jaw, lips, and tongue, all the instruments and the cameras slide directly into the mouth. The end result of this is that the patient has no external scarring on their face.

Lifelong consequences, such as appearance changes, are very meaningful. When we talk about these long-term consequences, we're not just talking about physical damage to the patient, we’re also talking about potentially long-lasting psychological damage.

Thoracoscopic surgery

Rather than opening up your ribcage, the instruments in the camera pass through the intercostal space between your ribs. Obviously, if we can carry out a procedure on a patient without having to break open their rib cage that’s a massive plus.

If you weren’t convinced that the last procedure I showed was significantly better, this one cuts down the patient recovery time from months to weeks! It goes without saying that this is a massive game-changer in terms of the long-term impact on a patient’s quality of life.

Challenges in minimally invasive surgery

Challenges of minimally invasive surgery

This might lead you to ask, 'why aren't we doing all surgeries minimally invasively?’ Well, this is because minimally invasive surgery is very challenging. With very complicated procedures, such as when the diseased tissue is very vulnerable and very close to blood vessels, it can be a very complex operation.

Patients may not be eligible for minimally invasive surgery. This is especially true of overweight or elderly patients who may not be suitable candidates for this type of technique. So, why is minimally invasive treatment so difficult? Let’s go through some of the reasons below. 👇

The design of the instruments

Surgeons manipulate the tissue by passing surgical instruments through a keyhole port. Eventually, as they move the instrument around, they pivot around the port. It’s a complicated manoeuvre, and due to cognitive strain, plus the need or an increase in the number of surgeons, it’s harder for them to do complicated operations.

Tremors in the surgeon's hand

The margin for error is very slight, and the potential ramifications for a very slight error can be monumental.

The smallest tremor can be significant when you get down to the tip of the instrument. If you're working close to structures such as a blood vessel, or an artery, even a small mistake can be fatal for a patient.

The instruments themselves are fairly simplistic

Typically, the instruments used have a single degree of freedom manipulators and oftentimes, they’re a cut and grab. This means that you can do simple operations like picking up and manipulating tissue fairly straightforwardly.

But if you do something more complex, something that requires more dexterous control of the tissue, it's very hard with these sorts of instruments. Surgery can be thought of as very similar to stitching. Each beat demands looping motions with the needle and thread.

It's very hard to accomplish the type of motion with these types of instruments. Only very skilled or experienced surgeons are able to do complex procedures with minimally invasive instruments.

Depth perception

This is a challenge no one would probably ever think of. When a surgeon uses their naked eyes to look at someone's anatomy, they have depth perception. But imagine you’re looking at a TV monitor across the room, for example, depth perception becomes compromised.

Of course, it's fairly obvious it's very heavily linked to surgical skill is the ability to perceive depth. There are tools that you can utilize to improve depth perception, such as 3D glasses, but this will complicate the setup of the operating room. Unsurprisingly, this is not something that surgical teams typically like.

So, we’ve gone through some pretty significant challenges, now let’s go into some solutions and look at how engineers started solving this problem around 20 years ago.

Robotic minimally invasive surgery

Robotic minimally invasive surgery

There are typically two different systems that we look at when we get into the subject. These systems were used at the dawn of the 21st century: The ZEUS and Da Vinci system These two systems merged to create what we have today 👇

The history of the two systems

Both systems were teleoperated. As you can see in the image on the right, the surgeon sits distinct from the patient. They sit at a console, where they control the robot remotely. The design for both systems is fairly different. ZEUS chose to place the arms of the robot on the patient bed itself, whereas the Da Vinci surgical system placed the arms of the robot on a single tower.

The advantage of the Da Vinci design is that it keeps one side of the patient's bed free for access to surgical assistance.

These are the two first two systems cleared by the FDA in the early 2000s. And the two companies actually merged in early 2000. The ZEUS design was replaced by the Da Vinci Surgical System.

The Da Vinci XI system today

This is the most popular system today. It's been around since 2014, but the design hasn't changed enormously. Structurally, it's still pretty much the same system.

Patient cart

There are two tower-mounted arms held up on a boom. The advantage of this sort of design is that it allows the arms to be quickly repositioned around the patient for different types of operations. It really speeds up operations, and it means you don't have to spend as much time moving the robot around yourself.

Surgeon cart and manipulators

This is where the surgeon sits, and they put their eyes into this small viewer that you see right in the middle of the cart. In their hands, they hold the manipulators. These sit in the middle of the surgeon's cart.

Advantages of this system

It’s a popular, long-lasting system for a reason. The system allows surgeons to overcome many different problems they had in the past. Let’s break them down here.

Stereo HD viewer

In the past, surgeons had very limited depth perception with the monitors they used. The advantage of this viewer means that even those with relatively poor depth perception are able to conduct this kind of surgery. This obviously broadens the availability of this kind of surgery quite considerably and means that more patients are potentially able to access it.

Surgeons are seated in a comfortable position.

This might seem like only a small thing, but you have to remember that in the past surgeons were often carrying out these kinds of procedures standing up for as long as 8 hours.

To broaden the availability of this kind of procedure as much as possible, we obviously want to maximize and extend the career of a surgeon. This isn’t going to happen if surgeons are burning out by the time they reach their 40s.

The positioning of the viewer

The way that the system is designed means that the surgeon is always looking slightly down through the viewer. Their hands are visually coupled with the instruments on the display in front of them. It creates a strong immersive feeling for the surgeon.

The surgeon feels that their hands are where the instruments are. The advantage of this means that we don’t necessarily have to have the most dexterous surgeons performing the procedure.

Provides greater dexterity

The newest version of this system, the SPX, allows for motions that are not too dissimilar from how a surgeon's wrists would move. It’s able to access very narrow spaces, which makes it a very useful system for procedures like natural orifice surgery, for example.

It can pass through a single orifice of the patient's body and perform an operation without creating any external scarring.

The impact of this technology today

So, we’ve gone through how machine learning and computer vision can optimize surgery, and how they can have an impact on healthcare as a whole. The potential for computer vision and machine learning goes well beyond that point of care where the robot is just interacting with the surgeon during an operation.


Pre-Operative and Intra-Operative

We can actually bring in data sources from the whole patient journey, and this starts preoperatively. Before surgery, the patient might go through diagnostic imaging where a surgical plan is built. There's a lot of data from this point in time that isn't currently being used.

At this stage, we can use the image and the system data. There's potential to build a lot of useful applications that can actually impact the surgery as it happens.



Computer vision and machine learning can be used post-operatively once a patient has returned home to understand how that patient recovers from the operation. This is very important to enable surgeons to understand what surgical techniques actually have patient impact.

This is really one of the significant challenges of surgery. It can be very hard to gather enough data that shows this technique method is working. The surgeon actually creates benefits for a patient so that they can recover more quickly and have fewer complications. This post-operative data is extremely valuable.

Personalized planning

Personalized patient planning with IRIS

The medical corporation Intuitive has a product called IRIS. It connects to a hospital PAC system, which contains medical imaging data for a given patient. It allows a surgeon to request a segmentation of pre-operative imaging. You could ask for a segmentation of the tumors, organ structures, and blood vessels.


The surgeon can then validate that segmentation when they see it and visualize it in a 3D volume rendering environment on a mobile device. This enables them to actually plan the operation before they start. With the segmentation, surgeons can understand the relationship of blood vessels to tumors.

That’s a very important component of building a comprehensive plan for a patient. You need to understand the size of different structures in the patient's anatomy. It allows them to know what they're gonna do before they do the operation. It's also important for communicating to patients how an operation is going to happen.


This is a crucial aspect of surgeon-patient communication, but it also allows medical professionals to communicate the plan to other members of the staff. Right now this product is cleared for CT and kidneys. The plan for the future is that it will eventually support other modalities and other organs.

This is the start of a major paradigm shift in the medical community and the scope of it is only going to broaden over time.

How do we integrate this?

One of the big challenges that's existed in surgery is that it's very difficult to integrate preoperative planning data so that the surgeon can make use of it intra-operatively.

Traditionally, when a surgeon has a surgical plan, for them to actually bring that into their workflow, they may have to wheel a display monitor into the cart with them.

Then when they're doing the operation, they may have to pull up the 3D model and then manipulate it with a mouse and keyboard so that it gets rough in the right orientation. What they’re looking at needs to correspond with what the 3D volume rendering shows them. This is very time-consuming for a surgeon.


What you find with surgeons is that if it makes their workflow more complex, they just won't do it. I've even seen surgeons print out a super low-res rendering of a 3D model and just stick it to the side of the console.

One of the really great things about a platform like Da Vinci is that it has a camera system, and with the stereo HD viewer it becomes possible to build an augmented reality system effectively. In order to do this, you're basically taking these pre-operative segmented models, and then aligning them with the surgical video.

Manual integration

We have a tool where the surgeon can pick up the model with the master manipulators, and orient it with what they see. It's important to have this type of manual tool because sometimes in surgery, a patient's organs may be almost completely covered with fat, which makes doing any kind of automated computer vision-based alignment impossible.

It can be helpful to look at computer vision to track the camera and check the organ as the surgeon moves. This means you can automatically align the structures so that the pre-operative model visually aligns with what's there under the tissue.

Clinically defined stages

You can also break down operations into clinically defined stages. Think about those cooking tutorials on Youtube where you break down the cooking process into different parts. It's also possible to break down an operation into multiple phases where a surgeon might be performing a particular task.

Doing this involves taking representations of the images, and learning from the robot data to be able to understand exactly what a surgeon is doing at a given point in time. Let’s say, for example, the surgeon is looking at clamping blood vessels, which is commonly used in a lot of tumor surgery.

For this, you’re going to want to bring up a visualization of the segmentation of the blood vessels. This lets the surgeon know the branching structure of the vessels, so they know which vessels to block off blood to.

Final thoughts

The most crucial thing to take away from all this is that computer vision and machine learning in medical procedures is not a benefit that’s confined to the medical space. It’s something that’s going to impact society at large.

Needless to say, surgery is necessary, but if we’re being truthful, it’s no fun at all. It’s extremely traumatic for the patient and the families of the patient. Our goal is that machine learning and computer vision can be the greatest facilitator for minimally invasive surgery at large. Not just that, but for better practice in general. We’ve already made such great strides and we’re just getting started!