Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Xen: Devel

Load increase after memory upgrade (part2)

 

 

Xen devel RSS feed   Index | Next | Previous | View Threaded


carsten at schiers

Nov 24, 2011, 4:28 AM

Post #1 of 66 (484 views)
Permalink
Load increase after memory upgrade (part2)

Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.

 
We (now) speak about

 
* Xen 4.1.2
* Dom0 is Jeremy's 2.6.32.46 64 bit
* DomU in question is now 3.1.2 64 bit
* Same thing if DomU is also 2.6.32.46
* DomU owns two PCI cards (DVB-C) that o DMA
* Machine has 8GB, Dom0 pinned at 512MB

 
As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It

will be "close to normal" if I reduce the memory used to 4GB.

 
As you can see from the attachment, you once had an idea. So should we try to find something...?

 
Carsten.
 
-----Ursprüngliche Nachricht-----
An:konrad.wilk <konrad.wilk [at] oracle>;
CC:linux <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>;
Von:Carsten Schiers <carsten [at] schiers>
Gesendet:Mi 29.06.2011 23:17
Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> Lets first do the c) experiment as that will likely explain your load average increase.
...
> >c). If you want to see if the fault here lies in the bounce buffer
> being used more
> >often in the DomU b/c you have 8GB of memory now and you end up using
> more pages
> >past 4GB (in DomU), I can cook up a patch to figure this out. But an
> easier way is
> >to just do (on the Xen hypervisor line): mem=4G and that will make
> think you only have
> >4GB of physical RAM.  If the load comes back to the normal "amount"
> then the likely
> >culprit is that and we can think on how to fix this.

You are on the right track. Load was going down to "normal" 10% when reducing
Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
before.


konrad at darnok

Nov 25, 2011, 10:42 AM

Post #2 of 66 (471 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
>
> ??
> We (now) speak about
>
> ??
> * Xen 4.1.2
> * Dom0 is Jeremy's 2.6.32.46 64 bit
> * DomU in question is now 3.1.2 64 bit
> * Same thing if DomU is also 2.6.32.46
> * DomU owns two PCI cards (DVB-C) that o DMA
> * Machine has 8GB, Dom0 pinned at 512MB
>
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
>
> will be "close to normal" if I reduce the memory used to 4GB.

That is in the dom0 or just in general on the machine?
>
> ??
> As you can see from the attachment, you once had an idea. So should we try to find something...?

I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

1). The drivers allocoate/bounce/deallocate buffers on every interrupt
(bad, driver should be using some form of dma pool and most of the
ivtv do that)

2). The buffers allocated to the drivers are above the 4GB and we end
up bouncing it needlessly. That can happen if the dom0 has most of
the precious memory under 4GB. However, that is usually not the case
as the domain isusually allocated from the top of the memory. The
fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
before 3.1, the parameter would be ignored, so you had to use
'mem=XX' on the Linux command line as well.

3). Where did you get the load values? Was it dom0? or domU?



>
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk [at] oracle>;
> CC:linux <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>;
> Von:Carsten Schiers <carsten [at] schiers>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > Lets first do the c) experiment as that will likely explain your load average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up using
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But an
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal "amount"
> > then the likely
> > >culprit is that and we can think on how to fix this.
>
> You are on the right track. Load was going down to "normal" 10% when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> before.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Nov 25, 2011, 2:11 PM

Post #3 of 66 (473 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

I got the values in DomU. I will have

- aprox. 5% load in DomU with 2.6.34 Xenified Kernel
- aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached
- aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attached

I looked through my old mails from you and you explained already the necessity of double
bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
Xenified kernel not have this kind of issue?

The driver in question is nearly identical between the two kernel versions. It is in
Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in
question is:

/* allocate and init buffers */
av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus);
if (!av7110->debi_virt)
goto err_saa71466_vfree_4;

isn't it? I think the cards are constantly transferring the stream received through DMA.

I have set dom0_mem=512M by the way, shall I change that in some way?

I can try out some things, if you want me to. But I have no idea what to do and where to
start, so I rely on your help...

Carsten.

-----Ursprngliche Nachricht-----
Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Freitag, 25. November 2011 19:43
An: Carsten Schiers
Cc: xen-devel; konrad.wilk
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
>
> ??
> We (now) speak about
>
> ??
> * Xen 4.1.2
> * Dom0 is Jeremy's 2.6.32.46 64 bit
> * DomU in question is now 3.1.2 64 bit
> * Same thing if DomU is also 2.6.32.46
> * DomU owns two PCI cards (DVB-C) that o DMA
> * Machine has 8GB, Dom0 pinned at 512MB
>
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
>
> will be "close to normal" if I reduce the memory used to 4GB.

That is in the dom0 or just in general on the machine?
>
> ??
> As you can see from the attachment, you once had an idea. So should we try to find something...?

I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

1). The drivers allocoate/bounce/deallocate buffers on every interrupt
(bad, driver should be using some form of dma pool and most of the
ivtv do that)

2). The buffers allocated to the drivers are above the 4GB and we end
up bouncing it needlessly. That can happen if the dom0 has most of
the precious memory under 4GB. However, that is usually not the case
as the domain isusually allocated from the top of the memory. The
fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
before 3.1, the parameter would be ignored, so you had to use
'mem=XX' on the Linux command line as well.

3). Where did you get the load values? Was it dom0? or domU?



>
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk [at] oracle>;
> CC:linux <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>;
> Von:Carsten Schiers <carsten [at] schiers>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > Lets first do the c) experiment as that will likely explain your load average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up using
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But an
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal "amount"
> > then the likely
> > >culprit is that and we can think on how to fix this.
>
> You are on the right track. Load was going down to "normal" 10% when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> before.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Nov 26, 2011, 1:14 AM

Post #4 of 66 (472 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

To add (read from some munin statistics I made over the time):

- with load I mean the %CPU of xentop
- there is no change in CPU usage of the DomU or Dom0
- xenpm shows the core dedicated to that DomU is doing more work

Also I need to say that reduction to 4GB was performed by Xen parameter.

Carsten.


-----Ursprngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad [at] darnok]
Gesendet: Freitag, 25. November 2011 19:43
An: Carsten Schiers
Cc: konrad.wilk; xen-devel
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
>
> ??
> We (now) speak about
>
> ??
> * Xen 4.1.2
> * Dom0 is Jeremy's 2.6.32.46 64 bit
> * DomU in question is now 3.1.2 64 bit
> * Same thing if DomU is also 2.6.32.46
> * DomU owns two PCI cards (DVB-C) that o DMA
> * Machine has 8GB, Dom0 pinned at 512MB
>
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
>
> will be "close to normal" if I reduce the memory used to 4GB.

That is in the dom0 or just in general on the machine?
>
> ??
> As you can see from the attachment, you once had an idea. So should we try to find something...?

I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

1). The drivers allocoate/bounce/deallocate buffers on every interrupt
(bad, driver should be using some form of dma pool and most of the
ivtv do that)

2). The buffers allocated to the drivers are above the 4GB and we end
up bouncing it needlessly. That can happen if the dom0 has most of
the precious memory under 4GB. However, that is usually not the case
as the domain isusually allocated from the top of the memory. The
fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
before 3.1, the parameter would be ignored, so you had to use
'mem=XX' on the Linux command line as well.

3). Where did you get the load values? Was it dom0? or domU?



>
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk [at] oracle>;
> CC:linux <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>;
> Von:Carsten Schiers <carsten [at] schiers>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > Lets first do the c) experiment as that will likely explain your load average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up using
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But an
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal "amount"
> > then the likely
> > >culprit is that and we can think on how to fix this.
>
> You are on the right track. Load was going down to "normal" 10% when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> before.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad at darnok

Nov 28, 2011, 7:28 AM

Post #5 of 66 (470 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> I got the values in DomU. I will have
>
> - aprox. 5% load in DomU with 2.6.34 Xenified Kernel
> - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached
> - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attached

HA!

I just wonder if the issue is that the reporting of CPU spent is wrong.
Laszlo Ersek and Zhenzhong Duan have both reported a bug in the pvops
code when it came to account of CPU time.

>
> I looked through my old mails from you and you explained already the necessity of double
> bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> Xenified kernel not have this kind of issue?

That is a puzzle. It should not. The code is very much the same - both
use the generic SWIOTLB which has not changed for years.
>
> The driver in question is nearly identical between the two kernel versions. It is in
> Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in
> question is:
>
> /* allocate and init buffers */
> av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus);

Good. So it allocates it during init and uses it.
> if (!av7110->debi_virt)
> goto err_saa71466_vfree_4;
>
> isn't it? I think the cards are constantly transferring the stream received through DMA.

Yeah, and that memory is set aside for the life of the driver. So there
should be no bounce buffering happening (as it allocated the memory
below the 4GB mark).
>
> I have set dom0_mem=512M by the way, shall I change that in some way?

Does the reporting (CPU usage of DomU) change in any way with that?
>
> I can try out some things, if you want me to. But I have no idea what to do and where to
> start, so I rely on your help...
>
> Carsten.
>
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: xen-devel; konrad.wilk
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
>
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> >
> > ??
> > We (now) speak about
> >
> > ??
> > * Xen 4.1.2
> > * Dom0 is Jeremy's 2.6.32.46 64 bit
> > * DomU in question is now 3.1.2 64 bit
> > * Same thing if DomU is also 2.6.32.46
> > * DomU owns two PCI cards (DVB-C) that o DMA
> > * Machine has 8GB, Dom0 pinned at 512MB
> >
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> >
> > will be "close to normal" if I reduce the memory used to 4GB.
>
> That is in the dom0 or just in general on the machine?
> >
> > ??
> > As you can see from the attachment, you once had an idea. So should we try to find something...?
>
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
>
> 1). The drivers allocoate/bounce/deallocate buffers on every interrupt
> (bad, driver should be using some form of dma pool and most of the
> ivtv do that)
>
> 2). The buffers allocated to the drivers are above the 4GB and we end
> up bouncing it needlessly. That can happen if the dom0 has most of
> the precious memory under 4GB. However, that is usually not the case
> as the domain isusually allocated from the top of the memory. The
> fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
> before 3.1, the parameter would be ignored, so you had to use
> 'mem=XX' on the Linux command line as well.
>
> 3). Where did you get the load values? Was it dom0? or domU?
>
>
>
> >
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk [at] oracle>;
> > CC:linux <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>;
> > Von:Carsten Schiers <carsten [at] schiers>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce buffer
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end up using
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal "amount"
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> >
> > You are on the right track. Load was going down to "normal" 10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> > before.
>
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel [at] lists
> > http://lists.xensource.com/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad at darnok

Nov 28, 2011, 7:30 AM

Post #6 of 66 (471 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Sat, Nov 26, 2011 at 10:14:08AM +0100, Carsten Schiers wrote:
> To add (read from some munin statistics I made over the time):
>
> - with load I mean the %CPU of xentop
> - there is no change in CPU usage of the DomU or Dom0

Uhh, which matrix are using for that? CPU usage...? This is if you
change the DomU or the amount of memory the guest has? This is not
the load number (xentop value)?

> - xenpm shows the core dedicated to that DomU is doing more work
>
> Also I need to say that reduction to 4GB was performed by Xen parameter.
>
> Carsten.
>
>
> -----Urspr?ngliche Nachricht-----
> Von: Konrad Rzeszutek Wilk [mailto:konrad [at] darnok]
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: konrad.wilk; xen-devel
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
>
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> >
> > ??
> > We (now) speak about
> >
> > ??
> > * Xen 4.1.2
> > * Dom0 is Jeremy's 2.6.32.46 64 bit
> > * DomU in question is now 3.1.2 64 bit
> > * Same thing if DomU is also 2.6.32.46
> > * DomU owns two PCI cards (DVB-C) that o DMA
> > * Machine has 8GB, Dom0 pinned at 512MB
> >
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> >
> > will be "close to normal" if I reduce the memory used to 4GB.
>
> That is in the dom0 or just in general on the machine?
> >
> > ??
> > As you can see from the attachment, you once had an idea. So should we try to find something...?
>
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
>
> 1). The drivers allocoate/bounce/deallocate buffers on every interrupt
> (bad, driver should be using some form of dma pool and most of the
> ivtv do that)
>
> 2). The buffers allocated to the drivers are above the 4GB and we end
> up bouncing it needlessly. That can happen if the dom0 has most of
> the precious memory under 4GB. However, that is usually not the case
> as the domain isusually allocated from the top of the memory. The
> fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
> before 3.1, the parameter would be ignored, so you had to use
> 'mem=XX' on the Linux command line as well.
>
> 3). Where did you get the load values? Was it dom0? or domU?
>
>
>
> >
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk [at] oracle>;
> > CC:linux <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>;
> > Von:Carsten Schiers <carsten [at] schiers>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce buffer
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end up using
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal "amount"
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> >
> > You are on the right track. Load was going down to "normal" 10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> > before.
>
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel [at] lists
> > http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


Ian.Campbell at citrix

Nov 28, 2011, 7:40 AM

Post #7 of 66 (471 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:

> > I looked through my old mails from you and you explained already the necessity of double
> > bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> > Xenified kernel not have this kind of issue?
>
> That is a puzzle. It should not. The code is very much the same - both
> use the generic SWIOTLB which has not changed for years.

The swiotlb-xen used by classic-xen kernels (which I assume is what
Carsten means by "Xenified") isn't exactly the same as the stuff in
mainline Linux, it's been heavily refactored for one thing. It's not
impossible that mainline is bouncing something it doesn't really need
to.

It's also possible that the dma mask of the device is different/wrong in
mainline leading to such additional bouncing.

I guess it's also possible that the classic-Xen kernels are playing fast
and loose by not bouncing something they should (although if so they
appear to be getting away with it...) or that there is some difference
which really means mainline needs to bounce while classic-Xen doesn't.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Nov 28, 2011, 7:52 AM

Post #8 of 66 (471 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Hi,

 
let me try to explain a bit more. Here you see the output of my xentop munin graph for a

week. Only take a look at the bluish buckle. Notice the small step in front? So it's the CPU

permille used by the DomU that owns the cards. The small buckle is when I only put in

one PCI card. Afterwards it's constantly noticable higher load. See that Dom0 (green) is

not impacted. I am back to the Xenified kernel, as you can see.

 

 
In the next picture you see the output of xenpm visualized. So this might be an indicator that

realy something happens. It's only the core that I dedicated to that DomU. I have a three-core

AMD CPU by the way:

 

 
In CPU usage of the Dom0, there is nothing to see:

 

 
In CPU usage of the DomU, there is also not much to see, eventually a very slight change of

mix:

 

 
There is a slight increase in sleaping jobs at the time slot in question, I guess nothing we ca

directly map to the issue:

 

 
If you need other charts, I can try to produce them.

 
BR,
Carsten.

 
-----Ursprüngliche Nachricht-----
An:Carsten Schiers <carsten [at] schiers>; zhenzhong.duan [at] oracle; lersek [at] redhat;
CC:xen-devel <xen-devel [at] lists>; konrad.wilk <konrad.wilk [at] oracle>;
Von:Konrad Rzeszutek Wilk <konrad [at] darnok>
Gesendet:Mo 28.11.2011 16:33
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> I got the values in DomU. I will have
>
>   - aprox. 5% load in DomU with 2.6.34 Xenified Kernel
>   - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached
>   - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attached

HA!

I just wonder if the issue is that the reporting of CPU spent is wrong.
Laszlo Ersek and Zhenzhong Duan have both reported a bug in the pvops
code when it came to account of CPU time.

>
> I looked through my old mails from you and you explained already the necessity of double
> bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> Xenified kernel not have this kind of issue?

That is a puzzle. It should not. The code is very much the same - both
use the generic SWIOTLB which has not changed for years.
>
> The driver in question is nearly identical between the two kernel versions. It is in
> Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in
> question is:
>
>         /* allocate and init buffers */
>         av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus);

Good. So it allocates it during init and uses it.
>         if (!av7110->debi_virt)
>                 goto err_saa71466_vfree_4;
>
> isn't it? I think the cards are constantly transferring the stream received through DMA.

Yeah, and that memory is set aside for the life of the driver. So there
should be no bounce buffering happening (as it allocated the memory
below the 4GB mark).
>
> I have set dom0_mem=512M by the way, shall I change that in some way?

Does the reporting (CPU usage of DomU) change in any way with that?
>
> I can try out some things, if you want me to. But I have no idea what to do and where to
> start, so I rely on your help...
>
> Carsten.
>
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: xen-devel; konrad.wilk
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
>
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> >
> > ??
> > We (now) speak about
> >
> > ??
> > *Xen 4.1.2
> > *Dom0 is Jeremy's 2.6.32.46 64 bit
> > *DomU in question is now 3.1.2 64 bit
> > *Same thing if DomU is also 2.6.32.46
> > *DomU owns two PCI cards (DVB-C) that o DMA
> > *Machine has 8GB, Dom0 pinned at 512MB
> >
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> >
> > will be "close to normal" if I reduce the memory used to 4GB.
>
> That is in the dom0 or just in general on the machine?
> >
> > ??
> > As you can see from the attachment, you once had an idea. So should we try to find something...?
>
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
>
>  1). The drivers allocoate/bounce/deallocate buffers on every interrupt
>     (bad, driver should be using some form of dma pool and most of the
>     ivtv do that)
>
>  2). The buffers allocated to the drivers are above the 4GB and we end
>     up bouncing it needlessly. That can happen if the dom0 has most of
>     the precious memory under 4GB. However, that is usually not the case
>     as the domain isusually allocated from the top of the memory. The
>     fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
>     before 3.1, the parameter would be ignored, so you had to use
>     'mem=XX' on the Linux command line as well.
>
>  3). Where did you get the load values? Was it dom0? or domU?
>
>
>
> >
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk [at] oracle>;
> > CC:linux <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>;
> > Von:Carsten Schiers <carsten [at] schiers>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce buffer
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end up using
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal "amount"
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> >
> > You are on the right track. Load was going down to "normal" 10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> > before.
>
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel [at] lists
> > http://lists.xensource.com/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Nov 28, 2011, 8:45 AM

Post #9 of 66 (472 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:
> On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
>
> > > I looked through my old mails from you and you explained already the necessity of double
> > > bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> > > Xenified kernel not have this kind of issue?
> >
> > That is a puzzle. It should not. The code is very much the same - both
> > use the generic SWIOTLB which has not changed for years.
>
> The swiotlb-xen used by classic-xen kernels (which I assume is what
> Carsten means by "Xenified") isn't exactly the same as the stuff in
> mainline Linux, it's been heavily refactored for one thing. It's not
> impossible that mainline is bouncing something it doesn't really need
> to.

The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
being done. The alloc_coherent will allocate a nice page, underneath the 4GB
mark and give it to the driver. The driver can use it as it wishes and there
is no need to bounce buffer.

But I can't find the implementation of that in the classic Xen-SWIOTLB. It looks
as if it is using map_single which would be taking the memory out of the
pool for a very long time, instead of allocating memory and "swizzling" the MFNs.
[.Note, I looked at the 2.6.18 hg tree for classic, the 2.6.34 is probably
improved much better so let me check that]

Carsten, let me prep up a patch that will print some diagnostic information
during the runtime - to see how often it does the bounce, the usage, etc..

>
> It's also possible that the dma mask of the device is different/wrong in
> mainline leading to such additional bouncing.

If one were to use map_page and such - yes. But the alloc_coherent bypasses
that and ends up allocating it right under the 4GB (or rather it allocates
based on the dev->coherent_mask and swizzles the MFNs as required).

>
> I guess it's also possible that the classic-Xen kernels are playing fast
> and loose by not bouncing something they should (although if so they
> appear to be getting away with it...) or that there is some difference
> which really means mainline needs to bounce while classic-Xen doesn't.

<nods> Could be very well.
>
> Ian.
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


lersek at redhat

Nov 28, 2011, 8:58 AM

Post #10 of 66 (473 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On 11/28/11 16:40, Ian Campbell wrote:
> On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
>> On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
>
>>> I looked through my old mails from you and you explained already the necessity of double
>>> bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
>>> Xenified kernel not have this kind of issue?
>>
>> That is a puzzle. It should not. The code is very much the same - both
>> use the generic SWIOTLB which has not changed for years.
>
> The swiotlb-xen used by classic-xen kernels (which I assume is what
> Carsten means by "Xenified") isn't exactly the same as the stuff in
> mainline Linux, it's been heavily refactored for one thing. It's not
> impossible that mainline is bouncing something it doesn't really need
> to.

Please excuse me if I'm completely mistaken; my only point of reference
is that we recently had to backport
<http://xenbits.xensource.com/hg/linux-2.6.18-xen.hg/rev/940>.

> It's also possible that the dma mask of the device is different/wrong in
> mainline leading to such additional bouncing.

dma_alloc_coherent() -- which I guess is the precursor of
pci_alloc_consistent() -- asks xen_create_contiguous_region() to back
the vaddr range with frames machine-addressible inside the device's dma
mask. xen_create_contiguous_region() seems to land in a XENMEM_exchange
hypercall (among others). Perhaps this extra layer of indirection allows
the driver to use low pages directly, without bounce buffers.

> I guess it's also possible that the classic-Xen kernels are playing fast
> and loose by not bouncing something they should (although if so they
> appear to be getting away with it...) or that there is some difference
> which really means mainline needs to bounce while classic-Xen doesn't.

I'm sorry if what I just posted is painfully stupid. I'm taking the risk
for the 1% chance that it could be helpful.

Wrt. the idle time accounting problem, after Niall's two pings, I'm also
waiting for a verdict, and/or for myself finding the time and fishing
out the current patches.

Laszlo

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


JBeulich at suse

Nov 29, 2011, 12:31 AM

Post #11 of 66 (470 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

>>> On 28.11.11 at 17:45, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> But I can't find the implementation of that in the classic Xen-SWIOTLB.

linux-2.6.18-xen.hg/arch/i386/kernel/pci-dma-xen.c:dma_alloc_coherent().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Nov 29, 2011, 1:31 AM

Post #12 of 66 (475 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

I attached the actualy used 2.6.34 file here, if that helps. BR,C.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>;
CC:Ian Campbell <Ian.Campbell [at] citrix>; Konrad Rzeszutek Wilk <konrad [at] darnok>; xen-devel <xen-devel [at] lists>; zhenzhong.duan [at] oracle; lersek [at] redhat; Carsten Schiers <carsten [at] schiers>;
Von:Jan Beulich <JBeulich [at] suse>
Gesendet:Di 29.11.2011 09:52
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
>>> On 28.11.11 at 17:45, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> But I can't find the implementation of that in the classic Xen-SWIOTLB.

linux-2.6.18-xen.hg/arch/i386/kernel/pci-dma-xen.c:dma_alloc_coherent().

Jan
Attachments: pci-dma-xen.c (9.38 KB)


carsten at schiers

Nov 29, 2011, 1:37 AM

Post #13 of 66 (474 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

The swiotlb-xen used by classic-xen kernels (which I assume is what
Carsten means by "Xenified") isn't exactly the same as the stuff in
mainline Linux, it's been heavily refactored for one thing. It's not
impossible that mainline is bouncing something it doesn't really need
to.

Yes, it's a 2.6.34 kernel with Andrew Lyon's backported patches found here:

 
  http://code.google.com/p/gentoo-xen-kernel/downloads/list

 
GrC.


carsten at schiers

Nov 29, 2011, 1:42 AM

Post #14 of 66 (470 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

 

>   - with load I mean the %CPU of xentop
>   - there is no change in CPU usage of the DomU or Dom0

Uhh, which matrix are using for that? CPU usage...? This is if you
change the DomU or the amount of memory the guest has? This is not
the load number (xentop value)?

I had a quick look into the munin plugin. It reads the output of "xm li", Time in seconds and normalizes it.
But the effect is also visible in the CPU(%) column of xentop, if the DomU is on higher load.

 
BR,C.


carsten at schiers

Nov 29, 2011, 1:46 AM

Post #15 of 66 (476 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Carsten, let me prep up a patch that will print some diagnostic information
during the runtime - to see how often it does the bounce, the usage, etc..

 
Jup, looking forward to implementing it. I can include them into any kernel. 2.6.18 would be

a bit difficult though, as the driver pack isn't compatible any longer...so I'd prefer 2.6.34 Xenified

vs. 3.1.2 pvops.

 
BR,C.


Ian.Campbell at citrix

Nov 29, 2011, 2:23 AM

Post #16 of 66 (474 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Mon, 2011-11-28 at 16:45 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:
> > On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> >
> > > > I looked through my old mails from you and you explained already the necessity of double
> > > > bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> > > > Xenified kernel not have this kind of issue?
> > >
> > > That is a puzzle. It should not. The code is very much the same - both
> > > use the generic SWIOTLB which has not changed for years.
> >
> > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > mainline Linux, it's been heavily refactored for one thing. It's not
> > impossible that mainline is bouncing something it doesn't really need
> > to.
>
> The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> mark and give it to the driver. The driver can use it as it wishes and there
> is no need to bounce buffer.

Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
subset of swiotlb is in use then, all the bouncing stuff _should_ be
idle/unused -- but has that been confirmed?

>
> But I can't find the implementation of that in the classic Xen-SWIOTLB. It looks
> as if it is using map_single which would be taking the memory out of the
> pool for a very long time, instead of allocating memory and "swizzling" the MFNs.
> [.Note, I looked at the 2.6.18 hg tree for classic, the 2.6.34 is probably
> improved much better so let me check that]
>
> Carsten, let me prep up a patch that will print some diagnostic information
> during the runtime - to see how often it does the bounce, the usage, etc..
>
> >
> > It's also possible that the dma mask of the device is different/wrong in
> > mainline leading to such additional bouncing.
>
> If one were to use map_page and such - yes. But the alloc_coherent bypasses
> that and ends up allocating it right under the 4GB (or rather it allocates
> based on the dev->coherent_mask and swizzles the MFNs as required).
>
> >
> > I guess it's also possible that the classic-Xen kernels are playing fast
> > and loose by not bouncing something they should (although if so they
> > appear to be getting away with it...) or that there is some difference
> > which really means mainline needs to bounce while classic-Xen doesn't.
>
> <nods> Could be very well.
> >
> > Ian.
> >



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Nov 29, 2011, 7:33 AM

Post #17 of 66 (474 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Tue, Nov 29, 2011 at 10:23:18AM +0000, Ian Campbell wrote:
> On Mon, 2011-11-28 at 16:45 +0000, Konrad Rzeszutek Wilk wrote:
> > On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:
> > > On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> > >
> > > > > I looked through my old mails from you and you explained already the necessity of double
> > > > > bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> > > > > Xenified kernel not have this kind of issue?
> > > >
> > > > That is a puzzle. It should not. The code is very much the same - both
> > > > use the generic SWIOTLB which has not changed for years.
> > >
> > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > impossible that mainline is bouncing something it doesn't really need
> > > to.
> >
> > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > mark and give it to the driver. The driver can use it as it wishes and there
> > is no need to bounce buffer.
>
> Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> subset of swiotlb is in use then, all the bouncing stuff _should_ be
> idle/unused -- but has that been confirmed?

Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
Now I just need to find a moment to write it :-)

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad at darnok

Dec 2, 2011, 7:23 AM

Post #18 of 66 (458 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

> > > > > That is a puzzle. It should not. The code is very much the same - both
> > > > > use the generic SWIOTLB which has not changed for years.
> > > >
> > > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > > impossible that mainline is bouncing something it doesn't really need
> > > > to.
> > >
> > > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it wishes and there
> > > is no need to bounce buffer.
> >
> > Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
>
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
> Now I just need to find a moment to write it :-)

Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.
Attachments: swiotlb-debug.patch (10.3 KB)


carsten at schiers

Dec 4, 2011, 3:59 AM

Post #19 of 66 (455 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Thank you, Konrad.

I applied the patch to 3.1.2. In order to have a clear picture, I only enabled one PCI card.
The result is:

[ 28.028032] Starting SWIOTLB debug thread.
[ 28.028076] swiotlb_start_thread: Go!
[ 28.028622] xen_swiotlb_start_thread: Go!
[ 33.028153] 0 [budget_av 0000:00:00.0] bounce: from:555352(slow:0)to:0 map:329 unmap:0 sync:555352
[ 33.028294] SWIOTLB is 2% full
[ 38.028178] 0 budget_av 0000:00:00.0 alloc coherent: 4, free: 0
[ 38.028230] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981
[ 38.028352] SWIOTLB is 2% full
[ 43.028170] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[ 43.028310] SWIOTLB is 2% full
[ 48.028199] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981
[ 48.028334] SWIOTLB is 2% full
[ 53.028170] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[ 53.028309] SWIOTLB is 2% full
[ 58.028138] 0 [budget_av 0000:00:00.0] bounce: from:126994(slow:0)to:0 map:0 unmap:0 sync:126994
[ 58.028195] SWIOTLB is 2% full
[ 63.028170] 0 [budget_av 0000:00:00.0] bounce: from:121401(slow:0)to:0 map:0 unmap:0 sync:121401
[ 63.029560] SWIOTLB is 2% full
[ 68.028193] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981
[ 68.028329] SWIOTLB is 2% full
[ 73.028104] 0 [budget_av 0000:00:00.0] bounce: from:122717(slow:0)to:0 map:0 unmap:0 sync:122717
[ 73.028244] SWIOTLB is 2% full
[ 78.028191] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981
[ 78.028331] SWIOTLB is 2% full
[ 83.028112] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[ 83.028171] SWIOTLB is 2% full

Was that long enough? I hope this helps.

Carsten.

-----Ursprngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad [at] darnok]
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan [at] oracle; lersek [at] redhat
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

> > > > > That is a puzzle. It should not. The code is very much the same - both
> > > > > use the generic SWIOTLB which has not changed for years.
> > > >
> > > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > > impossible that mainline is bouncing something it doesn't really need
> > > > to.
> > >
> > > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it wishes and there
> > > is no need to bounce buffer.
> >
> > Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
>
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
> Now I just need to find a moment to write it :-)

Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Dec 4, 2011, 4:09 AM

Post #20 of 66 (452 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Here with two cards enabled and creating a bit "work" by watching TV with one oft hem:

[ 23.842720] Starting SWIOTLB debug thread.
[ 23.842750] swiotlb_start_thread: Go!
[ 23.842838] xen_swiotlb_start_thread: Go!
[ 28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
[ 28.841592] SWIOTLB is 4% full
[ 33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
[ 33.840283] SWIOTLB is 4% full
[ 33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
[ 38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[ 38.840361] SWIOTLB is 4% full
[ 43.840182] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[ 43.840323] SWIOTLB is 4% full
[ 48.840094] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
[ 48.840154] SWIOTLB is 4% full
[ 53.840160] 0 [budget_av 0000:00:01.0] bounce: from:119756(slow:0)to:0 map:0 unmap:0 sync:119756
[ 53.840301] SWIOTLB is 4% full
[ 58.840202] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[ 58.840339] SWIOTLB is 4% full
[ 63.840626] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[ 63.840686] SWIOTLB is 4% full
[ 68.840122] 0 [budget_av 0000:00:01.0] bounce: from:127323(slow:0)to:0 map:0 unmap:0 sync:127323
[ 68.840180] SWIOTLB is 4% full
[ 73.840647] 0 [budget_av 0000:00:01.0] bounce: from:211547(slow:0)to:0 map:0 unmap:0 sync:211547
[ 73.840784] SWIOTLB is 4% full
[ 78.840204] 0 [budget_av 0000:00:01.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962
[ 78.840344] SWIOTLB is 4% full
[ 83.840114] 0 [budget_av 0000:00:01.0] bounce: from:255304(slow:0)to:0 map:0 unmap:0 sync:255304
[ 83.840178] SWIOTLB is 4% full
[ 88.840158] 0 [budget_av 0000:00:01.0] bounce: from:256620(slow:0)to:0 map:0 unmap:0 sync:256620
[ 88.840302] SWIOTLB is 4% full
[ 93.840185] 0 [budget_av 0000:00:00.0] bounce: from:250040(slow:0)to:0 map:0 unmap:0 sync:250040
[ 93.840319] SWIOTLB is 4% full
[ 98.840181] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962
[ 98.841563] SWIOTLB is 4% full
[ 103.841221] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962
[ 103.841361] SWIOTLB is 4% full
[ 108.840247] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962
[ 108.840389] SWIOTLB is 4% full
[ 113.840157] 0 [budget_av 0000:00:00.0] bounce: from:261555(slow:0)to:0 map:0 unmap:0 sync:261555
[ 113.840298] SWIOTLB is 4% full
[ 118.840119] 0 [budget_av 0000:00:00.0] bounce: from:295442(slow:0)to:0 map:0 unmap:0 sync:295442
[ 118.840259] SWIOTLB is 4% full
[ 123.841025] 0 [budget_av 0000:00:00.0] bounce: from:295113(slow:0)to:0 map:0 unmap:0 sync:295113
[ 123.841164] SWIOTLB is 4% full
[ 128.840175] 0 [budget_av 0000:00:00.0] bounce: from:294784(slow:0)to:0 map:0 unmap:0 sync:294784
[ 128.840310] SWIOTLB is 4% full
[ 133.840194] 0 [budget_av 0000:00:00.0] bounce: from:293797(slow:0)to:0 map:0 unmap:0 sync:293797
[ 133.840330] SWIOTLB is 4% full
[ 138.840498] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[ 138.840637] SWIOTLB is 4% full
[ 143.840173] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[ 143.840313] SWIOTLB is 4% full
[ 148.840215] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831
[ 148.840355] SWIOTLB is 4% full
[ 153.840205] 0 [budget_av 0000:00:01.0] bounce: from:329658(slow:0)to:0 map:0 unmap:0 sync:329658
[ 153.840341] SWIOTLB is 4% full
[ 158.840137] 0 [budget_av 0000:00:00.0] bounce: from:342160(slow:0)to:0 map:0 unmap:0 sync:342160
[ 158.840277] SWIOTLB is 4% full
[ 163.841288] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[ 163.841424] SWIOTLB is 4% full
[ 168.840198] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[ 168.840339] SWIOTLB is 4% full
[ 173.840167] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[ 173.840304] SWIOTLB is 4% full
[ 178.840184] 0 [budget_av 0000:00:00.0] bounce: from:328013(slow:0)to:0 map:0 unmap:0 sync:328013
[ 178.840324] SWIOTLB is 4% full
[ 183.840129] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831
[ 183.840269] SWIOTLB is 4% full
[ 188.840123] 0 [budget_av 0000:00:01.0] bounce: from:340515(slow:0)to:0 map:0 unmap:0 sync:340515
[ 188.841647] SWIOTLB is 4% full
[ 193.840192] 0 [budget_av 0000:00:00.0] bounce: from:338541(slow:0)to:0 map:0 unmap:0 sync:338541
[ 193.840329] SWIOTLB is 4% full
[ 198.840148] 0 [budget_av 0000:00:01.0] bounce: from:330316(slow:0)to:0 map:0 unmap:0 sync:330316
[ 198.840230] SWIOTLB is 4% full
[ 203.840860] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831
[ 203.841000] SWIOTLB is 4% full
[ 208.840562] 0 [budget_av 0000:00:01.0] bounce: from:337883(slow:0)to:0 map:0 unmap:0 sync:337883
[ 208.840698] SWIOTLB is 4% full
[ 213.840171] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[ 213.840311] SWIOTLB is 4% full
[ 218.840214] 0 [budget_av 0000:00:01.0] bounce: from:320117(slow:0)to:0 map:0 unmap:0 sync:320117
[ 218.840354] SWIOTLB is 4% full
[ 223.840238] 0 [budget_av 0000:00:01.0] bounce: from:299390(slow:0)to:0 map:0 unmap:0 sync:299390
[ 223.840373] SWIOTLB is 4% full
[ 228.841415] 0 [budget_av 0000:00:01.0] bounce: from:298732(slow:0)to:0 map:0 unmap:0 sync:298732
[ 228.841560] SWIOTLB is 4% full
[ 233.840705] 0 [budget_av 0000:00:00.0] bounce: from:299061(slow:0)to:0 map:0 unmap:0 sync:299061
[ 233.840844] SWIOTLB is 4% full
[ 238.840145] 0 [budget_av 0000:00:01.0] bounce: from:293468(slow:0)to:0 map:0 unmap:0 sync:293468
[ 238.840280] SWIOTLB is 4% full

-----Ursprngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad [at] darnok]
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan [at] oracle; lersek [at] redhat
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

> > > > > That is a puzzle. It should not. The code is very much the same - both
> > > > > use the generic SWIOTLB which has not changed for years.
> > > >
> > > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > > impossible that mainline is bouncing something it doesn't really need
> > > > to.
> > >
> > > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it wishes and there
> > > is no need to bounce buffer.
> >
> > Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
>
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
> Now I just need to find a moment to write it :-)

Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Dec 4, 2011, 4:18 AM

Post #21 of 66 (454 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Should eventually mention that I create the DomU with only the parameter iommu=soft. I hope
Nothing more is required. For Xenified, it's swiotlb=32,force.

Carsten.

-----Ursprngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad [at] darnok]
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan [at] oracle; lersek [at] redhat
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

> > > > > That is a puzzle. It should not. The code is very much the same - both
> > > > > use the generic SWIOTLB which has not changed for years.
> > > >
> > > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > > impossible that mainline is bouncing something it doesn't really need
> > > > to.
> > >
> > > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it wishes and there
> > > is no need to bounce buffer.
> >
> > Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
>
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
> Now I just need to find a moment to write it :-)

Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Dec 5, 2011, 7:26 PM

Post #22 of 66 (456 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:
> Here with two cards enabled and creating a bit "work" by watching TV with one oft hem:
>
> [ 23.842720] Starting SWIOTLB debug thread.
> [ 23.842750] swiotlb_start_thread: Go!
> [ 23.842838] xen_swiotlb_start_thread: Go!
> [ 28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
> [ 28.841592] SWIOTLB is 4% full
> [ 33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
> [ 33.840283] SWIOTLB is 4% full
> [ 33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
> [ 38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310

Whoa. Yes. You are definitly using the bounce buffer :-)

Now it is time to look at why the drive is not using those coherent ones - it
looks to allocate just eight of them but does not use them.. Unless it is
using them _and_ bouncing them (which would be odd).

And BTW, you can lower your 'swiotlb=XX' value. The 4% is how much you
are using of the default size.

I should find out_why_ the old Xen kernels do not use the bounce buffer
so much...


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad at darnok

Dec 14, 2011, 12:23 PM

Post #23 of 66 (444 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Mon, Dec 05, 2011 at 10:26:21PM -0500, Konrad Rzeszutek Wilk wrote:
> On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:
> > Here with two cards enabled and creating a bit "work" by watching TV with one oft hem:
> >
> > [ 23.842720] Starting SWIOTLB debug thread.
> > [ 23.842750] swiotlb_start_thread: Go!
> > [ 23.842838] xen_swiotlb_start_thread: Go!
> > [ 28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
> > [ 28.841592] SWIOTLB is 4% full
> > [ 33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
> > [ 33.840283] SWIOTLB is 4% full
> > [ 33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
> > [ 38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
>
> Whoa. Yes. You are definitly using the bounce buffer :-)
>
> Now it is time to look at why the drive is not using those coherent ones - it
> looks to allocate just eight of them but does not use them.. Unless it is
> using them _and_ bouncing them (which would be odd).
>
> And BTW, you can lower your 'swiotlb=XX' value. The 4% is how much you
> are using of the default size.

So I able to see this with an atl1c ethernet driver on my SandyBridge i3
box. It looks as if the card is truly 32-bit so on a box with 8GB it
bounces the data. If I booted the Xen hypervisor with 'mem=4GB' I get no
bounces (no surprise there).

In other words - I see the same behavior you are seeing. Now off to:
>
> I should find out_why_ the old Xen kernels do not use the bounce buffer
> so much...

which will require some fiddling around.

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Dec 14, 2011, 2:07 PM

Post #24 of 66 (449 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Wed, Dec 14, 2011 at 04:23:51PM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 05, 2011 at 10:26:21PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:
> > > Here with two cards enabled and creating a bit "work" by watching TV with one oft hem:
> > >
> > > [ 23.842720] Starting SWIOTLB debug thread.
> > > [ 23.842750] swiotlb_start_thread: Go!
> > > [ 23.842838] xen_swiotlb_start_thread: Go!
> > > [ 28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
> > > [ 28.841592] SWIOTLB is 4% full
> > > [ 33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
> > > [ 33.840283] SWIOTLB is 4% full
> > > [ 33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
> > > [ 38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
> >
> > Whoa. Yes. You are definitly using the bounce buffer :-)
> >
> > Now it is time to look at why the drive is not using those coherent ones - it
> > looks to allocate just eight of them but does not use them.. Unless it is
> > using them _and_ bouncing them (which would be odd).
> >
> > And BTW, you can lower your 'swiotlb=XX' value. The 4% is how much you
> > are using of the default size.
>
> So I able to see this with an atl1c ethernet driver on my SandyBridge i3
> box. It looks as if the card is truly 32-bit so on a box with 8GB it
> bounces the data. If I booted the Xen hypervisor with 'mem=4GB' I get no
> bounces (no surprise there).
>
> In other words - I see the same behavior you are seeing. Now off to:
> >
> > I should find out_why_ the old Xen kernels do not use the bounce buffer
> > so much...
>
> which will require some fiddling around.

And I am not seeing any difference - the swiotlb is used with the same usage when
booting a classic (old style XEnoLinux) 2.6.32 vs using a brand new pvops (3.2).
Obviously if I limit the physical amount of memory (so 'mem=4GB' on Xen hypervisor
line), the bounce usage disappears. Hmm, I wonder if there is a nice way to
tell the hypervisor - hey, please stuff dom0 under 4GB.

Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?
Attachments: swiotlb-against-old-type.patch (7.61 KB)


carsten at schiers

Dec 15, 2011, 6:52 AM

Post #25 of 66 (444 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

...

> which will require some fiddling around.

Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?

Sure, it might take a bit, but I'll try it with my 2.6.34 classic kernel.

 
Carsten.


carsten at schiers

Dec 16, 2011, 6:56 AM

Post #26 of 66 (436 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Well, it will do nothing but print out “SWIOTLB is 0% full”.

 
Does that help? Or do you think something went wrong with the patch…

 
BR,

Carsten.

 
 
 
Von: Carsten Schiers
Gesendet: Donnerstag, 15. Dezember 2011 15:53
An: Konrad Rzeszutek Wilk; Konrad Rzeszutek Wilk
Cc: linux [at] eikelenboom; zhenzhong.duan [at] oracle; Ian Campbell; lersek [at] redhat; xen-devel
Betreff: AW: [Xen-devel] Load increase after memory upgrade (part2)

 
...

> which will require some fiddling around.

Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?

Sure, it might take a bit, but I'll try it with my 2.6.34 classic kernel.

 
Carsten.


konrad.wilk at oracle

Dec 16, 2011, 7:04 AM

Post #27 of 66 (434 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Fri, Dec 16, 2011 at 03:56:10PM +0100, Carsten Schiers wrote:
> Well, it will do nothing but print out “SWIOTLB is 0% full”.
>
>  
> Does that help? Or do you think something went wrong with the patch…
>

And you are using swiotlb=force on the 2.6.34 classic kernel and passing
in your budget-av card in it? Could you append the dmesg output please?


Thanks.
>  
> BR,
>
> Carsten.
>
>  
>  
>  
> Von: Carsten Schiers
> Gesendet: Donnerstag, 15. Dezember 2011 15:53
> An: Konrad Rzeszutek Wilk; Konrad Rzeszutek Wilk
> Cc: linux [at] eikelenboom; zhenzhong.duan [at] oracle; Ian Campbell; lersek [at] redhat; xen-devel
> Betreff: AW: [Xen-devel] Load increase after memory upgrade (part2)
>
>  
> ...
>
> > which will require some fiddling around.
>
> Here is the patch I used against classic XenLinux. Any chance you could run
> it with your classis guests and see what numbers you get?
>
> Sure, it might take a bit, but I'll try it with my 2.6.34 classic kernel.
>
>  
> Carsten.
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Dec 16, 2011, 7:51 AM

Post #28 of 66 (434 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

> And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?

Yes, two of them with swiotlb=32,force.


> Could you append the dmesg output please?

Attached. You find a "normal" boot after the one with the patched kernel.

Carsten.
Attachments: dmesg.txt (30.1 KB)


konrad.wilk at oracle

Dec 16, 2011, 8:19 AM

Post #29 of 66 (436 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?
>
> Yes, two of them with swiotlb=32,force.
>
>
> > Could you append the dmesg output please?
>
> Attached. You find a "normal" boot after the one with the patched kernel.

Uh, what happens when you run the driver, meaning capture stuff. I remember with
the pvops you had about ~30K or so of bounces, but not sure about the bootup?

Thanks for being willing to be a guinea pig while trying to fix this.
>
> Carsten.
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Dec 17, 2011, 2:12 PM

Post #30 of 66 (434 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance
to check that the patch is working? Does it print out something else with your setting? BR, Carsten.

-----Ursprngliche Nachricht-----
Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Freitag, 16. Dezember 2011 17:19
An: Carsten Schiers
Cc: linux [at] eikelenboom; xen-devel; lersek [at] redhat; zhenzhong.duan [at] oracle; Ian Campbell
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?
>
> Yes, two of them with swiotlb=32,force.
>
>
> > Could you append the dmesg output please?
>
> Attached. You find a "normal" boot after the one with the patched kernel.

Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup?

Thanks for being willing to be a guinea pig while trying to fix this.
>
> Carsten.
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


linux at eikelenboom

Dec 17, 2011, 4:19 PM

Post #31 of 66 (435 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
I'm not having much time now, hoping to get back with a full report soon.

--
Sander

Saturday, December 17, 2011, 11:12:45 PM, you wrote:

> OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance
> to check that the patch is working? Does it print out something else with your setting? BR, Carsten.

> -----Ursprngliche Nachricht-----
> Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
> Gesendet: Freitag, 16. Dezember 2011 17:19
> An: Carsten Schiers
> Cc: linux [at] eikelenboom; xen-devel; lersek [at] redhat; zhenzhong.duan [at] oracle; Ian Campbell
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

> On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
>> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?
>>
>> Yes, two of them with swiotlb=32,force.
>>
>>
>> > Could you append the dmesg output please?
>>
>> Attached. You find a "normal" boot after the one with the patched kernel.

> Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup?

> Thanks for being willing to be a guinea pig while trying to fix this.
>>
>> Carsten.
>>
>>



> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel




--
Best regards,
Sander mailto:linux [at] eikelenboom


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad at darnok

Dec 19, 2011, 6:54 AM

Post #32 of 66 (433 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Sat, Dec 17, 2011 at 11:12:45PM +0100, Carsten Schiers wrote:
> OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance
> to check that the patch is working? Does it print out something else with your setting? BR, Carsten.

Hm, and with the pvops you got some numbers along with tons of 'bounce'.

The one thing that I neglected in this patch is the alloc_coherent
part.. which I don't thing is that important as we did show that the
alloc buffers are used.

I don't have anything concrete yet, but after the holidays should have a
better idea of what is happening. Thanks for being willing to test
this!
>
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
> Gesendet: Freitag, 16. Dezember 2011 17:19
> An: Carsten Schiers
> Cc: linux [at] eikelenboom; xen-devel; lersek [at] redhat; zhenzhong.duan [at] oracle; Ian Campbell
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
>
> On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
> > > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?
> >
> > Yes, two of them with swiotlb=32,force.
> >
> >
> > > Could you append the dmesg output please?
> >
> > Attached. You find a "normal" boot after the one with the patched kernel.
>
> Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup?
>
> Thanks for being willing to be a guinea pig while trying to fix this.
> >
> > Carsten.
> >
> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad at darnok

Dec 19, 2011, 6:56 AM

Post #33 of 66 (435 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:
> I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
> And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
> I'm not having much time now, hoping to get back with a full report soon.

Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
when running as PV guest .. Will look in more details after the
holidays. Thanks for being willing to try it out.

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Jan 10, 2012, 1:55 PM

Post #34 of 66 (419 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Mon, Dec 19, 2011 at 10:56:09AM -0400, Konrad Rzeszutek Wilk wrote:
> On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:
> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
> > I'm not having much time now, hoping to get back with a full report soon.
>
> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
> when running as PV guest .. Will look in more details after the
> holidays. Thanks for being willing to try it out.

Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:

[ 771.896140] SWIOTLB is 11% full
[ 776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
[ 776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
[ 776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0

but interestingly enough, if I boot the guest as the first one I do not get these bounce
requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
numbers.


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


linux at eikelenboom

Jan 12, 2012, 2:06 PM

Post #35 of 66 (418 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Hello Konrad,

Tuesday, January 10, 2012, 10:55:33 PM, you wrote:

> On Mon, Dec 19, 2011 at 10:56:09AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:
>> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
>> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
>> > I'm not having much time now, hoping to get back with a full report soon.
>>
>> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
>> when running as PV guest .. Will look in more details after the
>> holidays. Thanks for being willing to try it out.

> Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:

> [ 771.896140] SWIOTLB is 11% full
> [ 776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
> [ 776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
> [ 776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0

> but interestingly enough, if I boot the guest as the first one I do not get these bounce
> requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
> numbers.


I started to expiriment some more with what i encountered.

On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module.
It was showing "12% full".
Checking in sysfs shows:
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
32
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
32

If i remember correctly wasn't the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?
Could that change cause the need for all devices to need bounce buffering and could it therefore explain some people seeing more cpu usage for dom0 ?

I have forced my r8169 to use 64bits dma mask (using use_dac=1)
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
32
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
64

This results in dump-swiotlb reporting:

[ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
[ 1265.625043] SWIOTLB is 0% full
[ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12
[ 1270.635024] SWIOTLB is 0% full
[ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
[ 1275.644261] SWIOTLB is 0% full
[ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10



So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ?


Another thing i was wondering about, couldn't the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU's to be used for DMA ?

(oh yes, i haven't got i clue what i'm talking about ... so it probably make no sense at all :-) )


--
Sander




_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


JBeulich at suse

Jan 13, 2012, 12:12 AM

Post #36 of 66 (421 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

>>> On 12.01.12 at 23:06, Sander Eikelenboom <linux [at] eikelenboom> wrote:
> Another thing i was wondering about, couldn't the hypervisor offer a small
> window in 32bit addressable mem to all (or only when pci passthrough is used)
> domU's to be used for DMA ?

How would use of such a range be arbitrated/protected? You'd have to
ask for reservation (aka allocation) of a chunk anyway, which is as good
as using the existing interfaces to obtain address restricted memory
(and the hypervisor has a [rudimentary] mechanism to preserve some
low memory for DMA allocations).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Jan 13, 2012, 7:13 AM

Post #37 of 66 (421 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

> >> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
> >> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
> >> > I'm not having much time now, hoping to get back with a full report soon.
> >>
> >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
> >> when running as PV guest .. Will look in more details after the
> >> holidays. Thanks for being willing to try it out.
>
> > Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:
>
> > [ 771.896140] SWIOTLB is 11% full
> > [ 776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
> > [ 776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
> > [ 776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0
>
> > but interestingly enough, if I boot the guest as the first one I do not get these bounce
> > requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
> > numbers.
>
>
> I started to expiriment some more with what i encountered.
>
> On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module.
> It was showing "12% full".
> Checking in sysfs shows:
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
> 32
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
> 32
>
> If i remember correctly wasn't the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?

? We never actually had dom0 support in the upstream kernel until 2.6.37.. The 2.6.32<->2.6.36 you are
referring to must have been the trees that I spun up - but the implementation of SWIOTLB in them
had not really changed.

> Could that change cause the need for all devices to need bounce buffering and could it therefore explain some people seeing more cpu usage for dom0 ?

The issue I am seeing is not CPU usage in dom0, but rather the CPU usage in domU with guests.
And that the older domU's (XenOLinux) do not have this.

That I can't understand - the implementation in both cases _looks_ to do the same thing.
There was one issue I found in the upstream one, but even with that fix I still
get that "bounce" usage in domU.

Interestingly enough, I get that only if I have launched, destroyed, launched, etc, the guest multiple
times before I get this. Which leads me to believe this is not a kernel issue but that we
are simply fragmented the Xen memory so much, so that when it launches the guest all of the
memory is above 4GB. But that seems counter-intuive as by default Xen starts guests at the far end of
memory (so on my 16GB box it would stick a 4GB guest at 12GB->16GB roughly). The SWIOTLB
swizzles some memory under the 4GB , and this is where we get the bounce buffer effect
(as the memory from 4GB is then copied to the memory 12GB->16GB).

But it does not explain why on the first couple of starts I did not see this with pvops.
And it does not seem to happen with the XenOLinux kernel, so there must be something else
in here.

>
> I have forced my r8169 to use 64bits dma mask (using use_dac=1)

Ah yes.
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
> 32
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
> 64
>
> This results in dump-swiotlb reporting:
>
> [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
> [ 1265.625043] SWIOTLB is 0% full
> [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12
> [ 1270.635024] SWIOTLB is 0% full
> [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
> [ 1275.644261] SWIOTLB is 0% full
> [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10

Which is what we expect. No need to bounce since the PCI adapter can reach memory
above the 4GB mark.

>
>
>
> So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ?

The bouncing can happen due to two cases:
- Memory is above 4GB
- Memory crosses a page-boundary (rarely happens).
>
>
> Another thing i was wondering about, couldn't the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU's to be used for DMA ?

It does. That is what the Xen SWIOTLB does with "swizzling" the pages in its pool.
But it can't do it for every part of memory. That is why there are DMA pools
which are used by graphics adapters, video capture devices,storage and network
drivers. They are used for small packet sizes so that the driver does not have
to allocate DMA buffers when it gets a 100bytes ping response. But for large
packets (say that ISO file you are downloading) it allocates memory on the fly
and "maps" it into the PCI space using the DMA API. That "mapping" sets up
an "physical memory" -> "guest memory" translation - and if that allocated
memory is above 4GB, part of this mapping is to copy ("bounce") the memory
under the 4GB (where XenSWIOTLB has allocated a pool), so that the adapter
can physically fetch/put the data. Once that is completed it is "sync"-ed
back, which is bouncing that data to the "allocated memory".

So having a DMA pool is very good - and most drivers use it. The thing I can't
figure out is:
- why the DVB do not seem to use it, even thought they look to use the videobuf_dma
driver.
- why the XenOLinux does not seem to have this problem (and this might be false -
perhaps it does have this problem and it just takes a couple of guest launches,
destructions, starts, etc to actually see it).
- are there any flags in the domain builder to say: "ok, this domain is going to
service 32-bit cards, hence build the memory from 0->4GB". This seems like
a good know at first, but it probably is a bad idea (imagine using it by mistake
on every guest). And also nowadays most cards are PCIe and they can do 64-bit, so
it would not be that important in the future.
>
> (oh yes, i haven't got i clue what i'm talking about ... so it probably make no sense at all :-) )

Nonsense. You were on the correct path . Hopefully the level of details hasn't
scared you off now :-)

>
>
> --
> Sander
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


linux at eikelenboom

Jan 15, 2012, 3:32 AM

Post #38 of 66 (417 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Friday, January 13, 2012, 4:13:07 PM, you wrote:

>> >> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
>> >> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
>> >> > I'm not having much time now, hoping to get back with a full report soon.
>> >>
>> >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
>> >> when running as PV guest .. Will look in more details after the
>> >> holidays. Thanks for being willing to try it out.
>>
>> > Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:
>>
>> > [ 771.896140] SWIOTLB is 11% full
>> > [ 776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
>> > [ 776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
>> > [ 776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0
>>
>> > but interestingly enough, if I boot the guest as the first one I do not get these bounce
>> > requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
>> > numbers.
>>
>>
>> I started to expiriment some more with what i encountered.
>>
>> On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module.
>> It was showing "12% full".
>> Checking in sysfs shows:
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
>> 32
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
>> 32
>>
>> If i remember correctly wasn't the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?

> ? We never actually had dom0 support in the upstream kernel until 2.6.37.. The 2.6.32<->2.6.36 you are
> referring to must have been the trees that I spun up - but the implementation of SWIOTLB in them
> had not really changed.

>> Could that change cause the need for all devices to need bounce buffering and could it therefore explain some people seeing more cpu usage for dom0 ?

> The issue I am seeing is not CPU usage in dom0, but rather the CPU usage in domU with guests.
> And that the older domU's (XenOLinux) do not have this.

> That I can't understand - the implementation in both cases _looks_ to do the same thing.
> There was one issue I found in the upstream one, but even with that fix I still
> get that "bounce" usage in domU.

> Interestingly enough, I get that only if I have launched, destroyed, launched, etc, the guest multiple
> times before I get this. Which leads me to believe this is not a kernel issue but that we
> are simply fragmented the Xen memory so much, so that when it launches the guest all of the
> memory is above 4GB. But that seems counter-intuive as by default Xen starts guests at the far end of
> memory (so on my 16GB box it would stick a 4GB guest at 12GB->16GB roughly). The SWIOTLB
> swizzles some memory under the 4GB , and this is where we get the bounce buffer effect
> (as the memory from 4GB is then copied to the memory 12GB->16GB).

> But it does not explain why on the first couple of starts I did not see this with pvops.
> And it does not seem to happen with the XenOLinux kernel, so there must be something else
> in here.

>>
>> I have forced my r8169 to use 64bits dma mask (using use_dac=1)

> Ah yes.
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
>> 32
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
>> 64
>>
>> This results in dump-swiotlb reporting:
>>
>> [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
>> [ 1265.625043] SWIOTLB is 0% full
>> [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12
>> [ 1270.635024] SWIOTLB is 0% full
>> [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
>> [ 1275.644261] SWIOTLB is 0% full
>> [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10

> Which is what we expect. No need to bounce since the PCI adapter can reach memory
> above the 4GB mark.

>>
>>
>>
>> So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ?

> The bouncing can happen due to two cases:
> - Memory is above 4GB
> - Memory crosses a page-boundary (rarely happens).
>>
>>
>> Another thing i was wondering about, couldn't the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU's to be used for DMA ?

> It does. That is what the Xen SWIOTLB does with "swizzling" the pages in its pool.
> But it can't do it for every part of memory. That is why there are DMA pools
> which are used by graphics adapters, video capture devices,storage and network
> drivers. They are used for small packet sizes so that the driver does not have
> to allocate DMA buffers when it gets a 100bytes ping response. But for large
> packets (say that ISO file you are downloading) it allocates memory on the fly
> and "maps" it into the PCI space using the DMA API. That "mapping" sets up
> an "physical memory" -> "guest memory" translation - and if that allocated
> memory is above 4GB, part of this mapping is to copy ("bounce") the memory
> under the 4GB (where XenSWIOTLB has allocated a pool), so that the adapter
> can physically fetch/put the data. Once that is completed it is "sync"-ed
> back, which is bouncing that data to the "allocated memory".


> So having a DMA pool is very good - and most drivers use it. The thing I can't
> figure out is:
> - why the DVB do not seem to use it, even thought they look to use the videobuf_dma
> driver.
> - why the XenOLinux does not seem to have this problem (and this might be false -
> perhaps it does have this problem and it just takes a couple of guest launches,
> destructions, starts, etc to actually see it).
> - are there any flags in the domain builder to say: "ok, this domain is going to
> service 32-bit cards, hence build the memory from 0->4GB". This seems like
> a good know at first, but it probably is a bad idea (imagine using it by mistake
> on every guest). And also nowadays most cards are PCIe and they can do 64-bit, so
> it would not be that important in the future.
>>
>> (oh yes, i haven't got i clue what i'm talking about ... so it probably make no sense at all :-) )

> Nonsense. You were on the correct path . Hopefully the level of details hasn't
> scared you off now :-)

Well it only gives some more questions :-)
The thing is, pci passthrough and especially the DMA part of it, all work behind the scenes without giving much output about the way it is actually working.

The thing i was wondering about is if my AMD IOMMU is actually doing something for PV guests.
When booting with iommu=off machine has 8GB mem, dom0 limited to 1024M and just starting one domU with iommu=soft, with pci-passthrough and the USB pci-cards with USB videograbbers attached to it, i would expect to find some bounce buffering going.

(HV_START_LOW 18446603336221196288)
(FEATURES '!writable_page_tables|pae_pgdir_above_4gb')
(VIRT_BASE 18446744071562067968)
(GUEST_VERSION 2.6)
(PADDR_OFFSET 0)
(GUEST_OS linux)
(HYPERCALL_PAGE 18446744071578849280)
(LOADER generic)
(SUSPEND_CANCEL 1)
(PAE_MODE yes)
(ENTRY 18446744071594476032)
(XEN_VERSION xen-3.0)

Still i only see:

[ 47.449072] Starting SWIOTLB debug thread.
[ 47.449090] swiotlb_start_thread: Go!
[ 47.449262] xen_swiotlb_start_thread: Go!
[ 52.449158] 0 [ehci_hcd 0000:0a:00.3] bounce: from:432(slow:0)to:1329 map:1756 unmap:1781 sync:0
[ 52.449180] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:16 map:23 unmap:0 sync:0
[ 52.449187] 2 [ohci_hcd 0000:0a:00.4] bounce: from:0(slow:0)to:4 map:5 unmap:0 sync:0
[ 52.449226] SWIOTLB is 0% full
[ 57.449180] 0 ehci_hcd 0000:0a:00.3 alloc coherent: 35, free: 0
[ 57.449219] 1 ohci_hcd 0000:0a:00.6 alloc coherent: 1, free: 0
[ 57.449265] SWIOTLB is 0% full
[ 62.449176] SWIOTLB is 0% full
[ 67.449336] SWIOTLB is 0% full
[ 72.449279] SWIOTLB is 0% full
[ 77.449121] SWIOTLB is 0% full
[ 82.449236] SWIOTLB is 0% full
[ 87.449242] SWIOTLB is 0% full
[ 92.449241] SWIOTLB is 0% full
[ 172.449102] 0 [ehci_hcd 0000:0a:00.7] bounce: from:3839(slow:0)to:664 map:4486 unmap:4617 sync:0
[ 172.449123] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:82 map:111 unmap:0 sync:0
[ 172.449130] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:32 map:36 unmap:0 sync:0
[ 172.449170] SWIOTLB is 0% full
[ 177.449109] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5348(slow:0)to:524 map:5834 unmap:5952 sync:0
[ 177.449131] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:76 map:112 unmap:0 sync:0
[ 177.449138] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:4 map:6 unmap:0 sync:0
[ 177.449178] SWIOTLB is 0% full
[ 182.449143] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5349(slow:0)to:563 map:5899 unmap:5949 sync:0
[ 182.449157] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:27 map:35 unmap:0 sync:0
[ 182.449164] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:10 map:15 unmap:0 sync:0
[ 182.449204] SWIOTLB is 0% full
[ 187.449112] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5375(slow:0)to:592 map:5941 unmap:6022 sync:0
[ 187.449126] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:46 map:69 unmap:0 sync:0
[ 187.449133] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:9 map:12 unmap:0 sync:0
[ 187.449173] SWIOTLB is 0% full
[ 192.449183] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5360(slow:0)to:556 map:5890 unmap:5978 sync:0
[ 192.449226] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:52 map:74 unmap:0 sync:0
[ 192.449234] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:10 map:14 unmap:0 sync:0
[ 192.449275] SWIOTLB is 0% full

And the devices do work ... so how does that work ...

Thx for your explanation so far !

--
Sander







>>
>>
>> --
>> Sander
>>
>>



--
Best regards,
Sander mailto:linux [at] eikelenboom


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Jan 17, 2012, 1:02 PM

Post #39 of 66 (418 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

> The thing i was wondering about is if my AMD IOMMU is actually doing something for PV guests.
> When booting with iommu=off machine has 8GB mem, dom0 limited to 1024M and just starting one domU with iommu=soft, with pci-passthrough and the USB pci-cards with USB videograbbers attached to it, i would expect to find some bounce buffering going.
>
> (HV_START_LOW 18446603336221196288)
> (FEATURES '!writable_page_tables|pae_pgdir_above_4gb')
> (VIRT_BASE 18446744071562067968)
> (GUEST_VERSION 2.6)
> (PADDR_OFFSET 0)
> (GUEST_OS linux)
> (HYPERCALL_PAGE 18446744071578849280)
> (LOADER generic)
> (SUSPEND_CANCEL 1)
> (PAE_MODE yes)
> (ENTRY 18446744071594476032)
> (XEN_VERSION xen-3.0)
>
> Still i only see:
>
> [ 47.449072] Starting SWIOTLB debug thread.
> [ 47.449090] swiotlb_start_thread: Go!
> [ 47.449262] xen_swiotlb_start_thread: Go!
> [ 52.449158] 0 [ehci_hcd 0000:0a:00.3] bounce: from:432(slow:0)to:1329 map:1756 unmap:1781 sync:0

There is bouncing there.
..
> [ 172.449102] 0 [ehci_hcd 0000:0a:00.7] bounce: from:3839(slow:0)to:664 map:4486 unmap:4617 sync:0

And there.. 3839 of them.
> [ 172.449123] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:82 map:111 unmap:0 sync:0
> [ 172.449130] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:32 map:36 unmap:0 sync:0
> [ 172.449170] SWIOTLB is 0% full
> [ 177.449109] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5348(slow:0)to:524 map:5834 unmap:5952 sync:0

And 5348 here!

So bounce-buffering is definitly happening with this guest.
.. snip..
>
> And the devices do work ... so how does that work ...

Most (all?) drivers are written to work with bounce-buffering.
That has never been a problem.

The issue as I understand is that the DVB drivers allocate their buffers
from 0->4GB most (all the time?) so they never have to do bounce-buffering.

While the pv-ops one ends up quite frequently doing the bounce-buffering, which
implies that the DVB drivers end up allocating their buffers above the 4GB.
This means we end up spending some CPU time (in the guest) copying the memory
from >4GB to 0-4GB region (And vice-versa).

And I am not clear why this is happening. Hence my thought
was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the
same) with the same PCI device (and the test would entail rebooting the
box in between the launches) to confirm that the Xen-O-Linux is doing something
that the PVOPS is not.

So far, I've haven't had much luck compiling a Xen-O-Linux v2.6.38 kernel
so :-(

>
> Thx for your explanation so far !

Sure thing.

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


pasik at iki

Jan 18, 2012, 3:28 AM

Post #40 of 66 (416 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Tue, Jan 17, 2012 at 04:02:25PM -0500, Konrad Rzeszutek Wilk wrote:
> >
> > And the devices do work ... so how does that work ...
>
> Most (all?) drivers are written to work with bounce-buffering.
> That has never been a problem.
>
> The issue as I understand is that the DVB drivers allocate their buffers
> from 0->4GB most (all the time?) so they never have to do bounce-buffering.
>
> While the pv-ops one ends up quite frequently doing the bounce-buffering, which
> implies that the DVB drivers end up allocating their buffers above the 4GB.
> This means we end up spending some CPU time (in the guest) copying the memory
> from >4GB to 0-4GB region (And vice-versa).
>
> And I am not clear why this is happening. Hence my thought
> was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the
> same) with the same PCI device (and the test would entail rebooting the
> box in between the launches) to confirm that the Xen-O-Linux is doing something
> that the PVOPS is not.
>
> So far, I've haven't had much luck compiling a Xen-O-Linux v2.6.38 kernel
> so :-(
>

Did you try downloading a binary rpm (or src.rpm) from OpenSuse?
I think they have 2.6.38 xenlinux kernel available.

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


JBeulich at suse

Jan 18, 2012, 3:35 AM

Post #41 of 66 (417 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

>>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> The issue as I understand is that the DVB drivers allocate their buffers
> from 0->4GB most (all the time?) so they never have to do bounce-buffering.
>
> While the pv-ops one ends up quite frequently doing the bounce-buffering,
> which
> implies that the DVB drivers end up allocating their buffers above the 4GB.
> This means we end up spending some CPU time (in the guest) copying the
> memory
> from >4GB to 0-4GB region (And vice-versa).

This reminds me of something (not sure what XenoLinux you use for
comparison) - how are they allocating that memory? Not vmalloc_32()
by chance (I remember having seen numerous uses under - iirc -
drivers/media/)?

Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
what their (driver) callers might expect in a PV guest (including the
contiguity assumption for the latter, recalling that you earlier said
you were able to see the problem after several guest starts), and I
had put into our kernels an adjustment to make vmalloc_32() actually
behave as expected.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


JBeulich at suse

Jan 18, 2012, 3:39 AM

Post #42 of 66 (418 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

>>> On 18.01.12 at 12:28, Pasi Kärkkäinen<pasik [at] iki> wrote:
> On Tue, Jan 17, 2012 at 04:02:25PM -0500, Konrad Rzeszutek Wilk wrote:
>> >
>> > And the devices do work ... so how does that work ...
>>
>> Most (all?) drivers are written to work with bounce-buffering.
>> That has never been a problem.
>>
>> The issue as I understand is that the DVB drivers allocate their buffers
>> from 0->4GB most (all the time?) so they never have to do bounce-buffering.
>>
>> While the pv-ops one ends up quite frequently doing the bounce-buffering,
> which
>> implies that the DVB drivers end up allocating their buffers above the 4GB.
>> This means we end up spending some CPU time (in the guest) copying the
> memory
>> from >4GB to 0-4GB region (And vice-versa).
>>
>> And I am not clear why this is happening. Hence my thought
>> was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the
>> same) with the same PCI device (and the test would entail rebooting the
>> box in between the launches) to confirm that the Xen-O-Linux is doing
> something
>> that the PVOPS is not.
>>
>> So far, I've haven't had much luck compiling a Xen-O-Linux v2.6.38 kernel
>> so :-(
>>
>
> Did you try downloading a binary rpm (or src.rpm) from OpenSuse?
> I think they have 2.6.38 xenlinux kernel available.

openSUSE 11.4 is using 2.6.37; 12.1 is on 3.1 (and SLE is on 3.0).
Pulling out (consistent) patches at 2.6.38 level might be a little
involved.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad at darnok

Jan 18, 2012, 6:29 AM

Post #43 of 66 (419 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> > The issue as I understand is that the DVB drivers allocate their buffers
> > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> >
> > While the pv-ops one ends up quite frequently doing the bounce-buffering,
> > which
> > implies that the DVB drivers end up allocating their buffers above the 4GB.
> > This means we end up spending some CPU time (in the guest) copying the
> > memory
> > from >4GB to 0-4GB region (And vice-versa).
>
> This reminds me of something (not sure what XenoLinux you use for
> comparison) - how are they allocating that memory? Not vmalloc_32()

I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
I am going to look at the 2.6.38 from OpenSuSE.

> by chance (I remember having seen numerous uses under - iirc -
> drivers/media/)?
>
> Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> what their (driver) callers might expect in a PV guest (including the
> contiguity assumption for the latter, recalling that you earlier said
> you were able to see the problem after several guest starts), and I
> had put into our kernels an adjustment to make vmalloc_32() actually
> behave as expected.

Aaah.. The plot thickens! Let me look in the sources! Thanks for the
pointer.

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Jan 23, 2012, 2:32 PM

Post #44 of 66 (406 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> > > The issue as I understand is that the DVB drivers allocate their buffers
> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > >
> > > While the pv-ops one ends up quite frequently doing the bounce-buffering,
> > > which
> > > implies that the DVB drivers end up allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest) copying the
> > > memory
> > > from >4GB to 0-4GB region (And vice-versa).
> >
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
>
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> I am going to look at the 2.6.38 from OpenSuSE.
>
> > by chance (I remember having seen numerous uses under - iirc -
> > drivers/media/)?
> >
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> > what their (driver) callers might expect in a PV guest (including the
> > contiguity assumption for the latter, recalling that you earlier said
> > you were able to see the problem after several guest starts), and I
> > had put into our kernels an adjustment to make vmalloc_32() actually
> > behave as expected.
>
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> pointer.

Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
and then performs PCI DMA operations on the allocted vmalloc_32
area.

So I cobbled up the attached patch (hadn't actually tested it and sadly
won't until next week) which removes the call to vmalloc_32 and instead
sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please
send me your logs.

Cheers,
Konrad
Attachments: vmalloc (3.64 KB)


JBeulich at suse

Jan 24, 2012, 12:58 AM

Post #45 of 66 (404 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

>>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
>> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
>> > > The issue as I understand is that the DVB drivers allocate their buffers
>> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
>> > >
>> > > While the pv-ops one ends up quite frequently doing the bounce-buffering,
>> > > which
>> > > implies that the DVB drivers end up allocating their buffers above the
> 4GB.
>> > > This means we end up spending some CPU time (in the guest) copying the
>> > > memory
>> > > from >4GB to 0-4GB region (And vice-versa).
>> >
>> > This reminds me of something (not sure what XenoLinux you use for
>> > comparison) - how are they allocating that memory? Not vmalloc_32()
>>
>> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
>> I am going to look at the 2.6.38 from OpenSuSE.
>>
>> > by chance (I remember having seen numerous uses under - iirc -
>> > drivers/media/)?
>> >
>> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
>> > what their (driver) callers might expect in a PV guest (including the
>> > contiguity assumption for the latter, recalling that you earlier said
>> > you were able to see the problem after several guest starts), and I
>> > had put into our kernels an adjustment to make vmalloc_32() actually
>> > behave as expected.
>>
>> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
>> pointer.
>
> Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
> and then performs PCI DMA operations on the allocted vmalloc_32
> area.
>
> So I cobbled up the attached patch (hadn't actually tested it and sadly
> won't until next week) which removes the call to vmalloc_32 and instead
> sets up DMA allocated set of pages.

What a big patch (which would need re-doing for every vmalloc_32()
caller)! Fixing vmalloc_32() would be much less intrusive (reproducing
our 3.2 version of the affected function below, but clearly that's not
pv-ops ready).

Jan

static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
pgprot_t prot, int node, void *caller)
{
const int order = 0;
struct page **pages;
unsigned int nr_pages, array_size, i;
gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
#ifdef CONFIG_XEN
gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);

BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32));
if (dma_mask == (__GFP_DMA | __GFP_DMA32))
gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
#endif

nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));

area->nr_pages = nr_pages;
/* Please note that the recursion is strictly bounded. */
if (array_size > PAGE_SIZE) {
pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,
PAGE_KERNEL, node, caller);
area->flags |= VM_VPAGES;
} else {
pages = kmalloc_node(array_size, nested_gfp, node);
}
area->pages = pages;
area->caller = caller;
if (!area->pages) {
remove_vm_area(area->addr);
kfree(area);
return NULL;
}

for (i = 0; i < area->nr_pages; i++) {
struct page *page;
gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;

if (node < 0)
page = alloc_page(tmp_mask);
else
page = alloc_pages_node(node, tmp_mask, order);

if (unlikely(!page)) {
/* Successfully allocated i pages, free them in __vunmap() */
area->nr_pages = i;
goto fail;
}
area->pages[i] = page;
#ifdef CONFIG_XEN
if (dma_mask) {
if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
area->nr_pages = i + 1;
goto fail;
}
if (gfp_mask & __GFP_ZERO)
clear_highpage(page);
}
#endif
}

if (map_vm_area(area, prot, &pages))
goto fail;
return area->addr;

fail:
warn_alloc_failed(gfp_mask, order,
"vmalloc: allocation failure, allocated %ld of %ld bytes\n",
(area->nr_pages*PAGE_SIZE), area->size);
vfree(area->addr);
return NULL;
}

...

#if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
#define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL
#elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
#define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL
#elif defined(CONFIG_XEN)
#define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL
#else
#define GFP_VMALLOC32 GFP_KERNEL
#endif


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Jan 24, 2012, 6:17 AM

Post #46 of 66 (406 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Tue, Jan 24, 2012 at 08:58:22AM +0000, Jan Beulich wrote:
> >>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> > On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> >> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> >> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> >> > > The issue as I understand is that the DVB drivers allocate their buffers
> >> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> >> > >
> >> > > While the pv-ops one ends up quite frequently doing the bounce-buffering,
> >> > > which
> >> > > implies that the DVB drivers end up allocating their buffers above the
> > 4GB.
> >> > > This means we end up spending some CPU time (in the guest) copying the
> >> > > memory
> >> > > from >4GB to 0-4GB region (And vice-versa).
> >> >
> >> > This reminds me of something (not sure what XenoLinux you use for
> >> > comparison) - how are they allocating that memory? Not vmalloc_32()
> >>
> >> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> >> I am going to look at the 2.6.38 from OpenSuSE.
> >>
> >> > by chance (I remember having seen numerous uses under - iirc -
> >> > drivers/media/)?
> >> >
> >> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> >> > what their (driver) callers might expect in a PV guest (including the
> >> > contiguity assumption for the latter, recalling that you earlier said
> >> > you were able to see the problem after several guest starts), and I
> >> > had put into our kernels an adjustment to make vmalloc_32() actually
> >> > behave as expected.
> >>
> >> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> >> pointer.
> >
> > Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
> > and then performs PCI DMA operations on the allocted vmalloc_32
> > area.
> >
> > So I cobbled up the attached patch (hadn't actually tested it and sadly
> > won't until next week) which removes the call to vmalloc_32 and instead
> > sets up DMA allocated set of pages.
>
> What a big patch (which would need re-doing for every vmalloc_32()
> caller)! Fixing vmalloc_32() would be much less intrusive (reproducing
> our 3.2 version of the affected function below, but clearly that's not
> pv-ops ready).

I just want to get to the bottom of this before attempting a proper fix.

>
> Jan
>
> static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> pgprot_t prot, int node, void *caller)
> {
> const int order = 0;
> struct page **pages;
> unsigned int nr_pages, array_size, i;
> gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> #ifdef CONFIG_XEN
> gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
>
> BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32));
> if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> #endif
>
> nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> array_size = (nr_pages * sizeof(struct page *));
>
> area->nr_pages = nr_pages;
> /* Please note that the recursion is strictly bounded. */
> if (array_size > PAGE_SIZE) {
> pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,
> PAGE_KERNEL, node, caller);
> area->flags |= VM_VPAGES;
> } else {
> pages = kmalloc_node(array_size, nested_gfp, node);
> }
> area->pages = pages;
> area->caller = caller;
> if (!area->pages) {
> remove_vm_area(area->addr);
> kfree(area);
> return NULL;
> }
>
> for (i = 0; i < area->nr_pages; i++) {
> struct page *page;
> gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
>
> if (node < 0)
> page = alloc_page(tmp_mask);
> else
> page = alloc_pages_node(node, tmp_mask, order);
>
> if (unlikely(!page)) {
> /* Successfully allocated i pages, free them in __vunmap() */
> area->nr_pages = i;
> goto fail;
> }
> area->pages[i] = page;
> #ifdef CONFIG_XEN
> if (dma_mask) {
> if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
> area->nr_pages = i + 1;
> goto fail;
> }
> if (gfp_mask & __GFP_ZERO)
> clear_highpage(page);
> }
> #endif
> }
>
> if (map_vm_area(area, prot, &pages))
> goto fail;
> return area->addr;
>
> fail:
> warn_alloc_failed(gfp_mask, order,
> "vmalloc: allocation failure, allocated %ld of %ld bytes\n",
> (area->nr_pages*PAGE_SIZE), area->size);
> vfree(area->addr);
> return NULL;
> }
>
> ...
>
> #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
> #define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL
> #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
> #define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL
> #elif defined(CONFIG_XEN)
> #define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL
> #else
> #define GFP_VMALLOC32 GFP_KERNEL
> #endif

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Jan 24, 2012, 1:32 PM

Post #47 of 66 (405 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Konrad,

I implemented the patch into a 3.1.2 but the patched function doesn't seem to be called (I set debug=1 for the module).
I think it's only for video capturing devices.

But I greped around and found a vmalloc_32 in drivers/media/common/saa7146_core.c line 182 function saa7146_vmalloc_build_pgtable
which is included in module saa7146.ko. This would be the DVB chip. Maybe you can rework the patch so that we can just test what
you intended to test.

Consequently, the patch you did so far doesn't change the load.

Carsten.




-----Ursprngliche Nachricht-----
Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Montag, 23. Januar 2012 23:32
An: Konrad Rzeszutek Wilk
Cc: Sander Eikelenboom; xen-devel; Jan Beulich
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> > > The issue as I understand is that the DVB drivers allocate their
> > > buffers from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > >
> > > While the pv-ops one ends up quite frequently doing the
> > > bounce-buffering, which implies that the DVB drivers end up
> > > allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest) copying
> > > the memory from >4GB to 0-4GB region (And vice-versa).
> >
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
>
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and
> now I am going to look at the 2.6.38 from OpenSuSE.
>
> > by chance (I remember having seen numerous uses under - iirc -
> > drivers/media/)?
> >
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> > what their (driver) callers might expect in a PV guest (including
> > the contiguity assumption for the latter, recalling that you earlier
> > said you were able to see the problem after several guest starts),
> > and I had put into our kernels an adjustment to make vmalloc_32()
> > actually behave as expected.
>
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> pointer.

Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and then performs PCI DMA operations on the allocted vmalloc_32 area.

So I cobbled up the attached patch (hadn't actually tested it and sadly won't until next week) which removes the call to vmalloc_32 and instead sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please send me your logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Jan 25, 2012, 4:02 AM

Post #48 of 66 (404 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

I can now confirm that saa7146_vmalloc_build_pgtable and vmalloc_to_sg are called once per

PCI card and will allocate 329 pages. Sorry, but I am not in the position to modify your patch

to patch the functions in the right way, but happy to test...

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad [at] darnok>;
CC:Sander Eikelenboom <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>; Jan Beulich <JBeulich [at] suse>;
Von:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
Gesendet:Mo 23.01.2012 23:42
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:vmalloc
On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> > > The issue as I understand is that the DVB drivers allocate their buffers
> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > >
> > > While the pv-ops one ends up quite frequently doing the bounce-buffering,
> > > which
> > > implies that the DVB drivers end up allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest) copying the
> > > memory
> > > from >4GB to 0-4GB region (And vice-versa).
> >
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
>
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> I am going to look at the 2.6.38 from OpenSuSE.
>
> > by chance (I remember having seen numerous uses under - iirc -
> > drivers/media/)?
> >
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> > what their (driver) callers might expect in a PV guest (including the
> > contiguity assumption for the latter, recalling that you earlier said
> > you were able to see the problem after several guest starts), and I
> > had put into our kernels an adjustment to make vmalloc_32() actually
> > behave as expected.
>
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> pointer.

Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
and then performs PCI DMA operations on the allocted vmalloc_32
area.

So I cobbled up the attached patch (hadn't actually tested it and sadly
won't until next week) which removes the call to vmalloc_32 and instead
sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please
send me your logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Jan 25, 2012, 11:06 AM

Post #49 of 66 (405 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is
reduced a bit, but noticably. It's only a simple test, running the DomU for 2 minutes, but the idle load is aprox.

- 2.6.32 pvops 12-13%
- 3.2.1 pvops 10-11%
- 2.6.34 XenoLinux 7-8%

BR, Carsten.


-----Ursprngliche Nachricht-----
Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Montag, 23. Januar 2012 23:32
An: Konrad Rzeszutek Wilk
Cc: Sander Eikelenboom; xen-devel; Jan Beulich
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> > > The issue as I understand is that the DVB drivers allocate their
> > > buffers from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > >
> > > While the pv-ops one ends up quite frequently doing the
> > > bounce-buffering, which implies that the DVB drivers end up
> > > allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest) copying
> > > the memory from >4GB to 0-4GB region (And vice-versa).
> >
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
>
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and
> now I am going to look at the 2.6.38 from OpenSuSE.
>
> > by chance (I remember having seen numerous uses under - iirc -
> > drivers/media/)?
> >
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> > what their (driver) callers might expect in a PV guest (including
> > the contiguity assumption for the latter, recalling that you earlier
> > said you were able to see the problem after several guest starts),
> > and I had put into our kernels an adjustment to make vmalloc_32()
> > actually behave as expected.
>
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> pointer.

Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and then performs PCI DMA operations on the allocted vmalloc_32 area.

So I cobbled up the attached patch (hadn't actually tested it and sadly won't until next week) which removes the call to vmalloc_32 and instead sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please send me your logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Jan 25, 2012, 1:02 PM

Post #50 of 66 (405 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Wed, Jan 25, 2012 at 08:06:12PM +0100, Carsten Schiers wrote:
> Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is
> reduced a bit, but noticably. It's only a simple test, running the DomU for 2 minutes, but the idle load is aprox.
>
> - 2.6.32 pvops 12-13%
> - 3.2.1 pvops 10-11%

Yeah. I think this idue to the fix I added in xen-swiotlb to not always
do the bounce copying.

> - 2.6.34 XenoLinux 7-8%
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Feb 15, 2012, 11:28 AM

Post #51 of 66 (265 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Wed, Jan 25, 2012 at 08:06:12PM +0100, Carsten Schiers wrote:
> Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is
> reduced a bit, but noticably. It's only a simple test, running the DomU for 2 minutes, but the idle load is aprox.
>
> - 2.6.32 pvops 12-13%
> - 3.2.1 pvops 10-11%
> - 2.6.34 XenoLinux 7-8%

I took a stab at Jan's idea - it compiles but I hadn't been able to properly test it.
Attachments: vmalloc_using_xen_limit_pages.patch (6.68 KB)


JBeulich at suse

Feb 16, 2012, 12:56 AM

Post #52 of 66 (261 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

>>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
>@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> struct page **pages;
> unsigned int nr_pages, array_size, i;
> gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>-
>+ gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
>+ if (xen_pv_domain()) {
>+ if (dma_mask == (__GFP_DMA | __GFP_DMA32))

I didn't spot where you force this normally invalid combination, without
which the change won't affect vmalloc32() in a 32-bit kernel.

>+ gfp_mask &= (__GFP_DMA | __GFP_DMA32);

gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);

Jan

>+ }
> nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> array_size = (nr_pages * sizeof(struct page *));
>



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


konrad.wilk at oracle

Feb 17, 2012, 7:07 AM

Post #53 of 66 (260 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+ gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+ if (xen_pv_domain()) {
> >+ if (dma_mask == (__GFP_DMA | __GFP_DMA32))
>
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
>
> >+ gfp_mask &= (__GFP_DMA | __GFP_DMA32);
>
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
>
> Jan

Duh!
Good eyes. Thanks for catching that.

>
> >+ }
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Feb 28, 2012, 6:35 AM

Post #54 of 66 (262 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Well let me check for a longer period of time, and especially, whether the DomU is still

working (can do that only from at home), but load looks pretty well after applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich [at] suse>;
CC:Konrad Rzeszutek Wilk <konrad [at] darnok>; xen-devel <xen-devel [at] lists>; Carsten Schiers <carsten [at] schiers>; Sander Eikelenboom <linux [at] eikelenboom>;
Von:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
>
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
>
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
>
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
>
> Jan

Duh!
Good eyes. Thanks for catching that.

>
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Feb 29, 2012, 4:10 AM

Post #55 of 66 (262 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Great news: it works and load is back to normal. In the attached graph you can see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"?

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>;
CC:Sander Eikelenboom <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>; Jan Beulich <jbeulich [at] suse>; Konrad Rzeszutek Wilk <konrad [at] darnok>;
Von:Carsten Schiers <carsten [at] schiers>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt


Well let me check for a longer period of time, and especially, whether the DomU is still

working (can do that only from at home), but load looks pretty well after applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich [at] suse>;
CC:Konrad Rzeszutek Wilk <konrad [at] darnok>; xen-devel <xen-devel [at] lists>; Carsten Schiers <carsten [at] schiers>; Sander Eikelenboom <linux [at] eikelenboom>;
Von:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
>
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
>
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
>
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
>
> Jan

Duh!
Good eyes. Thanks for catching that.

>
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


carsten at schiers

Feb 29, 2012, 4:56 AM

Post #56 of 66 (253 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

I am very sorry. I accidently started the DomU with the wrong config file, thus it's clear why there is no difference

between the two. And unfortunately, the DomU with the correct config file is having a BUG:

 


[ 14.674883] BUG: unable to handle kernel paging request at ffffc7fffffff000
[ 14.674910] IP: [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31
[ 14.674930] PGD 0
[ 14.674940] Oops: 0002 [#1] SMP
[ 14.674952] CPU 0
[ 14.674957] Modules linked in: nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc tda10023 budget_av evdev saa7146_vv videodev v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core budget_core snd_pcm dvb_core snd_timer saa7146 snd ttpci_eeprom soundcore snd_page_alloc i2c_core pcspkr ext3 jbd mbcache xen_netfront xen_blkfront
[ 14.675057]
[ 14.675065] Pid: 0, comm: swapper/0 Not tainted 3.2.8-amd64 #1
[ 14.675079] RIP: e030:[<ffffffff811b4c0b>] [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31
[ 14.675097] RSP: e02b:ffff880013fabe58 EFLAGS: 00010202
[ 14.675106] RAX: ffff880012800000 RBX: 0000000000000001 RCX: 0000000000001000
[ 14.675116] RDX: 0000000000001000 RSI: ffff880012800000 RDI: ffffc7fffffff000
[ 14.675126] RBP: 0000000000000002 R08: ffffc7fffffff000 R09: ffff880013f98000
[ 14.675137] R10: 0000000000000001 R11: ffff880003376000 R12: ffff8800032c5090
[ 14.675147] R13: 0000000000000149 R14: ffff8800033e0000 R15: ffffffff81601fd8
[ 14.675163] FS: 00007f3ff9893700(0000) GS:ffff880013fa8000(0000) knlGS:0000000000000000
[ 14.675175] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 14.675184] CR2: ffffc7fffffff000 CR3: 0000000012683000 CR4: 0000000000000660
[ 14.675195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 14.675205] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 14.675216] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020)
[ 14.675227] Stack:
[ 14.675232] ffffffff81211826 ffff880002eda000 0000000000000000 ffffc90000408000
[ 14.675251] 00000000000b0150 0000000000000006 ffffffffa013ec4a ffffffff810946cd
[ 14.675270] ffffffff81099203 ffff880003376000 0000000000000000 ffff880002eda4b0
[ 14.675289] Call Trace:
[ 14.675295] <IRQ>
[ 14.675307] [<ffffffff81211826>] ? xen_swiotlb_sync_sg_for_cpu+0x2e/0x47
[ 14.675322] [<ffffffffa013ec4a>] ? vpeirq+0x7f/0x198 [budget_core]
[ 14.675337] [<ffffffff810946cd>] ? handle_irq_event_percpu+0x166/0x184
[ 14.675350] [<ffffffff81099203>] ? __rcu_process_callbacks+0x71/0x2f8
[ 14.675364] [<ffffffff8104d175>] ? tasklet_action+0x76/0xc5
[ 14.675376] [<ffffffff8120a9ac>] ? eoi_pirq+0x5b/0x77
[ 14.675388] [<ffffffff8104cbc6>] ? __do_softirq+0xc4/0x1a0
[ 14.675400] [<ffffffff8120a022>] ? __xen_evtchn_do_upcall+0x1c7/0x205
[ 14.675412] [<ffffffff8134b06c>] ? call_softirq+0x1c/0x30
[ 14.675425] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79
[ 14.675436] [<ffffffff8104c996>] ? irq_exit+0x44/0xb5
[ 14.675452] [<ffffffff8120b032>] ? xen_evtchn_do_upcall+0x27/0x32
[ 14.675464] [<ffffffff8134b0be>] ? xen_do_hypervisor_callback+0x1e/0x30
[ 14.675473] <EOI>

 
Complete log is attached.

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>;
CC:Konrad Rzeszutek Wilk <konrad [at] darnok>; xen-devel <xen-devel [at] lists>; Jan Beulich <jbeulich [at] suse>; Sander Eikelenboom <linux [at] eikelenboom>;
Von:Carsten Schiers <carsten [at] schiers>
Gesendet:Mi 29.02.2012 13:16
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt


Great news: it works and load is back to normal. In the attached graph you can see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"?

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>;
CC:Sander Eikelenboom <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>; Jan Beulich <jbeulich [at] suse>; Konrad Rzeszutek Wilk <konrad [at] darnok>;
Von:Carsten Schiers <carsten [at] schiers>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt


Well let me check for a longer period of time, and especially, whether the DomU is still

working (can do that only from at home), but load looks pretty well after applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich [at] suse>;
CC:Konrad Rzeszutek Wilk <konrad [at] darnok>; xen-devel <xen-devel [at] lists>; Carsten Schiers <carsten [at] schiers>; Sander Eikelenboom <linux [at] eikelenboom>;
Von:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
>
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
>
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
>
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
>
> Jan

Duh!
Good eyes. Thanks for catching that.

>
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel



 
Attachments: debug.log (20.9 KB)


carsten at schiers

May 11, 2012, 2:39 AM

Post #57 of 66 (219 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Hi Konrad,

 
don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load.

But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG,

so we still have not checked whether our theory is correct.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
Von:Carsten Schiers <carsten [at] schiers>
Gesendet:Mi 29.02.2012 14:01
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:debug.log, inline.txt
An:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>;
CC:Sander Eikelenboom <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>; Jan Beulich <jbeulich [at] suse>; Konrad Rzeszutek Wilk <konrad [at] darnok>;


I am very sorry. I accidently started the DomU with the wrong config file, thus it's clear why there is no difference

between the two. And unfortunately, the DomU with the correct config file is having a BUG:

 


[ 14.674883] BUG: unable to handle kernel paging request at ffffc7fffffff000 [ 14.674910] IP: [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [ 14.674930] PGD 0 [ 14.674940] Oops: 0002 [#1] SMP [ 14.674952] CPU 0 [ 14.674957] Modules linked in: nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc tda10023 budget_av evdev saa7146_vv videodev v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core budget_core snd_pcm dvb_core snd_timer saa7146 snd ttpci_eeprom soundcore snd_page_alloc i2c_core pcspkr ext3 jbd mbcache xen_netfront xen_blkfront [ 14.675057] [ 14.675065] Pid: 0, comm: swapper/0 Not tainted 3.2.8-amd64 #1 [ 14.675079] RIP: e030:[<ffffffff811b4c0b>] [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [ 14.675097] RSP: e02b:ffff880013fabe58 EFLAGS: 00010202 [ 14.675106] RAX: ffff880012800000 RBX: 0000000000000001 RCX: 0000000000001000 [ 14.675116] RDX: 0000000000001000 RSI: ffff880012800000 RDI: ffffc7fffffff000 [ 14.675126] RBP: 0000000000000002 R08: ffffc7fffffff000 R09: ffff880013f98000 [ 14.675137] R10: 0000000000000001 R11: ffff880003376000 R12: ffff8800032c5090 [ 14.675147] R13: 0000000000000149 R14: ffff8800033e0000 R15: ffffffff81601fd8 [ 14.675163] FS: 00007f3ff9893700(0000) GS:ffff880013fa8000(0000) knlGS:0000000000000000 [ 14.675175] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.675184] CR2: ffffc7fffffff000 CR3: 0000000012683000 CR4: 0000000000000660 [ 14.675195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.675205] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 14.675216] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020) [ 14.675227] Stack: [ 14.675232] ffffffff81211826 ffff880002eda000 0000000000000000 ffffc90000408000 [ 14.675251] 00000000000b0150 0000000000000006 ffffffffa013ec4a ffffffff810946cd [ 14.675270] ffffffff81099203 ffff880003376000 0000000000000000 ffff880002eda4b0 [ 14.675289] Call Trace: [ 14.675295] <IRQ> [ 14.675307] [<ffffffff81211826>] ? xen_swiotlb_sync_sg_for_cpu+0x2e/0x47 [ 14.675322] [<ffffffffa013ec4a>] ? vpeirq+0x7f/0x198 [budget_core] [ 14.675337] [<ffffffff810946cd>] ? handle_irq_event_percpu+0x166/0x184 [ 14.675350] [<ffffffff81099203>] ? __rcu_process_callbacks+0x71/0x2f8 [ 14.675364] [<ffffffff8104d175>] ? tasklet_action+0x76/0xc5 [ 14.675376] [<ffffffff8120a9ac>] ? eoi_pirq+0x5b/0x77 [ 14.675388] [<ffffffff8104cbc6>] ? __do_softirq+0xc4/0x1a0 [ 14.675400] [<ffffffff8120a022>] ? __xen_evtchn_do_upcall+0x1c7/0x205 [ 14.675412] [<ffffffff8134b06c>] ? call_softirq+0x1c/0x30 [ 14.675425] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 [ 14.675436] [<ffffffff8104c996>] ? irq_exit+0x44/0xb5 [ 14.675452] [<ffffffff8120b032>] ? xen_evtchn_do_upcall+0x27/0x32 [ 14.675464] [<ffffffff8134b0be>] ? xen_do_hypervisor_callback+0x1e/0x30 [ 14.675473] <EOI>

 
Complete log is attached.

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>;
CC:Konrad Rzeszutek Wilk <konrad [at] darnok>; xen-devel <xen-devel [at] lists>; Jan Beulich <jbeulich [at] suse>; Sander Eikelenboom <linux [at] eikelenboom>;
Von:Carsten Schiers <carsten [at] schiers>
Gesendet:Mi 29.02.2012 13:16
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt


Great news: it works and load is back to normal. In the attached graph you can see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"?

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>;
CC:Sander Eikelenboom <linux [at] eikelenboom>; xen-devel <xen-devel [at] lists>; Jan Beulich <jbeulich [at] suse>; Konrad Rzeszutek Wilk <konrad [at] darnok>;
Von:Carsten Schiers <carsten [at] schiers>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt


Well let me check for a longer period of time, and especially, whether the DomU is still

working (can do that only from at home), but load looks pretty well after applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich [at] suse>;
CC:Konrad Rzeszutek Wilk <konrad [at] darnok>; xen-devel <xen-devel [at] lists>; Carsten Schiers <carsten [at] schiers>; Sander Eikelenboom <linux [at] eikelenboom>;
Von:Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
>
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
>
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
>
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
>
> Jan

Duh!
Good eyes. Thanks for catching that.

>
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel



 
--------------------------------
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2127 / Virendatenbank: 2411/4932 - Ausgabedatum: 12.04.2012


konrad.wilk at oracle

May 11, 2012, 12:41 PM

Post #58 of 66 (220 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> Hi Konrad,
>
>
> don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load.
>
> But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG,

Yes, that is right. Thank you for reminding me.
>
> so we still have not checked whether our theory is correct.

No we haven't. And I should be have no trouble reproducing this. I can just write
a tiny module that allocates vmalloc_32().

But your timming sucks - I am going on a week vacation next week :-(

Ah, if there was just a cloning machine - I could stick myself in it,
and Baseline_0 goes on vacation, while Clone_1 goes on working. Then
git merge Baseline_0 and Clone_1 in a week and fixup the merge conflicts
and continue on. Sigh.

Can I ask you to be patient with me once more and ping me in a week - when
I am back from vacation and my brain is fresh to work on this?

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


konrad.wilk at oracle

Jun 13, 2012, 9:55 AM

Post #59 of 66 (206 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> >
> >
> > don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load.
> >
> > But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG,
>
> Yes, that is right. Thank you for reminding me.
> >
> > so we still have not checked whether our theory is correct.
>
> No we haven't. And I should be have no trouble reproducing this. I can just write
> a tiny module that allocates vmalloc_32().

Done. Found some bugs.. and here is anew version. Can you please
try it out? It has the #define DEBUG 1 set so it should print a lot of
stuff when the DVB module loads. If it crashes please send me the full log.

Thanks.
From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
[v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
---
arch/x86/xen/mmu.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
include/xen/xen-ops.h | 2 +
mm/vmalloc.c | 18 +++++-
3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3a73785..960d206 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/seq_file.h>
+#include <linux/slab.h>

#include <trace/events/xen.h>

@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
/* Protected by xen_reservation_lock. */
#define MAX_CONTIG_ORDER 9 /* 2MB */
static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];

#define VOID_PTE (mfn_pte(0, __pgprot(0)))
static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
@@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
}
xen_mc_issue(0);
}
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+ unsigned long *in_frames,
+ unsigned long *out_frames,
+ void *limit_bitmap)
+{
+ int i, n = 0;
+ struct multicall_space mcs;
+ struct page *page;
+
+ xen_mc_batch();
+ for (i = 0; i < (1UL<<order); i++) {
+ if (!test_bit(i, limit_bitmap))
+ continue;
+
+ page = &pages[i];
+ mcs = __xen_mc_entry(0);
+#define DEBUG 1
+ if (in_frames) {
+#ifdef DEBUG
+ printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+ __func__, i, page_to_pfn(page),
+ pfn_to_mfn(page_to_pfn(page)), page_address(page));
+#endif
+ in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+ }
+ MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0);
+ set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+ if (out_frames)
+ out_frames[i] = page_to_pfn(page);
+ ++n;
+
+ }
+ xen_mc_issue(0);
+ return n;
+}

/*
* Update the pfn-to-mfn mappings for a virtual address range, either to
@@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order,

xen_mc_issue(0);
}
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+ unsigned long *mfns,
+ unsigned long first_mfn, /* in_frame if we failed*/
+ void *limit_map)
+{
+ unsigned i, limit;
+ unsigned long mfn;
+ struct page *page;
+
+ xen_mc_batch();
+
+ limit = 1ULL << order;
+ for (i = 0; i < limit; i++) {
+ struct multicall_space mcs;
+ unsigned flags;
+
+ if (!test_bit(i, limit_map))
+ continue;
+
+ page = &pages[i];
+ mcs = __xen_mc_entry(0);
+ if (mfns)
+ mfn = mfns[i];
+ else
+ mfn = first_mfn + i;
+
+ if (i < (limit - 1))
+ flags = 0;
+ else {
+ if (order == 0)
+ flags = UVMF_INVLPG | UVMF_ALL;
+ else
+ flags = UVMF_TLB_FLUSH | UVMF_ALL;
+ }
+#ifdef DEBUG
+ printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+ __func__, i, page_to_pfn(page), mfn, page_address(page));
+#endif
+ MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+ mfn_pte(mfn, PAGE_KERNEL), flags);
+
+ set_phys_to_machine(page_to_pfn(page), mfn);
+ }
+
+ xen_mc_issue(0);
+}
+

/*
* Perform the hypercall to exchange a region of our pfns to point to
@@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
{
long rc;
int success;
-
+#ifdef DEBUG
+ int i;
+#endif
struct xen_memory_exchange exchange = {
.in = {
.nr_extents = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,

rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+ for (i = 0; i < exchange.nr_exchanged; i++) {
+ printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", __func__,pfns_in[i], mfns_out[i]);
+ }
+#endif
BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
BUG_ON(success && (rc != 0));

@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
xen_zap_pfn_range(vstart, order, NULL, out_frames);

/* 3. Do the exchange for non-contiguous MFNs. */
- success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
- 0, out_frames, 0);
+ success = xen_exchange_memory(1, order, &in_frame,
+ 1UL << order, 0, out_frames, 0);

/* 4. Map new pages in place of old pages. */
if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
}
EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);

+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+ unsigned int address_bits)
+{
+ unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+ unsigned long flags;
+ struct page *page;
+ int success;
+ int i, n = 0;
+ unsigned long _limit_map;
+ unsigned long *limit_map;
+
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return 0;
+
+ if (unlikely(order > MAX_CONTIG_ORDER))
+ return -ENOMEM;
+
+ if (BITS_PER_LONG >> order) {
+ limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+ sizeof(*limit_map), GFP_KERNEL);
+ if (unlikely(!limit_map))
+ return -ENOMEM;
+ } else
+ limit_map = &_limit_map;
+
+ /* 0. Construct our per page bitmap lookup. */
+
+ if (address_bits && (address_bits < PAGE_SHIFT))
+ return -EINVAL;
+
+ if (order)
+ bitmap_zero(limit_map, 1U << order);
+ else
+ __set_bit(0, limit_map);
+
+ /* 1. Clear the pages */
+ for (i = 0; i < (1ULL << order); i++) {
+ void *vaddr;
+ page = &pages[i];
+
+ vaddr = page_address(page);
+#ifdef DEBUG
+ printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", __func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr)));
+#endif
+ if (address_bits) {
+ if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+ continue;
+ __set_bit(i, limit_map);
+ }
+ if (!PageHighMem(page))
+ memset(vaddr, 0, PAGE_SIZE);
+ else {
+ memset(kmap(page), 0, PAGE_SIZE);
+ kunmap(page);
+ ++n;
+ }
+ }
+ /* Check to see if we actually have to do any work. */
+ if (bitmap_empty(limit_map, 1U << order)) {
+ if (limit_map != &_limit_map)
+ kfree(limit_map);
+ return 0;
+ }
+ if (n)
+ kmap_flush_unused();
+
+ spin_lock_irqsave(&xen_reservation_lock, flags);
+
+ /* 2. Zap current PTEs. */
+ n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, limit_map);
+
+ /* 3. Do the exchange for non-contiguous MFNs. */
+ success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
+ n, 0, out_frames, address_bits);
+
+ /* 4. Map new pages in place of old pages. */
+ if (success)
+ xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+ else
+ xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+ spin_unlock_irqrestore(&xen_reservation_lock, flags);
+ if (limit_map != &_limit_map)
+ kfree(limit_map);
+
+ return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
#ifdef CONFIG_XEN_PVHVM
static void xen_hvm_exit_mmap(struct mm_struct *mm)
{
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
unsigned long mfn, int nr,
pgprot_t prot, unsigned domid);

+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+ unsigned int address_bits);
#endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
#include <asm/tlbflush.h>
#include <asm/shmparam.h>

+#include <xen/xen.h>
+#include <xen/xen-ops.h>
/*** Page table manipulation functions ***/

static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
struct page **pages;
unsigned int nr_pages, array_size, i;
gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+ gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+ if (xen_pv_domain()) {
+ if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+ gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+ }
nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));

@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
goto fail;
}
area->pages[i] = page;
+ if (xen_pv_domain()) {
+ if (dma_mask) {
+ if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+ area->nr_pages = i + 1;
+ goto fail;
+ }
+ if (gfp_mask & __GFP_ZERO)
+ clear_highpage(page);
+ }
+ }
}

if (map_vm_area(area, prot, &pages))
--
1.7.7.6


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


JBeulich at suse

Jun 14, 2012, 12:07 AM

Post #60 of 66 (206 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

>>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> struct page **pages;
> unsigned int nr_pages, array_size, i;
> gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> -
> + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> + if (xen_pv_domain()) {
> + if (dma_mask == (__GFP_DMA | __GFP_DMA32))

As said in an earlier reply - without having any place that would
ever set both flags at once, this whole conditional is meaningless.
In our code - which I suppose is where you cloned this from - we
set GFP_VMALLOC32 to such a value for 32-bit kernels (which
otherwise would merely use GFP_KERNEL, and hence not trigger
the code calling xen_limit_pages_to_max_mfn()). I don't recall
though whether Carsten's problem was on a 32- or 64-bit kernel.

Jan

> + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> + }
> nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> array_size = (nr_pages * sizeof(struct page *));
>



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


dvrabel at cantab

Jun 14, 2012, 1:38 AM

Post #61 of 66 (206 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On 13/06/12 17:55, Konrad Rzeszutek Wilk wrote:
>
> + /* 3. Do the exchange for non-contiguous MFNs. */
> + success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
> + n, 0, out_frames, address_bits);

vmalloc() does not require physically contiguous MFNs.

David

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


konrad.wilk at oracle

Jun 14, 2012, 11:31 AM

Post #62 of 66 (206 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Thu, Jun 14, 2012 at 09:38:31AM +0100, David Vrabel wrote:
> On 13/06/12 17:55, Konrad Rzeszutek Wilk wrote:
> >
> > + /* 3. Do the exchange for non-contiguous MFNs. */
> > + success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
> > + n, 0, out_frames, address_bits);
>
> vmalloc() does not require physically contiguous MFNs.

<nods> It doesn't matter that much in this context as the vmalloc
calls this per-page - so it is only one page that is swizzled.

>
> David

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


konrad.wilk at oracle

Jun 14, 2012, 11:33 AM

Post #63 of 66 (206 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

On Thu, Jun 14, 2012 at 08:07:55AM +0100, Jan Beulich wrote:
> >>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> > @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > -
> > + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> > + if (xen_pv_domain()) {
> > + if (dma_mask == (__GFP_DMA | __GFP_DMA32))
>
> As said in an earlier reply - without having any place that would
> ever set both flags at once, this whole conditional is meaningless.
> In our code - which I suppose is where you cloned this from - we

Yup.
> set GFP_VMALLOC32 to such a value for 32-bit kernels (which
> otherwise would merely use GFP_KERNEL, and hence not trigger

Ah, let me double check. Thanks for looking out for this.

> the code calling xen_limit_pages_to_max_mfn()). I don't recall
> though whether Carsten's problem was on a 32- or 64-bit kernel.
>
> Jan
>
> > + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> > + }
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


carsten at schiers

Jun 14, 2012, 11:40 AM

Post #64 of 66 (206 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

Konrad, against which kernel version did you produce this patch? It will not succeed
with 3.4.2 at least, will look up some older version now...

-----Ursprngliche Nachricht-----
Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Mittwoch, 13. Juni 2012 18:55
An: Carsten Schiers
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> >
> >
> > don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load.
> >
> > But I think this mistery is still open. My last status was that the
> > latest patch you produced resulted in a BUG,
>
> Yes, that is right. Thank you for reminding me.
> >
> > so we still have not checked whether our theory is correct.
>
> No we haven't. And I should be have no trouble reproducing this. I can
> just write a tiny module that allocates vmalloc_32().

Done. Found some bugs.. and here is anew version. Can you please try it out? It has the #define DEBUG 1 set so it should print a lot of stuff when the DVB module loads. If it crashes please send me the full log.

Thanks.
From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
[v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
---
arch/x86/xen/mmu.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
include/xen/xen-ops.h | 2 +
mm/vmalloc.c | 18 +++++-
3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/seq_file.h>
+#include <linux/slab.h>

#include <trace/events/xen.h>

@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
/* Protected by xen_reservation_lock. */ #define MAX_CONTIG_ORDER 9 /* 2MB */ static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];

#define VOID_PTE (mfn_pte(0, __pgprot(0))) static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
}
xen_mc_issue(0);
}
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+ unsigned long *in_frames,
+ unsigned long *out_frames,
+ void *limit_bitmap)
+{
+ int i, n = 0;
+ struct multicall_space mcs;
+ struct page *page;
+
+ xen_mc_batch();
+ for (i = 0; i < (1UL<<order); i++) {
+ if (!test_bit(i, limit_bitmap))
+ continue;
+
+ page = &pages[i];
+ mcs = __xen_mc_entry(0);
+#define DEBUG 1
+ if (in_frames) {
+#ifdef DEBUG
+ printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+ __func__, i, page_to_pfn(page),
+ pfn_to_mfn(page_to_pfn(page)), page_address(page)); #endif
+ in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+ }
+ MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0);
+ set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+ if (out_frames)
+ out_frames[i] = page_to_pfn(page);
+ ++n;
+
+ }
+ xen_mc_issue(0);
+ return n;
+}

/*
* Update the pfn-to-mfn mappings for a virtual address range, either to @@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order,

xen_mc_issue(0);
}
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+ unsigned long *mfns,
+ unsigned long first_mfn, /* in_frame if we failed*/
+ void *limit_map)
+{
+ unsigned i, limit;
+ unsigned long mfn;
+ struct page *page;
+
+ xen_mc_batch();
+
+ limit = 1ULL << order;
+ for (i = 0; i < limit; i++) {
+ struct multicall_space mcs;
+ unsigned flags;
+
+ if (!test_bit(i, limit_map))
+ continue;
+
+ page = &pages[i];
+ mcs = __xen_mc_entry(0);
+ if (mfns)
+ mfn = mfns[i];
+ else
+ mfn = first_mfn + i;
+
+ if (i < (limit - 1))
+ flags = 0;
+ else {
+ if (order == 0)
+ flags = UVMF_INVLPG | UVMF_ALL;
+ else
+ flags = UVMF_TLB_FLUSH | UVMF_ALL;
+ }
+#ifdef DEBUG
+ printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+ __func__, i, page_to_pfn(page), mfn, page_address(page)); #endif
+ MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+ mfn_pte(mfn, PAGE_KERNEL), flags);
+
+ set_phys_to_machine(page_to_pfn(page), mfn);
+ }
+
+ xen_mc_issue(0);
+}
+

/*
* Perform the hypercall to exchange a region of our pfns to point to @@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in, {
long rc;
int success;
-
+#ifdef DEBUG
+ int i;
+#endif
struct xen_memory_exchange exchange = {
.in = {
.nr_extents = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,

rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+ for (i = 0; i < exchange.nr_exchanged; i++) {
+ printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", __func__,pfns_in[i], mfns_out[i]);
+ }
+#endif
BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
BUG_ON(success && (rc != 0));

@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
xen_zap_pfn_range(vstart, order, NULL, out_frames);

/* 3. Do the exchange for non-contiguous MFNs. */
- success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
- 0, out_frames, 0);
+ success = xen_exchange_memory(1, order, &in_frame,
+ 1UL << order, 0, out_frames, 0);

/* 4. Map new pages in place of old pages. */
if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order) } EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);

+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+ unsigned int address_bits)
+{
+ unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+ unsigned long flags;
+ struct page *page;
+ int success;
+ int i, n = 0;
+ unsigned long _limit_map;
+ unsigned long *limit_map;
+
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return 0;
+
+ if (unlikely(order > MAX_CONTIG_ORDER))
+ return -ENOMEM;
+
+ if (BITS_PER_LONG >> order) {
+ limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+ sizeof(*limit_map), GFP_KERNEL);
+ if (unlikely(!limit_map))
+ return -ENOMEM;
+ } else
+ limit_map = &_limit_map;
+
+ /* 0. Construct our per page bitmap lookup. */
+
+ if (address_bits && (address_bits < PAGE_SHIFT))
+ return -EINVAL;
+
+ if (order)
+ bitmap_zero(limit_map, 1U << order);
+ else
+ __set_bit(0, limit_map);
+
+ /* 1. Clear the pages */
+ for (i = 0; i < (1ULL << order); i++) {
+ void *vaddr;
+ page = &pages[i];
+
+ vaddr = page_address(page);
+#ifdef DEBUG
+ printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n",
+__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr))); #endif
+ if (address_bits) {
+ if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+ continue;
+ __set_bit(i, limit_map);
+ }
+ if (!PageHighMem(page))
+ memset(vaddr, 0, PAGE_SIZE);
+ else {
+ memset(kmap(page), 0, PAGE_SIZE);
+ kunmap(page);
+ ++n;
+ }
+ }
+ /* Check to see if we actually have to do any work. */
+ if (bitmap_empty(limit_map, 1U << order)) {
+ if (limit_map != &_limit_map)
+ kfree(limit_map);
+ return 0;
+ }
+ if (n)
+ kmap_flush_unused();
+
+ spin_lock_irqsave(&xen_reservation_lock, flags);
+
+ /* 2. Zap current PTEs. */
+ n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */,
+limit_map);
+
+ /* 3. Do the exchange for non-contiguous MFNs. */
+ success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
+ n, 0, out_frames, address_bits);
+
+ /* 4. Map new pages in place of old pages. */
+ if (success)
+ xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+ else
+ xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+ spin_unlock_irqrestore(&xen_reservation_lock, flags);
+ if (limit_map != &_limit_map)
+ kfree(limit_map);
+
+ return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
#ifdef CONFIG_XEN_PVHVM
static void xen_hvm_exit_mmap(struct mm_struct *mm) { diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
unsigned long mfn, int nr,
pgprot_t prot, unsigned domid);

+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+ unsigned int address_bits);
#endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
#include <asm/tlbflush.h>
#include <asm/shmparam.h>

+#include <xen/xen.h>
+#include <xen/xen-ops.h>
/*** Page table manipulation functions ***/

static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
struct page **pages;
unsigned int nr_pages, array_size, i;
gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+ gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+ if (xen_pv_domain()) {
+ if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+ gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+ }
nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));

@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
goto fail;
}
area->pages[i] = page;
+ if (xen_pv_domain()) {
+ if (dma_mask) {
+ if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+ area->nr_pages = i + 1;
+ goto fail;
+ }
+ if (gfp_mask & __GFP_ZERO)
+ clear_highpage(page);
+ }
+ }
}

if (map_vm_area(area, prot, &pages))
--
1.7.7.6


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG berprft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5067 - Ausgabedatum: 13.06.2012


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


carsten at schiers

Jun 14, 2012, 11:43 AM

Post #65 of 66 (207 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

It's a 64 Bit kernel...

-----Ursprngliche Nachricht-----
Von: Jan Beulich [mailto:JBeulich [at] suse]
Gesendet: Donnerstag, 14. Juni 2012 09:08
An: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk; Sander Eikelenboom; xen-devel; Carsten Schiers
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

>>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk [at] oracle> wrote:
> @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> struct page **pages;
> unsigned int nr_pages, array_size, i;
> gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> -
> + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> + if (xen_pv_domain()) {
> + if (dma_mask == (__GFP_DMA | __GFP_DMA32))

As said in an earlier reply - without having any place that would ever set both flags at once, this whole conditional is meaningless.
In our code - which I suppose is where you cloned this from - we set GFP_VMALLOC32 to such a value for 32-bit kernels (which otherwise would merely use GFP_KERNEL, and hence not trigger the code calling xen_limit_pages_to_max_mfn()). I don't recall though whether Carsten's problem was on a 32- or 64-bit kernel.

Jan

> + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> + }
> nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> array_size = (nr_pages * sizeof(struct page *));
>



-----
E-Mail ist virenfrei.
Von AVG berprft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5069 - Ausgabedatum: 14.06.2012


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


carsten at schiers

Jun 14, 2012, 12:16 PM

Post #66 of 66 (206 views)
Permalink
Re: Load increase after memory upgrade (part2) [In reply to]

OK, found the problem in the patch file, baking 3.4.2...BR, Carsten.

-----Ursprngliche Nachricht-----
Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Carsten Schiers
Gesendet: Donnerstag, 14. Juni 2012 20:40
An: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

Konrad, against which kernel version did you produce this patch? It will not succeed with 3.4.2 at least, will look up some older version now...

-----Ursprngliche Nachricht-----
Von: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Mittwoch, 13. Juni 2012 18:55
An: Carsten Schiers
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> >
> >
> > don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load.
> >
> > But I think this mistery is still open. My last status was that the
> > latest patch you produced resulted in a BUG,
>
> Yes, that is right. Thank you for reminding me.
> >
> > so we still have not checked whether our theory is correct.
>
> No we haven't. And I should be have no trouble reproducing this. I can
> just write a tiny module that allocates vmalloc_32().

Done. Found some bugs.. and here is anew version. Can you please try it out? It has the #define DEBUG 1 set so it should print a lot of stuff when the DVB module loads. If it crashes please send me the full log.

Thanks.
From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
[v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>
---
arch/x86/xen/mmu.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
include/xen/xen-ops.h | 2 +
mm/vmalloc.c | 18 +++++-
3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
#include <linux/gfp.h>
#include <linux/memblock.h>
#include <linux/seq_file.h>
+#include <linux/slab.h>

#include <trace/events/xen.h>

@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
/* Protected by xen_reservation_lock. */ #define MAX_CONTIG_ORDER 9 /* 2MB */ static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];

#define VOID_PTE (mfn_pte(0, __pgprot(0))) static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
}
xen_mc_issue(0);
}
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+ unsigned long *in_frames,
+ unsigned long *out_frames,
+ void *limit_bitmap)
+{
+ int i, n = 0;
+ struct multicall_space mcs;
+ struct page *page;
+
+ xen_mc_batch();
+ for (i = 0; i < (1UL<<order); i++) {
+ if (!test_bit(i, limit_bitmap))
+ continue;
+
+ page = &pages[i];
+ mcs = __xen_mc_entry(0);
+#define DEBUG 1
+ if (in_frames) {
+#ifdef DEBUG
+ printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+ __func__, i, page_to_pfn(page),
+ pfn_to_mfn(page_to_pfn(page)), page_address(page)); #endif
+ in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+ }
+ MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0);
+ set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+ if (out_frames)
+ out_frames[i] = page_to_pfn(page);
+ ++n;
+
+ }
+ xen_mc_issue(0);
+ return n;
+}

/*
* Update the pfn-to-mfn mappings for a virtual address range, either to @@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order,

xen_mc_issue(0);
}
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+ unsigned long *mfns,
+ unsigned long first_mfn, /* in_frame if we failed*/
+ void *limit_map)
+{
+ unsigned i, limit;
+ unsigned long mfn;
+ struct page *page;
+
+ xen_mc_batch();
+
+ limit = 1ULL << order;
+ for (i = 0; i < limit; i++) {
+ struct multicall_space mcs;
+ unsigned flags;
+
+ if (!test_bit(i, limit_map))
+ continue;
+
+ page = &pages[i];
+ mcs = __xen_mc_entry(0);
+ if (mfns)
+ mfn = mfns[i];
+ else
+ mfn = first_mfn + i;
+
+ if (i < (limit - 1))
+ flags = 0;
+ else {
+ if (order == 0)
+ flags = UVMF_INVLPG | UVMF_ALL;
+ else
+ flags = UVMF_TLB_FLUSH | UVMF_ALL;
+ }
+#ifdef DEBUG
+ printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+ __func__, i, page_to_pfn(page), mfn, page_address(page)); #endif
+ MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+ mfn_pte(mfn, PAGE_KERNEL), flags);
+
+ set_phys_to_machine(page_to_pfn(page), mfn);
+ }
+
+ xen_mc_issue(0);
+}
+

/*
* Perform the hypercall to exchange a region of our pfns to point to @@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in, {
long rc;
int success;
-
+#ifdef DEBUG
+ int i;
+#endif
struct xen_memory_exchange exchange = {
.in = {
.nr_extents = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,

rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+ for (i = 0; i < exchange.nr_exchanged; i++) {
+ printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", __func__,pfns_in[i], mfns_out[i]);
+ }
+#endif
BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
BUG_ON(success && (rc != 0));

@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
xen_zap_pfn_range(vstart, order, NULL, out_frames);

/* 3. Do the exchange for non-contiguous MFNs. */
- success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
- 0, out_frames, 0);
+ success = xen_exchange_memory(1, order, &in_frame,
+ 1UL << order, 0, out_frames, 0);

/* 4. Map new pages in place of old pages. */
if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order) } EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);

+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+ unsigned int address_bits)
+{
+ unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+ unsigned long flags;
+ struct page *page;
+ int success;
+ int i, n = 0;
+ unsigned long _limit_map;
+ unsigned long *limit_map;
+
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return 0;
+
+ if (unlikely(order > MAX_CONTIG_ORDER))
+ return -ENOMEM;
+
+ if (BITS_PER_LONG >> order) {
+ limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+ sizeof(*limit_map), GFP_KERNEL);
+ if (unlikely(!limit_map))
+ return -ENOMEM;
+ } else
+ limit_map = &_limit_map;
+
+ /* 0. Construct our per page bitmap lookup. */
+
+ if (address_bits && (address_bits < PAGE_SHIFT))
+ return -EINVAL;
+
+ if (order)
+ bitmap_zero(limit_map, 1U << order);
+ else
+ __set_bit(0, limit_map);
+
+ /* 1. Clear the pages */
+ for (i = 0; i < (1ULL << order); i++) {
+ void *vaddr;
+ page = &pages[i];
+
+ vaddr = page_address(page);
+#ifdef DEBUG
+ printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n",
+__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr))); #endif
+ if (address_bits) {
+ if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+ continue;
+ __set_bit(i, limit_map);
+ }
+ if (!PageHighMem(page))
+ memset(vaddr, 0, PAGE_SIZE);
+ else {
+ memset(kmap(page), 0, PAGE_SIZE);
+ kunmap(page);
+ ++n;
+ }
+ }
+ /* Check to see if we actually have to do any work. */
+ if (bitmap_empty(limit_map, 1U << order)) {
+ if (limit_map != &_limit_map)
+ kfree(limit_map);
+ return 0;
+ }
+ if (n)
+ kmap_flush_unused();
+
+ spin_lock_irqsave(&xen_reservation_lock, flags);
+
+ /* 2. Zap current PTEs. */
+ n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */,
+limit_map);
+
+ /* 3. Do the exchange for non-contiguous MFNs. */
+ success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
+ n, 0, out_frames, address_bits);
+
+ /* 4. Map new pages in place of old pages. */
+ if (success)
+ xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+ else
+ xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+ spin_unlock_irqrestore(&xen_reservation_lock, flags);
+ if (limit_map != &_limit_map)
+ kfree(limit_map);
+
+ return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
#ifdef CONFIG_XEN_PVHVM
static void xen_hvm_exit_mmap(struct mm_struct *mm) { diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
unsigned long mfn, int nr,
pgprot_t prot, unsigned domid);

+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+ unsigned int address_bits);
#endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
#include <asm/tlbflush.h>
#include <asm/shmparam.h>

+#include <xen/xen.h>
+#include <xen/xen-ops.h>
/*** Page table manipulation functions ***/

static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
struct page **pages;
unsigned int nr_pages, array_size, i;
gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+ gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+ if (xen_pv_domain()) {
+ if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+ gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+ }
nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));

@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
goto fail;
}
area->pages[i] = page;
+ if (xen_pv_domain()) {
+ if (dma_mask) {
+ if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+ area->nr_pages = i + 1;
+ goto fail;
+ }
+ if (gfp_mask & __GFP_ZERO)
+ clear_highpage(page);
+ }
+ }
}

if (map_vm_area(area, prot, &pages))
--
1.7.7.6


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG berprft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5067 - Ausgabedatum: 13.06.2012


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG berprft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5069 - Ausgabedatum: 14.06.2012


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel

Xen devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.