Thats a good question and we have to clarify this in camera ready version.
In Table 1, in "Acc with 8 (greedy search)", we have calculated the proportions of each kind of filter in each layer; we use that in Table 2.
Thats a good question and we have to clarify this in camera ready version.
In Table 1, in "Acc with 8 (greedy search)", we have calculated the proportions of each kind of filter in each layer; we use that in Table 2.
I'll be looking forward to seeing more of your work in the future.
I'll be looking forward to seeing more of your work in the future.
I finally managed to look at Eq4.
I believe it doesn't represent DS-CNNs. Each kernel is convolved into a separate feature map, and you can't factor them out. (I marked the part I don't think represents DS-CNN in red)
Overall, DS-CNNs are not LCing kernels. They are LCing featuremaps.
I finally managed to look at Eq4.
I believe it doesn't represent DS-CNNs. Each kernel is convolved into a separate feature map, and you can't factor them out. (I marked the part I don't think represents DS-CNN in red)
Overall, DS-CNNs are not LCing kernels. They are LCing featuremaps.
Maybe we can cite it again on "Master key filters" again as another visual evidence 👌
Nice work, by the way.
Maybe we can cite it again on "Master key filters" again as another visual evidence 👌
Nice work, by the way.
Actually, two people referenced your work : D
Please correct me if I'm wrong. Are you sure that pointwise layers are LCing the "filters"? I'm having difficulties seeing that.
If we name filters as K and features as F, how can this result in LC of Ks?
Actually, two people referenced your work : D
Please correct me if I'm wrong. Are you sure that pointwise layers are LCing the "filters"? I'm having difficulties seeing that.
If we name filters as K and features as F, how can this result in LC of Ks?
A model learning (x,y,z) is mathematically equivalent to one using LC of "frozen" filters (1,0,0), (0,1,0), (0,0,1). They're doing the same optimization, just expressed differently. Same goes for LC of random filters.
A model learning (x,y,z) is mathematically equivalent to one using LC of "frozen" filters (1,0,0), (0,1,0), (0,0,1). They're doing the same optimization, just expressed differently. Same goes for LC of random filters.
I found out that they they create new filters through linear combinations of random filters, which isn't what we're doing. 🤔
And mathematically, LC of 49 random filters should span the entire 7x7 space, so it's not surprising that it works.
Open to discussion if I'm misunderstanding something!
I found out that they they create new filters through linear combinations of random filters, which isn't what we're doing. 🤔
And mathematically, LC of 49 random filters should span the entire 7x7 space, so it's not surprising that it works.
Open to discussion if I'm misunderstanding something!
After reading the paper, TBH, I couldn't see a deep connection. And I'm open to being wrong since you and AC both pointed this out. If I am wrong, please correct me.
After reading the paper, TBH, I couldn't see a deep connection. And I'm open to being wrong since you and AC both pointed this out. If I am wrong, please correct me.
bsky.app/profile/kias...
and
bsky.app/profile/kias...
But why? is this because of residual connections?
bsky.app/profile/kias...
and
bsky.app/profile/kias...
We have experimented with both DS-CNNs and classical CNNs (ResNets in our paper, and you are right that our main focus was DS-CNNs). In DS-CNNs we only frozen depthwise filters but in classical CNNs all params are frozen just like Yosinski did.
We have experimented with both DS-CNNs and classical CNNs (ResNets in our paper, and you are right that our main focus was DS-CNNs). In DS-CNNs we only frozen depthwise filters but in classical CNNs all params are frozen just like Yosinski did.