Injecting noise in hidden state inputs, query, key/value or attention head outputs #500

xonfour · 2024-06-10T20:21:20Z

xonfour
Jun 10, 2024

Hey there!

Not sure if this was already discussed somewhere around here but I stumbled across the idea of injecting noise into inference and BEFORE sampling.

See https://github.com/EGjoni/DRUGS and discussion on https://www.reddit.com/r/LocalLLaMA/comments/18toidc/stop_messing_with_sampling_parameters_and_just/

Apart from the freaky name and far too many puns, I like the idea, but I wonder how it could be implemented in exllamav2 in a performant way...

@turboderp To be honest, I find it difficult to make a meaningful judgement as to whether the effort is worth it. What do you think?

turboderp · 2024-06-10T21:27:24Z

turboderp
Jun 10, 2024
Maintainer

Honestly I think the idea has a lot of merit.

"Orthogonalized" de-censored models I think have clearly shown that refusal is represented by a single direction in latent space, at least for a number of large models. Constraining the hidden state to the hyperplane where the refusal component is zero is a much more powerful technique than trying to bias the token selection at the output stage, where you'd need another language model to make informed decisions about which tokens to prefer.

Likewise, adding random noise along the forward pass could be a way powerful way to get "creative" outputs than just sampling at the output stage. It's definitely on my list of things to experiment with, along with orthogonalization and other forms of intervention.

I've added a simple API for it so far:

def pre_hook(hidden_states, *args, **kwargs):
    hidden_states *= 0.9
    return hidden_states

def post_hook(hidden_states, *args, **kwargs):
    hidden_states /= 0.9
    return hidden_states

# Wrap one module with some hooks
model.modules[13] = Intervention(model.modules[13], pre_hook, post_hook)

As usual, though, time is limited and a few other things take priority at the moment.

0 replies

xonfour · 2024-06-11T07:54:28Z

xonfour
Jun 11, 2024
Author

As usual I'm impressed that you already have a (first) solution for this! That's what I call fast delivery... :-D Thanks a lot!

I'm wondering about the coincidence and what the original motivation for this change was, if I may ask? Since the release of Anthropic's last paper on this, I've read a lot about the interpretability of the black box LLM and about how to use this knowledge to alter/de-censor models. It's exciting that my request here and this development come together...

But be that as it may: I still don't understand how the interface you created can help to de-censor models. Or should I understand it more as a very first step in that direction?

0 replies

EGjoni · 2024-06-11T23:32:49Z

EGjoni
Jun 11, 2024

See here for what I think would be a good way to get a similar effect to orthogonalization using something like a hook system:

https://old.reddit.com/r/LocalLLaMA/comments/1d4o2o7/what_happened_to_brainhacking_chip/l6hhjkt/?context=3

If that explanation sounds a bit complicated, basically what it's saying is "do the same thing that first person shooters do to the player's velocity vector when the player runs into a wall at an angle, but do it in hyperdimensional space and make the wall be the plane orthogonal to the refusal direction"

0 replies

xonfour · 2024-06-13T13:48:12Z

xonfour
Jun 13, 2024
Author

That makes sense. For now I will just go with @turboderp's simple API. But I have a problem here if I want to only alter specific layers: How can the hook functions know on which layer they currently operate? I don't see any clue in the code for this.

6 replies

turboderp Jun 13, 2024
Maintainer

Note that the module list isn't a layer list, though. It starts with typically the embedding layer, then alternates between attention and MLP blocks. Some models will have modules that are parallel decoders (i.e. wrapping an attention and MLP block whose outputs both feed into the residual connection at once.)

Which is to say, the intervention is going to be architecture-specific, so you might want to inspect model.modules to get a sense of what it looks like for the model you're working with.

xonfour Jun 13, 2024
Author

Thanks for mentioning! I've seen that and only operate on certain module types.

EGjoni Jun 13, 2024

You will likely also want control within a layer. To, for example, apply some temporary transform onto cached values prior to computation of the next value, then discard that transform after the next value is computed (to avoid repeated application of the same transform per iteration).

xonfour Jun 13, 2024
Author

for now I'm planning to use precomputed hidden states for per layer CFG (and add some noise which was my initial intention 🤣)

xonfour Jun 13, 2024
Author

I just finished "per layer guidance" by precalculating and saving neg/pos hidden states of (upper) attention layers and using it for substraction or addition from/to those layers.

I keep the first half and last few layers intact and I use exponential growth as scaling per layer (need to find a better function here). noise still missing.

quite simple approach but very effective for one negative and positive item but not multiple (which is quite expectable). guess I need to generate an average here.

xonfour · 2024-06-17T15:21:30Z

xonfour
Jun 17, 2024
Author

The current status:

I precalculate hidden states of each layer for a number of negative prompts, then I calculate the average. At inference time I just subtract that average vector in the pre_hook and add it back in the post_hook of each Attention and MLP module. I only operate on the middle and upper layers (to not restrict basic "understanding") and I use a linear scaling to ramp up the effect.

Result: It works surprisingly well!

I know this is still a very simple approach and not really targeted towards adding or removing a specific feature. Now I want to go further and extract a cleaner feature vector. Also I want to limit the subtract and add-back runs to specific modules (where it gets more architecture-specific) or even refrain from adding anything back to the hidden states. But at the moment I'm still struggling with the math part...
(Now I finally remember why I hated linear algebra in college)

5 replies

xonfour Jun 17, 2024
Author

help is highly appreciated of course ;-)

xonfour Jun 17, 2024
Author

Principal Component Analysis or Linear Discriminant Analysis sound promising to isolate the feature, especially if the negative and positive prompts are as diverse as possible apart from the feature I'm looking for.

EGjoni Jun 17, 2024

Not sure if principle component analysis will accomplish much here, since you're only trying to get 1 direction. I think you should do basically what the refusal direction research did for getting the vector.

Get or make a large set of prompts that will be refused. Call it N.
Get or make a large set of prompts that will not be refused. Call it Y.

Let Ni and Yi be the average hidden state for the model's response at each layer i for the refusal and non-refusal responses respectively.

Let your refusal vector for each layer be Ri = Ni - Yi

Now that you have the refusal vector for each layer, share the vectors with the world, so other people know what they are for the models you're working on.

Then, use the vectors to steer the response as follows:

Let Hi be a hidden state generated by the model at layer i.
Determine how much the model is trying to refuse by inner_product(Hi, Ri) . Call this Rti.

If the value of Rti is negative, do nothing.

If Rti is positive, get the projection of Hi onto Ri: this is given by Ri*Rti / squared_magnitude(Ri). Call this Rpi.
Use Hi - Rpi as your new hidden state instead of Hi.

The end result is the same hidden state, minus any tendency to refuse, if there were in fact any such tendency.

You can also do weird things like Hi - (2*Rpi), which intuitively would probably be like "Be exactly as helpful as you would have otherwise been unhelpful".

EGjoni Jun 17, 2024

Oh, nevermind I guess this is actually exactly the thing they do in the original work before they bother with the weights. And the thing they note is that the refusal direction is the same throughout the entire residual stream (not layer dependent).

But also I don't actually see them testing the layer dependency thing too hard.

Probably worth experimenting with while you're at it. But regardless the broader points stand, you should use averages of contrasting sets of prompts, and you should subtract the projection of the hidden state onto the refusal direction from the hidden state.

turboderp Jun 18, 2024
Maintainer

I would note that for failspy's latest version of Llama3-abliterated, they opted to only modify a single layer.

My feeling is that you'd want to test this. I can believe that refusal amounts to a single direction somewhere in the state space, but I don't see how it would be expressed in the early layers before the model has had a chance to decode the input. And suppressing any direction at the output layer would work more or less like a logit filter.

So I would assume you want to intervene somewhere in the middle.

xonfour · 2024-06-19T22:42:52Z

xonfour
Jun 19, 2024
Author

After an evening of debugging I have no idea what I'm doing wrong (I guess there are enough options) but it doesn't work at all. Either I get garbage or there is no change at all.

Inner product and squared magnitude keep rising and eventually reach inf if not limited. Of course the math is wrong somewhere... Any ideas? Sorry for the dirty code. Eternal gratitude is assured.

### Inference

intercept_module_names = ["ExLlamaV2Attention", "ExLlamaV2MoEMLP"] # TODO: both?

def inner_product(x, y):
  #y = y / (torch.norm(y, dim = -1, keepdim = True) + 1e-8)
  return torch.sum(x * y, dim=-1)

def squared_magnitude(x):
  #x = x / (torch.norm(x, dim = -1, keepdim = True) + 1e-8)
  return torch.sum(x * x)

def pre_hook(hidden_states, *args, **kwargs):
  global cur_scaling_distribution, cur_neg_injections, intercept_module_names
  module_num = kwargs["module_num"]
  module_name = kwargs["module_name"]
  if hidden_states.shape[1] != 1: # TODO: deal with prefill
    return hidden_states
  try:
    # scaling factor per module_num, between 0 and 1
    csd = cur_scaling_distribution[module_num]
    if csd > 0:
      for dl in cur_neg_injections:
        cfv = dl.get(module_num, None)
        if cfv != None:
          ip = inner_product(hidden_states, cfv)
          if ip > 0:
            sm = squared_magnitude(cfv)
            hidden_states -= csd * cfv * ip / sm
  except:
    raise # TODO
  return hidden_states

### Prepare

NEG_PROMPTS = "neg_prompts.txt"
POS_PROMPTS = "pos_prompts.txt"

def calculate_feature_vectors(neg_hidden_states_dicts, pos_hidden_states_dicts):
  if len(neg_hidden_states_dicts) == 0 or len(pos_hidden_states_dicts) == 0:
    print("ERR: neg/pos dict list empty")
    return None
  avg_neg_hidden_states_dict = {}
  avg_pos_hidden_states_dict = {}
  feature_vector_dict = {}
  for k in neg_hidden_states_dicts[0].keys():
    hs = []
    for e in neg_hidden_states_dicts:
      if isinstance(e, dict) and k in e and not isinstance(e[k], int):
        hs.append(e[k])
    if len(hs) > 0:
      avg_hidden_states = torch.mean(torch.stack(hs), dim=0)
      avg_neg_hidden_states_dict[k] = avg_hidden_states
    else:
      avg_neg_hidden_states_dict[k] = 0
  for k in pos_hidden_states_dicts[0].keys():
    hs = []
    for e in pos_hidden_states_dicts:
      if isinstance(e, dict) and k in e and not isinstance(e[k], int):
        hs.append(e[k])
    if len(hs) > 0:
      avg_hidden_states = torch.mean(torch.stack(hs), dim=0)
      avg_pos_hidden_states_dict[k] = avg_hidden_states
    else:
      avg_pos_hidden_states_dict[k] = 0
  for k in avg_neg_hidden_states_dict.keys():
    if k in avg_pos_hidden_states_dict:
      feature_vector_dict[k] = avg_neg_hidden_states_dict[k] - avg_pos_hidden_states_dict[k]
  return feature_vector_dict

# called to prepare feature vectors
def prepare_injections():
  global cur_neg_hs, cur_pos_hs
  nt = Path(NEG_PROMPTS).read_text().splitlines()
  pt = Path(POS_PROMPTS).read_text().splitlines()
  for line in nt:
    if len(line) > 0:
      # saves cloned and detached hidden states per layer in cur_neg_hs
      add_negative_run(line)
  for line in pt:
    if len(line) > 0:
      # saves cloned and detached hidden states per layer in cur_pos_hs
      add_positive_run(line)
  # return value will be stored in cur_neg_injections list for pre_hook
  return calculate_feature_vectors(cur_neg_hs, cur_pos_hs)

3 replies

xonfour Jun 20, 2024
Author

stupid me!

intercept_module_names = ["ExLlamaV2Attention", "ExLlamaV2MoEMLP"] # TODO: both?

is now

intercept_module_names_pre = ["ExLlamaV2Attention"]
intercept_module_names_post = []

with corresponding changes to the hooks and it works... ;-)

Will continue testing

EGjoni Jun 20, 2024

Post some results

xonfour Aug 2, 2024
Author

Late reply, true. Above code was still not working properly and a kept on fixing it but finally abandoned that little side project because of lack of time.

I cannot post results at the moment so these are just bold words. Also the effect was mostly subtile. Still I want to share my findings to whom it may concern:

For Mixtral 8x7 I found removing a refusal feature from Layer 13 works best. Surprisingly early.
There is an extremely fine line between a (perceived) effect and the lobotomization of the model.
I remove the feature after Attention AND MLP modules.
I start at layer 5 and linearly ramp up the effect till layer 15, which seems to stabilize the model. Changes here have a significant impact.
Sometimes more is better: If I scale down the feature before subtracting it, the model seems to go crazy faster. For whatever reason.

Maybe when I find the time I will clean up my code and continue my experiments. I didn’t want to just leave the discussion hanging here and I still think that being able to add or remove features at inference time has a lot of potential.

jukofyork · 2024-09-03T09:47:48Z

jukofyork
Sep 3, 2024

Just trying to work out if this can be adapted to work with control vectors:

def pre_hook(hidden_states, *args, **kwargs):
  global cur_scaling_distribution, cur_neg_injections, intercept_module_names
  module_num = kwargs["module_num"]
  module_name = kwargs["module_name"]
  if hidden_states.shape[1] != 1: # TODO: deal with prefill
    return hidden_states
  try:
    # scaling factor per module_num, between 0 and 1
    csd = cur_scaling_distribution[module_num]
    if csd > 0:
      for dl in cur_neg_injections:
        cfv = dl.get(module_num, None)
        if cfv != None:
          #ip = inner_product(hidden_states, cfv)
          #if ip > 0:
          #sm = squared_magnitude(cfv)
          #hidden_states -= csd * cfv * ip / sm
          hidden states += csd * cfv 
  except:
    raise # TODO
  return hidden_states

Someone on HF has been adapting control vectors to work with exllamav2 but currently they are having to insert a down_proj.bias (or similar) into the compute graph and can then only have a fixed control vector with no way to sum multiple ones, or control the scale factor:

https://huggingface.co/gghfez/DarkMage-123b-exl2/discussions/1

Using this hook looks a lot simpler than trying to make a fake LoRA (as the exllamav LoRA code doesn't look to handle .bias tensors nor uses the modules_to_save tensors).

I'll link him to this thread.

0 replies

jukofyork · 2024-09-03T10:09:10Z

jukofyork
Sep 3, 2024

@xonfour

I struggled with trying to get the "abliteration" method to work to remove "positivity" for ages, but when you try to modify the weights you collapse both sides of the subspace... :(

I also found if you replace the -1 scaler with -2 (eg: hidden_states -= 2 * csd * cfv * ip / sm), you get a Householder Transformation and can "flip" both sides of the subspace and make the model (pointlessly) do the opposite of what you ask. This is actually what the "Mopey Mule" model did; except they used a scaler of -1.3 which both reflects and down-scales the subspace.

In theory you can use (an even) multiple of Householder Transformations to rotate the vector space too, but no amount of rotation can actually get rid of "positivity" as the opposite end of the subspace just gets rotated at the same time... :(

In the end I concluded the only viable method was to do what you have done here at inference time:

          ip = inner_product(hidden_states, cfv)
          if ip > 0:

but sadly the llama.cpp mixed C/C++ code using ggml makes this intervention way more painful to implement :/

It does make you wonder what (if anything) is on the other end of the "refusals" axis that gets collapsed when you do the "abliteration" by modifying the weights? I guess it would be interesting to flip your test if ip < 0: and see if it's possible to infer what has changed about the model when you interact with it.

0 replies

GHBigD · 2024-09-03T11:07:05Z

GHBigD
Sep 3, 2024

I would love to see support for this on Exllama. I have played around with this on llama.cpp, and it's a game-changer.

1 reply

gghfez Sep 3, 2024

I would love to see support for this on Exllama. I have played around with this on llama.cpp, and it's a game-changer.

100% agreed. It's a game changer for both story writing and roleplay in SillyTavern! But now that exllamav2 is so fast, it's painful having to halve the speed and keep all these gguf files around for llama.cpp

turboderp · 2024-09-03T11:25:35Z

turboderp
Sep 3, 2024
Maintainer

@jukofyork I personally wonder if the hidden state is really the right target for this. Why not the the MLP intermediate state? Surely that's where concepts like "inappropriate" are most unambiguously expressed by the model, where you'd find the closest thing to a conditional expression, and where you could most easily intervene by just erasing activations (zeroing rows in the down projection, for instance.)

6 replies

jukofyork Sep 3, 2024

@jukofyork I personally wonder if the hidden state is really the right target for this. Why not the the MLP intermediate state? Surely that's where concepts like "inappropriate" are most unambiguously expressed by the model, where you'd find the closest thing to a conditional expression, and where you could most easily intervene by just erasing activations (zeroing rows in the down projection, for instance.)

I've actually worked out quite a bit regarding this but just haven't had chance to implement it yet. I put some of my ideas down in this post, but will expand on it here some more:

So just looking at the down_proj (A and its input w):

h = A * w

Adding a control vector to the output of the transformation of the down_proj matrix turns this into an affine transformation:

h' = A * w + c = h + c

The horribly named "abliteration" is doing an orthogonal projection (where u is a unit vector here):

h' = A * (I - u^T *u) * w = A * w - A * u^T * u * w = h + h * u^T * u

(I've likely fucked up some transpose and the dimensions don't match quite right, but hopefully makes sense)

The key thing to notice is that the h * u^T = h . u is creating a scaler that measures the (signed) directional similarity between h and u, and then multiplying this by u. Since we used a unit vector for u this results in the scaled version of u always pointing directly back to the origin and thus collapsing the subspace.

Like I said above if you use -2 instead of -1 you get a Householder Transformation that reflects instead of collapses:

h' = A * (I - 2 * u^T *u) * w = ...

So now what if we allow u to be a pair of vectors instead and introduce an arbitrary scaler:

h' = A * (I + k * u^T *v) * w = A * I * w + A * k * u^T *v * w = A * w + A * k * u^T *v * w = h + k * h * u^T * v

(again I've likely fucked up some transpose and the dimensions don't match quite right)

So the same idea of h * u^T measuring the (signed) directional similarity between h and u is true, but now we scale a different vector v by the k * h * u^T scaler; letting it point in a different direction.

So now we can extend U and V to be low-rank matrices instead of just vectors and get:

h' = A * (I + k * U^T *V) * w = ... = h + k * h * U^T * V

and now we have lots of (signed) directional similarities between h and u_i, which in turn scale the corresponding v_i. This ability to conditionally alter many directions at once is what gives this method way more power than the control-vectors' ability to effect only a single direction unconditionally...

So to extend this further, if we insert a relu operation as the guy above did in his code, we can now take only the positive direction:

k * relu(h . u_i) * v_i

and when looked at in terms of U and V, this is actully very similar to the effect of the up_proj and gate_proj calculation inside of the MLP blocks!

we could even extend this by using constants c_i like so:

k * relu((h - c_i) . u_i) * v_i

allowing the signed direction to be centred before performing the dot product.

There is also another way to look at this in terms of LoRAs:

All of the different LoRA methods (AFAIK from reading recent review papers) use a variation on this:

(A + B) * w = (A + U^T * V) * w = A* w + U^T * V * w

and this "additive" LoRA can then we added (merged) back to the weights:

A' = A + U^T * V

So now looking at this:

A * (I + U^T *V) * w = A * I * w + A * U^T *V * w = A * w + A * U^T *V * w

and this "multiplicative" LoRA can also be added (merged) back to the weights:

A + B = A * (I + U^T *V) = A * I + A * U^T *V = A + A * U^T *V

B = A * U^T *V

so:

A' = A + A * U^T *V

During training this would then mean we are adapting a low-rank square matrix U^T *V:

h' = h + U^T *V * h

and this also has nice properties for contrastive learning where we have two samples (or two samples + a baseline) like the control vectors (since we only need the hidden states h and not the w inputs to the A * w transformation). I detailed about this much more in this post and explain how we might be able to solve this using Least Squares and/or via the Moore–Penrose Inverse.

So why am I so convinced that this kind of "multiplicative" LoRAs (of the down_proj matrix specifically) might be interesting and why I'm less interested in altering the hidden state of the MLP?

The reason is that the control vectors, the "abliteration", the "Golden Gate Claude" experiment, etc all point towards the hidden states after the down_proj transformation having definite semantic meaning and also they are far more constrained by the residual stream than the other matrices (ie: there are n! different permutations of the inner up_proj and gate_proj that would give the same results...).

I've already looked at the SVD of the (rectangular) matrices formed by the differences between base and fine-tuned models (eg: extract_lora.py) and there was nothing interesting to be had. Tensor decomposition might show something interesting, but AFAIK it's out of the question for 80-dimensional tensors, etc.

I have a very strong feeling that these "multiplicative" LoRAs (of the down_proj matrix specifically) might actually hold interpretable features and may well actually be much better optimisation directions if we want to fine-tune models to specifically alter their prose.

As I mentioned above, you can do contrastive learning (possibly using Triplet Loss) without any more cost than a simple hook to grab the hidden states at inference time, and it may well be possible to use "simplified" back-propagation (ie: ignoring all but the residual stream effects) without much more extra overhead (eg: outputs -> logits -> lm_head -> "residual-stream" -> U_i^T *V_ * h_i) just at the cost of using a lower learning rate.

I should add that the reason I'm not so interested in the out_proj transformation hidden states is every attempt to mess with them has caused the models to completely fail (whereas so long as you don't send the down_proj transformations way out of distribution using massive scale-factors; the models remain 100% robust with no effect on their context length either).

The way we look at LLMs is:

ATTN_1(h) -> MLP_1(h) | -> ATTN_2(h) -> MLP_2(h) | -> ... -> ATTN_n-11(h) -> MLP_n-1(h) | -> ATTN_n(h) -> MLP_n(h)

but I have strong feeling that ATTN_1(h) and MLP_n(h) may be serving some different purpose, and maybe we should look at it like:

MLP_1(h) -> ATTN_2(h) | -> MLP_2(h) -> ... -> ATTN_n-11(h) | -> MLP_n-1(h) -> ATTN_n(h)

ie: the MLP_i(h) -> ATTN_i+1(h) pairs are having the MLP "prepair" the hidden state for the ATTN block.

I've pushed the control vectors about as far as I can now, and the limitation of only being able to effect a single "axis" seems to be the main blocker to improving them (vector addition sort of works, but not very well...).

So my plan now is:

Write code to strip off these down_proj "multiplicative" LoRAs in the same way as extract_lora.py works.
Investigate contrastive loss using the same hidden state sampling as I already have for the control vectors.
See if I can get partial or full back-prop working with these down_proj "multiplicative" LoRAs.

My python skills are pretty bad though, so it will likely take a couple of months to get this far :/ I started looking at the control vectors around March IIRC and this will likely take a lot more to get working...

Please forgive me if I've messed up the matrix dimensions (or made other mistakes) - I'm in a rush to go out and no time to go through and check them, but I think the idea is definitely sound.

I should also add I'm only putting this out there as I know you and other's reading this might be interested. I'm 25 years out of academia and have no interest in writing papers nor anything to gain from it - I just find it really interesting - particularly the fact that so much of these seem to be purely linear!?

jukofyork Sep 4, 2024

So I've been hunting some more and there definitely is some prior research on using "multiplicative LoRAs" like this for image models at least:

https://arxiv.org/abs/2306.07280
https://arxiv.org/abs/2311.06243

They use a block diagonal square matrix instead of a pair factor matrices, but it's basically the same idea and has the same effect:

It also looks like it would be really easy to hack into PERF:

    def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor:
        self._check_forward_args(x, *args, **kwargs)
        adapter_names = kwargs.pop("adapter_names", None)

        if self.disable_adapters:
            if self.merged:
                self.unmerge()
            result = self.base_layer(x, *args, **kwargs)
        elif adapter_names is not None:
            result = self._mixed_batch_forward(x, *args, adapter_names=adapter_names, **kwargs)
        elif self.merged:
            result = self.base_layer(x, *args, **kwargs)
        else:
            result = self.base_layer(x, *args, **kwargs)
            torch_result_dtype = result.dtype
            for active_adapter in self.active_adapters:
                if active_adapter not in self.lora_A.keys():
                    continue
                lora_A = self.lora_A[active_adapter]
                lora_B = self.lora_B[active_adapter]
                dropout = self.lora_dropout[active_adapter]
                scaling = self.scaling[active_adapter]
                x = x.to(lora_A.weight.dtype)

                if not self.use_dora[active_adapter]:
                    result = result + lora_B(lora_A(dropout(x))) * scaling
                else:
                    x = dropout(x)
                    result = result + self.lora_magnitude_vector[active_adapter](
                        x,
                        lora_A=lora_A,
                        lora_B=lora_B,
                        scaling=scaling,
                        base_layer=self.get_base_layer(),
                    )

            result = result.to(torch_result_dtype)

        return result

https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py

result = result + lora_B(lora_A(dropout(x))) * scaling

Becomes:

result = result + lora_B(lora_A(result))) * scaling

and a couple of other changes to make lora_A the same dimensions as lora_B, but generally a super-easy hack.

jukofyork Sep 4, 2024

I've asked danielhanchen here:

unslothai/unsloth#991

As he seemed quite interested and replied in the LoRD announcement on Reddit. Hopefully he will know if there is anything existing to do this for LLMs.

jukofyork Sep 4, 2024

After some more hunting I found out these for image-model LoRAs:

https://arxiv.org/abs/2004.04690
https://arxiv.org/abs/2306.07280
https://arxiv.org/abs/2311.06243
https://arxiv.org/abs/2405.17484

The last paper is basically the same thing but reparametrised in terms of Householder Transformations (which makes me a lot less hopeful this will find anything interesting...).

jukofyork Sep 9, 2024

Posting a follow-up here in case anybody is interested.

My method outlined above using the pseudo-inverse doesn't actually find the best approximate rotation matrix in terms of least squares:

Orthogonal Procrustes problem shows it's a similar idea, but using B * A^T (the cross-covariance matrix again!) instead of A^+ * B.

The second paper has a nice way to factorise an orthogonal matrix suitable for projected gradient descent using a skew-symmetric matrix A:

https://en.wikipedia.org/wiki/Cayley_transform

https://planetmath.org/cayleysparameterizationoforthogonalmatrices

Q = (I - A) * (I + A)^{-1}

(and likely a nice way to regularise using the Frobenius norm of A [ie: Q' --> I as A --> 0]).

and the follow-up papers are just different ways to factorise Q using more efficient factorisations.

I've been through every one of the PEFT file formats:

https://github.com/huggingface/peft/tree/main/src/peft/tuners

and sadly none actually can store a general transformation I + U * V^T like I hoped (which would allow for the "abliteratrion" and all the other non-rotational transforms to be stored in a standard PEFT format).

The method from the 2nd and 3rd paper have been incorporated into PEFT:

https://github.com/huggingface/peft/tree/main/src/peft/tuners/oft
https://github.com/huggingface/peft/tree/main/src/peft/tuners/boft

and the 4th paper has code here that can be used to alter the oft format to work with HRA:

https://github.com/DaShenZi721/HRA

The last paper does actually have examples of its use for LLM LoRAs (rather than just image LoRAs like the preceding papers).

Injecting noise in hidden state inputs, query, key/value or attention head outputs #500

Replies: 10 comments · 21 replies

turboderp Jun 10, 2024 Maintainer

xonfour Jun 11, 2024 Author

xonfour Jun 13, 2024 Author

turboderp Jun 13, 2024 Maintainer

xonfour Jun 13, 2024 Author

xonfour Jun 13, 2024 Author

xonfour Jun 13, 2024 Author

xonfour Jun 17, 2024 Author

xonfour Jun 17, 2024 Author

xonfour Jun 17, 2024 Author

turboderp Jun 18, 2024 Maintainer

xonfour Jun 19, 2024 Author

xonfour Jun 20, 2024 Author

xonfour Aug 2, 2024 Author

turboderp Sep 3, 2024 Maintainer

Replies: 10 comments 21 replies

turboderp
Jun 10, 2024
Maintainer

xonfour
Jun 11, 2024
Author

xonfour
Jun 13, 2024
Author

turboderp Jun 13, 2024
Maintainer

xonfour Jun 13, 2024
Author

xonfour Jun 13, 2024
Author

xonfour Jun 13, 2024
Author

xonfour
Jun 17, 2024
Author

xonfour Jun 17, 2024
Author

xonfour Jun 17, 2024
Author

turboderp Jun 18, 2024
Maintainer

xonfour
Jun 19, 2024
Author

xonfour Jun 20, 2024
Author

xonfour Aug 2, 2024
Author

turboderp
Sep 3, 2024
Maintainer